##  Sentiment Analysis Practical on Book Reviews Data
**VADER vs Transformer-Based Models**

In this notebook, we perform sentiment analysis on a sample of book reviews using two different approaches:

- **VADER** ‚Äì a rule-based sentiment analysis method  
- **Transformer-based models** ‚Äì pre-trained deep learning models from Hugging Face

We intentionally apply **minimal text preprocessing** to preserve sentiment-related words.

---

## 1Ô∏è‚É£ Import Required Libraries

```python
import pandas as pd
import numpy as np
import re
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import transformers
from transformers import pipeline


In [None]:
import pandas as pd
import numpy as np
import re
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import transformers
from transformers import pipeline

2Ô∏è‚É£ Load the Dataset

In [None]:
data = pd.read_csv("../../data/book_reviews_sample.csv")

In [None]:
data.head()

In [None]:
data['reviewText'][0]

3Ô∏è‚É£ Text Cleaning (Minimal Preprocessing)

We apply light preprocessing only:

Convert text to lowercase

Remove punctuation

We do not remove stopwords, perform stemming, or lemmatization, because these steps may remove words that carry important sentiment information.

In [None]:
data['reviewText_clean'] = data['reviewText'].str.lower()

In [None]:
data['reviewText_clean'] = data.apply(lambda x: re.sub(r"([^\w\s])", '', x['reviewText']), axis=1)

In [None]:
data.head()

4Ô∏è‚É£ VADER Sentiment Analysis
Rule-Based Approach

VADER returns multiple sentiment scores.
We use the compound score, which summarizes overall sentiment on a scale from -1 to 1.

In [None]:
vader_sentiment= SentimentIntensityAnalyzer()

In [None]:
data['vader_sentiment_score'] = data['reviewText_clean'].apply(lambda review: vader_sentiment.polarity_scores(review)['compound'])  #only compound score

In [None]:
data.head()

5Ô∏è‚É£ Convert VADER Scores to Sentiment Labels

We classify compound scores into three sentiment categories:

Negative

Neutral

Positive

In [None]:
#bins : the value ranges that define how the scores are categorized/divided
bins = [-1, -0.1, 0.1, 1]
names = ['negative', 'neutral', 'positive']

#pd.cut : function to segment and sort data values into bins

data['vader_sentiment_label'] = pd.cut(data['vader_sentiment_score'], bins, labels=names)  #assign sentiment labels based on the bins 

In [None]:
data.head()

In [None]:
data['vader_sentiment_label'].value_counts().plot.bar()

6Ô∏è‚É£ Transformer-Based Sentiment Analysis
Pre-Trained Model

Modern sentiment analysis often relies on transformer models, which understand context and word relationships more effectively than rule-based methods.

We use the Hugging Face pipeline API, which allows us to run sentiment analysis without training a model from scratch.

In [None]:
# lets use pretrained models from transformers library
transformer_pipeline = pipeline("sentiment-analysis")


In [None]:
data['transformer_sentiment_label'] = [transformer_pipeline(review)[0]['label'] for review in data['reviewText_clean']]

7Ô∏è‚É£ Transformer Sentiment Distribution

In [None]:
data['transformer_sentiment_label'].value_counts().plot.bar()

üîç Comparison Insights

VADER is fast and interpretable but rule-based

Transformer models capture context, negation, and subtle sentiment

Minimal preprocessing helps preserve sentiment meaning

Transformers typically outperform rule-based approaches on complex text


üéØ Final Takeaways

Rule-based sentiment analysis is simple but limited

Transformer models represent the modern standard for NLP sentiment tasks

Pre-trained models enable rapid experimentation

Choosing the right model depends on:

domain

text style

application requirements