
# Sentiment Analysis of Product Reviews 🛍️🧠

This project performs **sentiment analysis** on product reviews using the **VADER** sentiment analysis tool from the NLTK library.

We analyze a set of product reviews and classify them into **positive**, **neutral**, or **negative** based on the text content.


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm

import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.tokenize import word_tokenize

plt.style.use('ggplot')


In [None]:

nltk.download('punkt')
nltk.download('vader_lexicon')


## Load Dataset

In [None]:

# Load the dataset - update the path if needed
df = pd.read_csv('data/sample_reviews.csv', encoding='unicode_escape')

# Display shape and first few rows
print(f"Dataset contains {df.shape[0]} rows and {df.shape[1]} columns")
df.head()


## Preprocess Dataset

In [None]:

# For demo purposes, keep only the first 500 reviews
df = df.head(500)

# Optional: Drop rows with missing reviews
df = df.dropna(subset=['body'])
df.reset_index(drop=True, inplace=True)
print(f"Trimmed dataset contains {df.shape[0]} reviews.")


## 🔍 Tokenization Example (Optional Demo)

In [None]:

example_text = df['body'][4]
print("Example Review:", example_text)
word_tokenize(example_text)


## 🧠 VADER Sentiment Analysis with Review ID Mapping

In [None]:

sia = SentimentIntensityAnalyzer()

# Run the polarity score on the entire dataset
res = {}
for i, row in tqdm(df.iterrows(), total=len(df)):
    body = row['body']
    review_id = i  # If you have an 'Id' column, replace with row['Id']
    res[review_id] = sia.polarity_scores(body)

# Convert results to DataFrame
vaders = pd.DataFrame(res).T
vaders = vaders.reset_index().rename(columns={'index': 'review_id'})
vaders = vaders.merge(df, left_on='review_id', right_index=True)


In [None]:

# Add sentiment label
vaders['sentiment'] = vaders['compound'].apply(
    lambda c: 'positive' if c > 0.05 else ('negative' if c < -0.05 else 'neutral')
)

vaders.head()


## 📊 Visualize Sentiment Distribution

In [None]:

sns.countplot(x='sentiment', data=vaders, palette='pastel')
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment')
plt.ylabel('Number of Reviews')
plt.show()



## 📌 Conclusion

- VADER provides a simple yet effective way to perform sentiment analysis.
- Majority/minority sentiment can be visualized to understand customer feedback.
- This is useful for businesses to monitor and respond to customer opinions.

---

**Next Steps:** Try other sentiment models like TextBlob or transformer-based models for deeper insights.
