# 03 — Sentiment Mapping (Text + Rating Combined)

In this notebook, we apply VADER sentiment to review text, generate rating-based sentiment,
and combine both into a final sentiment label.

✅ Input: `../data/netflix_reviews_clean.csv`  
✅ Output: `../data/netflix_reviews_cleaned.csv`


## Step 1: Load Clean Dataset

In [1]:
import pandas as pd
from pathlib import Path

CLEAN_DATA_PATH = Path("../data/netflix_reviews_clean.csv")
df = pd.read_csv(CLEAN_DATA_PATH)

print(f"✅ Loaded cleaned dataset with {len(df)} reviews")
df.head()


✅ Loaded cleaned dataset with 41238 reviews


Unnamed: 0,username,review,rating,date,country,source,clean_review
0,Robert,"brainwashing children. don't deny it, the proo...",1,2025-10-08 00:57:12,NG,Play Store,brainwashing children dont deny proof pudding
1,Abraham Bernabe,disgusting woke agenda. just cancel and uninst...,1,2025-10-08 00:35:03,NG,Play Store,disgusting woke agenda cancel uninstall garbage
2,Kyle Martin,Please fix the pausing issue. I am trying to w...,2,2025-10-08 00:33:36,NG,Play Store,please fix pausing issue trying watch monday n...
3,Jarrett,"Doesn't really have that much titles, heh lol",3,2025-10-08 00:30:51,NG,Play Store,doesnt really much titles heh lol
4,Michael Raber,"gotta pay for the app, they raise the price wi...",1,2025-10-08 00:26:11,NG,Play Store,gotta pay app raise price warning work half time


## Step 2: Initialize VADER Sentiment Analyzer


In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()


## Step 3: Sentiment from Text (VADER)



In [3]:
# 1. VADER Sentiment Analysis & Categorization

def categorize_netflix_sentiment(score):
    if score >= 0.05:
        return "positive"
    elif score <= -0.05:
        return "negative"
    else:
        return "neutral"


In [4]:
# 2. Apply VADER to each review
df["vader_score"] = df["clean_review"].apply(lambda x: sia.polarity_scores(str(x))["compound"])
df["sentiment"] = df["vader_score"].apply(categorize_netflix_sentiment)

## Step 4: Sentiment from Rating


In [5]:
# 3. Sentiment from star ratings
# -----------------------------
def sentiment_from_rating(rating):
    if rating >= 4:   # 4 or 5 stars
        return "positive"
    elif rating == 3: # middle rating
        return "neutral"
    else:             # 1 or 2 stars
        return "negative"

In [6]:
# 4. Apply to dataset
df["sentiment_rating"] = df["rating"].apply(sentiment_from_rating)


## Step 5: Combined Sentiment Logic


In [7]:
# 5. Combined sentiment function
def combined_sentiment(vader, rating):
    if vader == rating:
        return vader  # Both agree
    elif vader == "neutral":
        return rating  # Trust rating if VADER is neutral
    elif rating == "neutral":
        return vader  # Trust VADER if rating is neutral
    else:
        # If one says positive and the other says negative → neutral
        return "neutral"


In [8]:
# 6. Apply to dataset
df["sentiment_combined"] = df.apply(lambda x: combined_sentiment(x["sentiment"], x["sentiment_rating"]), axis=1)

## Step 6: Sentiment Alignment Analysis 

This section compares how VADER text sentiment aligns with user star-rating sentiment and analyzes how conflicts are resolved through the combined sentiment logic.


In [9]:
# 7. Compare VADER vs Rating
# -----------------------------
comparison = (df["sentiment"] == df["sentiment_rating"]).mean()
print(f"✅ VADER matches rating-based sentiment {comparison*100:.2f}% of the time.")

✅ VADER matches rating-based sentiment 59.73% of the time.


In [10]:
# 8. Comapre Sentiment Combined vs Rating
comparison = (df["sentiment_combined"] == df["sentiment_rating"]).mean()
print(f"✅ VADER matches rating-based sentiment {comparison*100:.2f}% of the time.")

✅ VADER matches rating-based sentiment 79.86% of the time.


## Step 7: Summary Statistics 

In [11]:
# 9. Summary statistics
print("\n📊 Sentiment Summary")
print(df['sentiment'].value_counts())


📊 Sentiment Summary
sentiment
positive    22570
negative     9868
neutral      8800
Name: count, dtype: int64


In [12]:
# 10. Summary statistics for rating-based sentiment
print("\n📊 Rating-based Sentiment Summary")
print(df['sentiment_rating'].value_counts())


📊 Rating-based Sentiment Summary
sentiment_rating
negative    19770
positive    19094
neutral      2374
Name: count, dtype: int64


In [13]:
# 11. Summary statistics for combined sentiment
print("\n📊 Combined Sentiment Summary")
print(df['sentiment_combined'].value_counts())


📊 Combined Sentiment Summary
sentiment_combined
positive    19729
negative    14578
neutral      6931
Name: count, dtype: int64


## Step 8: Save Cleaned Dataset


In [14]:
# 12. Save Cleaned dataset
CLEANED_OUTPUT_LABELED_PATH = Path("../data/netflix_reviews_Cleaned.csv")
df.to_csv(CLEANED_OUTPUT_LABELED_PATH, index=False)

print("\n✅ Sentiment-labeled dataset saved to 'data/netflix_reviews_Cleaned.csv'")
print(f"📊 Final labeled reviews: {len(df)}")
df.head()



✅ Sentiment-labeled dataset saved to 'data/netflix_reviews_Cleaned.csv'
📊 Final labeled reviews: 41238


Unnamed: 0,username,review,rating,date,country,source,clean_review,vader_score,sentiment,sentiment_rating,sentiment_combined
0,Robert,"brainwashing children. don't deny it, the proo...",1,2025-10-08 00:57:12,NG,Play Store,brainwashing children dont deny proof pudding,-0.119,negative,negative,negative
1,Abraham Bernabe,disgusting woke agenda. just cancel and uninst...,1,2025-10-08 00:35:03,NG,Play Store,disgusting woke agenda cancel uninstall garbage,-0.6597,negative,negative,negative
2,Kyle Martin,Please fix the pausing issue. I am trying to w...,2,2025-10-08 00:33:36,NG,Play Store,please fix pausing issue trying watch monday n...,-0.1779,negative,negative,negative
3,Jarrett,"Doesn't really have that much titles, heh lol",3,2025-10-08 00:30:51,NG,Play Store,doesnt really much titles heh lol,0.235,positive,neutral,positive
4,Michael Raber,"gotta pay for the app, they raise the price wi...",1,2025-10-08 00:26:11,NG,Play Store,gotta pay app raise price warning work half time,-0.4215,negative,negative,negative


## Final Sentiment Label Used for Analysis: `sentiment_combined`

For this project, multiple sentiment labels were generated:

| Column | Description |
|--------|------------|
| `sentiment` | Sentiment classified using **VADER** based on review text only. |
| `sentiment_rating` | Sentiment derived from **user star rating** (⭐). |
| `sentiment_combined` | ✅ **Final label used for all analysis and modeling.** |

## ✅ Why I Chose `sentiment_combined`
Using only VADER or only star rating can be misleading:

- 📌 **Star ratings** can be **generic** (e.g., 3-star reviews with positive text).
- 📌 **Text sentiment (VADER)** may sometimes **misinterpret sarcasm or short reviews**.
- ✅ By **combining both**, I created a more **balanced and realistic sentiment signal**:
  - If both agree → keep that label.
  - If one is neutral → trust the stronger signal.
  - If both strongly disagree → mark as **neutral** to reduce noise.

> 👉 Therefore, **all further analysis, visualizations, and ML modeling use `sentiment_combined` as the final sentiment column.**


In [15]:
print(df['sentiment_combined'].value_counts())
print("\n✅ Confirmed: Using `sentiment_combined` for all downstream analysis.")


sentiment_combined
positive    19729
negative    14578
neutral      6931
Name: count, dtype: int64

✅ Confirmed: Using `sentiment_combined` for all downstream analysis.


### Sentiment Labeling Completed!

➡️ Next Notebook: **04_exploratory_data_analysis.ipynb**
