# YouTube Comments Sentiment Analysis
### Full Interactive Demo Notebook

This notebook demonstrates a complete pipeline for:
- Fetching YouTube comments
- Filtering irrelevant comments
- Detecting emoji sentiment
- Detecting gibberish text
- Applying transformer-based sentiment analysis
- Exporting results to CSV


## 1Ô∏è‚É£ Install Dependencies
Run this cell if you are using a fresh environment (e.g. Google Colab).

In [None]:
!pip install google-api-python-client pandas transformers emoji torch --quiet

## 2Ô∏è‚É£ Imports & Configuration
Insert your YouTube API key and target video ID below.

In [None]:
from googleapiclient.discovery import build
import pandas as pd
from transformers import pipeline
import emoji
import re

API_KEY = "YOUR_API_KEY_HERE"
VIDEO_ID = "your-Video-ID"

## 3Ô∏è‚É£ Initialize YouTube Client

In [None]:
youtube = build("youtube", "v3", developerKey=API_KEY)

## 4Ô∏è‚É£ Keyword Filter Configuration

In [None]:
keywords = [
    "which", "wallpaper", "wallpapers", "confused", "dubbing",
    "intro", "intros", "lottery", "marques", "viewers",
    "content", "shirt", "MKBHD", "mustache", "he",
    "guy", "or", "choice", "watching"
]

## 5Ô∏è‚É£ Fetch YouTube Comments

In [None]:
def getComments(video_id, max_comments=500):
    comments = []
    next_page = None

    while len(comments) < max_comments:
        request = youtube.commentThreads().list(
            part='snippet',
            videoId=video_id,
            maxResults=100,
            pageToken=next_page,
            textFormat='plainText'
        )
        response = request.execute()

        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

        next_page = response.get('nextPageToken')
        if not next_page:
            break

    return comments

## 6Ô∏è‚É£ Filter Comments

In [None]:
def filterComments(comments, keywords):
    return [c for c in comments if not any(k.lower() in c.lower() for k in keywords)]

## 7Ô∏è‚É£ Emoji Sentiment Engine

In [None]:
positive_emojis = {"üòÇ","ü§£","üòç","‚ù§Ô∏è","üî•","üòÅ","üòä","üòÉ","üëç","üôè"}
negative_emojis = {"üò°","ü§¨","üò¢","üò≠","üëé","üò†","üôÅ","üòû"}

def emoji_sentiment_score(text):
    pos = sum(ch in positive_emojis for ch in text)
    neg = sum(ch in negative_emojis for ch in text)

    if pos == 0 and neg == 0:
        return None
    if pos > neg:
        return "POSITIVE"
    if neg > pos:
        return "NEGATIVE"
    return None

## 8Ô∏è‚É£ Gibberish Detection

In [None]:
def is_gibberish(text):
    cleaned = re.sub(r'[^a-zA-Z]', '', text).lower()
    if len(cleaned) < 5:
        return False
    if not re.search(r'[aeiou]', cleaned):
        return True
    half = len(cleaned) // 2
    if cleaned[:half] == cleaned[half:]:
        return True
    if len(set(cleaned)) <= 2:
        return True
    return False

## 9Ô∏è‚É£ Load Sentiment Model

In [None]:
sentiment_model = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)

## üîü Hybrid Sentiment Analyzer

In [None]:
def analyze_comment(text):
    emoji_label = emoji_sentiment_score(text)
    if emoji_label:
        return emoji_label
    if is_gibberish(text):
        return "NEUTRAL"
    cleaned = emoji.demojize(text)
    result = sentiment_model(cleaned[:256])[0]
    return result['label'].upper()

## 1Ô∏è‚É£1Ô∏è‚É£ Run Full Pipeline

In [None]:
comments = getComments(VIDEO_ID)
filtered_comments = filterComments(comments, keywords)

sentiments = [analyze_comment(c) for c in filtered_comments]

df = pd.DataFrame({
    "comment": filtered_comments,
    "sentiment": sentiments
})

df.head(10)

## 1Ô∏è‚É£2Ô∏è‚É£ Save Results

In [None]:
df.to_csv("youtube_comments_sentiment.csv", index=False)
print("Saved youtube_comments_sentiment.csv")

## 1Ô∏è‚É£3Ô∏è‚É£ Sentiment Distribution

In [None]:
df['sentiment'].value_counts().plot(kind='bar', title='Sentiment Distribution')