<a href="https://colab.research.google.com/github/jessicasmelton/YTCommentAnalysis/blob/main/Step%203%3A%20Sentiment%20Analysis%20Program.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Sentiment Analysis Program for YouTube Comments:**

This program performs sentiment analysis on YouTube comments. It filters comments based on a specific set of keywords and emojis relevant to the topic, then analyzes the sentiment of the filtered comments using TextBlob. The sentiment analysis results, including sentiment distribution and trends over time, are visualized and saved to a new CSV file.

---

**Usage**

* Ensure your CSV file containing the cleaned and translated comments is correctly formatted and saved in the specified location. This file should be the output from the previous data cleaning and translation steps.

* Replace the file_path variable value in the code with the path to your CSV file. Make sure the file path is correctly specified to avoid file not found errors.

* Create an extensive keyword list specific to your topic. This list is crucial for filtering relevant comments. It is recommended to use a minimum of 150 keywords.

* Execute the program in a Python environment such as Google Colab, Jupyter Notebook, or any local Python environment.

* The program will read the CSV file, filter comments based on the keyword list, perform sentiment analysis, and save the results to a new CSV file.

---

**Notes**

* The keyword list is essential for filtering relevant comments. Customize the keywords list in the code to match your specific research topic.

* The program includes a basic dictionary for emoji sentiment. Expand or modify the emoji_sentiment_dict as needed for your analysis.

* The sentiment analysis is conducted using TextBlob, which provides polarity (positive/negative) and subjectivity (objective/subjective) scores for each comment.


---

**Potential Errors and Fixes**

* Ensure the file path to the CSV file is correct. Verify that the file exists at the specified location.

* If the filtering does not work as expected, ensure the keywords are correctly defined and relevant to your topic. Regular expressions are used for matching, so check for any syntax errors.

* If there are issues with saving the CSV file, check for special characters in the file path or name that may cause problems. Ensure the directory where the file is being saved exists and is writable.

In [None]:
!pip install textblob
!pip install fuzzywuzzy
!pip install python-Levenshtein
!pip install emoji

In [None]:
# Sentiment Analysis Program

# Import necessary libraries
import pandas as pd  # Library for data manipulation and analysis
import numpy as np  # Library for numerical operations
from textblob import TextBlob  # Library for text processing and sentiment analysis
import matplotlib.pyplot as plt  # Library for plotting data
import re  # Regular expressions library for text cleaning
import emoji  # Library for handling emojis

# Load the cleaned and translated comments CSV file
file_path = 'INSERT YOUR FILE PATH HERE.csv'  # Replace with your actual file path
df = pd.read_csv(file_path)

# Define the keywords for filtering relevant comments
# Note: It is important to create an extensive keyword list specific to your topic.
keywords = [
    # ... (your extensive keyword list)
]

# Define the emoji sentiment dictionary
emoji_sentiment_dict = {
    # Mapping of emojis to sentiment categories
    "❤️": "positive", "💩": "negative", "😍": "positive",
    "😊": "positive", "😢": "negative", "😡": "negative", "👍": "positive",
    "👎": "negative", "🎉": "positive", "🙌": "positive", "😞": "negative",
    "😭": "negative", "😃": "positive", "😔": "negative", "🤔": "neutral",
    "😐": "neutral", "🙄": "negative", "😤": "negative", "😉": "positive",
    "😁": "positive", "😠": "negative", "😩": "negative", "😅": "positive",
    "🤢": "negative", "🤮": "negative", "🥳": "positive", "😎": "positive",
    "🤯": "negative", "😇": "positive", "😈": "negative", "👿": "negative",
    "🇬🇾": "neutral",  # Guyana flag
    "🇻🇪": "neutral"   # Venezuela flag
}

# Function to extract emojis from text
def extract_emojis(text):
    return ''.join(c for c in text if c in emoji.EMOJI_DATA)

# Function to determine if a comment should be included based on emojis and text
def should_include_comment_based_on_emojis(comment):
    emojis_in_comment = extract_emojis(comment)

    # Include if it has a national flag and another emoji
    if "🇬🇾" in emojis_in_comment or "🇻🇪" in emojis_in_comment:
        if len(emojis_in_comment) > 1:
            return True
    return False

# Function to filter relevant comments based on keywords and emojis
def filter_comments(comment):
    comment_lower = comment.lower()  # Convert comment to lowercase

    # Include if there's relevant text (keywords)
    for keyword in keywords:
        if re.search(r'\b' + re.escape(keyword.lower()) + r'\b', comment_lower):
            return True

    # Include based on emoji criteria
    if should_include_comment_based_on_emojis(comment):
        return True

    return False

# Apply the filter to determine relevance of comments
df['Relevant'] = df['Comment Text'].apply(filter_comments)
filtered_df = df[df['Relevant']]

# Function to conduct sentiment analysis on comments
def get_sentiment(comment):
    blob = TextBlob(comment)
    return blob.sentiment.polarity, blob.sentiment.subjectivity

# Apply sentiment analysis to relevant comments
filtered_df[['Polarity', 'Subjectivity']] = filtered_df['Comment Text'].apply(lambda x: pd.Series(get_sentiment(x)))

# Function to categorize sentiment based on polarity
def sentiment_category(polarity):
    if polarity > 0:
        return 'Positive'
    elif polarity < 0:
        return 'Negative'
    else:
        return 'Neutral'

# Categorize sentiment for each comment
filtered_df['Sentiment'] = filtered_df['Polarity'].apply(sentiment_category)

# Calculate the proportion of comments that are positive, negative, or neutral
sentiment_counts = filtered_df['Sentiment'].value_counts(normalize=True) * 100

print("Sentiment Proportions:")
print(sentiment_counts)

# Track sentiment over time by month
filtered_df['Date Published'] = pd.to_datetime(filtered_df['Date Published'])
filtered_df['Month'] = filtered_df['Date Published'].dt.to_period('M')
monthly_sentiment = filtered_df.groupby('Month')['Polarity'].mean()

# Visualize the results
# Sentiment Distribution Bar Chart
plt.figure(figsize=(10, 6))
sentiment_counts.plot(kind='bar', color=['green', 'gray', 'red'])
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment')
plt.ylabel('Proportion (%)')
plt.show()

# Sentiment Over Time Line Chart
plt.figure(figsize=(10, 6))
monthly_sentiment.plot(kind='line', marker='o')
plt.title('Sentiment Over Time')
plt.xlabel('Month')
plt.ylabel('Average Polarity')
plt.grid(True)
plt.show()

# Save the filtered and analyzed data to a new CSV file
output_file_path = 'Total_Sentiment_Analysis.csv'
filtered_df.to_csv(output_file_path, index=False)

print(f"Filtered and analyzed data has been saved to {output_file_path}")