# **Analyzing Global Reactions to the 2024 U.S. Presidential Debate on YouTube: A Sentiment Analysis and Keyword Analysis of Non-U.S. Citizen Perspectives**

## **Introduction**
###### The 2024 U.S. Presidential Election represents a pivotal moment not only for American democracy but also for its role on the global stage. As one of the world's most influential nations, the United States' political leadership directly impacts international policy, economic stability, and diplomatic relationships. The outcomes of its elections reverberate far beyond its borders, influencing decisions in capitals and boardrooms worldwide. In an age of instant digital communication, social media platforms such as YouTube have become arenas for real-time international discourse, offering unprecedented access to unfiltered global perspectives. These platforms provide a unique lens through which to examine how non-U.S. citizens perceive the political dynamics of the United States.

###### Recent events surrounding the 2024 U.S. Presidential Election underscore why this analysis is timely and necessary. Coverage from international media outlets, including publications such as The Guardian and Al Jazeera, has highlighted a growing global disillusionment with American leadership. Headlines focusing on the fractious nature of U.S. political debates, perceived inefficacy in addressing global challenges like climate change and security, and polarizing rhetoric have contributed to a narrative of declining U.S. influence. For instance, articles published during the election month highlighted the frustration of European leaders regarding the unpredictability of U.S. foreign policy, as well as concerns in Asian capitals about Washington's ability to maintain credibility as a global power broker. These sentiments, echoed in both diplomatic statements and media coverage, form the backdrop against which international audiences evaluate the U.S. political process.

###### Against this global context, the 2024 U.S. Presidential Debate serves as a critical touchpoint for understanding how foreign audiences interpret and react to U.S. leadership. Presidential debates are not only a domestic spectacle but also a highly visible demonstration of political values, governance philosophies, and future policy directions. They offer a window into the priorities of American leaders and, by extension, into the strengths and weaknesses of its democratic institutions. This makes debates an ideal subject for analyzing international sentiment, particularly as they often provoke reactions tied to broader issues such as transparency, accountability, and competence.

###### This research focuses specifically on how non-U.S. citizens reacted to the 2024 U.S. Presidential Debate on YouTube, exploring their commentary to uncover patterns of sentiment and recurring themes. The central research question guiding this work is: How do non-U.S. citizens perceive the 2024 U.S. Presidential Debate, and what do these perceptions reveal about global attitudes toward U.S. leadership and its political processes? My hypothesis is that non-U.S. citizens predominantly express negative sentiment, characterized by cynicism toward the candidates' competence and skepticism regarding the legitimacy of the political process. This negative outlook likely reflects a broader disillusionment with U.S. leadership, accompanied by concerns about the erosion of its democratic institutions, as perceived from abroad.

###### Understanding these global perceptions is not merely an academic exercise; it carries significant implications for the United States' international standing. Perceptions of a nation's leadership play a critical role in shaping soft power, which, in turn, influences its ability to form alliances, negotiate trade agreements, and lead on global issues. When foreign audiences view U.S. political leadership with skepticism, this can erode trust in its commitments and leadership. Conversely, identifying the sources of discontent and addressing them presents an opportunity to rebuild credibility and foster stronger international relationships.

###### This research is particularly important in light of the increasing reliance on digital platforms to gauge public opinion. YouTube, as one of the most widely used platforms globally, captures a diversity of voices that traditional opinion polls or formal media channels may overlook. User-generated comments reflect unfiltered and immediate reactions, providing a raw yet insightful data source for sentiment analysis. By analyzing these reactions, this study seeks to go beyond media narratives and official statements to understand how ordinary individuals across the world perceive the United States during a critical electoral moment.

###### Moreover, this research contributes to the broader conversation about the role of democratic leadership in a multipolar world. At a time when authoritarian regimes challenge liberal democracies, the U.S.' ability to project stability, competence, and democratic values is under heightened scrutiny. How global audiences perceive its political processes may influence not only their trust in American leadership but also their faith in the broader democratic model. The 2024 Presidential Debate offers a microcosm of this larger issue, making it a compelling subject for investigation.

###### By examining the sentiment and themes in YouTube comments from non-U.S. citizens, this research aims to provide policymakers with actionable insights into the areas where U.S. leadership falls short and where it has the potential to rebuild trust. It also seeks to highlight the evolving ways in which international audiences engage with U.S. politics in the digital age. Understanding these dynamics is essential for fostering more transparent, inclusive, and effective global engagement in the years to come.

## **Methodology**
###### This research employed a structured and systematic approach to analyze how non-U.S. citizens reacted to the 2024 U.S. Presidential Debate. The methodology consists of five key steps, each carefully designed to ensure rigorous analysis:

###### 1. Data Collection: Comments were extracted from the YouTube video of the debate using the YouTube Data API. This step provided a robust dataset of user-generated content, capturing the global audience's reactions.

###### 2. Data Preprocessing and Filtering Foreign Perspectives: After collecting the comments, a comprehensive cleaning process was applied to standardize the text and remove irrelevant content. This included stripping URLs, special characters, and redundant text while retaining emojis to preserve sentiment-related nuances. Following  this preprocessing, comments were filtered to identify those reflecting foreign perspectives. Regular expressions (regex) and keywords such as “in my country,” “from abroad,” or mentions of specific countries were used to isolate comments likely authored by non-U.S. citizens.

###### 3. Sentiment Analysis: VADER was used to analyze the sentiment of each filtered comment. This lexicon-based tool assigned a compound score reflecting the positivity, negativity, or neutrality of the text. Custom rules were also integrated to account for context-specific phrases, such as sarcasm or emotionally charged language, to enhance accuracy.

###### 4. Keyword Analysis: To identify recurring themes and topics within the comments, keywords were extracted using CountVectorizer. This analysis highlighted the frequency and importance of specific terms that offered insight into the dominant areas of focus for non-U.S. viewers.

###### 5. Visualization: The results were visualized using a histogram and Word Cloud. The histogram illustrated the distribution of sentiment scores, revealing patterns of neutrality, negativity, and positivity among the comments. The Word Cloud provided a visual representation of the most frequent terms, emphasizing key themes and topics that resonated with foreign audiences.

###### These steps collectively allowed for a comprehensive examination of international sentiment and thematic patterns in the YouTube comments. The following sections delve into each step, providing detailed explanations of the techniques and tools used.



#### 1. Data Collection
###### To collect user-generated comments, I utilized the YouTube Data API, which allows programmatic access to video data, including comments. I specifically focused on the official debate video using its Video ID: GdSDngmDLmY. A total of 1,000 comments were extracted, providing a robust dataset for analysis. Python's googleapiclient library facilitated the API connection, and comments were stored in a pandas DataFrame for subsequent processing. The collection process ensured that comments were retrieved in an efficient and structured manner, allowing me to build a foundation for further filtering and analysis.

###### 1.1 Importing the necessary libraries for the analysis:

In [None]:
import pandas as pd #For data manipulation and organization
import re #For cleaning and preprocessing text data using regex
from nltk.sentiment.vader import SentimentIntensityAnalyzer #For performing sentiment analysis on text
from wordcloud import WordCloud #For creating word cloud visualizations
import matplotlib.pyplot as plt #For generating data visulations and plots
from googleapiclient.discovery import build #For accessing YouTube Data API to fetch comments
from sklearn.feature_extraction.text import CountVectorizer #For converting text data into numerical format

###### 1.2 Initializing the YouTube API using an API key to extract video comments, with additional commentaries in the code below:

In [None]:
API_KEY = 'My personal API key (not written for privacy reasons)'  # Replaced with my own API key when running the code
VIDEO_ID = 'GdSDngmDLmY'  # Replaced with YouTube video ID for 2024 U.S. Presidential Debate
youtube = build('youtube', 'v3', developerKey=API_KEY)

# Defining function to fetch comments from the YouTube video
def get_comments(video_id):
    # Setting up my list to store YouTube comments
    comments = []
    request = youtube.commentThreads().list(
        part='snippet',
        videoId=video_id,
        maxResults=100
    )
    response = request.execute()

    # Looping to fetch all pages of comments if there are multiple pages
    while response:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

        if 'nextPageToken' in response:
            request = youtube.commentThreads().list(
                part='snippet',
                videoId=video_id,
                maxResults=100,
                pageToken=response['nextPageToken']
            )
            response = request.execute()
        else:
            break

    return comments

comments = get_comments(VIDEO_ID)
comments_df = pd.DataFrame(comments, columns=['comment'])

#### 2. Data Preprocessing by Filtering Foreign Perspectives
###### Before the foreign comments were identified, I applied a preprocessing step to clean and prepare the text for analysis. Preprocessing involved removing irrelevant elements while preserving emojis, as emojis can provide critical sentiment context—such as sarcasm, humor, or anger—that words alone might miss. The steps included converting text to lowercase, removing URLs, and stripping special characters while retaining punctuation and emojis. For example: python Copy code. The cleaned text was stored in a new column labeled cleaned_comment, ensuring the data remained sentiment-rich and ready for analysis.

###### Since my focus was on non-U.S., or foreign, citizen perspectives, the next step was to identify comments that explicitly reflected a foreign perspective. I filtered comments using a combination of keywords and regular expressions (regex). Keywords such as “in my country,” “from abroad,” “outside the U.S.,” and other phrases indicating international origins were used to flag relevant comments. Additionally, I implemented a regex pattern to detect specific mentions of country names following words like “in” or “from,” such as “in Japan” or “from Germany.” By combining keyword matching and regex, this filtering process effectively isolated comments likely to be written by international viewers. The filtered comments were stored in a new DataFrame for further processing.

###### 2.1 Defining a function to clean and preprocess comments by removing irrelevant text elements:


In [None]:
def preprocess_comment(comment):
    # Removes URLs from comments
    comment = re.sub(r'http\S+', '', comment)
    # Converts all text to lowercase for uniformity purposes
    comment = comment.lower()
    # Retains only alphabets, numbers, punctuation, and emojis
    comment = re.sub(r'[^\w\s,.!?�-�]', '', comment, flags=re.UNICODE)
    return comment.strip()

comments_df['cleaned_comment'] = comments_df['comment'].apply(preprocess_comment)

###### 2.2 Enhancing the filter for identifying foreign perspective comments:

In [None]:
# Defines a function to identify whether a comment is likely from a foreign perspective
def identify_foreign_comments(comment):
    keywords = [
        "in my country", "from abroad", "in germany", "in japan",
        "from the uk", "america", "u.s.", "as a non-american",
        "outside the u.s.", "not american", "foreign perspective",
        "in", "outside america", "not from here", "from [country]"
    ]
    # Looks for country-specific mentions with a general regex:
    country_mentions = re.findall(r'\b(in|from) [a-zA-Z ]+\b', comment)
    if country_mentions:
        return True
    for keyword in keywords:
        if keyword in comment:
            return True
    return False

# Applies the foreign commment identification function
comments_df['is_foreign'] = comments_df['cleaned_comment'].apply(identify_foreign_comments)

# Filters only foreign comments for further analysis
foreign_comments_df = comments_df[comments_df['is_foreign']]
foreign_comments_df = foreign_comments_df[foreign_comments_df['cleaned_comment'].str.strip() != '']

# Verifies non-empty comments, with warning if there no foreign comments are identified and use all comments as fallback
if foreign_comments_df.empty:
    print("Warning: No foreign comments identified. Falling back to all comments.")
    foreign_comments_df = comments_df[comments_df['cleaned_comment'].str.strip() != '']

#### 3. Sentiment Analysis
###### For sentiment classification, I employed VADER, a lexicon-based sentiment analysis tool to perform well on social media text. VADER assigns each comment a compound sentiment score ranging from -1 (most negative) to +1 (most positive). However, because YouTube comments often include context-specific phrases that VADER may misinterpret, I implemented custom rules to adjust sentiment scores for particular expressions. For example, a phrase like “What a joke," which VADER might score neutrally, was manually adjusted to reflect a more negative sentiment; on the other hand, another phrase like “Finally someone who gets it” was given a positive adjustment.By combining VADER's robustness with custom rules, my analysis captured nuanced sentiments, including sarcasm and emotionally-charged language.



###### 3.1 Defining a funciton to adjust sentiment scores for specific phrases that are not well-handled by VADER:

In [None]:
def adjust_sentiment(comment, vader_score):
    if "what a joke" in comment: # Phrase indicating negativity
        vader_score -= 0.5 # Decrease the sentiment score
    if "finally someone who gets it" in comment: # Phrase indicating positivity
        vader_score += 0.5 # Increase the sentiment score
    if "oh sure" in comment and "😂" in comment:  # Sarcastic phrase with an emoji
        vader_score -= 0.3 #Adjust score for sarcasm
    return vader_score #Return the adjusted score

###### 3.2 Applying the VADER sentiment analyzer to comments:

In [None]:
# Initializes the VADER sentiment analyzer
sid = SentimentIntensityAnalyzer()

# Applies VADER to claened comments and adjust scores using the custom function
foreign_comments_df['sentiment_score'] = foreign_comments_df['cleaned_comment'].apply(
    lambda comment: adjust_sentiment(comment, sid.polarity_scores(comment)['compound'])
)

#### 4. Keyword Analysis
###### Finally, to uncover recurring themes within the comments, I used CountVectorizer from Scikit-learn to extract frequent words. I generated a Word Cloud to visually represent the most common words, with larger font sizes indicating higher frequency. This visualization complemented the sentiment analysis by revealing specific topics and terms that dominated the conversation.




4.1 Using CountVectorizer to extract frequent keywords from comments and generate a co-occurence matrix:

In [None]:
vectorizer = CountVectorizer(stop_words='english')
# Removes comment English stopwords
if not foreign_comments_df['cleaned_comment'].empty:
    X = vectorizer.fit_transform(foreign_comments_df['cleaned_comment']) # Transform comments into keyword frequency matrix
    keyword_freq = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out()) # Convert matrix to DataFrame

    # Generates co-occurrence matrix
    co_occurrence = keyword_freq.T.dot(keyword_freq)
else:
    keyword_freq = pd.DataFrame()
    co_occurrence = pd.DataFrame()

#### 5. Data Visualization from Sentiment and Wordcloud Analyses

5.1 Sentiment Trend Visualization

In [None]:
# Sentiment Trend
if not foreign_comments_df['sentiment_score'].empty:
    plt.hist(foreign_comments_df['sentiment_score'], bins=20, color='skyblue', edgecolor='black')
    plt.title('Sentiment Score Distribution')
    plt.xlabel('Sentiment Score')
    plt.ylabel('Frequency')
    plt.show()
else:
    print("No sentiment scores to visualize.")

5.2 Word Cloud Visualization

In [None]:
# Word Cloud
if not foreign_comments_df['cleaned_comment'].empty:
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.join(foreign_comments_df['cleaned_comment']))
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.title('Word Cloud of Comments')
    plt.show()
else:
    print("No comments to generate a word cloud.")

# Saves results (just in case)
foreign_comments_df.to_csv('sentiment_analysis_results.csv', index=False)
if not co_occurrence.empty:
    co_occurrence.to_csv('keyword_co_occurrence.csv')
else:
    print("No keyword co-occurrence data to save.")

## **Results**
###### The findings from the sentiment analysis and keyword analysis offer a nuanced understanding of how non-U.S. citizens reacted to the 2024 U.S. Presidential Debate. These insights, derived from the collected and filtered YouTube comments, provide valuable evidence of the global audience's attitudes toward U.S. political leadership and its processes.


#### Sentiment Analysis
###### The distribution of sentiment scores reveals a notable trend: neutral and negative sentiment overwhelmingly dominated the reactions. The histogram of sentiment scores (shown below) illustrates this pattern, with the majority of comments clustered around a score of 0, signifying neutrality. At the same time, a substantial portion leaned toward the negative range, scoring between -0.5 and -1.0. This skew indicates that while many foreign viewers approached the debate with observational curiosity, a significant fraction expressed discontent and distrust. The dominance of neutral sentiment suggests that numerous international viewers engaged with the debate without significant emotional investment. Such comments often provided factual or descriptive observations about the debate or the candidates' statements. For instance, remarks like “They discussed housing policies for too long” reflect a detached engagement, where viewers assessed the content without revealing strong opinions. This neutrality may indicate a lack of familiarity with the intricacies of U.S. politics or a cautious approach to forming judgments.

###### However, the substantial left skew toward negativity highlights a broader theme of skepticism and cynicism among non-U.S. viewers. Comments in this category frequently criticized the candidates' competence, the lack of substantive debate, and the overall perceived legitimacy of the process. Statements such as “This debate is nothing but a staged performance” or “What a joke—these people can't lead the world” underscore the prevailing dissatisfaction. Negative sentiment often revealed frustration with both the rhetoric and the broader political context, suggesting disillusionment with the U.S.' ability to present credible leadership on the global stage.

###### Positive sentiment, though present, was far less common. Comments classified as positive were often associated with specific moments or behaviors during the debate that resonated with viewers. These comments reflected approval for instances where candidates were seen as offering clarity or addressing pressing issues. For example, “Finally someone who understands the real challenges we face” highlights rare moments of alignment between candidates' statements and viewers' expectations. However, the relative scarcity of such comments reinforces the conclusion that optimism toward U.S. leadership among foreign viewers is limited. These patterns confirm the hypothesis that global audiences, particularly non-U.S. citizens, approach U.S. political discourse with a mixture of skepticism and limited enthusiasm. The results also highlight the broader disillusionment with American political leadership, further emphasizing the need for a more substantive and globally relevant approach in engaging international audiences.

INSERT PICTURE HERE

#### Keyword Analysis
###### The keyword analysis, visualized through the Word Cloud below, provides further insights into the themes and topics that captured the attention of non-U.S. viewers. This analysis identifies the most frequently used terms in the comments, revealing key areas of focus and concern. The prominence of the word “Kamala” underscores the significant attention directed toward Kamala Harris, reflecting her role as a symbolic figure of U.S. leadership. As the current Vice President, Harris holds a unique position that combines domestic policy influence with global visibility. The high frequency of her name in the comments suggests that international audiences closely observed her performance, possibly scrutinizing her statements and demeanor as a reflection of U.S. leadership's current and future direction. This observation aligns with the hypothesis that symbolic figures often attract heightened global scrutiny, particularly when they represent diverse and historically underrepresented groups.

###### Beyond specific individuals, the Word Cloud also highlights recurring themes of critique and skepticism. Words such as “joke,” “lying,” “stop,” and “need” point to a pervasive distrust of the candidates' rhetoric and the perceived lack of authenticity in their messages. Comments containing these words often reflected frustration with the debate's substance and questioned the candidates' ability to address critical global challenges. For instance, phrases like “They're just making promises they won't keep” and “This is why no one trusts U.S. politicians anymore” reveal deep-seated doubts about the legitimacy and effectiveness of the political discourse.

###### Interestingly, the Word Cloud also highlights neutral or observational terms like “debate,” “talk,” and “question.” These words suggest that many viewers approached the debate as an opportunity to observe rather than participate emotionally. Such terms indicate an intellectual curiosity about the U.S. political process, where viewers sought to understand the arguments presented without necessarily forming strong opinions. This neutrality, when contrasted with the strong negative terms, paints a picture of a divided audience—one part disillusioned and critical, the other detached and analytical.

###### The analysis further reveals the absence of certain globally significant terms, such as “climate” or “cooperation,” which might have been expected in discussions of leadership on shared global issues. This omission may reflect dissatisfaction with the debate's focus on domestic policies, potentially alienating international viewers who look to the U.S. for leadership on global challenges.

###### Together, the keyword analysis complements the sentiment findings by highlighting the specific areas where U.S. leadership fails to meet global expectations. The prominence of critical terms, coupled with the high frequency of observational words, underscores a dual theme of engagement and disillusionment, offering valuable insights into the global perception of U.S. politics.

INSERT PICTURE HERE

## **Discussion and Conclusion**
###### The findings from both the sentiment analysis and keyword analysis strongly support the hypothesis that non-U.S. citizens predominantly express negative sentiment when reacting to the 2024 U.S. Presidential Debate. The sentiment distribution reflects widespread skepticism and cynicism, while the keyword analysis emphasizes themes of distrust, incompetence, and dissatisfaction with the substance of the debate. These findings shed light on the complex and multifaceted ways in which foreign audiences engage with U.S. political discourse.

###### One of the most striking observations is the significant attention directed toward Kamala Harris. As a symbolic figure, Harris represents not only U.S. leadership but also the broader aspirations of inclusion and progressivism on the global stage. However, the critical themes associated with her name in the comments suggest that symbolic representation alone is insufficient to satisfy international expectations. Global audiences appear to demand both symbolic and substantive leadership, emphasizing the need for authenticity, clarity, and action.

###### The prevalence of negative sentiment and critical keywords has important implications for U.S. policymakers and leaders. The erosion of trust, as reflected in these findings, highlights the urgent need for transparent and substantive communication that addresses both domestic and global concerns. The frequent references to broken promises and unfulfilled rhetoric underscore the importance of aligning words with action, particularly on issues of global relevance. Policymakers must recognize that foreign audiences are not passive observers but active evaluators of U.S. political processes, whose perceptions can influence international relations and soft power.

###### These insights also present an opportunity for rebuilding. By addressing the concerns raised in these comments, U.S. leaders can begin to restore credibility and strengthen international partnerships. For example, incorporating global perspectives into domestic debates and demonstrating a commitment to addressing shared challenges—such as climate change and international security—can help rebuild trust. Furthermore, ensuring that symbolic leaders like Kamala Harris are supported by substantive policies can bridge the gap between expectations and performance, reinforcing the U.S.' position as a credible global leader.

###### Future research could expand on these findings by incorporating data from additional debates or analyzing comments over a longer period. Advanced techniques such as topic modeling could uncover deeper thematic narratives, while segmenting comments by geographic regions could provide more granular insights into how perceptions vary across the globe. Such analyses would offer a richer understanding of global attitudes toward U.S. politics and identify further opportunities for engagement and improvement.

###### In conclusion, the analysis reveals a complex relationship between global audiences and U.S. political leadership. While skepticism and disillusionment dominate, there remains a level of curiosity and engagement that reflects the enduring influence of U.S. politics on the world stage. By addressing the criticisms and leveraging the insights uncovered in this research, U.S. leaders can work toward rebuilding trust and fostering stronger, more inclusive global relationships.

### **Full Code Below**

In [None]:
import pandas as pd
import re
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from googleapiclient.discovery import build
from sklearn.feature_extraction.text import CountVectorizer

In [None]:
# 1. Data Collection
API_KEY = 'YOUR_API_KEY'  # Replaced with my personal API key
VIDEO_ID = 'VIDEO_ID'  # Replaced with my YouTube video ID of investigation
youtube = build('youtube', 'v3', developerKey=API_KEY)

def get_comments(video_id):
    comments = []
    request = youtube.commentThreads().list(
        part='snippet',
        videoId=video_id,
        maxResults=100
    )
    response = request.execute()

    while response:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

        if 'nextPageToken' in response:
            request = youtube.commentThreads().list(
                part='snippet',
                videoId=video_id,
                maxResults=100,
                pageToken=response['nextPageToken']
            )
            response = request.execute()
        else:
            break

    return comments

comments = get_comments(VIDEO_ID)
comments_df = pd.DataFrame(comments, columns=['comment'])

In [None]:
# 2. Data Preprocessing
def preprocess_comment(comment):
    comment = re.sub(r'http\S+', '', comment)
    comment = comment.lower()
    comment = re.sub(r'[^\w\s,.!?😂😡]', '', comment, flags=re.UNICODE)
    return comment.strip()

comments_df['cleaned_comment'] = comments_df['comment'].apply(preprocess_comment)

def identify_foreign_comments(comment):
    keywords = [
        "in my country", "from abroad", "in germany", "in japan",
        "from the uk", "america", "u.s.", "as a non-american",
        "outside the u.s.", "not american", "foreign perspective",
        "in", "outside america", "not from here", "from [country]"
    ]

    country_mentions = re.findall(r'\b(in|from) [a-zA-Z ]+\b', comment)
    if country_mentions:
        return True
    for keyword in keywords:
        if keyword in comment:
            return True
    return False

comments_df['is_foreign'] = comments_df['cleaned_comment'].apply(identify_foreign_comments)

foreign_comments_df = comments_df[comments_df['is_foreign']]
foreign_comments_df = foreign_comments_df[foreign_comments_df['cleaned_comment'].str.strip() != '']

if foreign_comments_df.empty:
    print("Warning: No foreign comments identified. Falling back to all comments.")
    foreign_comments_df = comments_df[comments_df['cleaned_comment'].str.strip() != '']

In [None]:
# 3. Sentiment Analysis
def adjust_sentiment(comment, vader_score):
    if "what a joke" in comment:
        vader_score -= 0.5
    if "finally someone who gets it" in comment:
        vader_score += 0.5
    if "oh sure" in comment and "😂" in comment:
        vader_score -= 0.3
    return vader_score

sid = SentimentIntensityAnalyzer()

foreign_comments_df['sentiment_score'] = foreign_comments_df['cleaned_comment'].apply(
    lambda comment: adjust_sentiment(comment, sid.polarity_scores(comment)['compound'])
)

In [None]:
# 4. Keyword Analysis
vectorizer = CountVectorizer(stop_words='english')
if not foreign_comments_df['cleaned_comment'].empty:
    X = vectorizer.fit_transform(foreign_comments_df['cleaned_comment'])
    keyword_freq = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())

    co_occurrence = keyword_freq.T.dot(keyword_freq)
else:
    keyword_freq = pd.DataFrame()
    co_occurrence = pd.DataFrame()

In [None]:
# 5. Visualization
# Sentiment Trend Visualization
if not foreign_comments_df['sentiment_score'].empty:
    plt.hist(foreign_comments_df['sentiment_score'], bins=20, color='skyblue', edgecolor='black')
    plt.title('Sentiment Score Distribution')
    plt.xlabel('Sentiment Score')
    plt.ylabel('Frequency')
    plt.show()
else:
    print("No sentiment scores to visualize.")

# Word Cloud Visualization
if not foreign_comments_df['cleaned_comment'].empty:
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.join(foreign_comments_df['cleaned_comment']))
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.title('Word Cloud of Comments')
    plt.show()
else:
    print("No comments to generate a word cloud.")

# Saving results (just in case)
foreign_comments_df.to_csv('sentiment_analysis_results.csv', index=False)

if not co_occurrence.empty:
    co_occurrence.to_csv('keyword_co_occurrence.csv')
else:
    print("No keyword co-occurrence data to save.")