# A Sentiment Analysis on Trending Hashtags and Its Related Content during the COVID-19 Pandemic

#### Introduction
- ***Focus***: Sentiment analysis of trending hashtags and their respective videos on TikTok during the COVID-19 pandemic.
- ***Importance***: Social media's rise means that it is increasingly becoming a communication tool for various aspects of life. One of the ways we saw this was during the pandemic, where it played a crucial role in public health communication, especially during quarantine and remote adaptation.
- ***Objective***: Analyze how public health-related messages are emotionally received by the public, particularly younger demographics, as they constitute the largest customer base of social media platforms such as TikTok.

#### Research Question
“What do trending hashtags on TikTok reveal about public sentiment towards public health messages during the COVID-19 pandemic?”

## TikTok by the Numbers 

- ***How many people use TikTok globally?***
    - "The number of users on TikTok is anticipated to rise by 6.7% in 2024, to surpass 2 billion."
    - "TikTok rose to popularity in 2020, reaching a [peak](https://www.statista.com/statistics/1377008/tiktok-worldwide-downloads-quarterly/) of approximately 313.5 million downloads during the first quarter of the year. The popular social media platform is forecasted to reach approximately 2.35 billion users in 2029."
    - ![Global Statistics of TikTok Users from 2018 to 2029](assets/statista-Number-of-TikTok-users-worldwide-from-2018-to-2029.jpg)

 
- ***TikTok Age Demographics***
    - In 2024, 18 to 24 year olds make up 36.2% of TikTok users, and 25 to 24 year olds make up 33.9%.
    - More than 2/3 of TikTok users are between 18 and 34 years old.
    - ![Age Demographics of TikTok from 2024](assets/TikTok-users-by-age-2024.jpg)

 
- ***An Increase of Using TikTok as "News" Source***
    - "A small but growing share of U.S. adults say they regularly get news on TikTok... In just three years, the share of U.S. adults who say they regularly get news from TikTok has more than quadrupled, from 3% in 2020 to 14% in 2023."(4)
    - ![TikToks being used as a news source, statistics by age group over the years](assets/tiktok-as-news-source.jpg)



- ***Why are we focusing on TikTok and not other social media platforms?***
    - TikTok is increasingly being used as a primary platform for younger demographics, making it an ideal case study for analyzing the emotional reception of public health-related messages among this key audience. "This is in contrast with many other social media sites, where news consumption has either declined or stayed about the same in recent years." (4)
    - ![TikTok Increasingly being used over other social media platforms](assets/news-on-tiktok-vs-others.jpg)



- Source Links: [1](https://www.statista.com/forecasts/1142687/tiktok-users-worldwide), [2](https://www.statista.com/statistics/1377008/tiktok-worldwide-downloads-quarterly/), [3](https://www.oberlo.com/statistics/tiktok-age-demographics), [4](https://www.pewresearch.org/short-reads/2023/11/15/more-americans-are-getting-news-on-tiktok-bucking-the-trend-seen-on-most-other-social-media-sites/)

## Defining the Project Scope
### Time Period Selection
- Time Period: March 2020 to December 2020
- Reasoning:
    - **March 2020**: The World Health Organization declared COVID-19 a global pandemic on March 11, 2020. This period marks the beginning of widespread awareness and significant changes in public behavior and communication patterns.
        - [Source Link](https://www.cdc.gov/museum/timeline/covid19.html#:~:text=March%2011%2C%202020,declares%20COVID%2D19%20a%20pandemic.)
    - **December 2020**: By the end of 2020, various key events had taken place, including the approval of the first vaccines (Pfizer-BioNTech and Moderna) in December.
        - [Source Link](https://www.mayoclinic.org/diseases-conditions/history-disease-outbreaks-vaccine-timeline/covid-19#:~:text=The%20FDA%20gives,since%20December%202020.)
  
I chose these months because this period captures the initial pandemic response, lockdowns, social distancing measures, and the beginning of vaccine distribution

### Big Picture Hashtag Selection
- Hashtags:
    - **#COVID19**: broadly covers all content related to COVID-19, ensuring a wide array of public health messages
    - **#Coronavirus**: similar to #COVID19 but may capture different audiences or slightly different content
    - **#Pandemic**: encompasses the broader impact and responses to the pandemic beyond just the virus itself
    - **#StayHome**: reflects public health messaging around lockdowns and staying indoors to prevent the spread
    - **#WearAMask**: focuses on wearing masks, which was a crucial public health directive during the pandemic
    - **#SocialDistancing**: a term that became commonly used during the pandemic to maintain physical distance in an effort to reduce transmissions
    - **#FlattenTheCurve**: this represents early efforts to reduce the infection rate to manageable levels
    - **#Vaccine**: includes general information and sentiment about different vaccines, or vaccines in general, during the pandemic
    - **#Pfizer**: this is specific to the Pfizer-BioNTech vaccine, capturing public sentiment around this particular vaccine
    - **#Moderna**: this is specific to the Moderna vaccine, similarly capturing public sentiment about this vaccine

### Smaller Picture
- For this project, I will focus on one hashtag: "#SocialDistancing"

## Tools and Techniques
### Research and Sentiment Analysis Tools, Libraries, and Techniques
- **Apify**: a webscraping and automation platform used for extracting data from websites
    - [TikTok Hashtags Scraper](https://apify.com/clockworks/tiktok-hashtag-scraper)
    - [TikTok Comments Scraper](https://apify.com/clockworks/tiktok-comments-scraper)
      
- **BERT**: (Bidirectional Encoder Representations from Transformers)
    - was going to use this, but changed to vader
   
- **VADER** (Valence Aware Dictionary and Sentiment Reasoner): a rule-based sentiment analysis tool that uses a dictionary of words and rules to determine the sentiment of a text; typically used for analyzing social media texts
    - use this to classify comments as "positive" or "negative"
    - *Why Vader?* VADER is optimized for social media and short text, therefore making it well-suited for analyzing TikTok comments -- which tend to be short and informal. Additionally, Vader is trained to understand slang and emojis, another typical feature of TikTok comments.

### Method
- TikTok Scraper: search videos by hasthag, filter for 2020 dates, get comments, run sentiment analysis on comments to determine the average percentage of comments that were positive/good or negative/bad.
    - For a more in-depth look at what it can do, visit this [link](https://apify.com/clockworks/tiktok-scraper).
 
### Visualization
- Pie Chart
- Bar Graph
- Word cloud of typical words that are used in bad comments versus good comments --> may need to tokenize comments

## Data Collection

Here is a quick overview of what I did:
- start with one hashtage, retrieve 200 videos from that
- for each video, retrieve all the comments
- pass the comments into a sentiment alaysis o rank as either positive, neutral, or negative
- have a pie chart that has the percentage of the good, neutral, bad classification of that hashtag

### Outline of Plan

**Step 1**: Data Collection
1. Choose One Hashtag: I will use #SocialDistancing
2. Retrieve 200 Videos: Used Apify's TikTok Hashtag Scraper to scrape data for 200 videos associated with the chosen hashtag.
3. Retrieve Comments: I used Apify's TikTok Comments Scraper, to retrieve all the comments for each video.

**Step 2**: Sentiment Analysis
1. Pre-process Comments: Cleaned and tokenized the comments to remove any unnecessary characters, URLs, and stop words.
2. Perform Sentiment Analysis: Used VADER to classify each comment as "positive", "neutral", or "negative".

**Step 3**: Data Visualization
1. Pie Chart: Created a pie chart to display the percentage of positive, neutral, and negative comments for the hashtag.
2. Bar Graph: Created a bar graph to show the distribution of sentiment scores.
3. Word Cloud: Generate word clouds to visualize common words used in positive, neutral, and negative comments separately.

Discuss limitations of project
- this is very initial stages of being able to dunesrtand which a imted data set due to constraints of the free account, however, i ecnoruage this to be used more extensively, to analysze patterns and to uncove what kinds of methods and emssageing gets across more effectively to younger demographiscs.

In [None]:
pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer


In [None]:
# initialize Apify client and run scraper
from apify_client import ApifyClient

client = ApifyClient('your_apify_api_token')
run_input = {
    "startUrls": ["https://www.tiktok.com/tag/covid19"]
}
run = client.actor('apify/actor-tiktok-scraper').call(run_input=run_input)
dataset_items = client.dataset(run['defaultDatasetId']).list_items().items

In [None]:
import pandas as pd

# Load data into DataFrame
df = pd.DataFrame(dataset_items)

# clean text data
df['cleaned_text'] = df['comment'].apply(lambda x: clean_text(x))

analyzer = SentimentIntensityAnalyzer()

# Function to get sentiment score
def get_sentiment_score(text):
    sentiment = analyzer.polarity_scores(text)
    return sentiment

# Apply sentiment analysis to each comment
df['sentiment'] = df['comment'].apply(lambda x: get_sentiment_score(x))

import matplotlib.pyplot as plt

sentiment_counts = df['sentiment'].value_counts()
sentiment_counts.plot(kind='bar')
plt.title('Sentiment Analysis of TikTok Comments')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.show()
