Skip to content

This repository is a project looking at tweets that used the #BLM and analyzed the sentiment and words used as well as utilized topic modeling with Latent Semantic Analysis and Latent Dirichlet Allocation to pull out the main themes that are used when the #BLM is used.

Notifications You must be signed in to change notification settings

roweyerboat/Twitter_Hashtag_Analysis

Repository files navigation

BLM Hashtag Analysis

Background Information

Over the years, the Black Lives Matter (BLM) movement has gained attention across many platforms. One such platform, Twitter, has been a major place where the message of BLM has been articulated. With the events in May and June of 2020, the deaths of Ahmaud Arbery, Breonna Taylor, and George Floyd, BLM has been in the spotlight more than ever before. With that, there has been a variety of messages about what BLM means and what does it really accomplish. There are some who see it as a new wave of the Civil Rights movement from the 1960s. There are others that consider BLM as a dangerous terrorist organization. In order to sort out the message of BLM, I scraped twitter data from the past 7 years by pulling tweets using twint and searching the hashtag BLM.
I recognize and acknowledge that those that use the hashtag might not be for Black Lives Matter and could even be a critic of it. I felt it was important to gather all that I could to see what rose to the surface when analyzing the text. I also recognize that people have the ability to delete tweets, and so I do not have a full collection of every tweet.
With those acknowledgements, I was able to obtain 220,504 tweets from over 140,000 Twitter users.

Libraries Needed

Twint
Nest_asyncio
Vader
Textblob
Wordcloud
Nltk
Sklearn

The Data

Since it was too large to upload to Github, here are the links for the raw data as well as the clean data
Raw data
Clean data

picture of a graph showing tweets over the years with the hashtag blm

Scraping Notebook

The scraping notebook was also too large for github, so here is a link to it

Repo Files

Tweet Cleaning Notebook - Cleaning the raw data notebook
EDA and Visualization Notebook - Exploratory data analysis and visualizations
Time Series Notebook - Notebook looking at the hashtag over time
Latent Sentiment Analysis - Notebook of the LSA process
Latent Direchlet Allocation - Notebook of the LDA process
BLM_Hashtag_Analysis - Summarization of the whole project

Findings

Sentiment analysis was inconclusive as a tweet that could be seen as "negative" wasn't necessarily against the BLM movement. Likewise a "positive" sentiment didn't mean the message was for BLM or promoting their values. Therefore I chose to look deeper at the words through Topic Modeling with two methods.

LSA topic counts
LDA 4 topics

Conclusion

Based on the analysis and looking at what continued to rise to the top, the main themes of tweets with the hashtag BLM are about protecting Black lives. Antifa was not included as much as mainstream news outlets assume. However this is definitely a limited study. It was interesting to see how a sentiment analysis isn't always helpful. Future work with this and other data sets would be to look at how BLM is portrayed by major news media and see if the words used are similar. An application of this project is to understand the context of the sentiment analysis.

Blog and Video

Blog post about the project
Video of final presentation

About

This repository is a project looking at tweets that used the #BLM and analyzed the sentiment and words used as well as utilized topic modeling with Latent Semantic Analysis and Latent Dirichlet Allocation to pull out the main themes that are used when the #BLM is used.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published