Skip to content

sumalathakonjeti/ZCW-FinalProject

 
 

Repository files navigation

Sentiment Analysis of Conversation Surrounding Covid-19 Vaccine


For our final project at Zip Code Wilmington, we chose to create a sentiment analyis on the Twitter conversation surrounding the COVID-19 vaccine in the United States. We produced streams of all the tweets using the Twitter API and put the data into an AWS SQL database. We then cleaned the data with Spark and returned it to the database.

After acquiring this data, we used NLTK machine learning models to analyze the sentiment of the tweets. We then separated the tweets into four different tables, based on what region of the United States they came from and created various visualizations of the data using Wordcloud and Matplotlib. The whole process was automated using an Apache Airflow DAG.

Lastly, we made an interactive data visualizations using Tableau where you can view the sentiment analysis for the USA and isolate each region.

Below are the basic steps of our program, followed by a flowchart showing how all of the technologies worked together. View our PowerPoint presentation by clicking here.


In the sentiment analysis, tweets were split into three categories: positive, negative and neutral. Using Word Cloud, we generated images of the key words for each category. The larger the word, the more common it was.

Here are the words found in positive tweets.

Here are key words found in negative tweets:

API Used

Frameworks Used

  • Airflow
  • AWS lightsail MySql
  • PANDAS
  • Matplotlib
  • Tableau
  • NLTK
  • Papermill
  • PySpark

Meet the Team


Anusha Jangalapalli

Connect on LinkedIn

Lee Givhan

Connect on LinkedIn

Sumalatha Konjeti

Connect on LinkedIn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%