This project does a sentiment analysis of covid vaccination data using twitter's api. The project is written in Python and uses a couple of python packages like Tweepy, NLTK, SciKit Learn, WordCloud, etc to query the Twitter API and do a sentiment analyis on the tweets.
Some of the keywords that have been used are: Covid Vaccine, Covid Booster.
Following are the steps that are done to conduct a sentiment analyis:
- Choosing the keywords for live search: (Covid booster).
- Running the live search using Twitter API by leveraging tweepy.
- Using Sentiment Analyzer to see if the tweets are positive, negative or neutral
- Creating visualizations to categorize the tweets as positive, negative or neutral.
- Creating a word cloud for different categories of the tweets.
- Cleaning up the tweets to remove punctuation, stopwords and stemming.
- Showing the most user words in the search.
- Creating bigrams & trigrams.
To run the code, simply clone the repo and open the main.py file in Visual Studio Code. Make sure to restore all the packages that are mentioned in the header section of the file and resolve any errors that are logged on the terminal. Once that is done, make sure to request a Twitter API key and replace it in the code where it's being referred. After doing that, simply running the code will ask for keywords in the terminal and the number of tweets that you want to analyze to do a sentiment analysis.
booster 11 vaccin 4 shot 3 jab 3 got 2 front 2 omicron 2 die 2 death 2 data 2
[('booster jabs', 2), ('blood clots', 1), ('clots vaccine', 1), ('vaccine realize', 1), ('won save', 1), ('save nhs', 1), ('nhs omicron', 1), ('omicron booster', 1), ('jabs prime', 1), ('prime minister', 1), ('minister really', 1), ('really ought', 1), ('ought telling', 1), ('telling people', 1), ('people cut', 1), ('think tank', 1), ('tank boomer', 1), ('boomer maybe', 1), ('maybe 5th', 1), ('5th booster', 1)]
[('blood clots vaccine', 1), ('clots vaccine realize', 1), ('won save nhs', 1), ('save nhs omicron', 1), ('nhs omicron booster', 1), ('omicron booster jabs', 1), ('booster jabs prime', 1), ('jabs prime minister', 1), ('prime minister really', 1), ('minister really ought', 1), ('really ought telling', 1), ('ought telling people', 1), ('telling people cut', 1), ('think tank boomer', 1), ('tank boomer maybe', 1), ('boomer maybe 5th', 1), ('maybe 5th booster', 1), ('5th booster ll', 1), ('booster ll start', 1), ('ll start question', 1)]
- Deploy to the web and give the user run the query using a form based input {predetermined keywords} rather than running this via terminal.
- Use multiple Twitter API accounts to resolve the tweepy limitations.
- Connect the Twitter feed to Firebase Database to save the information & use it again to do a more accurate sentiment analysis for subsequent runs.