Skip to content

smrutijethwani/Covid-Vaccine-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project Overview


In this project, we pre-processed a dataset of 100K tweets in Python. We then implemented six different classification algorithms: Logistic Regression, Random Forest Classifier, Decision Tree Model, KNN Classifier, Linear SVC Model, and AdaBoost Classifier. We performed hyperparameter tuning on each algorithm to compare their accuracies. We also created a new dataset by scraping tweets from Twitter using the python package snscrape. Finally, we performed data visualization on each step to prominently distinguish data using Sci-kit, Seaborn, Matplotlib, and Plotly. We compared the different predictions and classification-based algorithms and achieved about 92% accuracy.

I learned a lot about the different challenges and opportunities that come with working with large datasets. How to identify and address data quality issues, how to choose the right classification algorithm for the task at hand, and how to interpret the results of your analysis are the few questions I am now able to answer.


About

Sentiment Analysis about the vaccine data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages