π Twitter Sentiment Analysis using Python & Machine Learning
Twitter Sentiment Analysis is a powerful Natural Language Processing (NLP) technique used to understand public opinions, emotions, and attitudes expressed in tweets. In this project, we analyze peopleβs sentiment toward Pfizer COVID-19 vaccines using machine learning models.
The goal is to clean and process tweet data, convert text into meaningful features, build ML models, and classify tweets into Positive, Negative, and Neutral sentiments. The performance of multiple classifiers is compared to identify the most accurate model.
π― Project Objectives
Collect and preprocess real-world Twitter data
Perform sentiment classification using Machine Learning
Evaluate different ML algorithms for accuracy
Visualize sentiment distribution and model performance
Identify public perception towards Pfizer vaccines
π§ Project Workflow Phase Description
- Data Collection Dataset sourced from Kaggle containing Pfizer-related tweets
- Data Preprocessing Cleaning tweets (removing URLs, stopwords, special characters, stemming, etc.)
- Feature Extraction Converting text to numerical features using TF-IDF / Bag-of-Words
- Model Training Applying ML algorithms to train sentiment classifier
- Model Evaluation Accuracy comparison and confusion matrix for performance analysis
- Output Visualization Sentiment results and graphs for insights π§° Technologies & Tools Used Category Tools / Libraries Programming Language Python Data Handling Pandas, NumPy Machine Learning Scikit-Learn NLP & Text Processing NLTK, Regex, Stopwords, Stemming Feature Engineering TF-IDF Vectorizer, CountVectorizer Model Evaluation Accuracy Score, Confusion Matrix, Classification Report Visualization Matplotlib, Seaborn, WordCloud Environment Jupyter Notebook / Google Colab π€ Algorithms Used
The following ML models were trained and evaluated:
Logistic Regression
Naive Bayes (MultinomialNB)
Support Vector Machine (SVM)
Random Forest Classifier
Decision Tree Classifier
The model with the highest accuracy is selected as the final sentiment classifier.
π Project Outcomes
β Successfully classified tweets into Positive, Negative, and Neutral sentiment β Identified the sentiment trend towards Pfizer vaccines β Compared models and highlighted the best performing classifier β Visualized sentiment distribution using bar charts and word clouds
Key Insight:
Sentiment analysis showed a mix of opinions, with a significant percentage expressing positive views about Pfizer vaccines, along with some negative concerns and neutral feedback.
π Project Structure Twitter-Sentiment-Analysis/ β βββ Dataset/ # Kaggle dataset used for training βββ notebook.ipynb # Main Jupyter Notebook with code βββ models/ # Trained ML models (if saved) βββ visuals/ # Charts and WordCloud images βββ requirements.txt # Python dependencies βββ README.md # Project documentation
π How to Run the Project Step 1: Install Dependencies pip install -r requirements.txt
Step 2: Run the Jupyter Notebook jupyter notebook
Step 3: Open the notebook.ipynb and run all cells
π Future Improvements
πΉ Integrate live Twitter API to analyze real-time tweets πΉ Deploy as a Web App using Flask or Streamlit πΉ Use Deep Learning models like LSTM, Bi-LSTM, or BERT for higher accuracy πΉ Multi-class emotion detection (joy, anger, fear, sarcasm, etc.)
π©βπ» Author
Sai Aishwarya Reddy Devarapalli Machine Learning | AI Enthusiast π GitHub: https://github.com/sai-aishwarya-codes