~ This project implements an embedding-based sentiment classification system that classifies Twitter tweets into Positive, Negative, or Neutral sentiments using Sentence Transformer embeddings and a machine learning classifier.
~ Unlike API-based solutions, this system works fully offline, making it efficient, scalable, and reproducible.
~ Social media platforms generate millions of posts daily, making manual sentiment analysis impractical. Understanding public sentiment helps brands, governments, and organizations make informed decisions.
~ The goal of this project is to build a sentiment classifier using:
-
Text preprocessing and cleaning
-
Transformer-based semantic embeddings
-
A machine learning classification model
~ Dataset: Twitter Tweets Sentiment Dataset ~ Size: ~27,000 tweets ~ Columns:
- textID
- text
- selected_text
- sentiment
~ Sentiment Labels:
- Positive
- Negative
- Neutral
https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
- Python
- Pandas, NumPy
- NLTK (text preprocessing)
- Sentence Transformers (all-MiniLM-L6-v2)
- Scikit-learn (Logistic Regression)
- Matplotlib, Seaborn
- WordCloud
- VS Code (Jupyter Notebook)
~ We use the lightweight transformer model:
all-MiniLM-L6-v2
~ Key Features:
384-dimensional semantic embeddings
~ Captures contextual meaning of sentences
Fast and lightweight
~ Works completely offline
~ No API dependency
-
Exploratory Data Analysis (EDA)
-
Text preprocessing and cleaning
-
Word cloud visualization
-
Embedding generation using Sentence Transformers
-
Train-test split
-
Model training using Logistic Regression
-
Model evaluation using classification metrics
-
Custom tweet sentiment prediction
~ The model successfully classifies tweets into positive, negative, and neutral categories.
~ Transformer-based embeddings capture contextual meaning effectively.
~ The classifier performs strongly on short social media texts.
~ Custom user-defined tweets were accurately classified.
"I absolutely love this new phone!" → Positive
"This service is horrible and frustrating" → Negative
"The event happened yesterday" → Neutral
1️⃣ Clone the repository git clone https://github.com/coderShreyIn/Sentiment_Classification_Using_Embeddings.git cd Sentiment_Classification_Using_Embeddings 2️⃣ Install dependencies pip install -r requirements.txt 3️⃣ Run the Jupyter Notebook
Open in VS Code or Jupyter:
jupyter notebook
Run all cells sequentially.
This project uses offline Sentence Transformers, so:
❌ No Gemini API key needed
❌ No rate limits
❌ No internet dependency after first model download
-
Logistic Regression trained on transformer embeddings
-
High accuracy on multi-class sentiment classification
-
Good generalization on unseen tweets
-
Sarcasm detection is challenging
-
Very short texts may reduce classification confidence
-
Mixed sentiment sentences can cause ambiguity
-
Fine-tune a transformer model directly for sentiment classification
-
Add confidence score display
-
Implement hyperparameter tuning
-
Deploy as a web app (Streamlit/Flask)
-
Add real-time tweet streaming analysis
- Social media sentiment monitoring
- Brand reputation analysis
- Customer feedback analytics
- Political opinion mining
- Product review classification
- Chatbot emotion detection
~ Shrey Dak
AI & Machine Learning Enthusiast GitHub: https://github.com/coderShreyIn
~ Please consider giving this repository a ⭐ on GitHub!