A real-time sentiment analysis application for YouTube live chat messages using Python, transformers, and Streamlit.
This project demonstrates a complete machine learning pipeline that:
- Collects live chat messages from YouTube livestreams
- Preprocesses text data (removing URLs, emojis, special characters)
- Performs sentiment analysis using HuggingFace Transformers
- Classifies messages as Positive, Negative, or Neutral
- Displays real-time analytics in an interactive Streamlit dashboard
Use Cases:
- Monitor audience sentiment during live events
- Understand viewer reactions in real-time
- Analyze content engagement and reception
- Academic prototype for NLP/ML projects
YouTube Live Chat Source
β
Chat Collection Module (pytchat)
β
Live Message Stream
β
Text Preprocessing
ββ Remove URLs
ββ Remove Emojis
ββ Remove Special Characters
ββ Normalize Whitespace
ββ Lowercase
β
Sentiment Analysis Model
(HuggingFace Transformers)
β
Sentiment Classification
ββ POSITIVE (score > 0.7)
ββ NEUTRAL (0.4 β€ score β€ 0.7)
ββ NEGATIVE (score < 0.4)
β
Statistics Aggregation
ββ Count by sentiment
ββ Percentages
ββ Confidence scores
β
Visualization Dashboard (Streamlit)
ββ Sentiment counters
ββ Bar chart
ββ Pie chart
ββ Message feed
ββ Statistics
youtube-chat-sentiment/
βββ app.py # Main Streamlit application
βββ chat_collector.py # YouTube live chat collection
βββ preprocessing.py # Text preprocessing module
βββ sentiment_model.py # Sentiment analysis with transformers
βββ visualization.py # Dashboard visualizations
βββ requirements.txt # Python dependencies
βββ README.md # This file
Main Streamlit application that orchestrates the entire pipeline. Features:
- Sidebar controls for video ID input and chat collection
- Three main tabs: Dashboard, Watch, Messages
- Real-time sentiment counters and statistics
- Interactive charts (bar and pie)
- Message feed display
- Data export functionality (CSV)
Handles YouTube live chat collection using pytchat library:
YouTubeChattCollectorclass with connect/disconnect methods- Stream-based message retrieval
- Error handling and logging
- Message format:
{author, message, timestamp}
Text preprocessing pipeline:
TextPreprocessorclass with modular preprocessing steps- URL removal
- Emoji removal
- Special character removal
- Whitespace normalization
- Lowercase conversion
- Batch processing support
Sentiment analysis using pre-trained transformers:
SentimentAnalyzerclass using HuggingFace models- GPU/CPU device management
- Three-class sentiment mapping (POSITIVE/NEGATIVE/NEUTRAL)
- Batch processing capability
- Confidence scores
Streamlit-based visualization components:
- Sentiment counters with icons
- Bar chart visualization
- Pie chart visualization
- Recent messages table
- Overall statistics display
- CSV export functionality
- Python 3.10 or higher
- pip (Python package manager)
- Internet connection (for downloading models)
- YouTube livestream URL/Video ID
cd "/path/to/youtube-chat-sentiment"# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtThis will install:
- pytchat (1.5.7): YouTube live chat scraping
- streamlit (1.28.1): Interactive dashboard
- transformers (4.38.1): Pre-trained NLP models
- torch (2.1.2): PyTorch deep learning framework
- pandas (2.1.4): Data manipulation and analysis
- matplotlib (3.8.2): Data visualization
- regex (2023.12.25): Regular expression library
Note: First download will take a few minutes as the transformer model is cached locally (~500MB).
# Ensure you're in the virtual environment
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
# Run the Streamlit app
streamlit run app.pyThe app will open in your browser at http://localhost:8501
python -m streamlit run app.py- Open a YouTube livestream URL
- Copy the video ID from the URL:
https://www.youtube.com/watch?v=dQw4w9WgXcQ ^^^^^^^^^^^ This is the Video ID
-
Open Streamlit App
- Run:
streamlit run app.py - Browser opens automatically at
http://localhost:8501
- Run:
-
Enter YouTube Video ID
- Paste the video ID in the sidebar input field
- Example:
dQw4w9WgXcQ
-
Start Collecting Messages
- Click the "
βΆοΈ Start Collecting" button in the sidebar - Wait for connection confirmation (β or β)
- Click the "
-
Monitor in Real-Time
- Watch sentiment counters update automatically
- View live statistics on the Dashboard tab
- Check the Watch tab for status updates
- Browse individual messages on the Messages tab
-
Export Analysis
- Click "π₯ Download analyzed messages as CSV" to export results
- Data includes: Author, Original Message, Cleaned Text, Sentiment, Confidence, Timestamp
"STREAMER INI LUCU BANGET!!! ππ https://youtube.com"
-
Preprocessing:
- Input: "STREAMER INI LUCU BANGET!!! ππ https://youtube.com"
- Output: "streamer ini lucu banget"
-
Sentiment Analysis:
- Model Output: POSITIVE (confidence: 0.85)
- Classification: POSITIVE (score > 0.7)
-
Dashboard Update:
- Positive counter: +1
- Statistics: Updated percentages
- Charts: Regenerated visualizations
Positive: 120 (54%)
Negative: 35 (16%)
Neutral: 50 (22%)
βββββββββββββββββββββ
Total: 205
The application now uses a custom fine-tuned DistilBERT model trained on Twitter sentiment data for improved performance.
Model Location: ./fine_tuned_distilbert_sentiment/
Training Details:
- Base Model: distilbert-base-uncased
- Dataset: Kaggle - Twitter Entity Sentiment Analysis (74,681 samples)
- Training Split: 80% training, 10% validation, 10% test (stratified)
- Epochs: 3
- Batch Size: 16
- Learning Rate: 2e-5
- Max Sequence Length: 128
- Optimizer: AdamW with weight decay (0.01)
- Scheduler: Linear warmup (500 steps)
Model Performance:
- Test Accuracy: 90.2%
- Test Precision: 0.902
- Test Recall: 0.902
- Test F1-Score: 0.902
Model Specs:
- Type: DistilBERT (Distilled BERT)
- Task: Sequence classification for sentiment analysis
- Dimension: 768 hidden dimensions
- Parameters: ~66 million
- Size: ~250 MB
- Speed: ~100-500 messages/second (depending on hardware)
- Device: Automatically uses CUDA (GPU) if available, falls back to CPU
If the fine-tuned model is not available, the application automatically uses:
- Model:
distilbert-base-uncased-finetuned-sst-2-english - Accuracy: 91% on SST-2 dataset
Model Output: {label: "POSITIVE"/"NEGATIVE", score: 0.0-1.0}
Mapping to 3-class:
ββ POSITIVE (model) + score > 0.7 β POSITIVE
ββ POSITIVE (model) + score β€ 0.7 β NEUTRAL
ββ NEGATIVE (model) + score > 0.7 β NEGATIVE
ββ NEGATIVE (model) + score β€ 0.7 β NEUTRALInput: "STREAMER INI LUCU BANGET!!! ππ https://youtube.com"
β [lowercase]
"streamer ini lucu banget!!! ππ https://youtube.com"
β [remove_urls]
"streamer ini lucu banget!!! ππ"
β [remove_emojis]
"streamer ini lucu banget!!!"
β [remove_special_characters]
"streamer ini lucu banget"
β [remove_extra_whitespace]
"streamer ini lucu banget"- GPU Acceleration: Automatically uses CUDA if available
- Batch Processing: Can process multiple messages simultaneously
- Memory: ~2-3 GB RAM for models + message buffer
- Network: Requires persistent connection to YouTube
- Scalability: Can analyze ~100-500 messages/second
Solution:
- Verify the YouTube video ID is correct
- Ensure the video is an active livestream (not a regular video)
- Check your internet connection
Solution:
pip install pytchat==1.5.7Solution:
- Use CPU instead (automatic fallback available)
- Reduce batch size in code
Solution:
- Verify the livestream is active and has chat enabled
- Check browser console for errors
- Try a different livestream video ID
Solution:
- First run downloads the model (~500MB) - this is normal
- Subsequent runs are much faster
- Ensure sufficient RAM (4GB+ recommended)
- Track audience sentiment during live events
- Identify negative feedback immediately
- Monitor engagement levels
- Understand which topics trigger positive/negative responses
- Evaluate content quality through sentiment trends
- A/B test different content approaches
- Identify toxic comments early
- Respond to negative sentiment promptly
- Celebrate positive audience reactions
- Collect datasets for NLP research
- Study sentiment patterns in live streaming
- Analyze multilingual sentiment (with appropriate models)
To train a custom fine-tuned model:
-
Prepare Training Notebook
- Use
fine_tune_model.ipynbJupyter notebook - Requires kagglehub authentication for dataset download
- Use
-
Install Notebook Dependencies
pip install jupyter kagglehub
-
Run Fine-Tuning
jupyter notebook fine_tune_model.ipynb
-
Model Export
- Fine-tuned model automatically exports to
./fine_tuned_distilbert_sentiment/ - Includes: model weights, tokenizer, config, and label mappings
- Fine-tuned model automatically exports to
-
Integration
sentiment_model.pyautomatically detects and loads the fine-tuned model- Falls back to pre-trained model if fine-tuned version unavailable
- Data Collection: Only collects publicly available chat messages from livestreams
- Data Storage: Messages stored locally in session state (not persisted to disk by default)
- Model Bias: DistilBERT may have biases from training data - use with awareness
- Usage: Comply with YouTube's Terms of Service when using pytchat
- Multi-language support with multilingual models
- Real-time alerts for negative sentiment spikes
- User authentication and session persistence
- Database integration for historical analysis
- Advanced NLP features (topic modeling, emotion detection)
- Custom model fine-tuning (Completed - DistilBERT fine-tuned on Twitter sentiment data)
- HuggingFace Hub model deployment
- API endpoint for integration with other services
- Mobile app support
- pytchat Documentation: https://github.com/taizan-hokuto/pytchat
- HuggingFace Transformers: https://huggingface.co/docs/transformers
- DistilBERT Model Card: https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english
- Streamlit Documentation: https://docs.streamlit.io
- sentencepiece Tokenizer: https://github.com/google/sentencepiece
This project is provided as an educational prototype. Use freely for learning and academic purposes.
Contributions welcome! Feel free to:
- Report bugs and issues
- Suggest new features
- Improve documentation
- Optimize code performance
For issues or questions:
- Check the Troubleshooting section above
- Review code comments in individual modules
- Verify all dependencies are correctly installed
- Check console logs for detailed error messages
- Complete ML Pipeline: Data collection β preprocessing β analysis β visualization
- Production-Ready: Error handling, logging, and robust design
- User-Friendly: Intuitive Streamlit interface with real-time updates
- Scalable Architecture: Modular design allows easy customization
- Educational Value: Well-documented code suitable for learning ML/NLP concepts
Built with β€οΈ for sentiment analysis and real-time insights
Last Updated: March 14, 2026
Model Version: v1.1 (Fine-Tuned DistilBERT)
Python Version: 3.10+
Status: β
Production Ready with Custom Fine-Tuned Model