Link: https://netflixpulse.streamlit.app/
A comprehensive data pipeline and dashboard application that ingests social media data and survey data, performs sentiment analysis, and displays consumer insights about Netflix original content over time.
- Multi-Platform Data Ingestion: Twitter, Reddit, and Survey data collection
- Advanced Sentiment Analysis: VADER and Transformer-based models
- Real-time Dashboard: Interactive Streamlit interface
- Automated Pipeline: Airflow orchestration for scheduled data processing
- Comprehensive Analytics: Trend analysis, comparison views, and keyword insights
- Python 3.9+
- PostgreSQL (optional, SQLite included for development)
- Twitter Developer Account
- Reddit API Access
git clone https://github.com/shockwave22/StreamPulse.git
cd StreamPulse
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file with your API credentials:
# Database
DATABASE_URL=sqlite:///streampulse.db
# Twitter API
TWITTER_BEARER_TOKEN=your_bearer_token
TWITTER_API_KEY=your_api_key
TWITTER_API_SECRET=your_api_secret
TWITTER_ACCESS_TOKEN=your_access_token
TWITTER_ACCESS_TOKEN_SECRET=your_access_token_secret
# Reddit API
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret# Setup database and collect initial data
python main.py --full
# Start the dashboard
streamlit run dashboard/app.pyThe dashboard will be available at http://localhost:8501
# Start all services
docker-compose up -d
# Access dashboard at http://localhost:8501
# Access Airflow at http://localhost:8080- Total posts/comments metrics
- Average sentiment scores
- Survey satisfaction ratings
- Sentiment distribution charts
- Sentiment trends over time
- Platform-specific breakdowns
- Interactive time series visualizations
- Survey vs social media sentiment
- Alignment analysis
- Key insights and recommendations
- Word clouds from recent content
- Top trending keywords
- Theme analysis
- Recent tweets and comments
- Engagement metrics
- Content filtering options
The pipeline consists of four main stages:
- Data Ingestion: Collect data from Twitter, Reddit, and survey sources
- Sentiment Analysis: Process text using VADER or transformer models
- Data Aggregation: Calculate daily metrics and trends
- Dashboard Update: Refresh visualizations and insights
The pipeline runs automatically every 6 hours:
- Fetch new social media content
- Process sentiment analysis
- Update aggregated metrics
- Refresh dashboard data
StreamPulse/
├── config.py # Configuration settings
├── main.py # Main application entry point
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
├── docker-compose.yml # Multi-service deployment
├── database/
│ ├── models.py # SQLAlchemy models
│ └── database.py # Database connection
├── data_ingestion/
│ ├── twitter_collector.py # Twitter API integration
│ ├── reddit_collector.py # Reddit API integration
│ └── survey_collector.py # Survey data handling
├── sentiment_analysis/
│ └── analyzer.py # Sentiment analysis engine
├── data_aggregation/
│ └── aggregator.py # Data aggregation logic
├── dashboard/
│ └── app.py # Streamlit dashboard
├── airflow_dags/
│ └── streampulse_pipeline.py # Airflow DAG
├── data/ # Data storage directory
└── logs/ # Application logs
- Stranger Things
- The Witcher
- Wednesday
- Ozark
- The Crown
- Bridgerton
- Squid Game
- Money Heist
- Dark
- The Umbrella Academy
- VADER: Rule-based sentiment analysis (default)
- Transformers: DistilBERT fine-tuned model (optional)
# Collect data for specific operations
python main.py --collect # Collect new data
python main.py --sentiment # Process sentiment
python main.py --aggregate # Update aggregationsfrom data_ingestion.twitter_collector import TwitterCollector
collector = TwitterCollector()
tweets = collector.collect_tweets_for_title("Stranger Things", max_results=50)# Run tests
pytest
# Run specific test modules
pytest tests/test_sentiment_analysis.py
pytest tests/test_data_ingestion.py- Store API credentials in environment variables
- Use read-only database credentials for dashboard
- Implement rate limiting for API calls
- Regular security updates for dependencies
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Database Connection Errors
- Verify DATABASE_URL in .env file
- Ensure PostgreSQL is running (if using)
- Check database permissions
API Rate Limits
- Monitor API usage in logs
- Implement longer delays between requests
- Consider upgrading API plans
Dashboard Not Loading
- Check if all dependencies are installed
- Verify Streamlit is running on correct port
- Review application logs for errors
- Check the Issues page
- Review logs in the
logs/directory - Enable debug logging:
LOG_LEVEL=DEBUG
- Use PostgreSQL for production deployments
- Configure database connection pooling
- Implement caching for dashboard data
- Monitor memory usage with large datasets
- Use batch processing for sentiment analysis