Real-time sentiment pipeline that determines the most loved and hated things on the internet.
[Reddit Producer] ──→ raw.reddit ──┐
[YouTube Producer] ─→ raw.youtube ─┤→ [Flink] → processed.signals → [LLM Service] → SQLite → Streamlit
[News Producer] ────→ raw.news ────┘
Stack:
- Kafka — message broker for raw and processed data streams
- Apache Flink — sliding window aggregation and signal normalization
- VADER / llama.cpp — sentiment scoring (VADER by default; llama.cpp if running locally)
- SQLite — persistence layer for sentiment scores
- Streamlit — real-time leaderboard dashboard
zeitgeist/
├── start.ps1 # Start the full pipeline (one command)
├── stop.ps1 # Stop everything and tear down Docker
├── verify_pipeline.py # Health check for all components
├── docker-compose.yml # Kafka, Zookeeper, Flink, Kafdrop
├── .env.example # Environment variables template
├── entities.py # Seed entity list
├── producers/
│ ├── reddit_producer.py # Reddit public JSON API (no credentials needed)
│ ├── youtube_producer.py # YouTube Data API v3
│ └── news_producer.py # Google News RSS + NewsAPI
├── flink/
│ └── sentiment_pipeline.py
├── llm_service/
│ └── sentiment_scorer.py
├── dashboard/
│ └── app.py
└── data/
└── zeitgeist.db # SQLite database (auto-created, gitignored)
- Docker Desktop (running)
- Python 3.10+
git clone https://github.com/KylePeiman/Zeitgeist.git
cd Zeitgeist
cp .env.example .env
# Add your YouTube API key to .env (optional — pipeline works without it)pip install -r requirements.txt.\start.ps1This single command:
- Tears down any previous Docker state (prevents Zookeeper stale-node crashes)
- Starts Kafka, Zookeeper, Flink, and Kafdrop via Docker
- Waits for Kafka to be healthy
- Launches all 6 pipeline services silently in the background
- Logs each service to
logs/<name>.log
python verify_pipeline.py.\stop.ps1Kills all pipeline processes and tears down Docker (including volumes).
| Source | Key required? | Where to get one |
|---|---|---|
| No | Uses public JSON API — no credentials needed | |
| YouTube | Optional | Google Cloud Console — enable YouTube Data API v3 |
| News | Optional | newsapi.org free tier (100 req/day); falls back to Google News RSS |
| Service | URL |
|---|---|
| Streamlit Dashboard | http://localhost:8501 |
| Kafdrop (Kafka UI) | http://localhost:9000 |
| Flink UI | http://localhost:8081 |
| Topic | Description |
|---|---|
raw.reddit |
Raw Reddit posts and comments |
raw.youtube |
Raw YouTube comments and metadata |
raw.news |
Raw news headlines and articles |
processed.signals |
Normalized signals from Flink, ready for scoring |
The scorer defaults to VADER for fast local scoring. If you have llama.cpp running locally at http://localhost:8080 with a compatible model, it will use that instead for richer sentiment analysis.