This project requires building a big data streaming dashboard using Kafka and Streamlit, featuring separate real-time and historical data views.
A dual-pipeline architecture separates live streaming from long-term storage and analysis.
| Pipeline | Flow | Output |
|---|---|---|
| Real-time | Kafka |
π Real-time Streaming View |
| Historical | Kafka |
π Historical Data View |
- Kafka Producer/Consumer.
- HDFS or MongoDB integration.
- Two-page Streamlit dashboard with charts.
- Robust error handling.
Create a Kafka Producer that fetches real data from an existing Application Programming Interface (API) (e.g., a public weather API, stock market API, etc.).
Required Data Schema Fields:
timestamp(ISO format)value(Numeric)metric_type(String)sensor_id(String)
Implement the Streamlit logic:
consume_kafka_data(): Real-time processing.query_historical_data(): Data retrieval from storage.- Create interactive widgets (filters, time-range selector) for the Historical View.
Implement data writing and querying for ONE of the following: HDFS or MongoDB.
Python 3.8+, Apache Kafka, HDFS OR MongoDB.
- Setup environment
- Download miniconda
- Create your python environment
conda create -n bigdata python=3.10.13
- Clone Repo & Install:
git clone [REPO_URL] conda activate bigdata pip install -r requirements.txt
- Configure: Set up Kafka and your chosen Storage System.
- Optional Environment File (
.env): Use for connection details.
- Start Kafka Broker (and Controller).
- Start Producer:
python producer.py
- Launch Dashboard:
streamlit run app.py
Submit the following files:
app.pyproducer.pyrequirements.txtREADME.md