This project streams data from Kafka to Snowflake and provides SQL optimization techniques to improve query performance using:
✅ Clustering
✅ Materialized Views
✅ Result Caching
First, install the required dependencies:
pip install -r requirements.txt
- Add your Kafka and Snowflake credentials in the appropriate places.
- Open
stock_consumer.py
and update your table name where data will be inserted.
Run the producer to send data:
python stock_producer.py
Then, run the consumer to receive and insert data into Snowflake:
python stock_consumer.py
The logs folder will capture messages after the consumer starts processing data. Initially, it's empty, but once you run the consumer, logs will be generated.
After data is successfully inserted into Snowflake, go to the SQL folder:
cd sql/
Copy-paste the queries into Snowflake, replacing:
TABLE_NAME
with your actual table name.COLUMN_NAME
with your relevant column names.
📌 Clustering (clustering.sql
) – Organizes data storage to improve query efficiency.
📌 Materialized Views (materialized_views.sql
) – Stores precomputed results for faster queries.
📌 Result Caching (result_caching.sql
) – Reuses query results to boost performance.
- Modify SQL scripts to match your dataset structure.
- Ensure Kafka is running before starting producer & consumer.
- Monitor execution times to see performance improvements.
🚀 Now you’re ready to optimize your Snowflake queries for maximum efficiency! 🔥
Happy Querying! 😊🎯