This project is the first assignment from Big Data for Engineering class. It utilizes Docker to deploy an end-to-end data pipeline on your local computer using containerized Kafka for data streaming, Cassandra for NoSQL database with Jupyter Lab and Dash framework for data analysis Visualization. There are 3 pipelines using data from Twitter and OpenWeatherMap APIs, Faker API and PokeAPI.
Kafka Producers and Consumers help stream data from provided APIs:
Data is then stored in Cassandra Database:
Using Jupyter Lab (or Dash) to query database and visualize data:
All Docker containers used in the pipeline:
Containers for all things mentioned in this project can be found in the src
folder. All images have been pre-built, however if you want to replicate the pipeline, you can rebuild the images and compose them up again using the following guide:
docker network create kafka-network
docker network create cassandra-network
docker-compose -f cassandra/docker-compose.yml up -d --build
docker-compose -f kafka/docker-compose.yml up -d
- OpenWeatherMap:
docker-compose -f owm-producer/docker-compose.yml up -d --build
- Twitter Producer:
docker-compose -f twitter-producer/docker-compose.yml up --build
- FakerAPI:
docker-compose -f faker-producer/docker-compose.yml up -d --build
- Pokemon Producer
docker-compose -f pokemon-producer/docker-compose.yml up -d --build
docker-compose -f consumers/docker-compose.yml up --build
- Log in to Cassandra CLI:
docker exec -it cassandra bash
- Query the data from 4 tables:
$ cqlsh --cqlversion=3.4.4 127.0.0.1 #make sure you use the correct cqlversion cqlsh> use kafkapipeline; #keyspace name cqlsh:kafkapipeline> select * from twitterdata; cqlsh:kafkapipeline> select * from weatherreport; cqlsh:kafkapipeline> select * from fakerdata;
- With Jupyter Notebook:
docker-compose -f data-vis/docker-compose.yml up -d --build
- With Dash:
docker-compose -f dashboard/docker-compose.yml up -d
Based on: https://github.com/salcaino/sfucmpt733/tree/main/foobar-kafka and https://github.com/vnyennhi/docker-kafka-cassandra
If you find this project useful, you can let me know. I would love to hear about it! 🔥