Data Streaming Nanodegree Projects

1. Data Ingestion with Kafka & Kafka Streaming

Worked on ingesting data into kafka and stream processing on kafka topics.

Tasks Completed:
- Set up Kafka Topics and Schemas
- Set up postgres kafka connector
- Implement Kafka Producers using kafka-client and rest proxy
- Implement Kafka Consumers/Pipelines with faust and ksql
Concepts Learned:
- Apache Kafka Infrastructure
- Data Schemas and Apache Avro
- Kafka Connect and REST Proxy
- Stream Processing Fundamentals
- Stream Processing with Faust
- KSQL
Core Technologies Used:
- Python (confluent-kafka, faust)
- Apache Kafka

Worked on processing apache kafka data streams using pyspark streaming APIs

Tasks Completed:
- Implement spark streaming pipelines
- Consume from apache kafka topic
- decode base64 encoded json data from stream
- perform transformations and aggregation on data
- join streams and output to new kafka topic or console
Concepts Learned:
- Spark Dataframes, Views and Spark SQL
- Reading binary, json and base64 encoded data from kafka streams
- Joining streams
Core Technologies Used:
- Python (pyspark)
- Apache Spark
- Apache Kafka
- Redis

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
kafka-streaming-pipeline		kafka-streaming-pipeline
spark-streaming-pipeline		spark-streaming-pipeline
.gitignore		.gitignore
README.md		README.md