Skip to content

An event streaming pipeline built using Kafka, KSQL, Faust, and Spark Structured Streaming

Notifications You must be signed in to change notification settings

ggbong734/udacity-data-streaming

Repository files navigation

Udacity Data Streaming Nanodegree

Projects completed in the Udacity Data Streaming Nanodegree program.

Constructed a streaming event pipeline around Apache Kafka and its ecosystem.

  • Configured the producers such that events are sent to Kafka along with the Avro key and value schemas
  • Ingested data from a PostgreSQL database using Kafka Connect
  • Utilized Faust to transform ingested data
  • Aggregated data using KSQL
  • Configured the consumer to consume data from Kafka

Proficiencies used: Apache Kafka, Kafka Connect, Faust Stream processing, KSQL

Created a Kafka server to produce data and ingested the data using Spark Structured Streaming.

  • Built a simple Kafka server (with Zookeeper) for producing data
  • Created a Spark consumer to perform data aggregation and joining
  • Modified SparkSession property parameters to optimize processing throughput

Proficiencies used: Apache Kafka, Apache Spark, Spark Structured Streaming

About

An event streaming pipeline built using Kafka, KSQL, Faust, and Spark Structured Streaming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published