Skip to content

3-way INNER JOINS with aggregation -- Python SQL, Pandas, Scala Kafka Streams, Scala Flink and Scala Spark Structured Streams

Notifications You must be signed in to change notification settings

msb1/triple-join

Repository files navigation

Triple Joins with Simulated Data

  • Cards are generated with id and additional data
  • Verifications by Users are generated
  • User are generated
  • First verifications are joined to cards
  • Next, users are joined to the verified cards
  • Then an aggregation is performed on the users where the cards verified by each user are determined
  • Finally a filter is applied to only output users with greater than a certain number (200) of verifications

Case 1: SQL with SQLite3 in Python (no ORM)

Case 2: Pandas in Python with Dataframes

Case 3: Scala Spark Structured Streaming with Kafka generated streams

  1. Program 1 generates three simulated data streams to three Kafka Producer topics
  2. Program 2 performs the Triple Join with Aggregation

Case 4: Scala Kafka Streams

  1. Use same Program 1 from Case 3 to generate data records to Producer topics
  2. Program is Scala Kafka Streams implementation of Triple Join with aggregation

Case 5: Scala Flink with Kafka generated streams

  1. Use same Program 1 from Case 3 to generate data records to Producer topics
  2. Program is Scala Flink implementation of Triple Join with aggregation

About

3-way INNER JOINS with aggregation -- Python SQL, Pandas, Scala Kafka Streams, Scala Flink and Scala Spark Structured Streams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published