microbatch2cassandra

Simple example of using Spark Structured Streaming for imaginary case of aggregating customers bank transactions

Case description

Our customer can make debit and credit transactions. Our task is to aggregate all of them in one-hour windows and save aggregates to a database. "To aggregate" means to group transactions by customer, time window, type... and sum up. Transactions arrive as CSV-files. Apache Cassandra is used as aggregates storage.

Few complications:

Transactions may be incorrect (malformed CSV, zero amount).
Transaction may be duplicated in one hour. And we mustn't sum up one transaction twice.
Transactions may be late for few hours, but no more than 24 hours.

Solution:

Read stream of CSV-files and filter off incorrect records.
Use time-window aggregation.
Write batches of aggregates to Cassandra via spark-cassandra-connector.

New requirement:

Transactions arrive as JSON-messages from Kafka.

New solution:

Read stream of JSON via spark-sql-kafka-0-10_2.11
Use time-window aggregation as earlier.
Write batches of aggregates to Cassandra via spark-cassandra-connector as earlier.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/main		src/main
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

microbatch2cassandra

Simple example of using Spark Structured Streaming for imaginary case of aggregating customers bank transactions

Case description

Few complications:

Solution:

New requirement:

New solution:

About

Releases

Packages

Languages

greysap/microbatch2cassandra

Folders and files

Latest commit

History

Repository files navigation

microbatch2cassandra

Simple example of using Spark Structured Streaming for imaginary case of aggregating customers bank transactions

Case description

Few complications:

Solution:

New requirement:

New solution:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages