Skip to content

Project to learn different data processing architectures using Scala and Domain-driven design.

Notifications You must be signed in to change notification settings

fortega/bigdata-architectures-scala

Repository files navigation

BigData architectures in Scala

About

This is a project to learn different data processing architectures using Scala and Domain-driven design.

Problem: your GPS provider somethings send you bad data. You have to validate each event, and provide information about your findings

How to run

Batch (Spark)

Using "input" for input files (parquet) and "output" to write validated events.

sbt "batch/run input output"

Producer (+ RabbitMQ)

docker-compose up

RabbitMQ server is up using default ports (Server: 5672 / UI: 15672) and password (guest:guest).

Lambda (AMQP client)

sbt "lambda/run localhost raw deadLetter valid invalid"

Application structure

  • core: bussiness logic
  • batch: batch processing in Apache Spark
  • lambda: stream processing in Apache Flink
  • beam: stream or batch processing in Apache beam (scio)

Theory

Classic "big data" or Batch architecture

https://en.wikipedia.org/wiki/Batch_processing

Lambda architecture

https://en.wikipedia.org/wiki/Lambda_architecture

Kappa architecture

https://en.wikipedia.org/wiki/Lambda_architecture#Kappa_architecture

Beam model or "unified architecture"

https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/

Domain-driven design

https://en.wikipedia.org/wiki/Domain-driven_design

About

Project to learn different data processing architectures using Scala and Domain-driven design.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages