CQRS & Event Sourcing are "THE BUZZ" words these days and how it fits in real time analytics. Microservices architecture causes segregation of applications (also called mushrooming of application landscape). This posses challenge for the Data Engineering teams. The proposed architecture (in the GitHub blog) demystifies CQRS + Event Sourcing in Real time analytics and hence bridges the gap between Full Stack Developers & Data Engineer.
NOTE: CQRS + Event Sourcing = Elegant DDD
Event Souring - “All changes to an application state are stored as a sequence of events.” Martin Fowler
- Change made to state are tracked as events
- Event are stored in event store (any database)
- Use stored events and summation of all these events (always arrive in current state)
NOTE Event Sourcing is not part of CQRS
The Change Data Capture (CDC) provides an easy mechanism to implement these. Further CDC, in the world of micro-services, data services & real time analytics is a central part of a modern architecture these days.Check out my GitHub project which demonstrates:
- CQRS
- Event Sourcing
- CDC using Debezium (good bye to expensive CDC products & complicated integration)
- Kafka Connect
- Kafka
- Real time streaming
- Spring Cloud Stream
A bank application which demonstrates CQRS design pattern. This application performs following operations:
-
Money withdrawal using debit card
-
List all the money withdrawal (mini bank statement)
This application uses following two tables for above operations:
-
debit_card
-
money_withdrawal
A debit card withdrawl operation is stored in debit_card table. Once the transaciton is successfully committed in the debit_card table, using Debezium's & Kafka connect the CDC is moved to Kafka. Once the message arrive in Kafka topic, using Spring Cloud Stream Stream Listener, an entry in made to money_withdrawl table. This table is used to create mini statement (query)
NOTE For sake of simplicity same DB is used but as can be seen - "a command to perform debit operation" is separated from mini statement.
Following picture shows architecture of this application:
To demonstrate OLAP capabilities, this application write mini statement to Kafka topic - "ministatement". From this topic, Druid picks it up and provide fast querying abilities
-
MySQL
-
Apache Kafka
-
Kafka Connect
-
Debezium
-
Spring Cloud Stream
-
Zookeeper
-
Druid
-
Docker
NOTE: This application is completely dockerized.
The complete reference architecture implementation is dockerized. Hence it takes just few minutes to run this application locally or any cloud provider of your choice.
Execute following steps to run the application:
- Build bank app
mvn clean install -DskipTests
- Run bank application complete infrastructure:
docker-compose up
- Instruct Kafka Connect to tail transaction log of MySQL DB and start sending messages as CDC to Kafka:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @mysqlsource.json --verbose
- Instruct Druid connect with Kafka
curl -XPOST -H'Content-Type: application/json' -d @bank-app-mini-statement-supervisor.json http://localhost:8081/druid/indexer/v1/supervisor
- Money withdrawal operation:
curl http://localhost:8080/moneywithdrawals -X POST --header 'Content-Type: application/json' -d '{"debitCard":"123456789", "amount": 10.00}' --verbose
- Mini statement fetching operation (query/read model)
curl http://localhost:8080/moneywithdrawals?debitCardId=123456789 --verbose
In section some of key concepts of CQRS & Event Sourcing are explained along with it's usage in analytics
Command Vs Event
Command – “submit order”
- A request (imperative statement)
- May fail
- May affect multiple aggregates
NOTE: Rebuild Aggregate State from Commands
Event – “order submitted”
- Statement of fact (past tense)
- Never fails
- May affect a single aggregate
NOTE: Rebuild Aggregate State from Events
Command to Event
Following digram explain creation of Event from a Command
Once a Command is converted to an Event, the state can be reterived as shown below:
Classical BI
A classical BI is shown below: NOT A NEW IDEA
Tying knots together - CQRS & Event Sourcing
Below shows CQRS & Event Sourcing:
But "?" is still missing block. This missing "?" is called "State" or "Materialized View"
- Query a RDMS? Old style
- RDMS is option (will become bottleneck as volume of data increases)
- Views are optimized for specific query use cases
- multiple view of same events
- Update is asynchronously delayed eventual consistency build to scale
- Easy to evolve or fix
- Change or fix logic; rebuild view from events event log is the source of truth not the view
- Views can be rebuilt from the events
Indexing is key concerns when "Command" is separated from "Query". Following diagram shows architecture to achieve this:
Snapshotting Snapshotting is one of the key concepts in "Event Sourcing". Following diagram shows this:
Event sourcing/cqrs drawback
- No “One-Size-Fits-All”
- Multiple “topic’ implementation
- Delayed reads
- No ACID transactions
- Additional complexity
Event sourcing/cqrs benefits
- No “One-Size-Fits-All”
- “topic’ are optimized for usecases
- Eventual consistency
- History, temporal queries
- Robust for data corruption
The Complete Picture
The below picture shows the end to end architecture using CQRS & Event Sourcing