A simple high performance bank application using command sourcing.
-
Process around
200,000
write-requests per second on a singleleader
node.Result of sending 500k write-requests (deposit only) to the
leader
with 64 grpc connections (running on a MacBook Pro, 13-inch, M1, 16 GB, 2020): -
By adding more the
follower
nodes, theread
throughput can increase linearly, theoretically reachinginfinity
.
NOTE: This project is slated for significant performance enhancements through the implementation of Cap'n Proto
serialization (serde) and Cap'n Proto RPC
, or alternatively, technologies such as RSocket
.
Benchmarking results will be updated accordingly to reflect these improvements in due course.
The architecture resembles that of Lmax Architecture, but in a simplified form.
This is achieved by journaling command logs
into Kafka and by omitting the use of the replicator processor
.
-
cluster-app
:leader
node: handles all incoming commands, queries.follower
node: handles all incoming queries, replays command-logs published byleader
.learner
node: replays command-logs published byleader
, takes snapshot of state-machine.
-
client-app
interacts withcluster-app
viagrpc
protocol, provides Restful Api. Including modules:admin
user
- All commands requested from client-apps are published into an inbound ring-buffer (command-buffer).
- The commands are then grouped into chunks and then streamed into disk (kafka - one partition) when disruptor's
EventHandler
reachesendOfBatch
. - The business-logic consumer then processes all incoming commands in order to build
state-machine
. - Finally, the results are published into an out-bound ring-buffer (reply-buffer) in order to reply back to
client-apps
.
Before we dive in, let's go over a few preliminary notes:
- Only the
leader
node is responsible for writing thecommand-log
into the intocommand-log-storage
. - Meanwhile, the
learner
node is exclusively tasked with snapshotting the state machine.
- Assume that the latest offset in kafka is
x
and thelearner
replayscommand-log
up to m'th offset. - The
learner
snapshots thestate-machine
interval or everycommand-size
. - Assume that the
learner
snapshots up to n'th offset.- For
optimization
, thelearner
snapshots only thestates
that have changed from the last-snapshot-offset to n'th offset.
- For
- When the
cluster
(leader
,follower
orlearner
) restart, it first loads thesnapshot
, then replays thecommand-log
fromn + 1
'th offset to rebuild state-machine.- If there is no
snapshot
stored in thedatabase
, then thecluster
will replay allcommand-log
from the beginning.
- If there is no
The snapshot trigger and logic can be found in LearnerBootstrap -> startReplayMessage()
and ReplayBufferHandlerByLearner
.
cluster-core
: domain logic.cluster-app
: framework & transport layer, implementscluster-core
's interface ports.
Note: All producers
(or dispatcher
) and consumers
(or processor
) interacting with the same ring-buffer
are managed as children of a buffer-channel
.
- Journaling command logs.
- Replaying command logs.
- Managing state machine.
- Replicating state machine.
- Snapshotting state machine.
- Processing domain logic.
- Create balance.
- Deposit money.
- Withdraw money.
- Transfer money.
- Get balance by id.
- List all balances.
-
Admin
:- Create balance.
- Deposit money.
- Withdraw money.
- Transfer money.
- List all balances.
- Get balance by id.
-
User
:- Get current balance.
- Deposit money.
- Withdraw money.
- Transfer money.
cluster-core
: Domain logic.cluster-app
: Implementscluster-core
and provides transport-layer (ex: grpc), framework-layer.client-core
: Provideslibs
to interacts withcluster
, providesrequest-reply
channel for incoming requests.client
: Interacts withcluster-app
, providersapi-resource
.
make help
- Setup dev environment
make setup-dev
- Start
leader
node - processing read and write requests
make run-leader
- Start
follower
node - processing read requests
make run-follower
- Start
learner
node - snapshotting state machine
make run-learner
- Start
admin
app - CRUD app
make run-admin
- Start
user
app (Not available yet)
make run-user
In order to test grpc server, you can use portman to send message like this
We use ghz
(link) as a benchmarking and load testing tool.
ghz --insecure --proto ./bank-libs/bank-cluster-proto/src/main/proto/balance.proto \
--call gc.garcol.bank.proto.BalanceQueryService/sendQuery \
-d '{"singleBalanceQuery": {"id": 1,"correlationId": "random-uuid"}}' \
-c 200 -n 100000 \
127.0.0.1:9500
We use autocannon
(it can produce more load than wrk
and wrk2
).
See BENCHMARK for more details.