Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a kafka streams/event stores based event log #47

Open
SemanticBeeng opened this issue May 24, 2018 · 6 comments
Open

Provide a kafka streams/event stores based event log #47

SemanticBeeng opened this issue May 24, 2018 · 6 comments

Comments

@SemanticBeeng
Copy link

SemanticBeeng commented May 24, 2018

Feature request as per

https://blog.softwaremill.com/event-sourcing-using-kafka-53dfd72ad45d

"Hence, we need a better way. That’s where kafka-streams and state stores come into play. Kafka-streams applications run across a cluster of nodes, which jointly consume some topics."

This way, the concepts of "entity" from akka persistence and "runtime state" from event stores complement each other.
Ideally akka persistence runtime would not be necessary to run aecor.

@SemanticBeeng SemanticBeeng changed the title Proved a kafka streams/event stores based event log Provide a kafka streams/event stores based event log May 24, 2018
@notxcain
Copy link
Owner

notxcain commented Oct 2, 2018

@SemanticBeeng that's the topic I thought a lot about. When you use Kafka as an event log you loose the ability to query events by entity key. Imagine the time you need to recover the whole system in case you need to change how you fold an entity state? Also, runtime state that uses KTable or similar is eventually consistent which is acceptable for read side, but not for write side.

@SemanticBeeng
Copy link
Author

Please help me understand better your points, especially "acceptable for read side, but not for write side.".

But for "ability to query events by entity key." we could use a separate topic per entity.
And then there is KQL: https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html

Again, please explain a bit deeper to see what I am missing since have not thought as much as you.

@schrepfler
Copy link

I think there are limitations to separate topic per entity over Kafka as (at least until the previous version) it would scale up to around ~10k topics. I think this has been increased the order of ~100k but still it's not the most efficient way of using Kafka.
Being a durable topic Kafka is typically used by consuming the messages as a consumers group (multiple nodes connected to the brokers) and it's messages are guaranteed to be ordered within a partition, hence these are the dimensions you can play with in order to increase parallel processing. In this way, if you partition by entity ID you'd know you're consistently sending the messages for a given entity to the same host, which I think is what you want.
In the Lagom world, if you have say a 3 host distributed cluster, what you can do is process all the messages as a consumer group and if you're not partitioning by key or you don't have a good partition key then they route the message via akka remoting to the actual cluster node that would host the persisted entity (not sure on the mechanism).

Apache Pulsar is also an interesting middleware to explore as a target, it supports both a topic/pub-sub interface as well as a queue interface (suitable for work scheduling for example) and it scales to millions of topics, therefore it's conceivable you could have a topic for each host in your akka cluster processing and therefore if you're doing something similar to an Ask pattern you might be able to implement a direct (and durable) return to sender reply without finishing to be routed via akka remoting (but then you need to figure out what happens if the host which Asked something dies).

Apologies if this explains something obvious.

I also didn't understand what was meant by "acceptable for read side, but not for write side", in Lagom Read Side Processor designates a role processes the Events (in the Command (input)-> State (hidden) -> Event (output) sense) and then prepares some sort of a view which is query-able or more optimised for some sort of read pattern. Are you talking about the problem arising of re-processing Commands which requires some sort of strategy to de-dupe the already processed ones?

@notxcain
Copy link
Owner

notxcain commented Dec 5, 2018

@schrepfler what I mean is that you can’t use eventually consistent KTable to validate commands against.

@SemanticBeeng
Copy link
Author

SemanticBeeng commented Jan 3, 2019

From latest post of Vladimir : https://pavkin.ru/aecor-part-4a/

Highlight of similarity between the event journal in context of event sourcing and kafka "log".

image

And mention to kafka-journal.
Not sure how much from this could be reused for the purpose of this issue.

@notxcain
Copy link
Owner

notxcain commented Jun 6, 2019

There is a promising Akka Persistence journal implementation based on both Cassandra and Kafka. You may want to take a look at the trade-offs made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants