Provide a kafka streams/event stores based event log #47

SemanticBeeng · 2018-05-24T18:02:50Z

Feature request as per

https://blog.softwaremill.com/event-sourcing-using-kafka-53dfd72ad45d

"Hence, we need a better way. That’s where kafka-streams and state stores come into play. Kafka-streams applications run across a cluster of nodes, which jointly consume some topics."

This way, the concepts of "entity" from akka persistence and "runtime state" from event stores complement each other.
Ideally akka persistence runtime would not be necessary to run aecor.

The text was updated successfully, but these errors were encountered:

notxcain · 2018-10-02T14:50:13Z

@SemanticBeeng that's the topic I thought a lot about. When you use Kafka as an event log you loose the ability to query events by entity key. Imagine the time you need to recover the whole system in case you need to change how you fold an entity state? Also, runtime state that uses KTable or similar is eventually consistent which is acceptable for read side, but not for write side.

SemanticBeeng · 2018-10-02T15:23:37Z

Please help me understand better your points, especially "acceptable for read side, but not for write side.".

But for "ability to query events by entity key." we could use a separate topic per entity.
And then there is KQL: https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html

Again, please explain a bit deeper to see what I am missing since have not thought as much as you.

schrepfler · 2018-12-05T03:11:52Z

I think there are limitations to separate topic per entity over Kafka as (at least until the previous version) it would scale up to around ~10k topics. I think this has been increased the order of ~100k but still it's not the most efficient way of using Kafka.
Being a durable topic Kafka is typically used by consuming the messages as a consumers group (multiple nodes connected to the brokers) and it's messages are guaranteed to be ordered within a partition, hence these are the dimensions you can play with in order to increase parallel processing. In this way, if you partition by entity ID you'd know you're consistently sending the messages for a given entity to the same host, which I think is what you want.
In the Lagom world, if you have say a 3 host distributed cluster, what you can do is process all the messages as a consumer group and if you're not partitioning by key or you don't have a good partition key then they route the message via akka remoting to the actual cluster node that would host the persisted entity (not sure on the mechanism).

Apache Pulsar is also an interesting middleware to explore as a target, it supports both a topic/pub-sub interface as well as a queue interface (suitable for work scheduling for example) and it scales to millions of topics, therefore it's conceivable you could have a topic for each host in your akka cluster processing and therefore if you're doing something similar to an Ask pattern you might be able to implement a direct (and durable) return to sender reply without finishing to be routed via akka remoting (but then you need to figure out what happens if the host which Asked something dies).

Apologies if this explains something obvious.

I also didn't understand what was meant by "acceptable for read side, but not for write side", in Lagom Read Side Processor designates a role processes the Events (in the Command (input)-> State (hidden) -> Event (output) sense) and then prepares some sort of a view which is query-able or more optimised for some sort of read pattern. Are you talking about the problem arising of re-processing Commands which requires some sort of strategy to de-dupe the already processed ones?

notxcain · 2018-12-05T14:48:11Z

@schrepfler what I mean is that you can’t use eventually consistent KTable to validate commands against.

SemanticBeeng · 2019-01-03T08:44:15Z

From latest post of Vladimir : https://pavkin.ru/aecor-part-4a/

Highlight of similarity between the event journal in context of event sourcing and kafka "log".

And mention to kafka-journal.
Not sure how much from this could be reused for the purpose of this issue.

notxcain · 2019-06-06T12:55:12Z

There is a promising Akka Persistence journal implementation based on both Cassandra and Kafka. You may want to take a look at the trade-offs made.

SemanticBeeng changed the title ~~Proved a kafka streams/event stores based event log~~ Provide a kafka streams/event stores based event log May 24, 2018

SemanticBeeng mentioned this issue Dec 17, 2018

Composable event sourcing #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a kafka streams/event stores based event log #47

Provide a kafka streams/event stores based event log #47

SemanticBeeng commented May 24, 2018 •

edited

notxcain commented Oct 2, 2018

SemanticBeeng commented Oct 2, 2018

schrepfler commented Dec 5, 2018

notxcain commented Dec 5, 2018

SemanticBeeng commented Jan 3, 2019 •

edited

notxcain commented Jun 6, 2019 •

edited

Provide a kafka streams/event stores based event log #47

Provide a kafka streams/event stores based event log #47

Comments

SemanticBeeng commented May 24, 2018 • edited

notxcain commented Oct 2, 2018

SemanticBeeng commented Oct 2, 2018

schrepfler commented Dec 5, 2018

notxcain commented Dec 5, 2018

SemanticBeeng commented Jan 3, 2019 • edited

notxcain commented Jun 6, 2019 • edited

SemanticBeeng commented May 24, 2018 •

edited

SemanticBeeng commented Jan 3, 2019 •

edited

notxcain commented Jun 6, 2019 •

edited