Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshotting with event store #6

Open
rbanks54 opened this issue Sep 15, 2016 · 11 comments
Open

Snapshotting with event store #6

rbanks54 opened this issue Sep 15, 2016 · 11 comments

Comments

@rbanks54
Copy link
Owner

In the domain services, to avoid long rebuilds of model state show how snapshotting can be used.

For the sample we might do snapshotting at every 5 events.

@dasiths
Copy link

dasiths commented Oct 31, 2016

Would the best design practice be using a different storage provider for snapshots (Like Redis) that caters for a quick lookup by a key?

@rbanks54
Copy link
Owner Author

Not really. It's better to keep using EventStore as a single repository, with a separate stream for the snapshots.

i.e. if we have a event stream for a domain object such as product-xyz, we could have an event stream named product-xyz-snapshot.

In the snapshot stream we store the current of the domain object and the version of the stream as at the time the snapshot was made.

To rehydrate objects we:

  • read backwards in the snapshot stream to get the last snapshotted state
  • read and replay any events in the domain stream after the version in the snapshot.

Does that make sense?

@dasiths
Copy link

dasiths commented Nov 1, 2016

Yes it does make sense.

I've implemented my SnapshotStorageProvider in a very similar way. https://github.com/dasiths/NEventLite/blob/master/NEventLite%20Storage%20Providers/EventStore/EventstoreSnapshotStorageProvider.cs

But I replaced it with Redis and got better read times. The only drawback with the Redis cache was how my implementation always overwrote the last entry with the new one. I wasn't worried about contention as I always stored the version number too. Is there a specific need for storing past snapshots? Is that for cases when we have to rebuild to a state faster at a specific past date/time?

@rbanks54
Copy link
Owner Author

rbanks54 commented Nov 3, 2016

Having the snapshots and event streams stored in different places means you have to think about transactional boundaries or accept potential data loss. Of course, the loss of a snapshot isn't a problem, really, given it's just another read model and thus easy to rebuild. Plus the next event for the domain entity would trigger the creation of a new snapshot.

The other thing to consider is when rebuilding state you query two separate data sources, so your code will possibly be a little more complex, and you're making two database connections instead of one.

Yes, multiple snapshots makes it easier to calculate state at a point in time, but there's not many domains where that ability is a feature so it might not be necessary for you. If you do need it, you could still support it with redis by storing a SET (i.e. collection) of snapshots related to a domain entity.

@feanz
Copy link

feanz commented Mar 10, 2017

Could you not just add snapshots to the current stream as an event. You're going to have to come up with some sort of snap shot event anyway I assume if you use separate streams or a single stream. Then you would just read back to the last snapshot and replay the events upto the snapshot over the top.

What do you think?

@dasiths
Copy link

dasiths commented Mar 10, 2017

This introduces contention issues afaik. Calculating the snapshot and storing it in the right place in the stream require locking. Using a separate stream we just store the snapshot with version number and then read all events from that number forward to rehydrate. This way the second stream (snapshot stream) doesn't suffer from contention issues/locks.

@feanz
Copy link

feanz commented Mar 10, 2017

Cool thanks for the info. So I guess the million dollar question is when you apply snapshots and how. Out of band batch process or inline with updates to the domain model. Usually as with most things I'm guessing the answer to this is it depends 😄

@dasiths
Copy link

dasiths commented Mar 10, 2017

Have a look here if you're keen https://dasith.me/2016/12/31/event-sourcing-examined-part-2-of-3/

@feanz
Copy link

feanz commented Mar 10, 2017

Thanks man very useful info.

@rbanks54
Copy link
Owner Author

rbanks54 commented Mar 10, 2017

@feanz @dasiths If I assume "current stream" means "aggregate root's stream", then snapshotting that way is a bad idea for a few reasons:

  1. Snapshotting is an optimisation for the application, not a domain event for the aggregate root object. It doesn't belong in the aggregates event stream.
  2. If I want to rebuild my aggregate from the last snapshot, I have to scan backwards through all the events until I find the last snapshot, then replay forward again.
  3. If I wanted to replay all the events for an aggregate root (bug fixes, new read models, etc) then I need to ensure I skip the snapshots during replay.
  4. Let's say there's a bug fix with event state that caused snapshots with incorrect state to be saved. I can't fix those snapshots in the event stream because event streams are immutable. I'd have to dump the existing stream and recreate it from scratch, recreating new snapshots as I go.

The best approach is to store a separate snapshot stream for an aggregate root, similar to an event stream. Rebuilding state simply involves reading the last snapshot in the snapshot stream and reading the last part of event stream to get events that have occurred since the last snapshot.

If you have a bug that requires you to rebuild state, simply delete the snapshot stream. The next event that occurs on the aggregate should then trigger a new snapshot, which will have bug fixes applied.

As to how often you snapshot? It's somewhat up to you and your performance criteria, and how volatile and aggregate root is (i.e. how often new events occur for an object). Maybe start with snapshotting every few hundred events and seeing if that helps or not. Adjust up or down to suit the performance profile of your application.

When you save an event to storage you should be able to create a new snapshot (if applicable) in the same transaction.

@gromas
Copy link

gromas commented Nov 20, 2019

Hi, my two cente for the problem.

I have two different streams one for domain events and one for aggregate snapshots. Moreover I have one "system stream" whitch describes both "events" and "snapshots" streams as "aggregate root event source" and adds "event stream rotation" logic for decomposing both streams to timeslices. When I need restore application state then I read system stream first and resolve last events stream uri and snapshots stream uri then read latest snapshot slice and restore aggregate state from the snapshot, then I read latest events slice and restore full aggregate state. By checkpoint I close event stream and create new one, then execute snapshoting service which generates snapshot based on previous snapshot and events was published before the checkpont. Then snapshot service publishes newly generated snapshots as events to snapshot stream and write to system stream than snapshot stream uri was changed. I generate new snapshots for 1 hours based interval and rotate events stream each 120 seconds while processing over 9mlns domain events per minute.

PS: sorry for bad english )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants