Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gap detection to projections #221

Merged
merged 3 commits into from
Mar 1, 2020
Merged

Add gap detection to projections #221

merged 3 commits into from
Mar 1, 2020

Conversation

codeliner
Copy link
Member

@codeliner codeliner commented Feb 22, 2020

Fixes #189

The PR adds a "gap detection" mechanism to projections. This means, that if the projection detects a gap in the stream position while processing events, it pauses processing for a configurable time, before rereading events from the last known position.
Gap detection is disabled by default.

Enabling gap detection for a projection:

$gapDetection = new GapDetection();

$projection = $projectionManager->createProjection('test_projection', [
            PdoEventStoreProjector::OPTION_GAP_DETECTION => $gapDetection,
        ]);

Enabling gap detection for a read model projection with custom configuration:

$gapDetection = new GapDetection(
    // Configure retries in case a gap is detected
    [
        0, // First retry without any sleep time
        10, // Second retry after 10 ms
        30, // Wait another 30 ms before performing a third retry
    ],
    \DateInterval('PT60S') //Set detection window to 60s
);

$projection = $projectionManager->createReadModelProjection('test_projection', [
            PdoEventStoreReadModelProjector::OPTION_GAP_DETECTION => $gapDetection,
        ]);

Detection window

If you set a detection window, gap detection is only performed on events not older than NOW - window
If the detection window is set to PT60S, only events created within the last 60 seconds are checked. Be careful when setting Event::createdAt manually (for example during a migration).

@codeliner
Copy link
Member Author

Some more tests + documentation is missing, but I'd like to collect some feedback on the PR. We will run some experiments next week, to check if gap detection solves the issue of skipped events.

@coveralls
Copy link

coveralls commented Feb 22, 2020

Coverage Status

Coverage increased (+0.3%) to 84.267% when pulling 0c7fc05 on feature/gap_detection into 0681a63 on master.

src/Projection/GapDetection.php Show resolved Hide resolved
src/Projection/PdoEventStoreProjector.php Outdated Show resolved Hide resolved
src/Projection/PdoEventStoreProjector.php Show resolved Hide resolved
src/Projection/GapDetection.php Show resolved Hide resolved
@prolic
Copy link
Member

prolic commented Feb 24, 2020

I'm not sure about those predefined timing windows. Some transactions might be long-running and the sleep times are fairly short. From what I saw in the past, those gabs are pretty rare, so even if the projection falls behind for 5 secs, that's still acceptable in most situations, so maybe we should add some longer retry times as well. But without data that's only speculation.

@codeliner
Copy link
Member Author

"normal gaps" can occur very easily. They are caused by concurrency exceptions. That's the reason why I'd like to avoid a long sleep time by default. 50ms are already a very long time for 99% of write operations in a typical event sourced implementation.

@fritz-gerneth
Copy link
Contributor

This is pretty awesome stuff and I'll put this to testing on our side this week too!

@codeliner
Copy link
Member Author

Cool, looking forward to your results @fritz-gerneth
We performed first tests yesterday but still had gaps with the default sleep times. So I guess @prolic is right that we need longer defaults. But we're testing Postgres event-store without GET_LOCK.
Hopefully we can perform some more tests tomorrow.

GET_LOCK + gap detection should work with short sleep times, right? Because in that case faulty gaps only occur when one aggregate records multiple events with different payload size in one transaction.

@fritz-gerneth
Copy link
Contributor

GET_LOCK + gap detection should work with short sleep times, right? Because in that case faulty gaps only occur when one aggregate records multiple events with different payload size in one transaction.

I think so.

We ran error-prone operations a couple of thousand times today without any issue (default gap detection settings). Error-prone in the magnitude of "once every 1k - 2k saves" though..

@codeliner
Copy link
Member Author

Great news @fritz-gerneth 🎉

We can also confirm that for our case a longer sleep time fixed the issue (No GET_LOCK, only gap detection). We're trying to balance sleep time vs. gaps now.

So I'll definitely continue with the PR and finish it. Sleep times need to be documented anyway, but I think I'd go with @prolic suggestion and increase the default sleep time. If it is too slow for someone, they can simply override the setting and define their own.

@prolic
Copy link
Member

prolic commented Feb 28, 2020

I think you can still use the sleep times you have right now, just add more retries with larger intervals. I would be totally fine with a projection hanging 5 secs every now and then, when it means it processes correctly. Way better than handling stuff quick and still having gabs.

@ahmed-alaa
Copy link

@codeliner I tried the solution on Postgres with max retry of 100ms as well as windows time of 120s and without GET_LOCK strategy.

So far It works good without missing events within 10,000 events in average of 15-20 events/second.

Please Note: for example with our current implementation to override the PostgresSingleStreamStrategy or MySqlSingleStreamStrategy, this might break any service that does the same because of these changes 2b2f943#diff-38bb66b2f052fb5ca610f76079f40f3f, would be nice if could take that into consideration during the release :)

Thanks alot for the fix!

@codeliner
Copy link
Member Author

thx @ahmed-alaa I started a discussion about BC break for persistence strategies. This will not be included in a 1.12 release: #217 (comment)

@codeliner
Copy link
Member Author

codeliner commented Feb 28, 2020

@prolic I guess you mean this PR in your question, right?

Open Todos

  • more tests
  • Documentation

I try to finish it on Sunday.

@prolic
Copy link
Member

prolic commented Feb 28, 2020 via email

@codeliner codeliner changed the title WIP: Add gap detection to projections Add gap detection to projections Mar 1, 2020
@codeliner
Copy link
Member Author

@prolic ready to merge

@prolic prolic merged commit f26bcf0 into master Mar 1, 2020
@prolic prolic deleted the feature/gap_detection branch March 1, 2020 18:01
@keywinf
Copy link

keywinf commented May 11, 2020

Very nice work, I personally used a Fibonacci sequence (0, 5, 5, 10, 15, 25, ... until 2s) as GapDetection options. Thanks

@webdevilopers
Copy link

Open Todos

* [x]  more tests

* [x]  Documentation

I guess the OPTION_GAP_DETECTION option should be added here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MySQL Projections skipping events (SingleStreamStrategy)
8 participants