Make Ringbuffer not throw StaleSequenceException on read-many operation #16303

jbartok · 2019-12-19T09:30:26Z

The problem we want to fix is that when producers are fast, consumers can have a really hard time keeping- and catching up with them (they often fail to do so, even when they would be able to process fast enough). For example if your head is at 1, and you ask for 100 items from 0, the Ringbuffer.readManyAsync call just throws an exception, instead of returning what’s available. By the time the exception travels to the consumer, gets processed and a new request is made, the requested sequence number tends to be stale again and will result in yet another exception. Meanwhile data is being overwritten in the Ringbuffer and is lost to the consumer.

The solution aims to remove the StaleSequenceException being thrown by the ReadManyOperation in such situations and just have it return the data that's available.

We are able to do this for Ringbuffer.readManyAsync, because it returns a ReadResultSet carrying sequence numbers, thus allowing the client to notice potential sequence gaps and decide if it can tolerate them or not. On the other had we can't do the same for Ringbuffer.readOne because there is no way to observe sequence numbers in that case (it returns just an element from the Ringbuffer).

cangencer · 2019-12-19T12:17:23Z

readManyAsync javadoc would need updating also.

hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java

cangencer · 2019-12-20T18:04:01Z

run-lab-run

mmedenjak

Looks good, added some minor points for discussion and improvement.

hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java

hazelcast/src/main/java/com/hazelcast/ringbuffer/impl/operations/ReadManyOperation.java

hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java

cangencer · 2019-12-26T10:06:36Z

@jbartok could you add description of what this fixes?

hazelcast/src/main/java/com/hazelcast/ringbuffer/impl/operations/ReadManyOperation.java

hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java

hazelcast/src/test/java/com/hazelcast/client/ringbuffer/RingbufferTest.java

jbartok · 2019-12-27T07:57:33Z

@jbartok could you add description of what this fixes?

Added to PR description.

mmedenjak · 2020-01-02T10:51:25Z

@ihsandemir are all review comments addressed?

mmedenjak · 2020-01-07T14:13:27Z

Thank you for the PR and the reviews, everyone!

Because of the behaviour change introduced in hazelcast#16303, when the requested sequence is larger than the largest sequence (tailSequence) + 1, we don't listen from the oldest sequence (headSequence) but rather from the tailSequence + 1. Both approaches are fine and both approaches work better in some scenarios. Since the listener is loss tolerant, we can skip items from headSequence..tailSequence+1 anyway. Fixed the test to adhere to the new behaviour. We assume that eventually as we publish an item, it will reach the listener. A better fix would be to introduce unique IDs per ringbuffer, where we would then be able to distinguish between a completely lost ringbuffer and a ringbuffer which has not received the last few items and appropriately reset the requested sequence to the headSequence or tailSequence. Fixes: hazelcast#16430

Fix LossToleranceTest Because of the behaviour change introduced in #16303, when the requested sequence is larger than the largest sequence (tailSequence) + 1, we don't listen from the oldest sequence (headSequence) but rather from the tailSequence + 1. Both approaches are fine and both approaches work better in some scenarios. Since the listener is loss tolerant, we can skip items from headSequence..tailSequence+1 anyway. Fixed the test to adhere to the new behaviour. We assume that eventually as we publish an item, it will reach the listener. A better fix would be to introduce unique IDs per ringbuffer, where we would then be able to distinguish between a completely lost ringbuffer and a ringbuffer which has not received the last few items and appropriately reset the requested sequence to the headSequence or tailSequence. Fixes: #16430

Fixes hazelcast#19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from hazelcast#19630. Fixes hazelcast#16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in hazelcast#16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member.

Fixes #19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from #19630. Fixes #16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in #16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member.

Fixes hazelcast#19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from hazelcast#19630. Fixes hazelcast#16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in hazelcast#16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member. Backport of hazelcast#19788

* Fix Ringbuffer test failures [5.0.z] Fixes #19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from #19630. Fixes #16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in #16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member. Backport of #19788

Bartok Jozsef added 3 commits December 19, 2019 11:29

Modify ReadManyOperation behaviour

6b90e66

Adapt tests to changes

c749c37

Attempt to update MessageRunner

0c353ce

Bartok Jozsef added 4 commits December 19, 2019 14:21

Remove StaleSequenceException handling

f6030da

Fix issues in MessageRunner

67cec39

Adjust tests

a5caf3b

Clean-up

94440fa

jbartok marked this pull request as ready for review December 20, 2019 11:41

jbartok requested a review from a team as a code owner December 20, 2019 11:41

jbartok requested a review from mmedenjak December 20, 2019 11:41

cangencer mentioned this pull request Dec 20, 2019

Return result to caller hazelcast/hazelcast-jet#1729

Merged

mmedenjak added Module: Ringbuffer Team: Core Type: Enhancement labels Dec 20, 2019

mmedenjak added this to the 4.0 milestone Dec 20, 2019

mmedenjak added the Add to Release Notes label Dec 20, 2019

mmedenjak reviewed Dec 20, 2019

View reviewed changes

hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java Outdated Show resolved Hide resolved

cangencer force-pushed the ringbuffer_improvement branch 5 times, most recently from b731826 to 3c8aa0c Compare December 20, 2019 16:12

Adjust tests so that the message is published after subscription

3c8aa0c

mmedenjak approved these changes Dec 24, 2019

View reviewed changes

Bartok Jozsef added 4 commits December 24, 2019 14:11

Remove redundant code

52d7683

Revert unneeded change

5577a88

Improve loss detection

7e46ebe

Touchups

990a9bc

cangencer reviewed Dec 25, 2019

View reviewed changes

hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java Show resolved Hide resolved

ihsandemir self-requested a review December 26, 2019 08:32

ihsandemir reviewed Dec 26, 2019

View reviewed changes

Bartok Jozsef added 2 commits December 27, 2019 10:13

Address review concerns

bcf76c4

Checkstyle fixes

88406c9

mmedenjak requested a review from ihsandemir January 3, 2020 15:06

hazelcast deleted a comment from jbartok Jan 7, 2020

ihsandemir approved these changes Jan 7, 2020

View reviewed changes

mmedenjak merged commit f5328fc into hazelcast:master Jan 7, 2020

jbartok deleted the ringbuffer_improvement branch January 7, 2020 14:54

mmedenjak mentioned this pull request Jan 8, 2020

LossToleranceTest.whenLossTolerant_andOwnerCrashes_thenContinue #16430

Closed

mmedenjak mentioned this pull request Jan 8, 2020

Fix LossToleranceTest #16440

Merged

mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020

mdumandag mentioned this pull request Jun 13, 2020

Fix ReliableTopic implementation hazelcast/hazelcast-nodejs-client#543

Closed

frant-hartm mentioned this pull request Oct 15, 2021

Fix Ringbuffer test failures #19788

Merged

4 tasks

frant-hartm mentioned this pull request Oct 22, 2021

Fix Ringbuffer test failures [5.0.z] #19835

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Ringbuffer not throw StaleSequenceException on read-many operation #16303

Make Ringbuffer not throw StaleSequenceException on read-many operation #16303

jbartok commented Dec 19, 2019 •

edited

cangencer commented Dec 19, 2019

cangencer commented Dec 20, 2019

mmedenjak left a comment

cangencer commented Dec 26, 2019

jbartok commented Dec 27, 2019

mmedenjak commented Jan 2, 2020

mmedenjak commented Jan 7, 2020

Make Ringbuffer not throw StaleSequenceException on read-many operation #16303

Make Ringbuffer not throw StaleSequenceException on read-many operation #16303

Conversation

jbartok commented Dec 19, 2019 • edited

cangencer commented Dec 19, 2019

cangencer commented Dec 20, 2019

mmedenjak left a comment

Choose a reason for hiding this comment

cangencer commented Dec 26, 2019

jbartok commented Dec 27, 2019

mmedenjak commented Jan 2, 2020

mmedenjak commented Jan 7, 2020

jbartok commented Dec 19, 2019 •

edited