Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Ringbuffer not throw StaleSequenceException on read-many operation #16303

Merged
merged 15 commits into from Jan 7, 2020

Conversation

jbartok
Copy link
Contributor

@jbartok jbartok commented Dec 19, 2019

The problem we want to fix is that when producers are fast, consumers can have a really hard time keeping- and catching up with them (they often fail to do so, even when they would be able to process fast enough). For example if your head is at 1, and you ask for 100 items from 0, the Ringbuffer.readManyAsync call just throws an exception, instead of returning what’s available. By the time the exception travels to the consumer, gets processed and a new request is made, the requested sequence number tends to be stale again and will result in yet another exception. Meanwhile data is being overwritten in the Ringbuffer and is lost to the consumer.

The solution aims to remove the StaleSequenceException being thrown by the ReadManyOperation in such situations and just have it return the data that's available.

We are able to do this for Ringbuffer.readManyAsync, because it returns a ReadResultSet carrying sequence numbers, thus allowing the client to notice potential sequence gaps and decide if it can tolerate them or not. On the other had we can't do the same for Ringbuffer.readOne because there is no way to observe sequence numbers in that case (it returns just an element from the Ringbuffer).

@cangencer
Copy link
Contributor

readManyAsync javadoc would need updating also.

@cangencer cangencer force-pushed the ringbuffer_improvement branch 5 times, most recently from b731826 to 3c8aa0c Compare December 20, 2019 16:12
@cangencer
Copy link
Contributor

run-lab-run

Copy link
Contributor

@mmedenjak mmedenjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some minor points for discussion and improvement.

@ihsandemir ihsandemir self-requested a review December 26, 2019 08:32
@cangencer
Copy link
Contributor

@jbartok could you add description of what this fixes?

@jbartok
Copy link
Contributor Author

jbartok commented Dec 27, 2019

@jbartok could you add description of what this fixes?

Added to PR description.

@mmedenjak
Copy link
Contributor

@ihsandemir are all review comments addressed?

@hazelcast hazelcast deleted a comment from jbartok Jan 7, 2020
@hazelcast hazelcast deleted a comment from jbartok Jan 7, 2020
@mmedenjak mmedenjak merged commit f5328fc into hazelcast:master Jan 7, 2020
@mmedenjak
Copy link
Contributor

Thank you for the PR and the reviews, everyone!

@jbartok jbartok deleted the ringbuffer_improvement branch January 7, 2020 14:54
mmedenjak pushed a commit to mmedenjak/hazelcast that referenced this pull request Jan 8, 2020
Because of the behaviour change introduced in
hazelcast#16303, when the requested
sequence is larger than the largest sequence (tailSequence) + 1, we
don't listen from the oldest sequence (headSequence) but rather from the
tailSequence + 1. Both approaches are fine and both approaches work
better in some scenarios. Since the listener is loss tolerant, we can
skip items from headSequence..tailSequence+1 anyway.

Fixed the test to adhere to the new behaviour. We assume that eventually
as we publish an item, it will reach the listener.

A better fix would be to introduce unique IDs per ringbuffer, where we
would then be able to distinguish between a completely lost ringbuffer
and a ringbuffer which has not received the last few items and
appropriately reset the requested sequence to the headSequence or
tailSequence.

Fixes: hazelcast#16430
@mmedenjak mmedenjak mentioned this pull request Jan 8, 2020
mmedenjak pushed a commit to mmedenjak/hazelcast that referenced this pull request Jan 9, 2020
Because of the behaviour change introduced in
hazelcast#16303, when the requested
sequence is larger than the largest sequence (tailSequence) + 1, we
don't listen from the oldest sequence (headSequence) but rather from the
tailSequence + 1. Both approaches are fine and both approaches work
better in some scenarios. Since the listener is loss tolerant, we can
skip items from headSequence..tailSequence+1 anyway.

Fixed the test to adhere to the new behaviour. We assume that eventually
as we publish an item, it will reach the listener.

A better fix would be to introduce unique IDs per ringbuffer, where we
would then be able to distinguish between a completely lost ringbuffer
and a ringbuffer which has not received the last few items and
appropriately reset the requested sequence to the headSequence or
tailSequence.

Fixes: hazelcast#16430
mmedenjak added a commit that referenced this pull request Jan 9, 2020
Fix LossToleranceTest

Because of the behaviour change introduced in
#16303, when the requested
sequence is larger than the largest sequence (tailSequence) + 1, we
don't listen from the oldest sequence (headSequence) but rather from the
tailSequence + 1. Both approaches are fine and both approaches work
better in some scenarios. Since the listener is loss tolerant, we can
skip items from headSequence..tailSequence+1 anyway.

Fixed the test to adhere to the new behaviour. We assume that eventually
as we publish an item, it will reach the listener.

A better fix would be to introduce unique IDs per ringbuffer, where we
would then be able to distinguish between a completely lost ringbuffer
and a ringbuffer which has not received the last few items and
appropriately reset the requested sequence to the headSequence or
tailSequence.

Fixes: #16430
@mmedenjak mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020
frant-hartm added a commit to frant-hartm/hazelcast that referenced this pull request Oct 15, 2021
Fixes hazelcast#19696
ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved
Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved
The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey
This also addresses review comment from hazelcast#19630.

Fixes hazelcast#16469
RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer
The stress test is incorrect, the ReadManyOperation doesn't throw the
StaleSequenceException when head is stale and the items are just missing
in the result. This was introduced in hazelcast#16303.

Also fixed HashMap->ConcurrentHashMap in RingbufferService.
This Map is modified from operation thread when RingbufferContainer is
created and also from destroyContainer, which may run directly on the
user's thread when the ringbuffer is local on the member.
@frant-hartm frant-hartm mentioned this pull request Oct 15, 2021
4 tasks
frant-hartm added a commit to frant-hartm/hazelcast that referenced this pull request Oct 15, 2021
Fixes hazelcast#19696
ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved
Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved
The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey
This also addresses review comment from hazelcast#19630.

Fixes hazelcast#16469
RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer
The stress test is incorrect, the ReadManyOperation doesn't throw the
StaleSequenceException when head is stale and the items are just missing
in the result. This was introduced in hazelcast#16303.

Also fixed HashMap->ConcurrentHashMap in RingbufferService.
This Map is modified from operation thread when RingbufferContainer is
created and also from destroyContainer, which may run directly on the
user's thread when the ringbuffer is local on the member.
frant-hartm added a commit to frant-hartm/hazelcast that referenced this pull request Oct 18, 2021
Fixes hazelcast#19696
ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved
Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved
The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey
This also addresses review comment from hazelcast#19630.

Fixes hazelcast#16469
RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer
The stress test is incorrect, the ReadManyOperation doesn't throw the
StaleSequenceException when head is stale and the items are just missing
in the result. This was introduced in hazelcast#16303.

Also fixed HashMap->ConcurrentHashMap in RingbufferService.
This Map is modified from operation thread when RingbufferContainer is
created and also from destroyContainer, which may run directly on the
user's thread when the ringbuffer is local on the member.
frant-hartm added a commit that referenced this pull request Oct 22, 2021
Fixes #19696
ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved
Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved
The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey
This also addresses review comment from #19630.

Fixes #16469
RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer
The stress test is incorrect, the ReadManyOperation doesn't throw the
StaleSequenceException when head is stale and the items are just missing
in the result. This was introduced in #16303.

Also fixed HashMap->ConcurrentHashMap in RingbufferService.
This Map is modified from operation thread when RingbufferContainer is
created and also from destroyContainer, which may run directly on the
user's thread when the ringbuffer is local on the member.
frant-hartm added a commit to frant-hartm/hazelcast that referenced this pull request Oct 22, 2021
Fixes hazelcast#19696
ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved
Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved
The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey
This also addresses review comment from hazelcast#19630.

Fixes hazelcast#16469
RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer
The stress test is incorrect, the ReadManyOperation doesn't throw the
StaleSequenceException when head is stale and the items are just missing
in the result. This was introduced in hazelcast#16303.

Also fixed HashMap->ConcurrentHashMap in RingbufferService.
This Map is modified from operation thread when RingbufferContainer is
created and also from destroyContainer, which may run directly on the
user's thread when the ringbuffer is local on the member.

Backport of hazelcast#19788
frant-hartm added a commit that referenced this pull request Oct 25, 2021
* Fix Ringbuffer test failures [5.0.z]

Fixes #19696
ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved
Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved
The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey
This also addresses review comment from #19630.

Fixes #16469
RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer
The stress test is incorrect, the ReadManyOperation doesn't throw the
StaleSequenceException when head is stale and the items are just missing
in the result. This was introduced in #16303.

Also fixed HashMap->ConcurrentHashMap in RingbufferService.
This Map is modified from operation thread when RingbufferContainer is
created and also from destroyContainer, which may run directly on the
user's thread when the ringbuffer is local on the member.

Backport of #19788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants