New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Ringbuffer not throw StaleSequenceException on read-many operation #16303
Conversation
readManyAsync javadoc would need updating also. |
hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java
Outdated
Show resolved
Hide resolved
b731826
to
3c8aa0c
Compare
run-lab-run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added some minor points for discussion and improvement.
hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java
Outdated
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java
Outdated
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/ringbuffer/impl/operations/ReadManyOperation.java
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/ringbuffer/impl/operations/ReadManyOperation.java
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java
Show resolved
Hide resolved
@jbartok could you add description of what this fixes? |
hazelcast/src/main/java/com/hazelcast/ringbuffer/impl/operations/ReadManyOperation.java
Outdated
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/ringbuffer/impl/operations/ReadManyOperation.java
Show resolved
Hide resolved
hazelcast/src/main/java/com/hazelcast/topic/impl/reliable/MessageRunner.java
Show resolved
Hide resolved
hazelcast/src/test/java/com/hazelcast/client/ringbuffer/RingbufferTest.java
Outdated
Show resolved
Hide resolved
hazelcast/src/test/java/com/hazelcast/client/ringbuffer/RingbufferTest.java
Outdated
Show resolved
Hide resolved
Added to PR description. |
@ihsandemir are all review comments addressed? |
Thank you for the PR and the reviews, everyone! |
Because of the behaviour change introduced in hazelcast#16303, when the requested sequence is larger than the largest sequence (tailSequence) + 1, we don't listen from the oldest sequence (headSequence) but rather from the tailSequence + 1. Both approaches are fine and both approaches work better in some scenarios. Since the listener is loss tolerant, we can skip items from headSequence..tailSequence+1 anyway. Fixed the test to adhere to the new behaviour. We assume that eventually as we publish an item, it will reach the listener. A better fix would be to introduce unique IDs per ringbuffer, where we would then be able to distinguish between a completely lost ringbuffer and a ringbuffer which has not received the last few items and appropriately reset the requested sequence to the headSequence or tailSequence. Fixes: hazelcast#16430
Because of the behaviour change introduced in hazelcast#16303, when the requested sequence is larger than the largest sequence (tailSequence) + 1, we don't listen from the oldest sequence (headSequence) but rather from the tailSequence + 1. Both approaches are fine and both approaches work better in some scenarios. Since the listener is loss tolerant, we can skip items from headSequence..tailSequence+1 anyway. Fixed the test to adhere to the new behaviour. We assume that eventually as we publish an item, it will reach the listener. A better fix would be to introduce unique IDs per ringbuffer, where we would then be able to distinguish between a completely lost ringbuffer and a ringbuffer which has not received the last few items and appropriately reset the requested sequence to the headSequence or tailSequence. Fixes: hazelcast#16430
Fix LossToleranceTest Because of the behaviour change introduced in #16303, when the requested sequence is larger than the largest sequence (tailSequence) + 1, we don't listen from the oldest sequence (headSequence) but rather from the tailSequence + 1. Both approaches are fine and both approaches work better in some scenarios. Since the listener is loss tolerant, we can skip items from headSequence..tailSequence+1 anyway. Fixed the test to adhere to the new behaviour. We assume that eventually as we publish an item, it will reach the listener. A better fix would be to introduce unique IDs per ringbuffer, where we would then be able to distinguish between a completely lost ringbuffer and a ringbuffer which has not received the last few items and appropriately reset the requested sequence to the headSequence or tailSequence. Fixes: #16430
Fixes hazelcast#19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from hazelcast#19630. Fixes hazelcast#16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in hazelcast#16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member.
Fixes hazelcast#19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from hazelcast#19630. Fixes hazelcast#16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in hazelcast#16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member.
Fixes hazelcast#19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from hazelcast#19630. Fixes hazelcast#16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in hazelcast#16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member.
Fixes #19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from #19630. Fixes #16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in #16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member.
Fixes hazelcast#19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from hazelcast#19630. Fixes hazelcast#16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in hazelcast#16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member. Backport of hazelcast#19788
* Fix Ringbuffer test failures [5.0.z] Fixes #19696 ReliableTopicDestroyTest.whenDestroyedThenRingbufferRemoved Created a simpler reproducer RingbufferDestroyTest.whenDestroyAfterAdd_thenRingbufferRemoved The cause was recreation of the RingbufferContainer in ReadOneOperation.getWaitKey This also addresses review comment from #19630. Fixes #16469 RingbufferAddAllReadManyStressTest.whenShortTTLAndBigBuffer The stress test is incorrect, the ReadManyOperation doesn't throw the StaleSequenceException when head is stale and the items are just missing in the result. This was introduced in #16303. Also fixed HashMap->ConcurrentHashMap in RingbufferService. This Map is modified from operation thread when RingbufferContainer is created and also from destroyContainer, which may run directly on the user's thread when the ringbuffer is local on the member. Backport of #19788
The problem we want to fix is that when producers are fast, consumers can have a really hard time keeping- and catching up with them (they often fail to do so, even when they would be able to process fast enough). For example if your head is at 1, and you ask for 100 items from 0, the
Ringbuffer.readManyAsync
call just throws an exception, instead of returning what’s available. By the time the exception travels to the consumer, gets processed and a new request is made, the requested sequence number tends to be stale again and will result in yet another exception. Meanwhile data is being overwritten in theRingbuffer
and is lost to the consumer.The solution aims to remove the
StaleSequenceException
being thrown by theReadManyOperation
in such situations and just have it return the data that's available.We are able to do this for
Ringbuffer.readManyAsync
, because it returns aReadResultSet
carrying sequence numbers, thus allowing the client to notice potential sequence gaps and decide if it can tolerate them or not. On the other had we can't do the same forRingbuffer.readOne
because there is no way to observe sequence numbers in that case (it returns just an element from theRingbuffer
).