Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Journal consumer might never catch up #11895

Closed
cangencer opened this issue Nov 30, 2017 · 5 comments
Closed

Event Journal consumer might never catch up #11895

cangencer opened this issue Nov 30, 2017 · 5 comments

Comments

@cangencer
Copy link
Contributor

@cangencer cangencer commented Nov 30, 2017

Currently when reading from event journal, the reader throws StaleSequenceException when the consumer is behind the current head of the journal. The exception contains the new head of the journal, however by the time the client gets to read again this new head can be stale again. It can mean that the reader is never able to catch up once it falls behind.

Another mechanism should be in place to detect missing items and the reader should be able to catch up eventually.

@pveentjer
Copy link
Member

@pveentjer pveentjer commented Dec 1, 2017

Probably best to make a new operation 'continue listen to most recent head' so that the head doesn't need to be passed explicitly.

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Dec 1, 2017

As it's private API, the behaviour will be changed to not throw an exception or a boolean parameter will be added to signify this. We can mostly keep the existing operation as the response contains enough information to determine how many items were missed which is relevant to the use case.

@cangencer
Copy link
Contributor Author

@cangencer cangencer commented Dec 11, 2017

I noticed something else: if you try to subscribe with an offset which is greater than the current tail, you get IllegalStateException. This can happen when a Jet job restarts and some of the partitions have been lost, and it will try to restart from an offset which doesn't exist. What should happen in this case? Intuitively it feels to me it should start from the latest available offset.

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Dec 12, 2017

@cangencer as it is private API, we can alter it to whatever suits your needs. Jet does not support rolling upgrades, right? It does not support having a cluster of hazelcast members with different versions?

@cangencer
Copy link
Contributor Author

@cangencer cangencer commented Dec 14, 2017

Yes there's no need to support rolling upgrades.

@taburet taburet added Team: Core and removed Team: Core labels Jan 22, 2018
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Feb 26, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
lostCount field indicating how many events were lost.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Feb 28, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
lostCount field indicating how many events were lost.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Feb 28, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
nextSeq field indicating the sequence from where further reads can
continue.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Feb 28, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
nextSeq field indicating the sequence from where further reads can
continue.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Feb 28, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
nextSeq field indicating the sequence from where further reads can
continue.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Mar 8, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
lostCount field indicating how many events were lost.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Mar 8, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
nextSeq field indicating the sequence from where further reads can
continue.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Mar 8, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
nextSeq field indicating the sequence from where further reads can
continue.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Mar 8, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
nextSeq field indicating the sequence from where further reads can
continue.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Mar 8, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
lostCount field indicating how many events were lost.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Mar 12, 2018
Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
lostCount field indicating how many events were lost.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: hazelcast#11895
mmedenjak added a commit that referenced this issue Mar 14, 2018
* Tolerate event journal reads of stale and future items

Previously when the reader requested an event from the event journal
which was already overwritten, the reader would get a
StaleSequenceException. Now we tolerate reading items with stale
sequences and return the oldest events instead, together with a
lostCount field indicating how many events were lost.
Also, allowed the reader to request a future, non existent sequence.
This can happen when some of the partitions have been lost, and it will
try to read from an sequence which doesn't exist. In this case, the read
silently returns from the event following the newest event (i.e. always
blocks and waits for the next event in the journal).
Also, reformatted some javadoc to adhere to a ~72 width limit and
removed some unnecessary rolling upgrade checks.

Fixes: #11895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.