New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken watermark counters in unordered mapUsingServiceAsync [HZ-1928] #23271
Fix broken watermark counters in unordered mapUsingServiceAsync [HZ-1928] #23271
Conversation
…d mapUsingServiceAsync hazelcast#23245
7b1146a
to
cad71e7
Compare
// but this is not required. | ||
|| (!ordered && actual.equals(asList("a-1", "a-1", "a-1", wm(10)))) | ||
// after snapshot restore watermark may be duplicated | ||
|| (!ordered && actual.equals(asList("a-1", "a-1", wm(10), "a-1", wm(10)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viliam-durina I think that watermark duplication is allowed. Is that true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not allowed, they must be strictly monotonic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watermark order with regards to items is preserved.
After a lot of debugging I confirmed that WM duplication is partially caused by the fact that we do not store lastEmittedWms
in the snapshot. So if at the time of getting snapshot we have:
- WM that was already emitted and no new WM was received
- inflight item
then, after restore we get:
- that old, already emitted, WM in
lastReceivedWms
- inflight item that assumes that it was received before any snapshot (
Long.MIN_VALUE
) even though to be strict it should be related to the WM (lastReceivedWms
field is updated after restored items are sent for processing) - when inflight item completes, it triggers emission of WM from
lastReceivedWms
(check inwatermarkCount.isEmpty() && lastReceivedWms[i] > lastEmittedWms[i]
) which seems to make sense at least for other cases
I think this actually may be by design that given WM can be repeated after snapshot because, as the comment says:
we restart at the oldest WM any instance was at the time of snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the duplication does not occur during single execution, duplicated WM is emitted after restore from snapshot. But because of "we restart at the oldest WM any instance was at the time of snapshot" it probably can go backwards.
Still, emitting WM after unrelated item seems at least counter-intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debugged it too and yes, the observed output is legal. The duplicate WMs aren't emitted from a single instance of a processor (that wouldn't be legal), but from the processor after restore.
We save a simplified state to the snapshot. Before saving, we know exactly which item was received before which WM. But we can't save it exactly like this, because after restore, the WM can go back in at-least-once mode, and items can be re-partitioned, and we can't re-partition the watermarkCounts
. I think we don't even have to save lastReceivedWm
at all, we can just rely on the new WMs received after restore, but it's not too important to change, I can be wrong here.
...ast/src/main/java/com/hazelcast/jet/impl/processor/AsyncTransformUsingServiceUnorderedP.java
Outdated
Show resolved
Hide resolved
for events received before it were already sent. | ||
|
||
Snapshot contains in-flight elements at the time of taking the snapshot. | ||
They are replayed when state is restored from the snapshot, so we get only at-least-once guarantee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guarantee is still exactly-once, but under the hood the processing is retried at least once, but that doesn't break the guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated comment
Kudos, SonarCloud Quality Gate passed! |
] [5.2.z] (#23272) mapUsingServiceAsync in unordered mode used wrong condition to check if the processing is complete and wrong number of currently processed items to propagate watermarks. This could lead to lost items when the pipeline was completed and to crashes during processing of streams containing the same item multiple times. Fixes #23245 Backport of: #23271 Co-authored-by: Viliam Durina <viliam@hazelcast.com>
…928] (hazelcast#23271) mapUsingServiceAsync in unordered mode used wrong condition to check if the processing is complete and wrong number of currently processed items to propagate watermarks. This could lead to lost items when the pipeline was completed and to crashes during processing of streams containing the same item multiple times. Fixes hazelcast#23245 Co-authored-by: Viliam Durina <viliam@hazelcast.com> (cherry picked from commit 2bec372)
mapUsingServiceAsync in unordered mode used wrong condition to check if the processing is complete and wrong number of currently processed items to propagate watermarks. This could lead to lost items when the pipeline was completed and to crashes during processing of streams containing the same item multiple times.
Fixes #23245
Checklist:
Team:
,Type:
,Source:
,Module:
) and Milestone setAdd to Release Notes
orNot Release Notes content
set