-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed tracking expected last offset of a follower #13495
fixed tracking expected last offset of a follower #13495
Conversation
Signed-off-by: Michal Maslanka <michal@redpanda.com>
3838046
to
b3d8ba8
Compare
b3d8ba8
to
b80bb45
Compare
b80bb45
to
13d6188
Compare
* requests that were not yet replied by the follower. | ||
*/ | ||
idx.expected_log_end_offset = std::max( | ||
idx.last_dirty_log_index, idx.expected_log_end_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit suspicious that we are using the follower-side offset here without any term check (whereas in other places this is updated with a leader-side offset).
E.g. if both the follower and the leader have dirty_offset = 10 and term at offset 10 is different, but at offset 9 matches, and we send an append_entries with 9, the reply will be successful and expected_log_end_offset will be set to 10, but we are not really ready to send regular replicate append_entries yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done only for successful response here, in this case follower and leader logs up to the point indicated by append_entries_response
match perfectly
In Redpanda Raft implementation there may more than one `append_entries_request` dispatched to the follower at the same time. Leader tracks follower expected end offset to coordinate `recovery_stm` and `append_entries_stm` and prevent delivering the same batches twice. In classic raft implementation there is always only one append entries request pending to the follower hence it is enough to update follower state when processing append entries reply. We must track the expected follower end before receiving response as the requests may already be in flight. Signed-off-by: Michal Maslanka <michal@redpanda.com>
When replying to stale append entries request a request that was already delivered to the follower we must clamp returned dirty offset not to allow Raft group leader to reason about offsets which are not yet know to be matching between leader and followers. This fixes situation in which follower `match_index` may updated before its log actually matches leader. Example: (term,offset) - represent a single entry Leader log: ``` (1,0),(1,1),(1,2),(3,3),(3,4),(3,5) committed_offset: 2 ``` Follower log: ``` (1,0),(1,1),(1,2),(2,3),(2,4) committed_offset: 2 ``` There is a term inconsistency starting at offset `3` If follower would receive an append entries request with prev_log_index=1 prev_log_term=1 The request would result in a successful reply as `prev_log_term` and matches the entry at offset 1, however follower log can not be truncated so the follower will reply with success. The success reply will 'lie' to the leader that the follower log matches leader log. Signed-off-by: Michal Maslanka <michal@redpanda.com>
13d6188
to
d0e45c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed that the patch fixes the original issue
ci failure: #13491 |
/backport v23.2.x |
/backport v23.1.x |
/backport v22.3.x |
Failed to create a backport PR to v23.1.x branch. I tried:
|
Failed to create a backport PR to v23.2.x branch. I tried:
|
Failed to create a backport PR to v22.3.x branch. I tried:
|
[v23.2.x] Backport of #13495
In Redpanda Raft implementation there may more than one
append_entries_request
dispatched to the follower at the same time.Leader tracks follower expected end offset to coordinate
recovery_stm
and
append_entries_stm
and prevent delivering the same batches twice.In classic raft implementation there is always only one append entries
request pending to the follower hence it is enough to update follower
state when processing append entries reply. We must track the expected
follower end before receiving response as the requests may already be in
flight.
Fixes: https://github.com/redpanda-data/core-internal/issues/752
Backports Required
Release Notes
Bug Fixes