Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed tracking expected last offset of a follower #13495

Merged
merged 3 commits into from
Sep 20, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Sep 18, 2023

In Redpanda Raft implementation there may more than one
append_entries_request dispatched to the follower at the same time.
Leader tracks follower expected end offset to coordinate recovery_stm
and append_entries_stm and prevent delivering the same batches twice.
In classic raft implementation there is always only one append entries
request pending to the follower hence it is enough to update follower
state when processing append entries reply. We must track the expected
follower end before receiving response as the requests may already be in
flight.

Fixes: https://github.com/redpanda-data/core-internal/issues/752

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

Bug Fixes

  • fixed rare situation in which follower recovery stuck as the follower state was incorrectly updated

@mmaslankaprv mmaslankaprv marked this pull request as ready for review September 18, 2023 18:06
src/v/cluster/cluster_utils.cc Outdated Show resolved Hide resolved
src/v/raft/recovery_stm.cc Outdated Show resolved Hide resolved
src/v/raft/recovery_stm.cc Outdated Show resolved Hide resolved
src/v/raft/replicate_entries_stm.cc Outdated Show resolved Hide resolved
Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv changed the title r/consensus: replace last_sent_offset with inflight_offset fixed tracking expected last offset of a follower Sep 19, 2023
src/v/raft/recovery_stm.cc Outdated Show resolved Hide resolved
* requests that were not yet replied by the follower.
*/
idx.expected_log_end_offset = std::max(
idx.last_dirty_log_index, idx.expected_log_end_offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit suspicious that we are using the follower-side offset here without any term check (whereas in other places this is updated with a leader-side offset).

E.g. if both the follower and the leader have dirty_offset = 10 and term at offset 10 is different, but at offset 9 matches, and we send an append_entries with 9, the reply will be successful and expected_log_end_offset will be set to 10, but we are not really ready to send regular replicate append_entries yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done only for successful response here, in this case follower and leader logs up to the point indicated by append_entries_response match perfectly

In Redpanda Raft implementation there may more than one
`append_entries_request` dispatched to the follower at the same time.
Leader tracks follower expected end offset to coordinate `recovery_stm`
and `append_entries_stm` and prevent delivering the same batches twice.
In classic raft implementation there is always only one append entries
request pending to the follower hence it is enough to update follower
state when processing append entries reply. We must track the expected
follower end before receiving response as the requests may already be in
flight.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
When replying to stale append entries request a request that was already
delivered to the follower we must clamp returned dirty offset not to
allow Raft group leader to reason about offsets which are not yet know
to be matching between leader and followers. This fixes situation in
which follower `match_index` may updated before its log actually matches
leader.

Example:

(term,offset) - represent a single entry

Leader log:
```
(1,0),(1,1),(1,2),(3,3),(3,4),(3,5)

committed_offset: 2
```

Follower log:
```
(1,0),(1,1),(1,2),(2,3),(2,4)

committed_offset: 2
```

There is a term inconsistency starting at offset `3`

If follower would receive an append entries request with

prev_log_index=1
prev_log_term=1

The request would result in a successful reply as `prev_log_term` and
matches the entry at offset 1, however follower log can not be truncated
so the follower will reply with success. The success reply will 'lie' to
the leader that the follower log matches leader log.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Copy link
Contributor

@ztlpn ztlpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed that the patch fixes the original issue

@mmaslankaprv
Copy link
Member Author

ci failure: #13491

@mmaslankaprv mmaslankaprv merged commit 54c0d86 into redpanda-data:dev Sep 20, 2023
23 of 25 checks passed
@mmaslankaprv mmaslankaprv deleted the fix-internal-752 branch September 20, 2023 07:16
@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-13495-v23.1.x-197 remotes/upstream/v23.1.x
git cherry-pick -x fbf71007041db876fdd3f16096caf5a9ce2f6e76 299c32163da657d94fd48cb2050501fd2b370f52 d0e45c0baf88e2f0c3121a919d2051ef4e12fd8e

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-13495-v23.2.x-379 remotes/upstream/v23.2.x
git cherry-pick -x fbf71007041db876fdd3f16096caf5a9ce2f6e76 299c32163da657d94fd48cb2050501fd2b370f52 d0e45c0baf88e2f0c3121a919d2051ef4e12fd8e

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v22.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-13495-v22.3.x-107 remotes/upstream/v22.3.x
git cherry-pick -x fbf71007041db876fdd3f16096caf5a9ce2f6e76 299c32163da657d94fd48cb2050501fd2b370f52 d0e45c0baf88e2f0c3121a919d2051ef4e12fd8e

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants