Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.3.x] rm_stm/idempotency: fix the producer lock scope #16749

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #16706

In case of a replication error of current sequence, the code issues a
manual leader step down to prevent the subsequent requests making
progress as that violates idempotency guarantees.

This is done by holding a mutex while the request is in progress. The
mutex is incorrectly released before issuing a step down in such cases
which may theoretically let other requests make progress before step
down is actually issued, the race sequence looks like this

seq=5 replication_error
seq=6 makes progress
seq=5 issues a stepdown

This bug was identified by just eyeballing the code but couldn't be
verified due to lack of trace logs in many partitions test. Seems like
something that should be tightened regardless.

Deployed the patch on a 3 node cluster with 500MB/s OMB run, no
noticeable perf changes.

(cherry picked from commit def3776)
@vbotbuildovich vbotbuildovich added this to the v23.3.x-next milestone Feb 27, 2024
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Feb 27, 2024
@piyushredpanda
Copy link
Contributor

Failures: 14139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants