[v24.2.x] rm_stm: fix a race during partition shutdown#24938
Merged
lf-rep merged 2 commits intoredpanda-data:v24.2.xfrom Jan 27, 2025
Merged
[v24.2.x] rm_stm: fix a race during partition shutdown#24938lf-rep merged 2 commits intoredpanda-data:v24.2.xfrom
lf-rep merged 2 commits intoredpanda-data:v24.2.xfrom
Conversation
Currently apply fiber can continue to run (and possibly add new
producers to _producers map) as the state machine is shutting down.
This can manifest in weird crashes as the clean up destroys the
_producers without deregistering properly.
First manifestation
Iterator invalidation in reset_producers() as it loops thru _producers
with scheduling points while state machine apply adds new producers
future<> rm_stm::stop() {
.....
co_await _gate.close();
co_await reset_producers(); <---- interferes with state machine apply
_metrics.clear();
co_await raft::persisted_stm<>::stop();
.....
Second manifestation
Crashes: every producer creation registers with an intrusive list in
producer_state_manager using a safe link. Now, if a new producer is
registered after reset_producers, the map is destroyed in the state
machine destructor without unlinking from the producer_state_manager
and the safe_link fires an assert.
This bug has been there forever from what I can tell, perhaps got
worsened with recent changes that added more scheduling points in the
surrounding code.
(cherry picked from commit fb57ccd)
(cherry picked from commit 873b282)
bashtanov
approved these changes
Jan 27, 2025
Collaborator
Author
Retry command for Build#61213please wait until all jobs are finished before running the slash command |
Collaborator
Author
CI test resultstest results on build#61213
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of PR #24936