Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r/consensus: fixed handling replicate result when term changed #3003

Merged

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Nov 18, 2021

Fixed problem that may lead to situation in which replication might be
claimed successful but replicated entries were truncated. This might
lead to consistency violation.

Consider an example:

F1 - fiber processing replicate in replicate entries stm
F2 - fiber processing append entries request

F1 - append to leader log
F1 - dispatch requests to followers and flush on leader
F1 - wait for dispatches and flush
F2 - processing new append entries with term greater than current
F2 - grab oplock
F2 - new term detected, step down, update term
F1 - check stop condition, detected term update
F1 - check term of self appended entry matches stored term -> TRUE
F1 - claim successful replication
F2 - truncate log, new leader log doesn't match <- violation

The problem was caused by the fact that raft::replicate_entries_stm did not wait for committed index update before evaluation replication results. This way a result might have been calculated based on the incorrect assumptions. Waiting for actual update of committed index before proceeding to result calculation give us a guarantee that all the state was update to decide if replication was successful.

Signed-off-by: Michal Maslanka michal@vectorized.io

Fixed problem that may lead to situation in which replication might be
claimed successful but replicated entries were truncated. This might
lead to consistency violation.

Consider an example:
```
F1 - fiber processing replicate in replicate entries STM
F2 - fiber processing append entries request

F1 - append to leader log
F1 - dispatch requests to followers and flush on leader
F1 - wait for dispatches and flush
F2 - processing new append entries with term greater than current
F2 - grab oplock
F2 - new term detected, step down, update term
F1 - check stop condition, detected term update
F1 - check term of self appended entry matches stored term -> TRUE
F1 - claim successful replication
F2 - truncate log, new leader log doesn't match <- violation
```

The problem was caused by the fact that `raft::replicate_entries_stm`
did not wait for committed index update before evaluation replication
results. This way a result might have been calculated based on the
incorrect assumptions. Waiting for actual update of committed index
before proceeding to result calculation give us a guarantee that all the
state was update to decide if replication was successful.

Signed-off-by: Michal Maslanka <michal@vectorized.io>
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. would be good to get a review from @rystsov

Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@mmaslankaprv mmaslankaprv merged commit 77fa757 into redpanda-data:dev Nov 18, 2021
@mmaslankaprv mmaslankaprv mentioned this pull request Nov 18, 2021
mmaslankaprv added a commit that referenced this pull request Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants