Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] Fix aborted txs being visible #11309

Merged
merged 3 commits into from
Jun 14, 2023

Conversation

rystsov
Copy link
Contributor

@rystsov rystsov commented Jun 8, 2023

Backport of PR #10671

Fixes #10819

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

rystsov added 3 commits June 8, 2023 13:52
When we treat last_visible_index as LSO there is a chance of it being
ahead of last applied offset so we can't conclude that the stm doesn't
have ongoing txs

fixes redpanda-data#10097

(cherry picked from commit 166119f)
Transactions in kafka protocol are stateful: the processing of the
requests depends on the previous commands executed by the same or
even different producer. It makes the situations when the replica-
tion fails with the indecisive errors such as timeout dangerous
because the true state is unknown.

At the same time we have a fundamental invariant enforcing that a
new leader start processing the request only when it has seen all
the messages written by the previous leader.

Switching rm_stm to use this invariant to handle the replication
uncertainties by doing a forced step down or errors.

(cherry picked from commit 996c4c7)
In general case LSO can't go beyond applied offset but it reduces
acks=1 e2e scenarios because a record only becomes visible when its
offset is behind LSO but LSO depends on applied offset which depends
on a record being fully replicated so reading acks=1 is as slow as
reading acks=all.

A leader knows about the all inflight replication requests so it
may return LSO beyond applied offset when it knows that it can't
be affected by the inflight requests.

(cherry picked from commit 3d1ef4e)
@rystsov rystsov merged commit 7425723 into redpanda-data:v23.1.x Jun 14, 2023
@BenPope BenPope added this to the v23.1.13 milestone Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants