-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rm_stm: remove mem_state::last_end_tx #16285
Conversation
It looks like last_end_tx was added to fix a violation when the code had lazy catchup on an abort transaction. This field was added in commit [1] when the abort code looked like this [2] ``` if (_mem_state.last_end_tx < r.value().last_offset) { _mem_state.last_end_tx = r.value().last_offset; } // don't need to wait for apply because tx is already aborted on the // coordinator level - nothing can go wrong co_return tx_errc::none; } ``` This sort of seemed like an optimization where we could skip the wait and defer it to next command if needed. However the code changed later in this commit [3] where an explicit wait after abort added thus obviating the need for this field. Just looking at the code anywhere this field is updated, we immediately have a wait for state machine to catchup until that offset which means any subsequent wait for it will anyway be a no-op, hence removing it. This is a part of clean up before moving fields to producer_state. [1] redpanda-data@51f42c1 [2] https://github.com/rystsov/redpanda/blob/51f42c1ff6574a664f13f477be489387180c7c57/src/v/cluster/rm_stm.cc#L656 [3] redpanda-data@d8c5254
// catching up with all previous end_tx operations (commit | abort) | ||
// to avoid writing the same commit | abort marker twice | ||
if (_mem_state.last_end_tx >= model::offset{0}) { | ||
if (!co_await wait_no_throw( | ||
_mem_state.last_end_tx, model::timeout_clock::now() + timeout)) { | ||
vlog( | ||
_ctx_log.trace, | ||
"Can't catch up to abort pid:{} tx_seq:{}", | ||
pid, | ||
tx_seq.value_or(model::tx_seq(-1))); | ||
co_return tx_errc::stale; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if it is possible that the abort or commit control batch was replicated but not yet applied we update the _mem_state.last_end_tx
after call to raft::replicate
is finished but before we wait for the offset to be applied. I see that there is locking in place i.e. we grab tx_lock
whenever something is being done with producer transactions. In this case the wait is global i.e. it provides ordering between different producers. I do not know if that is desired and required, this is just an observation. If this ordering between producers is not relevant this is LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the ordering between producers doesn't matter because the state is segmented by producer, a producer and it's transaction lifecycle only cares about its state. The shared state is mostly tracking LSO, aborted transactions etc but those are not affected by this situation IMO.
It looks like last_end_tx was added to fix a violation when the code had lazy catchup on an abort transaction.
This field was added in commit [1] when the abort code looked like this [2]
This sort of seemed like an optimization where we could skip the wait and defer it to next command if needed.
However the code changed later in this commit [3] where an explicit wait after abort added thus obviating the need for this field.
Just looking at the code, anywhere this field is updated, we immediately have a wait for state machine to catchup until that offset which means any subsequent wait for it will anyway be a no-op, hence removing it.
This is a part of clean up before moving fields to producer_state.
[1] 51f42c1
[2] https://github.com/rystsov/redpanda/blob/51f42c1ff6574a664f13f477be489387180c7c57/src/v/cluster/rm_stm.cc#L656
[3] d8c5254
Backports Required
Release Notes