Deadlock in raft append_entries fiber #3528

ztlpn · 2022-01-18T13:34:11Z

The following deadlock can happen if there is at least one persisted_stm for a partition (i.e. when transactions or shadow indexing is enabled) when log eviction happens concurrently with creation of a new segment during append:

fiber1:

before log prefix truncation log_eviction_stm fiber waits for persisted_stms to ensure that snapshots exist at least at truncation offset
before creating snapshots, state machines wait until records up to truncation offset get applied
state machine apply fibers in turn wait for raft committed index notifications

fiber2:

append_entries fiber appends new entries to the log
log roll happens, new segment gets created
persisted_stm::make_snapshot() is called
it tries to take persisted_stm::_op_lock

Deadlock happens when persisted_stm::_op_lock is taken by persisted_stm::ensure_snapshot_exists that waits for notifications that never arrive.

There are several ways "applied to state machine" or "committed index increased" notifications can be lost:

there is no notification in do_hydrate_snapshot if _commit_index increases due to installed snapshot
there is no notification if the state machine offset increases due to state_machine::handle_eviction
_commit_index notification may get lost in consensus::maybe_update_follower_commit_idx if _flushed_offset hasn't yet reached the new _commit_index (we do call consensus::refresh_commit_index before persisted_stm::ensure_snapshot_exists but it is a no-op for followers).
maybe others

The problem is easily reproduced with the following setup:

create a 3-node redpanda cluster
create a topic with small max segment size and small retention.bytes setting: rpk topic create foo -p 1 -r 3 -c 'retention.bytes=1500000' -c 'segment.bytes=1000000'
apply produce load
start/stop a node several times

To fix the problem we can fix missed notifications one by one, but a more robust fix would be to remove persisted_stm::make_snapshot from the critical append path. It is strictly an optimization and can be done either in the state_machine::apply fiber or in its own fiber after the state machine has processed enough records.

cc @rystsov

The text was updated successfully, but these errors were encountered:

mmaslankaprv · 2022-01-18T14:41:53Z

One of the points in lost committed index update notifications doesn't seems right as the notification will be dispatched during subsequent heartbeat i.e.:

_commit_index notification may get lost in consensus::maybe_update_follower_commit_idx if _flushed_offset hasn't yet reached the new _commit_index (we do call consensus::refresh_commit_index before persisted_stm::ensure_snapshot_exists but it is a no-op for followers).

ztlpn · 2022-01-18T14:50:06Z

One of the points in lost committed index update notifications doesn't seems right as the notification will be dispatched during subsequent heartbeat

This won't happen if the append_entries fiber is blocked (and thus unable to process heartbeats) :( That's the reason I think it is important to remove persisted_stm::make_snapshot calls from the append_entries fiber so that it can make progress and dispatch the notifications even if they get lost at some point.

mmaslankaprv · 2022-01-18T19:44:30Z

I think i was able to fix this, going to write a ducktape test and send PR tomorrow.

ztlpn added kind/bug Something isn't working area/raft labels Jan 18, 2022

mmaslankaprv self-assigned this Jan 18, 2022

ztlpn mentioned this issue Jan 19, 2022

Fixed deadlock in persisted STM #3537

Merged

mmaslankaprv closed this as completed in #3537 Jan 21, 2022

mmaslankaprv mentioned this issue Jan 21, 2022

Backport #3537 #3566

Merged

ztlpn mentioned this issue Feb 16, 2022

Make id_allocator less likely to stuck #3810

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock in raft append_entries fiber #3528

Deadlock in raft append_entries fiber #3528

ztlpn commented Jan 18, 2022 •

edited

mmaslankaprv commented Jan 18, 2022

ztlpn commented Jan 18, 2022 •

edited

mmaslankaprv commented Jan 18, 2022

Deadlock in raft append_entries fiber #3528

Deadlock in raft append_entries fiber #3528

Comments

ztlpn commented Jan 18, 2022 • edited

mmaslankaprv commented Jan 18, 2022

ztlpn commented Jan 18, 2022 • edited

mmaslankaprv commented Jan 18, 2022

ztlpn commented Jan 18, 2022 •

edited

ztlpn commented Jan 18, 2022 •

edited