Skip to content

Commit

Permalink
raft: drop assert in server_impl::apply_snapshot for a condition that…
Browse files Browse the repository at this point in the history
… may happen

server_impl::apply_snapshot() assumes that it cannot receive a snapshots
from the same host until the previous one is handled and usually this is
true since a leader will not send another snapshot until it gets
response to a previous one. But it may happens that snapshot sending
RPC fails after the snapshot was sent, but before reply is received
because of connection disconnect. In this case the leader may send
another snapshot and there is no guaranty that the previous one was
already handled, so the assumption may break.

Drop the assert that verifies the assumption and return an error in this
case instead.

Fixes: #15222

Message-ID: <ZO9JoEiHg+nIdavS@scylladb.com>
  • Loading branch information
Gleb Natapov authored and denesb committed Sep 1, 2023
1 parent 91cc544 commit 55f047f
Showing 1 changed file with 12 additions and 7 deletions.
19 changes: 12 additions & 7 deletions raft/server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1149,14 +1149,19 @@ void server_impl::send_snapshot(server_id dst, install_snapshot&& snp) {
}

future<snapshot_reply> server_impl::apply_snapshot(server_id from, install_snapshot snp) {
_fsm->step(from, std::move(snp));
// Only one snapshot can be received at a time from each node
assert(! _snapshot_application_done.contains(from));
snapshot_reply reply{_fsm->get_current_term(), false};
try {
reply = co_await _snapshot_application_done[from].get_future();
} catch (...) {
logger.error("apply_snapshot[{}] failed with {}", _id, std::current_exception());
// Previous snapshot processing may still be running if a connection from the leader was broken
// after it sent install_snapshot but before it got a reply. It may case the snapshot to be resent
// and it may arrive before the previous one is processed. In this rare case we return error and the leader
// will try again later (or may be not if the snapshot that is been applied is recent enough)
if (!_snapshot_application_done.contains(from)) {
_fsm->step(from, std::move(snp));

try {
reply = co_await _snapshot_application_done[from].get_future();
} catch (...) {
logger.error("apply_snapshot[{}] failed with {}", _id, std::current_exception());
}
}
co_return reply;
}
Expand Down

0 comments on commit 55f047f

Please sign in to comment.