Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
storage_service: wait for NORMAL state handler before
setup_group0()
`handle_state_normal` may drop connections to the handled node. This causes spurious failures if there's an ongoing concurrent operation. This problem was already solved twice in the past in different contexts: first in 5363616, then in 79ee381. Time to fix it for the third time. Now we do this right after enabling gossiping, so hopefully it's the last time. This time it's causing snapshot transfer failures in group 0. Although the transfer is retried and eventually succeeds, the failed transfer is wasted work and causes an annoying ERROR message in the log which dtests, SCT, and I don't like. The fix is done by moving the `wait_for_normal_state_handled_on_boot()` call before `setup_group0()`. But for the wait to work correctly we must first ensure that gossiper sees an alive node, so we precede it with `wait_for_live_node_to_show_up()` (before this commit, the call site of `wait_for_normal_state_handled_on_boot` was already after this wait). Also ensure that `wait_for_normal_state_handled_on_boot()` doesn't hang in raft-topology mode by adding `_normal_state_handled_on_boot.insert(endpoint);` in the raft-topology branch of `handle_state_normal`. Fixes: scylladb#12972
- Loading branch information