Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
gossiper: call
on_remove
subscriptions in the foreground in `remove…
…_endpoint` `gossiper::remove_endpoint` performs `on_remove` callbacks for all endpoint change subscribers. This was done in the background (with a discarded future) due the following reason: ``` // We can not run on_remove callbacks here because on_remove in // storage_service might take the gossiper::timer_callback_lock ``` however, `gossiper::timer_callback_lock` no longer exists, it was removed in 19e8c14. Today it is safe to perform the `storage_service::on_remove` callback in the foreground -- it's only taking the token metadata lock, which is also taken and then released earlier by the same fiber that calls `remove_endpoint` (i.e. `storage_service::handle_state_normal`). Furthermore, we want to perform it in the foreground. First, there already was a comment saying: ``` // do subscribers first so anything in the subscriber that depends on gossiper state won't get confused ``` it's not too precise, but it argues that subscriber callbacks should be serialized with the rest of `remove_endpoint`, not done concurrently with it. Second, we now have a concrete reason to do them in the foreground. In issue #14646 we observed that the subcriber callbacks are racing with the bootstrap procedure. Depending on scheduling order, if `storage_service::on_remove` is called too late, a bootstrapping node may try to wait for a node that was earlier replaced to become UP which is incorrect. By putting the `on_remove` call into foreground of `remove_endpoint`, we ensure that a node that was replaced earlier will not be included in the set of nodes that the bootstrapping node waits for (because `storage_service::on_remove` will clear it from `token_metadata` which we use to calculate this set of nodes). We also get rid of an unnecessary `seastar::async` call. Fixes #14646 Closes #14741
- Loading branch information