Skip to content

Commit

Permalink
gossiper: call on_remove subscriptions in the foreground in `remove…
Browse files Browse the repository at this point in the history
…_endpoint`

`gossiper::remove_endpoint` performs `on_remove` callbacks for all
endpoint change subscribers. This was done in the background (with a
discarded future) due the following reason:
```
    // We can not run on_remove callbacks here because on_remove in
    // storage_service might take the gossiper::timer_callback_lock
```
however, `gossiper::timer_callback_lock` no longer exists, it was
removed in 19e8c14.

Today it is safe to perform the `storage_service::on_remove` callback in
the foreground -- it's only taking the token metadata lock, which is
also taken and then released earlier by the same fiber that calls
`remove_endpoint` (i.e.  `storage_service::handle_state_normal`).

Furthermore, we want to perform it in the foreground. First, there
already was a comment saying:
```
     // do subscribers first so anything in the subscriber that depends on gossiper state won't get confused
```
it's not too precise, but it argues that subscriber callbacks should be
serialized with the rest of `remove_endpoint`, not done concurrently
with it.

Second, we now have a concrete reason to do them in the foreground. In
issue #14646 we observed that the subcriber callbacks are racing with
the bootstrap procedure. Depending on scheduling order, if
`storage_service::on_remove` is called too late, a bootstrapping node
may try to wait for a node that was earlier replaced to become UP which
is incorrect. By putting the `on_remove` call into foreground of
`remove_endpoint`, we ensure that a node that was replaced earlier will
not be included in the set of nodes that the bootstrapping node waits
for (because `storage_service::on_remove` will clear it from
`token_metadata` which we use to calculate this set of nodes).

We also get rid of an unnecessary `seastar::async` call.

Fixes #14646

Closes #14741
  • Loading branch information
kbr-scylla authored and tgrabiec committed Jul 18, 2023
1 parent 8bc42f5 commit bfaac51
Showing 1 changed file with 6 additions and 8 deletions.
14 changes: 6 additions & 8 deletions gms/gossiper.cc
Expand Up @@ -649,15 +649,13 @@ future<> gossiper::force_remove_endpoint(inet_address endpoint) {

future<> gossiper::remove_endpoint(inet_address endpoint) {
// do subscribers first so anything in the subscriber that depends on gossiper state won't get confused
// We can not run on_remove callbacks here becasue on_remove in
// storage_service might take the gossiper::timer_callback_lock
(void)seastar::async([this, endpoint] {
_subscribers.for_each([endpoint] (shared_ptr<i_endpoint_state_change_subscriber> subscriber) {
try {
co_await _subscribers.for_each([endpoint] (shared_ptr<i_endpoint_state_change_subscriber> subscriber) {
return subscriber->on_remove(endpoint);
}).get();
}).handle_exception([] (auto ep) {
logger.warn("Fail to call on_remove callback: {}", ep);
});
});
} catch (...) {
logger.warn("Fail to call on_remove callback: {}", std::current_exception());
}

if(_seeds.contains(endpoint)) {
build_seeds_list();
Expand Down

0 comments on commit bfaac51

Please sign in to comment.