Skip to content

Commit

Permalink
message: messaging_service: fix topology_ignored for pending endpoint…
Browse files Browse the repository at this point in the history
…s in get_rpc_client

`get_rpc_client` calculates a `topology_ignored` field when creating a
client which says whether the client's endpoint had topology information
when topology was created. This is later used to check if that client
needs to be dropped and replaced with a new client which uses the
correct topology information.

The `topology_ignored` field was incorrectly calculated as `true` for
pending endpoints even though we had topology information for them. This
would lead to unnecessary drops of RPC clients later. Fix this.

Remove the default parameter for `with_pending` from
`topology::has_endpoint` to avoid similar bugs in the future.

Apparently this fixes scylladb#11780. The verbs used by decommission operation
use RPC client index 1 (see `do_get_rpc_client_idx` in
message/messaging_service.cc). From local testing with additional
logging I found that by the time this client is created (i.e. the first
verb in this group is used), we already know the topology. The node is
pending at that point - hence the bug would cause us to assume we don't
know the topology, leading us to dropping the RPC client later, possibly
in the middle of a decommission operation.

Fixes: scylladb#11780
  • Loading branch information
kbr-scylla committed Nov 16, 2022
1 parent 840be34 commit 1bd2471
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 4 deletions.
2 changes: 1 addition & 1 deletion locator/token_metadata.cc
Expand Up @@ -562,7 +562,7 @@ const std::unordered_map<inet_address, host_id>& token_metadata_impl::get_endpoi
}

bool token_metadata_impl::is_member(inet_address endpoint) const {
return _topology.has_endpoint(endpoint);
return _topology.has_endpoint(endpoint, topology::pending::no);
}

void token_metadata_impl::add_bootstrap_token(token t, inet_address endpoint) {
Expand Down
5 changes: 3 additions & 2 deletions locator/token_metadata.hh
Expand Up @@ -67,9 +67,10 @@ public:
void remove_endpoint(inet_address ep);

/**
* Returns true iff contains given endpoint
* Returns true iff contains given endpoint.
* Excludes pending endpoints if `with_pending == pending::no`.
*/
bool has_endpoint(inet_address, pending with_pending = pending::no) const;
bool has_endpoint(inet_address, pending with_pending) const;

const std::unordered_map<sstring,
std::unordered_set<inet_address>>&
Expand Down
4 changes: 3 additions & 1 deletion message/messaging_service.cc
Expand Up @@ -731,7 +731,9 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge
std::optional<bool> topology_status;
auto has_topology = [&] {
if (!topology_status.has_value()) {
topology_status = _token_metadata ? _token_metadata->get()->get_topology().has_endpoint(id.addr) : false;
topology_status = _token_metadata
? _token_metadata->get()->get_topology().has_endpoint(id.addr, locator::topology::pending::yes)
: false;
}
return *topology_status;
};
Expand Down

0 comments on commit 1bd2471

Please sign in to comment.