New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
token_metadata: switch to host_id #15903
token_metadata: switch to host_id #15903
Conversation
locator/token_metadata.cc
Outdated
@@ -95,9 +95,9 @@ class token_metadata_impl final { | |||
struct shallow_copy {}; | |||
public: | |||
token_metadata_impl(shallow_copy, const token_metadata_impl& o) noexcept | |||
: _topology(topology::config{}) | |||
: _topology(topology::config{}, topology::key_kind::inet_address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add contents to the commit message (locator/topology: add key_kind parameter
)
about the motivation and plans for this feature - is it transient, only for the purpose of transitioning token_metadata to using host_id or is it meant to stay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add contents to the commit message
Sure, I'll add the proper commit messages later, I just skipped them for this early highly experimental stage.
about the motivation and plans for this feature - is it transient, only for the purpose of transitioning token_metadata to using host_id or is it meant to stay?
It's solely for the transitioning, when this PR is finished the key kind would be key_kind::host_id
for everything and I'll just drop it.
locator/topology.hh
Outdated
@@ -167,7 +172,7 @@ public: | |||
|
|||
bool operator==(const config&) const = default; | |||
}; | |||
topology(config cfg); | |||
topology(config cfg, key_kind k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make it part of topology::config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's temporary anyway
locator/token_metadata.cc
Outdated
@@ -237,7 +237,7 @@ class token_metadata_impl final { | |||
static range<dht::token> interval_to_range(boost::icl::interval<token>::interval_type i); | |||
|
|||
public: | |||
future<> update_topology_change_info(dc_rack_fn& get_dc_rack); | |||
future<> update_topology_change_info(dc_rack_fn<gms::inet_address>& get_dc_rack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mental note: consider passing a variant<gms::inet_address, locator::host_id>
as an alternative.
I'm not sure if that would be simpler overall, but it seems more straight forward.
Another alternative is a class instead of a function, with get_dc_rack(gms::inet_address)
and get_dc_rack(locator::host_id)
methods.
4026819
to
9c38bcc
Compare
🔴 CI State: FAILURE❌ - Build Build Details:
|
9c38bcc
to
56f7e7b
Compare
🔴 CI State: FAILURE❌ - Build Build Details:
|
🔴 CI State: FAILURE❌ - Build Build Details:
|
0ac9f39
to
ec2358a
Compare
🔴 CI State: ABORTED✅ - Build Failed Tests (871/21438):
Build Details:
|
ec2358a
to
86c1634
Compare
new version:
|
The check *ep == endpoint is needed when a node changes its IP - on_change can be called by the gossiper for old IP as part of its removal, after handle_state_normal has already been called for the new one. Without the check, the do_update_system_peers_table call overwrites the IP back to its old value. Previously token_metadata used endpoint as the key and the *ep == endpoint condition was followed from the is_normal_token_owner check. Now with host_id-s we have an additional layer of indirection, and we need *ep == endpoint check to get the same end condition. This case was revealed by the dtest update_cluster_layout_tests.py::TestUpdateClusterLayout::test_change_node_ip
The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map method was used only here. It's inlined in this commit since it's too specific and incurs the overhead of creating an intermediate map.
In this commit we change the return type of storage_service::get_token_metadata_ptr() to token_metadata2_ptr and fix whatever breaks. All the boost and topology tests pass with this change.
In this commit we replace token_metadata with token_metadata2 in the erm interface and field types. To accommodate the change some of strategy-related methods are also updated. All the boost and topology tests pass with this change.
database::get_token_metadata() is switched to token_metadata2. get_all_ips method is added to the host_id-based token_metadata, since its convenient and will be used in several places. It returns all current nodes converted to inet_address by means of the topology contained within token_metadata. hint_sender::can_send: if the node has already left the cluster we may not find its host_id. This case is handled in the same way as if it's not a normal token owner - we simply send a hint to all replicas.
Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.
…self This used to work before in replace-with-same-ip scenario, but with host_id-s it's no longer relevant. base_token_metadata has been removed from topology_change_info because the conditions needed for its creation are no longer met.
Make host_id parameter non-optional and move it to the beginning of the arguments list. Delete unused overloads of add_or_update_endpoint. Delete unused overload of token_metadata::update_topology with inet_address argument.
The overload was used only in tests.
e92af35
to
9d93a51
Compare
new version:
|
🟢 CI State: SUCCESS✅ - Build Build Details:
|
The cover letter has been updated. Queued. |
Idea: formalize the release notes tag, and use it to generate a skeleton for the release notes. |
…the node if's already removed This is a regression after scylladb#15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function.
…the node if's already removed This is a regression after scylladb#15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes scylladb#16466
…the node if's already removed This is a regression after #15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes #16466 Closes #16508
The HOST_ID is already written to system.peers since inception pretty much (See #16376 (comment) for details). However, it is written to the table using an individual CQL query and so it is not set atomically with other columns. If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value. This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes. On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted. With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id. Refs #15903 Closes #16376 * github.com:scylladb/scylladb: gms: endpoint_state: change application_state_map to std::unordered_map system_keyspace: update_peer_info: drop single-column overloads storage_service: drop do_update_system_peers_table storage_service: on_change: fixup indentation endpoint_state subscriptions: batch on_change notification everywhere: drop before_change subscription system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id system_keyspace: drop update_tokens(endpoint, tokens) overload storage_service: seal peer info with host_id storage_service: update_peer_info: pass peer_info to sys_ks gms: endpoint_state: define application_state_map system_keyspace: update_peer_info: use struct peer_info for all optional values query_processor: execute_internal: support unset values types: add data_value_list system_keyspace: get rid of update_cached_values storage_service: do not update peer info for this node
…the node if's already removed This is a regression after scylladb#15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes scylladb#16466 Closes scylladb#16508
In this PR we refactor
token_metadata
to uselocator::host_id
instead ofgms::inet_address
for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined in this comment -storage_service::topology_state_load
can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.The refactoring is structured as follows:
token_metadata
into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides aget_new()
method, which can be used to access the new version.get_new()
. When this stage is finished we can use any version of thetoken_metadata
for reading.token_metadata
and switch them to the new version.host_id
-basedtoken_metadata
default, dropinet_address
-based version, changetoken_metadata
back to non-template.Release notes
These series depends on RPC sender
host_id
being present in RPCclent_info
forbootstrap
andreplace
node_ops commands. This feature was added in this commit and released in5.4
. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to5.4
(or the corresponding Enterprise version) then to the version with these changes (5.5
or6.0
) should be fine. If for some reason they upgrade from a version withouthost_id
in RPCclent_info
to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an errorCoordinator host_id not found
if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.