token_metadata: switch to host_id #15903

gusev-p · 2023-11-01T07:31:39Z

In this PR we refactor token_metadata to use locator::host_id instead of gms::inet_address for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined in this comment - storage_service::topology_state_load can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.

The refactoring is structured as follows:

Turn token_metadata into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a get_new() method, which can be used to access the new version.
Go over all places which write to the old version and make the corresponding writes to the new version through get_new(). When this stage is finished we can use any version of the token_metadata for reading.
Go over all the places which read token_metadata and switch them to the new version.
Make host_id-based token_metadata default, drop inet_address-based version, change token_metadata back to non-template.

Release notes

These series depends on RPC sender host_id being present in RPC clent_info for bootstrap and replace node_ops commands. This feature was added in this commit and released in 5.4. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to 5.4 (or the corresponding Enterprise version) then to the version with these changes (5.5 or 6.0) should be fine. If for some reason they upgrade from a version without host_id in RPC clent_info to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error Coordinator host_id not found if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.

locator/abstract_replication_strategy.cc

bhalevy · 2023-11-01T08:07:32Z

locator/token_metadata.cc

@@ -95,9 +95,9 @@ class token_metadata_impl final {
    struct shallow_copy {};
 public:
    token_metadata_impl(shallow_copy, const token_metadata_impl& o) noexcept
-        : _topology(topology::config{})
+        : _topology(topology::config{}, topology::key_kind::inet_address)


Please add contents to the commit message (locator/topology: add key_kind parameter)
about the motivation and plans for this feature - is it transient, only for the purpose of transitioning token_metadata to using host_id or is it meant to stay?

Please add contents to the commit message

Sure, I'll add the proper commit messages later, I just skipped them for this early highly experimental stage.

about the motivation and plans for this feature - is it transient, only for the purpose of transitioning token_metadata to using host_id or is it meant to stay?

It's solely for the transitioning, when this PR is finished the key kind would be key_kind::host_id for everything and I'll just drop it.

bhalevy · 2023-11-01T08:08:13Z

locator/topology.hh

@@ -167,7 +172,7 @@ public:

        bool operator==(const config&) const = default;
    };
-    topology(config cfg);
+    topology(config cfg, key_kind k);


why not make it part of topology::config?

it's temporary anyway

bhalevy · 2023-11-02T09:06:02Z

locator/token_metadata.cc

@@ -237,7 +237,7 @@ class token_metadata_impl final {
    static range<dht::token> interval_to_range(boost::icl::interval<token>::interval_type i);

 public:
-    future<> update_topology_change_info(dc_rack_fn& get_dc_rack);
+    future<> update_topology_change_info(dc_rack_fn<gms::inet_address>& get_dc_rack);


mental note: consider passing a variant<gms::inet_address, locator::host_id> as an alternative.
I'm not sure if that would be simpler overall, but it seems more straight forward.
Another alternative is a class instead of a function, with get_dc_rack(gms::inet_address) and get_dc_rack(locator::host_id) methods.

scylladb-promoter · 2023-11-02T10:39:33Z

🔴 CI State: FAILURE

❌ - Build

Build Details:

Build URL: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4487/
Duration: 36 min
Builder: spider7.cloudius-systems.com

scylladb-promoter · 2023-11-07T14:26:33Z

🔴 CI State: FAILURE

❌ - Build

Build Details:

Build URL: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4576/
Duration: 39 min
Builder: spider1.cloudius-systems.com

scylladb-promoter · 2023-11-07T16:27:21Z

🔴 CI State: FAILURE

❌ - Build

Build Details:

Build URL: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4582/
Duration: 44 min
Builder: spider3.cloudius-systems.com

scylladb-promoter · 2023-11-08T06:47:57Z

🔴 CI State: ABORTED

✅ - Build
❌ - Unit Tests
❌ - Sanity Tests

Failed Tests (871/21438):

Build Details:

Build URL: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4589/
Duration: 5 hr 22 min
Builder: spider5.cloudius-systems.com

gusev-p · 2023-11-21T07:51:25Z

new version:

rebased on master
fixed the tests (no big deal, just typos here and there)

The check *ep == endpoint is needed when a node changes its IP - on_change can be called by the gossiper for old IP as part of its removal, after handle_state_normal has already been called for the new one. Without the check, the do_update_system_peers_table call overwrites the IP back to its old value. Previously token_metadata used endpoint as the key and the *ep == endpoint condition was followed from the is_normal_token_owner check. Now with host_id-s we have an additional layer of indirection, and we need *ep == endpoint check to get the same end condition. This case was revealed by the dtest update_cluster_layout_tests.py::TestUpdateClusterLayout::test_change_node_ip

The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map method was used only here. It's inlined in this commit since it's too specific and incurs the overhead of creating an intermediate map.

In this commit we change the return type of storage_service::get_token_metadata_ptr() to token_metadata2_ptr and fix whatever breaks. All the boost and topology tests pass with this change.

In this commit we replace token_metadata with token_metadata2 in the erm interface and field types. To accommodate the change some of strategy-related methods are also updated. All the boost and topology tests pass with this change.

database::get_token_metadata() is switched to token_metadata2. get_all_ips method is added to the host_id-based token_metadata, since its convenient and will be used in several places. It returns all current nodes converted to inet_address by means of the topology contained within token_metadata. hint_sender::can_send: if the node has already left the cluster we may not find its host_id. This case is handled in the same way as if it's not a normal token owner - we simply send a hint to all replicas.

Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.

…self This used to work before in replace-with-same-ip scenario, but with host_id-s it's no longer relevant. base_token_metadata has been removed from topology_change_info because the conditions needed for its creation are no longer met.

Make host_id parameter non-optional and move it to the beginning of the arguments list. Delete unused overloads of add_or_update_endpoint. Delete unused overload of token_metadata::update_topology with inet_address argument.

The overload was used only in tests.

gusev-p · 2023-12-12T19:33:46Z

new version:

coordinator_host_id: push the check/throw down to the concrete commands (https://github.com/scylladb/scylladb/compare/e92af35e4264f9d727c45e0032c7ee23f5c69942..9d93a518ac3c0d57aaa8eede33b0f3652e3ca690)

scylladb-promoter · 2023-12-12T23:50:28Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests
✅ - dtest

Build Details:

Duration: 4 hr 19 min
Builder: spider4.cloudius-systems.com

kbr-scylla · 2023-12-13T15:36:17Z

The cover letter has been updated.

Queued.

avikivity · 2023-12-13T15:39:09Z

Idea: formalize the release notes tag, and use it to generate a skeleton for the release notes.

…the node if's already removed This is a regression after scylladb#15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function.

…the node if's already removed This is a regression after scylladb#15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes scylladb#16466

…the node if's already removed This is a regression after #15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes #16466 Closes #16508

The HOST_ID is already written to system.peers since inception pretty much (See #16376 (comment) for details). However, it is written to the table using an individual CQL query and so it is not set atomically with other columns. If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value. This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes. On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted. With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id. Refs #15903 Closes #16376 * github.com:scylladb/scylladb: gms: endpoint_state: change application_state_map to std::unordered_map system_keyspace: update_peer_info: drop single-column overloads storage_service: drop do_update_system_peers_table storage_service: on_change: fixup indentation endpoint_state subscriptions: batch on_change notification everywhere: drop before_change subscription system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id system_keyspace: drop update_tokens(endpoint, tokens) overload storage_service: seal peer info with host_id storage_service: update_peer_info: pass peer_info to sys_ks gms: endpoint_state: define application_state_map system_keyspace: update_peer_info: use struct peer_info for all optional values query_processor: execute_internal: support unset values types: add data_value_list system_keyspace: get rid of update_cached_values storage_service: do not update peer info for this node

…the node if's already removed This is a regression after scylladb#15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes scylladb#16466 Closes scylladb#16508

gusev-p requested review from tgrabiec and nyh as code owners November 1, 2023 07:31

gusev-p requested review from bhalevy and kbr-scylla and removed request for tgrabiec, nyh and kbr-scylla November 1, 2023 07:31

github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 1, 2023

bhalevy reviewed Nov 1, 2023

View reviewed changes

locator/abstract_replication_strategy.cc Show resolved Hide resolved

bhalevy reviewed Nov 1, 2023

View reviewed changes

bhalevy reviewed Nov 2, 2023

View reviewed changes

gusev-p force-pushed the token_metadata_host_id branch from 4026819 to 9c38bcc Compare November 2, 2023 10:02

github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 2, 2023

gusev-p force-pushed the token_metadata_host_id branch from 9c38bcc to 56f7e7b Compare November 7, 2023 13:08

github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 7, 2023

gusev-p force-pushed the token_metadata_host_id branch from 0ac9f39 to ec2358a Compare November 7, 2023 18:25

github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 7, 2023

mykaul added this to the 6.0 milestone Nov 20, 2023

mykaul added area/topology changes P1 Urgent labels Nov 20, 2023

gusev-p force-pushed the token_metadata_host_id branch from ec2358a to 86c1634 Compare November 21, 2023 07:50

Petr Gusev added 14 commits December 12, 2023 23:19

api/token_metadata: switch to new version

0e4c90d

storage_service: get_token_to_endpoint_map: use new token_metadata

f53f34f

The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map method was used only here. It's inlined in this commit since it's too specific and incurs the overhead of creating an intermediate map.

storage_service: get_token_metadata -> token_metadata2

309e08e

In this commit we change the return type of storage_service::get_token_metadata_ptr() to token_metadata2_ptr and fix whatever breaks. All the boost and topology tests pass with this change.

erm: switch to the new token_metadata

11cc21d

In this commit we replace token_metadata with token_metadata2 in the erm interface and field types. To accommodate the change some of strategy-related methods are also updated. All the boost and topology tests pass with this change.

gossiper: use new token_metadata

c7314aa

shared_token_metadata: switch to the new token_metadata

799f747

token_metadata: drop the template

7b55ccb

Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.

dc_rack_fn: make it non-template

8c551f9

topology: drop key_kind, host_id is now the primary key

3b59919

token_metadata: topology: cleanup add_or_update_endpoint

fbf507b

Make host_id parameter non-optional and move it to the beginning of the arguments list. Delete unused overloads of add_or_update_endpoint. Delete unused overload of token_metadata::update_topology with inet_address argument.

topology: remove_endpoint: remove inet_address overload

9d93a51

The overload was used only in tests.

gusev-p force-pushed the token_metadata_host_id branch from e92af35 to 9d93a51 Compare December 12, 2023 19:30

scylladb-promoter closed this in 26cbd28 Dec 13, 2023

scylladb-promoter merged commit 26cbd28 into scylladb:master Dec 13, 2023
3 checks passed

This was referenced Dec 21, 2023

storage_service: node_ops_cmd_handler: decommission rollback fix #16508

Closed

token_metadata should map tokens to hosts rather than to endpoints #12279

Closed

gusev-p mentioned this pull request Jan 7, 2024

Multiple node core dump during decommission operation of other node (conversion to host ID related) #16668

Closed

2 tasks

dawmd mentioned this pull request Feb 15, 2024

topology: Extend the lifetime of IP–host ID mappings until they're not needed anymore #15968

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

token_metadata: switch to host_id #15903

token_metadata: switch to host_id #15903

gusev-p commented Nov 1, 2023 •

edited by kbr-scylla

bhalevy Nov 1, 2023

gusev-p Nov 1, 2023

bhalevy Nov 1, 2023

avikivity Dec 6, 2023

bhalevy Nov 2, 2023

scylladb-promoter commented Nov 2, 2023

scylladb-promoter commented Nov 7, 2023

scylladb-promoter commented Nov 7, 2023

scylladb-promoter commented Nov 8, 2023

gusev-p commented Nov 21, 2023

gusev-p commented Dec 12, 2023

scylladb-promoter commented Dec 12, 2023

kbr-scylla commented Dec 13, 2023

avikivity commented Dec 13, 2023

token_metadata: switch to host_id #15903

token_metadata: switch to host_id #15903

Conversation

gusev-p commented Nov 1, 2023 • edited by kbr-scylla

Release notes

bhalevy Nov 1, 2023

Choose a reason for hiding this comment

gusev-p Nov 1, 2023

Choose a reason for hiding this comment

bhalevy Nov 1, 2023

Choose a reason for hiding this comment

avikivity Dec 6, 2023

Choose a reason for hiding this comment

bhalevy Nov 2, 2023

Choose a reason for hiding this comment

scylladb-promoter commented Nov 2, 2023

🔴 CI State: FAILURE

Build Details:

scylladb-promoter commented Nov 7, 2023

🔴 CI State: FAILURE

Build Details:

scylladb-promoter commented Nov 7, 2023

🔴 CI State: FAILURE

Build Details:

scylladb-promoter commented Nov 8, 2023

🔴 CI State: ABORTED

Failed Tests (871/21438):

Build Details:

gusev-p commented Nov 21, 2023

gusev-p commented Dec 12, 2023

scylladb-promoter commented Dec 12, 2023

🟢 CI State: SUCCESS

Build Details:

kbr-scylla commented Dec 13, 2023

avikivity commented Dec 13, 2023

gusev-p commented Nov 1, 2023 •

edited by kbr-scylla