Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token_metadata: switch to host_id #15903

Merged
merged 51 commits into from Dec 13, 2023

Conversation

gusev-p
Copy link

@gusev-p gusev-p commented Nov 1, 2023

In this PR we refactor token_metadata to use locator::host_id instead of gms::inet_address for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined in this comment - storage_service::topology_state_load can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.

The refactoring is structured as follows:

  • Turn token_metadata into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a get_new() method, which can be used to access the new version.
  • Go over all places which write to the old version and make the corresponding writes to the new version through get_new(). When this stage is finished we can use any version of the token_metadata for reading.
  • Go over all the places which read token_metadata and switch them to the new version.
  • Make host_id-based token_metadata default, drop inet_address-based version, change token_metadata back to non-template.

Release notes

These series depends on RPC sender host_id being present in RPC clent_info for bootstrap and replace node_ops commands. This feature was added in this commit and released in 5.4. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to 5.4 (or the corresponding Enterprise version) then to the version with these changes (5.5 or 6.0) should be fine. If for some reason they upgrade from a version without host_id in RPC clent_info to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error Coordinator host_id not found if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.

@gusev-p gusev-p requested review from bhalevy and kbr-scylla and removed request for tgrabiec, nyh and kbr-scylla November 1, 2023 07:31
@github-actions github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 1, 2023
@@ -95,9 +95,9 @@ class token_metadata_impl final {
struct shallow_copy {};
public:
token_metadata_impl(shallow_copy, const token_metadata_impl& o) noexcept
: _topology(topology::config{})
: _topology(topology::config{}, topology::key_kind::inet_address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add contents to the commit message (locator/topology: add key_kind parameter)
about the motivation and plans for this feature - is it transient, only for the purpose of transitioning token_metadata to using host_id or is it meant to stay?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add contents to the commit message

Sure, I'll add the proper commit messages later, I just skipped them for this early highly experimental stage.

about the motivation and plans for this feature - is it transient, only for the purpose of transitioning token_metadata to using host_id or is it meant to stay?

It's solely for the transitioning, when this PR is finished the key kind would be key_kind::host_id for everything and I'll just drop it.

@@ -167,7 +172,7 @@ public:

bool operator==(const config&) const = default;
};
topology(config cfg);
topology(config cfg, key_kind k);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make it part of topology::config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's temporary anyway

@@ -237,7 +237,7 @@ class token_metadata_impl final {
static range<dht::token> interval_to_range(boost::icl::interval<token>::interval_type i);

public:
future<> update_topology_change_info(dc_rack_fn& get_dc_rack);
future<> update_topology_change_info(dc_rack_fn<gms::inet_address>& get_dc_rack);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mental note: consider passing a variant<gms::inet_address, locator::host_id> as an alternative.
I'm not sure if that would be simpler overall, but it seems more straight forward.
Another alternative is a class instead of a function, with get_dc_rack(gms::inet_address) and get_dc_rack(locator::host_id) methods.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

❌ - Build

Build Details:

@github-actions github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 7, 2023
@github-actions github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 7, 2023
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

❌ - Build

Build Details:

@github-actions github-actions bot deleted a comment from aws-amplify-us-east-2 bot Nov 7, 2023
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

❌ - Build

Build Details:

@mykaul mykaul added this to the 6.0 milestone Nov 20, 2023
@gusev-p
Copy link
Author

gusev-p commented Nov 21, 2023

new version:

  • rebased on master
  • fixed the tests (no big deal, just typos here and there)

Petr Gusev added 14 commits December 12, 2023 23:19
The check *ep == endpoint is needed when a node
changes its IP - on_change can be called by the
gossiper for old IP as part of its removal, after
handle_state_normal has already been called for
the new one. Without the check, the
do_update_system_peers_table call overwrites the IP
back to its old value.

Previously token_metadata used endpoint as the key
and the *ep == endpoint condition was followed from the
is_normal_token_owner check. Now with host_id-s we have
an additional layer of indirection, and we need
*ep == endpoint check to get the same end condition.

This case was revealed by the dtest
update_cluster_layout_tests.py::TestUpdateClusterLayout::test_change_node_ip
The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map
method was used only here. It's inlined in this
commit since it's too specific and incurs the overhead
of creating an intermediate map.
In this commit we change the return type of
storage_service::get_token_metadata_ptr() to
token_metadata2_ptr and fix whatever breaks.

All the boost and topology tests pass with this change.
In this commit we replace token_metadata with token_metadata2
in the erm interface and field types. To accommodate the change
some of strategy-related methods are also updated.

All the boost and topology tests pass with this change.
database::get_token_metadata() is switched to token_metadata2.

get_all_ips method is added to the host_id-based token_metadata, since
its convenient and will be used in several places. It returns all current
nodes converted to inet_address by means of the topology
contained within token_metadata.

hint_sender::can_send: if the node has already left the
cluster we may not find its host_id. This case is handled
in the same way as if it's not a normal token owner - we
simply send a hint to all replicas.
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
…self

This used to work before in replace-with-same-ip scenario, but
with host_id-s it's no longer relevant.

base_token_metadata has been removed from topology_change_info
because the conditions needed for its creation
are no longer met.
Make host_id parameter non-optional and
move it to the beginning of the arguments list.

Delete unused overloads of add_or_update_endpoint.

Delete unused overload of token_metadata::update_topology
with inet_address argument.
The overload was used only in tests.
@gusev-p
Copy link
Author

gusev-p commented Dec 12, 2023

new version:

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests
✅ - dtest

Build Details:

  • Duration: 4 hr 19 min
  • Builder: spider4.cloudius-systems.com

@kbr-scylla
Copy link
Contributor

The cover letter has been updated.

Queued.

@avikivity
Copy link
Member

Idea: formalize the release notes tag, and use it to generate a skeleton for the release notes.

@scylladb-promoter scylladb-promoter merged commit 26cbd28 into scylladb:master Dec 13, 2023
3 checks passed
gusev-p pushed a commit to gusev-p/scylla that referenced this pull request Dec 21, 2023
…the node if's already removed

This is a regression after scylladb#15903. Before these changes
del_leaving_endpoint took IP as a parameter and did nothing
if it was called with a non-existent IP.

The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was
flaky as in most cases the node died before the
gossiper notification reached all the other nodes. To make
it fail consistently and reproduce the problem one
can move the info log 'Announcing that I have' after
the sleep and add additional sleep after it in
storage_service::leave_ring function.
gusev-p pushed a commit to gusev-p/scylla that referenced this pull request Dec 21, 2023
…the node if's already removed

This is a regression after scylladb#15903. Before these changes
del_leaving_endpoint took IP as a parameter and did nothing
if it was called with a non-existent IP.

The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was
flaky as in most cases the node died before the
gossiper notification reached all the other nodes. To make
it fail consistently and reproduce the problem one
can move the info log 'Announcing that I have' after
the sleep and add additional sleep after it in
storage_service::leave_ring function.

Fixes scylladb#16466
tgrabiec pushed a commit that referenced this pull request Dec 22, 2023
…the node if's already removed

This is a regression after #15903. Before these changes
del_leaving_endpoint took IP as a parameter and did nothing
if it was called with a non-existent IP.

The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was
flaky as in most cases the node died before the
gossiper notification reached all the other nodes. To make
it fail consistently and reproduce the problem one
can move the info log 'Announcing that I have' after
the sleep and add additional sleep after it in
storage_service::leave_ring function.

Fixes #16466

Closes #16508
avikivity added a commit that referenced this pull request Dec 31, 2023
The HOST_ID is already written to system.peers since inception pretty much (See #16376 (comment) for details).

However, it is written to the table using an individual CQL query and so it is not set atomically with other columns.
If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value.

This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes.

On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted.

With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id.

Refs #15903

Closes #16376

* github.com:scylladb/scylladb:
  gms: endpoint_state: change application_state_map to std::unordered_map
  system_keyspace: update_peer_info: drop single-column overloads
  storage_service: drop do_update_system_peers_table
  storage_service: on_change: fixup indentation
  endpoint_state subscriptions: batch on_change notification
  everywhere: drop before_change subscription
  system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id
  system_keyspace: drop update_tokens(endpoint, tokens) overload
  storage_service: seal peer info with host_id
  storage_service: update_peer_info: pass peer_info to sys_ks
  gms: endpoint_state: define application_state_map
  system_keyspace: update_peer_info: use struct peer_info for all optional values
  query_processor: execute_internal: support unset values
  types: add data_value_list
  system_keyspace: get rid of update_cached_values
  storage_service: do not update peer info for this node
dgarcia360 pushed a commit to dgarcia360/scylla that referenced this pull request Apr 30, 2024
…the node if's already removed

This is a regression after scylladb#15903. Before these changes
del_leaving_endpoint took IP as a parameter and did nothing
if it was called with a non-existent IP.

The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was
flaky as in most cases the node died before the
gossiper notification reached all the other nodes. To make
it fail consistently and reproduce the problem one
can move the info log 'Announcing that I have' after
the sleep and add additional sleep after it in
storage_service::leave_ring function.

Fixes scylladb#16466

Closes scylladb#16508
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants