Cannot replace a node after changing its IP address #13775

bhalevy · 2023-05-04T17:27:08Z

This is a follow up on #13066

The following dtest reproduces the issue:

    def test_replace_after_changing_node_ip(self):
        """ Changes to cluster topology after node ip changed"""

        cluster = self.cluster
        logger.info("starting cluster")
        cluster.populate(3).start(wait_for_binary_proto=True, wait_other_notice=True)

        logger.info("stopping node3")
        node1, node2, node3 = cluster.nodelist()
        node3_host_id = node3.hostid()
        node3.stop(gently=True)

        logger.info("replace node3 address")
        old_ip3 = node3.address()
        ip3 = f'{old_ip3}3'
        node3.set_configuration_options(values={'listen_address': ip3, 'rpc_address': ip3, 'api_address': ip3})
        node3.network_interfaces = {k: (ip3, v[1]) for k, v in node3.network_interfaces.items()}
        node3.start(wait_for_binary_proto=True, wait_other_notice=True)

        logger.info("stop node3")
        node3.stop(wait_other_notice=True)

        def is_shutdown(endpoint=old_ip3):
            found = False
            for node in [node1, node2]:
                gs = nodetool_gossipinfo(node)
                if endpoint in gs:
                    logger.debug(gs[endpoint])
                    if not "shutdown" in gs[endpoint]['STATUS']:
                        found |= True
            return not found

        logger.info(f"Waiting for {old_ip3} gossip status=shutdown")
        timeout = self.cql_timeout(120)
        wait_for(is_shutdown, step=10, timeout=timeout)

        logger.info("Replace node3 with node4")
        node4 = new_node(cluster, bootstrap=True, token=None, remote_debug_port='0', data_center=None)
        node4.start(wait_for_binary_proto=True, replace_node_host_id=node3_host_id)

node4 fails to start, reporting:

INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - entering STARTING mode
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - Loading persisted ring state
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - initial_contact_nodes={127.0.64.33, 127.0.64.2, 127.0.64.1}, loaded_endpoints={}, loaded_peer_features=0
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - Gathering node replacement information for 289ad118-d113-4452-be68-e177823c57e5/0000:0000:0000:0000:0000:0000:0000:0000
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - Checking remote features with gossip
INFO  2023-05-04 17:25:09,215 [shard 0] gossip - Gossip shadow round started with nodes={127.0.64.33, 127.0.64.2, 127.0.64.1}
WARN  2023-05-04 17:25:09,216 [shard 0] gossip - Node 127.0.64.33 is down for get_endpoint_states verb
INFO  2023-05-04 17:25:09,216 [shard 0] gossip - Gossip shadow round finished with nodes_talked={127.0.64.1, 127.0.64.2}
INFO  2023-05-04 17:25:09,216 [shard 0] gossip - Feature check passed. Local node 127.0.64.4 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2023-05-04 17:25:09,216 [shard 0] init - Shutting down group 0 service
...
ERROR 2023-05-04 17:25:09,233 [shard 0] init - Startup failed: std::runtime_error (Found multiple nodes with Host ID 289ad118-d113-4452-be68-e177823c57e5: {127.0.64.3, 127.0.64.33})

The text was updated successfully, but these errors were encountered:

bhalevy · 2023-05-04T17:31:19Z

I believe that the issue is pre-existing.
Need to test this against old branches to see when it was introduced (if at all, it could have been there forever)

bhalevy · 2023-05-04T17:32:00Z

Also, we need to test this was raft topology operations enabled.

mykaul · 2023-06-26T13:42:42Z

should @kostja and team own this?

bhalevy · 2023-06-26T14:09:49Z

@kostja can you guys please own this issue?

kostja · 2023-06-26T16:19:31Z

@bhalevy is dtest running with consisten_cluster_management: true?

bhalevy · 2023-06-26T16:38:06Z

@bhalevy is dtest running with consisten_cluster_management: true?

@kostja I ran the test above without consistent cluster management, but it would interesting to test it with it.
Hopefully it will magically fix the problem.

kbr-scylla · 2023-07-07T09:50:02Z

Copying my comment from the other PR:

The exception comes from prepare_replacement_info, which is called before we even start gossiping. It uses information from the shadow round directly.

I guess it regressed when we started reusing Host ID of replaced node.

A potential fix would be to resolve the Host ID conflict inside prepare_replacement_info based on generation numbers, the same as we do when calculating token_metadata.

bhalevy · 2023-07-07T09:58:54Z

I guess it regressed when we started reusing Host ID of replaced node.

But we stopped reusing the replaced node id.
We used to in the past, but we no longer do that.

bhalevy · 2023-07-07T10:01:16Z

We do need to resolve the conflict by dropping the endpoint state of the node that changed its ip address.
We can do that safely now because we no longer reuse the host_id on replace.
This way, if 2 nodes have the same host_id we know they refer to the same host, that changed its ip address.

kbr-scylla · 2023-07-07T10:35:38Z

But we stopped reusing the replaced node id.
We used to in the past, but we no longer do that.

Right, sorry, brain fart.

When a host changes its ip address we should force remove the previous endpoint state since we want only one endpoint to refer to this host_id. If the new node that changed its ip address is decommissioned, the previous node seems as a normal token owner, just in shutdown status, but it is not longer in the cluster. Refs scylladb#14468 Fixes scylladb#13775 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

When a host changes its ip address we should force remove the previous endpoint state since we want only one endpoint to refer to this host_id. If the new node that changed its ip address is decommissioned, the previous node seems as a normal token owner, just in shutdown status, but it is not longer in the cluster. Fixes scylladb#13775 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

gusev-p · 2024-02-15T17:33:41Z

with this PR the test passes with --experimental-features consistent-topology-changes, fails without it.

kbr-scylla · 2024-02-15T17:35:44Z

Nice.

I'm moving this issue out of the raft topology required backlog then, as the issue remains in gossiper-mode but is not a blocker for raft-topology

bhalevy · 2024-02-16T08:26:24Z

How about the dtest for this issue?
Is it currently skipped with a require marker (see scylladb/scylla-dtest@6cdce217ca17241dbefe96d0dd128a3b37ab82e)
But we better run it with consistent topology changes, can you please change the marker to @pytest.mark.required_features("consistent-topology-changes")?

kbr-scylla · 2024-02-16T12:13:02Z

But we better run it with consistent topology changes, can you please change the marker to @pytest.mark.required_features("consistent-topology-changes")?

cc @temichus @aleksbykov
can you please add the marker?
and a comment referencing this issue.

kbr-scylla · 2024-03-08T14:04:53Z

cc @temichus @aleksbykov
can you please add the marker?
and a comment referencing this issue.

ping

temichus · 2024-03-08T15:50:17Z

cc @temichus @aleksbykov
can you please add the marker?
and a comment referencing this issue.

ping

I've created pr https://github.com/scylladb/scylla-dtest/pull/4043

mykaul added this to the 5.3 milestone May 8, 2023

mykaul added the area/topology changes label May 8, 2023

kostja self-assigned this Jun 26, 2023

bhalevy mentioned this issue Jul 6, 2023

Fix bootstrap "wait for UP/NORMAL nodes" to handle ignored nodes, recently replaced nodes, and recently changed IPs #14507

Merged

kostja added the area/gossip label Jul 18, 2023

kbr-scylla linked a pull request Aug 4, 2023 that will close this issue

storage_service: force_remove_endpoint of changed ip address #14471

Open

DoronArazii modified the milestones: 5.3, 5.4 Aug 6, 2023

mykaul modified the milestones: 5.4, 6.0 Oct 29, 2023

gusev-p self-assigned this Feb 15, 2024

kbr-scylla assigned kbr-scylla and kostja and unassigned kostja and gusev-p Feb 15, 2024

kbr-scylla modified the milestones: 6.0, 6.0.1 May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot replace a node after changing its IP address #13775

Cannot replace a node after changing its IP address #13775

bhalevy commented May 4, 2023

bhalevy commented May 4, 2023

bhalevy commented May 4, 2023 •

edited

mykaul commented Jun 26, 2023

bhalevy commented Jun 26, 2023

kostja commented Jun 26, 2023

bhalevy commented Jun 26, 2023

kbr-scylla commented Jul 7, 2023

bhalevy commented Jul 7, 2023

bhalevy commented Jul 7, 2023 •

edited

kbr-scylla commented Jul 7, 2023

gusev-p commented Feb 15, 2024

kbr-scylla commented Feb 15, 2024

bhalevy commented Feb 16, 2024

kbr-scylla commented Feb 16, 2024

kbr-scylla commented Mar 8, 2024

temichus commented Mar 8, 2024

Cannot replace a node after changing its IP address #13775

Cannot replace a node after changing its IP address #13775

Comments

bhalevy commented May 4, 2023

bhalevy commented May 4, 2023

bhalevy commented May 4, 2023 • edited

mykaul commented Jun 26, 2023

bhalevy commented Jun 26, 2023

kostja commented Jun 26, 2023

bhalevy commented Jun 26, 2023

kbr-scylla commented Jul 7, 2023

bhalevy commented Jul 7, 2023

bhalevy commented Jul 7, 2023 • edited

kbr-scylla commented Jul 7, 2023

gusev-p commented Feb 15, 2024

kbr-scylla commented Feb 15, 2024

bhalevy commented Feb 16, 2024

kbr-scylla commented Feb 16, 2024

kbr-scylla commented Mar 8, 2024

temichus commented Mar 8, 2024

bhalevy commented May 4, 2023 •

edited

bhalevy commented Jul 7, 2023 •

edited