Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot replace a node after changing its IP address #13775

Open
bhalevy opened this issue May 4, 2023 · 16 comments · May be fixed by #14471
Open

Cannot replace a node after changing its IP address #13775

bhalevy opened this issue May 4, 2023 · 16 comments · May be fixed by #14471

Comments

@bhalevy
Copy link
Member

bhalevy commented May 4, 2023

This is a follow up on #13066

The following dtest reproduces the issue:

    def test_replace_after_changing_node_ip(self):
        """ Changes to cluster topology after node ip changed"""

        cluster = self.cluster
        logger.info("starting cluster")
        cluster.populate(3).start(wait_for_binary_proto=True, wait_other_notice=True)

        logger.info("stopping node3")
        node1, node2, node3 = cluster.nodelist()
        node3_host_id = node3.hostid()
        node3.stop(gently=True)

        logger.info("replace node3 address")
        old_ip3 = node3.address()
        ip3 = f'{old_ip3}3'
        node3.set_configuration_options(values={'listen_address': ip3, 'rpc_address': ip3, 'api_address': ip3})
        node3.network_interfaces = {k: (ip3, v[1]) for k, v in node3.network_interfaces.items()}
        node3.start(wait_for_binary_proto=True, wait_other_notice=True)

        logger.info("stop node3")
        node3.stop(wait_other_notice=True)

        def is_shutdown(endpoint=old_ip3):
            found = False
            for node in [node1, node2]:
                gs = nodetool_gossipinfo(node)
                if endpoint in gs:
                    logger.debug(gs[endpoint])
                    if not "shutdown" in gs[endpoint]['STATUS']:
                        found |= True
            return not found

        logger.info(f"Waiting for {old_ip3} gossip status=shutdown")
        timeout = self.cql_timeout(120)
        wait_for(is_shutdown, step=10, timeout=timeout)

        logger.info("Replace node3 with node4")
        node4 = new_node(cluster, bootstrap=True, token=None, remote_debug_port='0', data_center=None)
        node4.start(wait_for_binary_proto=True, replace_node_host_id=node3_host_id)

node4 fails to start, reporting:

INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - entering STARTING mode
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - Loading persisted ring state
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - initial_contact_nodes={127.0.64.33, 127.0.64.2, 127.0.64.1}, loaded_endpoints={}, loaded_peer_features=0
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - Gathering node replacement information for 289ad118-d113-4452-be68-e177823c57e5/0000:0000:0000:0000:0000:0000:0000:0000
INFO  2023-05-04 17:25:09,215 [shard 0] storage_service - Checking remote features with gossip
INFO  2023-05-04 17:25:09,215 [shard 0] gossip - Gossip shadow round started with nodes={127.0.64.33, 127.0.64.2, 127.0.64.1}
WARN  2023-05-04 17:25:09,216 [shard 0] gossip - Node 127.0.64.33 is down for get_endpoint_states verb
INFO  2023-05-04 17:25:09,216 [shard 0] gossip - Gossip shadow round finished with nodes_talked={127.0.64.1, 127.0.64.2}
INFO  2023-05-04 17:25:09,216 [shard 0] gossip - Feature check passed. Local node 127.0.64.4 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2023-05-04 17:25:09,216 [shard 0] init - Shutting down group 0 service
...
ERROR 2023-05-04 17:25:09,233 [shard 0] init - Startup failed: std::runtime_error (Found multiple nodes with Host ID 289ad118-d113-4452-be68-e177823c57e5: {127.0.64.3, 127.0.64.33})
@bhalevy
Copy link
Member Author

bhalevy commented May 4, 2023

I believe that the issue is pre-existing.
Need to test this against old branches to see when it was introduced (if at all, it could have been there forever)

@bhalevy
Copy link
Member Author

bhalevy commented May 4, 2023

Also, we need to test this was raft topology operations enabled.

@mykaul mykaul added this to the 5.3 milestone May 8, 2023
@mykaul
Copy link
Contributor

mykaul commented Jun 26, 2023

should @kostja and team own this?

@bhalevy
Copy link
Member Author

bhalevy commented Jun 26, 2023

@kostja can you guys please own this issue?

@kostja kostja self-assigned this Jun 26, 2023
@kostja
Copy link
Contributor

kostja commented Jun 26, 2023

@bhalevy is dtest running with consisten_cluster_management: true?

@bhalevy
Copy link
Member Author

bhalevy commented Jun 26, 2023

@bhalevy is dtest running with consisten_cluster_management: true?

@kostja I ran the test above without consistent cluster management, but it would interesting to test it with it.
Hopefully it will magically fix the problem.

@kbr-scylla
Copy link
Contributor

Copying my comment from the other PR:

The exception comes from prepare_replacement_info, which is called before we even start gossiping. It uses information from the shadow round directly.

I guess it regressed when we started reusing Host ID of replaced node.

A potential fix would be to resolve the Host ID conflict inside prepare_replacement_info based on generation numbers, the same as we do when calculating token_metadata.

@bhalevy
Copy link
Member Author

bhalevy commented Jul 7, 2023

I guess it regressed when we started reusing Host ID of replaced node.

But we stopped reusing the replaced node id.
We used to in the past, but we no longer do that.

@bhalevy
Copy link
Member Author

bhalevy commented Jul 7, 2023

We do need to resolve the conflict by dropping the endpoint state of the node that changed its ip address.
We can do that safely now because we no longer reuse the host_id on replace.
This way, if 2 nodes have the same host_id we know they refer to the same host, that changed its ip address.

@kbr-scylla
Copy link
Contributor

But we stopped reusing the replaced node id.
We used to in the past, but we no longer do that.

Right, sorry, brain fart.

bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 6, 2023
When a host changes its ip address we should force remove
the previous endpoint state since we want only one endpoint
to refer to this host_id.

If the new node that changed its ip address is decommissioned,
the previous node seems as a normal token owner, just in shutdown
status, but it is not longer in the cluster.

Refs scylladb#14468
Fixes scylladb#13775

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 6, 2023
When a host changes its ip address we should force remove
the previous endpoint state since we want only one endpoint
to refer to this host_id.

If the new node that changed its ip address is decommissioned,
the previous node seems as a normal token owner, just in shutdown
status, but it is not longer in the cluster.

Refs scylladb#14468
Fixes scylladb#13775

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@DoronArazii DoronArazii modified the milestones: 5.3, 5.4 Aug 6, 2023
@mykaul mykaul modified the milestones: 5.4, 6.0 Oct 29, 2023
bhalevy added a commit to bhalevy/scylla that referenced this issue Jan 15, 2024
When a host changes its ip address we should force remove
the previous endpoint state since we want only one endpoint
to refer to this host_id.

If the new node that changed its ip address is decommissioned,
the previous node seems as a normal token owner, just in shutdown
status, but it is not longer in the cluster.

Fixes scylladb#13775

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@gusev-p gusev-p self-assigned this Feb 15, 2024
@gusev-p
Copy link

gusev-p commented Feb 15, 2024

with this PR the test passes with --experimental-features consistent-topology-changes, fails without it.

@kbr-scylla
Copy link
Contributor

Nice.

I'm moving this issue out of the raft topology required backlog then, as the issue remains in gossiper-mode but is not a blocker for raft-topology

@kbr-scylla kbr-scylla assigned kbr-scylla and kostja and unassigned kostja and gusev-p Feb 15, 2024
@bhalevy
Copy link
Member Author

bhalevy commented Feb 16, 2024

How about the dtest for this issue?
Is it currently skipped with a require marker (see scylladb/scylla-dtest@6cdce217ca17241dbefe96d0dd128a3b37ab82e)
But we better run it with consistent topology changes, can you please change the marker to @pytest.mark.required_features("consistent-topology-changes")?

@kbr-scylla
Copy link
Contributor

But we better run it with consistent topology changes, can you please change the marker to @pytest.mark.required_features("consistent-topology-changes")?

cc @temichus @aleksbykov
can you please add the marker?
and a comment referencing this issue.

@kbr-scylla
Copy link
Contributor

cc @temichus @aleksbykov
can you please add the marker?
and a comment referencing this issue.

ping

@temichus
Copy link
Contributor

temichus commented Mar 8, 2024

cc @temichus @aleksbykov
can you please add the marker?
and a comment referencing this issue.

ping

I've created pr https://github.com/scylladb/scylla-dtest/pull/4043

@kbr-scylla kbr-scylla modified the milestones: 6.0, 6.0.1 May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants