-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify that the replacing node is in the same DC/RACK as the node being replaced when needed #18139
Comments
One way to implement such a safeguard is: On the replacing node after it fetches a cluster the application state of the node being replaced and schema:
|
cc @mkeeneyj |
I kinda remember @bhalevy raising a similar concern. |
#16858 more accurately. |
With tablets, techically the new node has to be only in the same DC to comply with the network topology replication startegy configuration, but it better be on the same rack as well so not to screw up load balancing across racks. To keep it simple, we should require to keep both dc and rack on replace, and if there is a need for renaming it should be done separately using a different mechanism like a snitch change. |
The description suggests that this one is a duplicate of the above. I can add it to a FE Hackaton list for this R&D Summit maybe... |
Wait a second! There is even a patch already: bhalevy@1dc449c Why wasn't it merged, @bhalevy ? |
It had a dependency on another issue and that was fixed relatively recently. |
Installation details
HEAD: ce17841
Description
When one replaces a node using
replace_addres_xx
/replace_node_xxx
parameters and by mistake uses a node that will belong to a different rack or even a different DCscylla
will just allow that.As a result, if
NetworkTolopolyStrategy
is used, the ownership is going to change on nodes unrelated to the node being replaces hence even if RBNO is used the data is on those nodes is not going to be streamed to reflect the new ownership.As a result we will end with an inconsistent data and will need to run a full repair to fix that.
This is obviously not a desired behavior and such mistakes should be prevented.
We need a "safeguard" that would fail such an attempt.
The text was updated successfully, but these errors were encountered: