New node won't join the cluster and replace the lost node automatically #114

vladiceanu · 2020-06-02T12:54:43Z

Describe the bug
We've tried to simulate a node failure where a node gets forcefully removed and a new node is provided. But, when a new node was trying to join the cluster, we were seeing the following error message:

INFO  2020-05-15 10:24:36,589 [shard 0] init - Shutdown database started
INFO  2020-05-15 10:24:36,589 [shard 0] compaction_manager - Asked to stop
INFO  2020-05-15 10:24:36,589 [shard 0] compaction_manager - Stopped
INFO  2020-05-15 10:24:36,691 [shard 0] init - Shutdown database finished
INFO  2020-05-15 10:24:36,691 [shard 0] init - stopping prometheus API server
INFO  2020-05-15 10:24:36,691 [shard 0] init - Startup failed: std::runtime_error (A node with address 10.100.136.228 already exists, cancelling join. Use replace_address if you want to replace this node.)

where the address 10.100.136.228 is the old IP.

After kubectl exec and running nodetool removenode <node_id>, the new node was able to join the cluster, but when running nodetool gossipinfo, we are seeing the following:

/10.100.136.228 < ---- this is the node name? 
  RPC_ADDRESS:10.100.136.228
 ...
  INTERNAL_IP:172.26.29.218 <---- this is the real new IP

To Reproduce
Steps to reproduce the behavior:

Create a Scylla cluster using the scylla-operator;
Remove a node (Kubernetes node in this case) and provide a new, empty node (usually Autoscaler will do that);
See the error logs in the new pod;

Expected behavior
A (VM) node is removed from the cluster, a new empty node is available to host the Scylla Pod, the new Pod/Node joins automatically the cluster, no user action required.

Config Files
Default;

Logs
(see the description above, please let me know if additional logs required)

Environment:

Platform: EKS
Kubernetes version: 1.15.11
Scylla version: 3.2.1
Scylla-operator version: v0.1.6

The text was updated successfully, but these errors were encountered:

dahankzter · 2020-06-02T13:02:45Z

This is not implemented yet but has an issue for it #48. Scylla requires special handling if you want to reuse the IP address.

vladiceanu · 2020-06-02T13:10:46Z

Scylla requires special handling if you want to reuse the IP address.

The thing is that we don't really want to use the same IP address, we just want the new Pod to join the cluster and replace the old pod that was on a dead node. The error is misleading because the new Pod has a different IP, whereas10.100.136.228 is the IP of the old pod.

jkarjala · 2020-10-21T09:34:59Z

I ran into a related problem with the latest scylla-operator and the example EKS cluster configuration.

I am simulating a failure by terminating one Kubernetes node hosting a Scylla node via AWS API. AWS autoscaler brings up new node, and Kubernetes adopts it fine.

However, the replacement Scylla pod cannot be scheduled to the new node due to the old pod's PVC still pointing to the PV in the lost Kubernetes node. Manually deleting the PVC resets the situation, and the new pod gets scheduled and it joins the Scylla cluster and nodetool status shows it as UP.

According to kubernetes/kubernetes#61620, this situation should be managed by the (scylla) operator. Is this going to be fixed as part of this ticket (or #48), or is this a separate issue?

zimnx · 2020-10-21T09:44:20Z

@jkarjala do you have Operator logs from this situation?

jkarjala · 2020-10-21T13:11:26Z

The operator log tail attached (EC2 instance was terminated around 12:30).

scylla-op-40.log

The describe pod for the pending Scylla pod has Events:

`
Warning FailedScheduling 2m7s default-scheduler 0/6 nodes are available: 6 Insufficient cpu.

Warning FailedScheduling 49s (x4 over 99s) default-scheduler 0/7 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 6 Insufficient cpu.

Warning FailedScheduling 44s (x3 over 47s) default-scheduler 0/7 nodes are available: 1 node(s) had volume node affinity conflict, 6 Insufficient cpu.
`

Once I delete the PVC for that pod, as well as the pod itself, the new pod gets scheduled with a new PVC (pointing to the new local PV in the new node). According to the above kubernetes issue, the scylla operator should take care of this in case of node failure.

--

My previous comment claiming the new pod joins the cluster was actually false.
The scylla in the new pod does exit due to IP address conflict which seems to be the topic of #48:

ERROR 2020-10-21 13:03:17,613 [shard 0] init - Startup failed: std::runtime_error (A node with address 10.100.0.108 already exists, cancelling join. Use replace_address if you want to replace this node.)

The new EC2 node has a new IP address, but it seems the pod on Kubernetes level still has a local IP which exists.

When k8s node is gone, PVC might still have node affinity pointing to lost node. In this situation, PVC is deleted by the Operator and node replacement logic is triggered to restore cluster RF. Fixes #215 Fixes #114

When k8s node is gone, PVC might still have node affinity pointing to lost node. In this situation, PVC is deleted by the Operator and node replacement logic is triggered to restore cluster RF. Fixes #114

When k8s node is gone, PVC might still have node affinity pointing to lost node. In this situation, PVC is deleted by the Operator and node replacement logic is triggered to restore cluster RF. Fixes #114 Fixes #215

vladiceanu added the kind/bug Categorizes issue or PR as related to a bug. label Jun 2, 2020

mmatczuk added this to the 1.0 milestone Sep 22, 2020

zimnx mentioned this issue Nov 20, 2020

cluster: automatic replacement of orphaned nodes (#114 #215) #258

Merged

zimnx closed this as completed in #258 Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New node won't join the cluster and replace the lost node automatically #114

New node won't join the cluster and replace the lost node automatically #114

vladiceanu commented Jun 2, 2020

dahankzter commented Jun 2, 2020

vladiceanu commented Jun 2, 2020

jkarjala commented Oct 21, 2020

zimnx commented Oct 21, 2020

jkarjala commented Oct 21, 2020

New node won't join the cluster and replace the lost node automatically #114

New node won't join the cluster and replace the lost node automatically #114

Comments

vladiceanu commented Jun 2, 2020

dahankzter commented Jun 2, 2020

vladiceanu commented Jun 2, 2020

jkarjala commented Oct 21, 2020

zimnx commented Oct 21, 2020

jkarjala commented Oct 21, 2020