New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deleted node resurrected and delete button is then disabled #25242
Comments
I was able to delete the node by clicking 3 dots menu > Open in API > Delete. Still no idea why "Delete" button in the UI was disabled for that node while removing via the API worked... After that, 5 rancher containers were still running on the node (so it's a reproducible problem):
(the duplicate rancher-agent is perhaps an another problem - I didn't notice when it came by but based on "9 minutes" it was apparently created after I ran I deleted all containers with |
rancher/rancher:v2.3.3 I was able to reproduce this.
|
rancher/rancher:v2.4.5 I also was able to reproduce this. But this time I removed an ETCD node. Deploy Rancher v2.4.5 |
@IlyaSemenov thank you very much for you workaround using the I have almost exactly the same problem as you:
|
Happened 2 times in our setup and couldn't find an explanation/pattern for this until now. Confirm that the ... menu -> open in API -> delete method works |
We just experienced the same issue. The workaround using the API directly worked as well. |
There is a technical reason as to why this happens. We have a When you delete a node from the cluster but that node isn't properly cleaned (whether due to a Rancher bug or some other reason i.e. split brain), it is still running a kubelet that can re-register with the kubernetes cluster (creating a new The actual code for this can be found here: https://github.com/rancher/rancher/blob/v2.3.8/pkg/controllers/user/nodesyncer/nodessyncer.go#L617 |
@Oats87 So as a workaround, for a node that gets deleted from the cluster, would nuking the kubelet container be sufficient to prevent this from happening, as it would prevent the kubelet from creating a new Node object? Superficially, I don't think it's a particularly bad thing to have the Rancher UI "resurrect" a deleted node, particularly if the reason it's doing that is simply to stay in sync with the underlying K8s state. What is bad, is the fact that when this happens, it creates a Node (v3) object that can't be deleted in the UI. So whatever is auto-vivifying the Node (v3) object must be doing something differently from what happens when a node is added via the UI. I imagine if those two processes are reconciled, then the "resurrected" node would be deletable in the UI. |
I'd actually take it a step further and "clean" the node, we have some documentation around that here: https://rancher.com/docs/rancher/v2.x/en/cluster-admin/cleaning-cluster-nodes/ We have an open issue around this here: #26545 |
I tried to delete the nodes with some external scripts, when that happened, rancher got lost and I was not able to drain, cordon or delete the nodes. The solution was to use the CLI, connect to the cluster and perform a kubectl delete node. |
Just happened with me, 2.5.5, solution as to use API. |
Closing in favor of #26545 |
just met the same issue. |
What kind of request is this (question/bug/enhancement/feature request): bug?
Steps to reproduce (least amount of steps as possible):
Result:
After step 3, the "deleted" node still runs a few rancher-related containers, namely:
After step 4 (rebooting the node):
Other details that may be helpful:
I am somewhat "okay" with the node failing to remove itself completely. I could trash the remaining containers myself to prevent it from raising from the dead, no big deal. But I can't even delete the node again anymore! The delete button is disabled for whatever reason.
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): 2.3.3 (when the issue happened), 2.3.5 (now - I upgraded to see if anything changes)Cluster information
kubectl version
):docker version
):The text was updated successfully, but these errors were encountered: