Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes states do no change to cordon and removing during scaled down when drain on delete is enabled #5220

Closed
anupama2501 opened this issue Feb 25, 2022 · 4 comments

Comments

@anupama2501
Copy link

Setup

Rancher version: 2.6-head 8c785a1
Browser type & version: Chrome

Describe the bug
Nodes status should change from removing >> cordoned >> deleted when drain on delete option is enabled on the node driver pools. Nodes status changed from active >> deleted and the node is deleted in both cluster management page and the cluster explorer >> cluster >> nodes page.

To Reproduce

  • Create an rke2 node driver cluster with 3 node pools 1 for each role with 3 worker, 3 etcd, 2 cp nodes
  • Enable drain on delete feature for each of the node pools.
  • From cluster management >> machine pools >> select the worker node w1 >> scale down

Result
The state on the node changes to deleted from active and the node is deleted

Expected Result
From either cluster management page or explorer >> cluster >> nodes page:
The state on the node is expected to change from active >> removing >> cordoned >> deleted

Additional context
Related issue: rancher/rancher#36782

@richard-cox
Copy link
Member

Given that we've restricted cordoning / draining state to the kube node list (cluster explorer / cluster / node page) those states should only be showing up there.

I saw some interest behaviour when enabling Drain Before Delete for the machine pool containing the worker machines though. It seemed to sequentially replace them one by one. After waiting for this process to finish (all three replaced) I could scale down one of the new machines (via the machine's action menu on the right, not the pool scale buttons)

  1. The machine went into a Deleting state and it's associated kube node in the cluster explorer / cluster / node page went into a Cordoned state
  2. After about 10 seconds both machine and kube node were removed

This behaviour seems correct, doesn't quite go through all the stages in the issue description but some may not be seen due to speed (deleting --> deleted --> removed).

@anupama2501 Could the issue you're seeing be related to scaling down nodes that weren't recreated following the change to the machine pools Drain Before Delete

@anupama2501
Copy link
Author

anupama2501 commented Mar 10, 2022

Hi @richard-cox thank you for the detailed write up.

I saw some interest behaviour when enabling Drain Before Delete for the machine pool containing the worker machines though. It seemed to sequentially replace them one by one. After waiting for this process to finish (all three replaced) I could scale down one of the new machines (via the machine's action menu on the right, not the pool scale buttons)

This is the expected behavior from this comment here rancher/rancher#35274 (comment)

Could the issue you're seeing be related to scaling down nodes that weren't recreated following the change to the machine pools Drain Before Delete

Could you elaborate on following the change part?

I retried the scenario - I have enabled drain on delete option while creating the clusters. I do see the nodes go into cordoned state and the machines were then scaled down.

@richard-cox
Copy link
Member

@anupama2501 I wondered if the original issue of the nodes not showing as cordoned might be due to

  • creating the cluster first without drain on delete enabled
  • enabled drain on delete after the cluster has come up
  • (nodes will start to sequentially be recreated)
  • manually scaling down a specific a node that has not yet been recreated

If you're seeing this working now though when setting drain on delete when the cluster is created ... then all is good. Also thanks for clarifying the expected behaviour on changing a deployment, very helpful.

@anupama2501
Copy link
Author

Closing as I see the expected behavior noted in this comment #5220 (comment)

@zube zube bot reopened this Mar 21, 2022
@zube zube bot closed this as completed Mar 21, 2022
@zube zube bot removed the [zube]: Done label Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants