Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Auto Replace" option support for RKE2 machine pools #4449

Closed
Tracked by #3346
snasovich opened this issue Oct 26, 2021 · 9 comments
Closed
Tracked by #3346

"Auto Replace" option support for RKE2 machine pools #4449

snasovich opened this issue Oct 26, 2021 · 9 comments

Comments

@snasovich
Copy link
Contributor

Detailed Description
For RKE1-provisioned clusters, there is currently an option to specify the threshold on how many minutes a node can be unreachable in a node pool before it's automatically replaced:
image

For RKE2-provisioned clusters, we want the same functionality to be available, probably by adding a checkbox somewhere on the base pool details section highlighted below:
image

Context
This is needed for RKE2 provisioning parity with RKE1

Additional Details
Backend support is needed: rancher/rancher#35275

@paynejacob
Copy link
Contributor

ref rancher/rancher#35275

@paynejacob
Copy link
Contributor

See rancher/rancher#35916 (comment) for implementation

Support for self healing node pools was added with the pr above. The following fields were added to support this.

related: https://cluster-api.sigs.k8s.io/tasks/healthcheck.html

  • NodeStartupTimeout *metav1.Duration (Duration of time the node is given to initially become ready before it is replaced.)
  • UnhealthyNodeTimeout *metav1.Duration (How long a node can be unhealthy before it is replaced)
  • MaxUnhealthy *intstr.IntOrString (The maximum number of nodes that can be un healthy in a pool, if this is exceeded no nodes are replaced)
  • UnhealthyRange *string (A range of nodes that can be unhealthy and replaced see https://cluster-api.sigs.k8s.io/tasks/healthcheck.html#unhealthy-range for formatting)

@gaktive
Copy link
Member

gaktive commented Jan 25, 2022

Thanks @paynejacob -- does this mean that UI can being work on this as QA tests the backend?

@paynejacob
Copy link
Contributor

@gaktive yes qa can test backend and the ui work can start. Let me know if you have any questions about the api.

@catherineluse
Copy link
Contributor

catherineluse commented Feb 4, 2022

@paynejacob It looks like the UI just needs to include the UnhealthyNodeTimeout property when creating the node pool. That isn't case sensitive, right? I noticed that we don't capitalize the other properties that we send when creating the node pool. https://github.com/rancher/rancher/pull/35916/files#diff-e05181e81036cad4014bb4843ea91837fa7dd238dc3434bc4fcc598c56db6428R26

And just to confirm, should we still take the unit from the user in minutes, then convert it to seconds as the Ember UI did?

@paynejacob
Copy link
Contributor

@catherineluse it looks like I accidently capitalized it, I will have a pr up soon to make it lowercase.

@paynejacob
Copy link
Contributor

@catherineluse fixed

@Auston-Ivison-Suse
Copy link

Setup For Feature Testing
Rancher Version: v2.6-head(e55a04c)

**Steps for Reproduction: **

  1. Start creating a node driver rke2 cluster
  2. go to the advanced settings and you will see the option in the screenshot below:

AutoReplace.png

Result
Was able to successfully view the noted amount post provisioning.

Sanity Checks

  • successfully provisioned cluster
  • successfully edited the yaml to change the values before provisioning.
  • successfully edited the values post provisioning via ui
  • successfully edited the values post provisioning via yaml

@jtravee
Copy link

jtravee commented Mar 16, 2022

Confirmed with @catherineluse and @gaktive to add release note label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants