Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unregister node from RKE2 after agent deletion #12

Closed
zifeo opened this issue Oct 8, 2021 · 11 comments
Closed

Unregister node from RKE2 after agent deletion #12

zifeo opened this issue Oct 8, 2021 · 11 comments
Labels
wontfix This will not be worked on

Comments

@zifeo
Copy link
Contributor

zifeo commented Oct 8, 2021

Currently RKE2 keeps the agent as NotReady:

NAME                     STATUS     ROLES                       AGE   VERSION
rke-cluster-blue-001     Ready      <none>                      65m   v1.21.5+rke2r2
rke-cluster-green-001    NotReady   <none>                      65m   v1.21.5+rke2r2
@remche
Copy link
Owner

remche commented Oct 11, 2021

As stated in documentation, you need to manually drain and remove node before downscaling a pool nodes.
I did not find a clean way to do it, using ssh on server node was leading to chaotic behavior.
I will be really happy if a clean implementation is proposed though ;)

@zifeo
Copy link
Contributor Author

zifeo commented Oct 11, 2021

@remche Yes, I've seen this. The issue is that even upgrades are not stable if there is no volume on agent nodes. Is there a reason/use case to use a non volume server or node?

@remche
Copy link
Owner

remche commented Oct 14, 2021

@zifeo I did not manage to reproduce this issue. Can you provide a sanitized configuration ?

Is there a reason/use case to use a non volume server or node?

In my use case I only use ephemeral volume for VM w/o any problem.

@zifeo
Copy link
Contributor Author

zifeo commented Oct 14, 2021

@remche I have experimenting based on a different setup here, I will update if I find something stable and portable.

@remche
Copy link
Owner

remche commented Oct 14, 2021

@zifeo Nice, do not hesitate to contribute back ;)
Would be very happy to find a clean autoscaling method !

@zifeo
Copy link
Contributor Author

zifeo commented Oct 14, 2021

As stated in documentation, you need to manually drain and remove node before downscaling a pool nodes.

This seems related to k3s-io/k3s#1264.

As for autoscaling, I would suggest something like orchestration_stack_v1 to keep a coherent Terraform state. This should work for simple autoscaling behaviours, but a custom/vanilla cloud provider could be written for advanced use cases.

@remche
Copy link
Owner

remche commented Oct 15, 2021

As for autoscaling, I would suggest something like orchestration_stack_v1 to keep a coherent Terraform state. This should work for simple autoscaling behaviours, but a custom/vanilla cloud provider could be written for advanced use cases.

I came to the same conclusions. Using Heat stack seems pretty hacky to me, I would prefer a custom cluster-autoscaler but it's more work :)

@zifeo
Copy link
Contributor Author

zifeo commented Oct 15, 2021

@remche I will build a poc later to see how stable it could be. The issue with a custom autoscaler is the compatibility with the TF state. A remote & shared backend maybe but this seems even more hacky.

@remche
Copy link
Owner

remche commented Oct 15, 2021

My first though would be to use a remote state supporting locking. But I'm not sure there is a way to retrieve current backend configuration in data sources...

@stale
Copy link

stale bot commented Dec 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Dec 14, 2021
@stale stale bot closed this as completed Dec 28, 2021
@zifeo
Copy link
Contributor Author

zifeo commented Jan 28, 2022

@zifeo Nice, do not hesitate to contribute back ;)
Would be very happy to find a clean autoscaling method !

Node deletion seems stable so far. I am happy to bring https://github.com/zifeo/terraform-openstack-rke2 over (merge all here), but the expose module interface is rather different. What is your point of view on this? This is why I chose to start from scratch originally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants