Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove and reset nodes during apply by setting reset: true #417

Merged
merged 9 commits into from
Sep 27, 2022
Merged

Remove and reset nodes during apply by setting reset: true #417

merged 9 commits into from
Sep 27, 2022

Conversation

0SkillAllLuck
Copy link
Contributor

@0SkillAllLuck 0SkillAllLuck commented Sep 22, 2022

Closes #396
Fixes #383

This adds the ability to uninstall single or multiple nodes from the cluster.

This is achieved by adding a reset field to the host inside of k0sctl.yaml. By default this value is set to false and can be omitted.

Controllers will be removed from Kubernetes/etcd and reset one by one. This also applies to Controller+Worker nodes.
Workers will be removed from Kubernetes and reset all at once.

@0SkillAllLuck 0SkillAllLuck changed the title Feat add uninstall Add functionality to remove nodes Sep 22, 2022
cmd/apply.go Outdated Show resolved Hide resolved
phase/uninstall_controllers.go Outdated Show resolved Hide resolved
@kke
Copy link
Contributor

kke commented Sep 23, 2022

I think this is a good alternative to #396

@0SkillAllLuck
Copy link
Contributor Author

Wanted to ask about opinions between uninstall and remove as the name for the phase and also the field in the yaml.

@kke
Copy link
Contributor

kke commented Sep 23, 2022

Maybe connect phase should ignore errors from nodes with uninstall: true and warn+remove them from config.Spec.Hosts 🤔

@0SkillAllLuck
Copy link
Contributor Author

Maybe connect phase should ignore errors from nodes with uninstall: true and warn+remove them from config.Spec.Hosts 🤔

That is actually a great idea yes

@kke
Copy link
Contributor

kke commented Sep 23, 2022

Wanted to ask about opinions between uninstall and remove as the name for the phase and also the field in the yaml.

prune and reset are also options :). Maybe leaning towards reset to match k0sctl reset and k0s reset. The phase name is a bit problematic, but I guess you could just replace the phase/reset.go with phase/reset_controllers and reset_workers and modify k0sctl reset to utilize those 🤔

Maybe k0sctl reset could skip hosts that have explicitly set reset: false (i guess this would require making the reset *bool to know if it was set or not)

@kke kke linked an issue Sep 23, 2022 that may be closed by this pull request
@kke
Copy link
Contributor

kke commented Sep 23, 2022

That is actually a great idea yes

Hmm, actually, I think the hosts still need to be drained/removed from kube even though they can't be connected for doing the k0s reset and cleanups.

@0SkillAllLuck
Copy link
Contributor Author

That is actually a great idea yes

Hmm, actually, I think the hosts still need to be drained/removed from kube even though they can't be connected for doing the k0s reset and cleanups.

I guess ignoring error and only warning about them like it does currently is good then.

@0SkillAllLuck
Copy link
Contributor Author

Wanted to ask about opinions between uninstall and remove as the name for the phase and also the field in the yaml.

prune and reset are also options :). Maybe leaning towards reset to match k0sctl reset and k0s reset. The phase name is a bit problematic, but I guess you could just replace the phase/reset.go with phase/reset_controllers and reset_workers and modify k0sctl reset to utilize those 🤔

Maybe k0sctl reset could skip hosts that have explicitly set reset: false (i guess this would require making the reset *bool to know if it was set or not)

This would also remove some redundant code. But does it make sense to even look at the reset field in the k0sctl reset command? As it's meant to destroy/reset the cluster.

@kke
Copy link
Contributor

kke commented Sep 23, 2022

I guess ignoring error and only warning about them like it does currently is good then.

Yeah but you need to ignore that error in a lot of places :( But I think it's pretty much necessary, otherwise you're going to have problems getting rid of hosts that have actually gone missing or something. Not sure if not being able to connect will make the draining + removing impossible, because the hostname isn't known.

Getting rid of a dead worker node is addressed in #396

This would also remove some redundant code. But does it make sense to even look at the reset field in the k0sctl reset command? As it's meant to destroy/reset the cluster.

Yeah not really. Not worth the trouble.

@0SkillAllLuck
Copy link
Contributor Author

Yeah but you need to ignore that error in a lot of places :( But I think it's pretty much necessary, otherwise you're going to have problems getting rid of hosts that have actually gone missing or something. Not sure if not being able to connect will make the draining + removing impossible, because the hostname isn't known.

I already ignore errors during draining and all commands. So draining and all ssh based commands would fail, but etcd leave and also cube delete node would still be executed, which would certainly leave the cluster in a better state then when those wouldn't run.

Yeah not really. Not worth the trouble.

So I'll just make reset phase set the reset flag to true for all nodes. And at the end explicitly delete the leader node by just doing what the reset phase currently does.

@kke
Copy link
Contributor

kke commented Sep 23, 2022

Sounds good to me

@0SkillAllLuck
Copy link
Contributor Author

I also noticed that certain k0s files seem to be left over on the hosts:
image

I guess this shouldn't be the case and those should be removed?

@kke
Copy link
Contributor

kke commented Sep 23, 2022

  • /usr/local/bin/k0s I don't think so, people do k0s reset && k0s install type of dance.
  • /etc/k0s/k0stoken I think that one can go
  • /etc/k0s yeah if it's empty.

@kke
Copy link
Contributor

kke commented Sep 23, 2022

The connection library also leaves behind an empty dir after service cleanup but I never got around to fixing that k0sproject/rig#42

@0SkillAllLuck
Copy link
Contributor Author

0SkillAllLuck commented Sep 23, 2022

Maybe having a flag like --clean-hosts that removes those leftover files would make sense?
But that should probably be done in a separate PR or Issue

@kke
Copy link
Contributor

kke commented Sep 23, 2022

Better not add too much complexity. Just decide to clean or not clean. I think leaving behind stuff that affects/breaks a new install is unacceptable, but leaving behind empty dirs and such is just a bit unsanitary but doesn't do any real harm.

When you uninstall something like nginx via apt remove, I think it will leave behind anything you have laying around in /etc/nginx/ and wherever, since those could be of some importance to the user and the installation didn't put them there.

@0SkillAllLuck
Copy link
Contributor Author

I think if this PR is merged this note can be remove from the README: Nodes can't be removed

@0SkillAllLuck
Copy link
Contributor Author

Better not add too much complexity. Just decide to clean or not clean. I think leaving behind stuff that affects/breaks a new install is unacceptable, but leaving behind empty dirs and such is just a bit unsanitary but doesn't do any real harm.

When you uninstall something like nginx via apt remove, I think it will leave behind anything you have laying around in /etc/nginx/ and wherever, since those could be of some importance to the user and the installation didn't put them there.

Yes, also if you're reseting a host you probably gonna reinstall or destroy it if necessary which takes care of left over files.

README.md Outdated Show resolved Hide resolved
@kke kke changed the title Add functionality to remove nodes Remove and reset nodes during apply by setting reset: true Sep 27, 2022
@kke kke merged commit 5eae4c5 into k0sproject:main Sep 27, 2022
@0SkillAllLuck 0SkillAllLuck deleted the feat-add-uninstall branch September 27, 2022 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove nodes using k0sctl
2 participants