Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support graceful kube-apiserver shutdown as part of "kubeadm reset" to better support removing control plane nodes from a cluster #2978

Closed
invidian opened this issue Nov 29, 2023 · 6 comments · Fixed by kubernetes/website#45146
Labels
area/controlplane kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Milestone

Comments

@invidian
Copy link
Member

Right now, kubeadm provides reset subcommand, which removes related Kubernetes control plane artifacts from a machine (among other things), but there is no specific documentation on how to remove control plane nodes from a cluster. One may assume that kubeadm reset could be suitable for this task, as it also for example attempts to remove reset machine from cluster's etcd cluster.

If kubeadm reset is intended to be used for control plane nodes removal, it could do a better job at gracefully shutting down kube-apiserver itself, as it is done with etcd member, making it better for use in production environments.

kube-apiserver itself supports graceful shutdown via flags like --shutdown-delay-duration, --shutdown-send-retry-after, --shutdown-watch-termination-grace-period duration, so when kube-apiserver receives termination signal, /readyz probes will start responding with 500 error, which can be used as a trigger for load balancers to remove a given machine from the load balancing pool.

However, using those flags with kubeadm reset is currently not trivial, as at least one way to work with it is the following:

  1. Patch /etc/kubernetes/manifests/kube-apiserver.yaml file e.g. to remove all command arguments from kube-apiserver container to trigger graceful shutdown and prevent kube-apiserver from starting again.
  2. Wait until kube-apiserver process shuts down.
  3. Run kubeadm reset.

Perhaps kubeadm could do the patching and waiting part?

@neolit123 neolit123 added kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Nov 29, 2023
@neolit123 neolit123 added this to the v1.30 milestone Nov 29, 2023
@neolit123
Copy link
Member

neolit123 commented Nov 29, 2023

yes, reset is intended to stop all running containers on a node and clean it up.
i haven't seen other requests for such a feature before.

as at least one way to work with it is the following

based on demand, my take right now would be that users who want the graceful shutdown can prepare the node using any means necessary (i.e. we don't need to apply this change to "reset")

let's see if others have any comments on this topic.

@neolit123
Copy link
Member

@SataQiu @pacoxu @chendave
do you agree with my take ^ or have different comments?

@pacoxu
Copy link
Member

pacoxu commented Dec 14, 2023

https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/ is quite simple now. I prefer to add some documents about best practise of removing a control-plane in HA in https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/.

i haven't seen other requests for such a feature before.

+1
@invidian what is the current risk you met? And what problem does the current process cause? I think apiserver will be not ready once the etcd container is down or the remove-etcd-member complete.

@invidian
Copy link
Member Author

With current process, with Azure LB probes running every 5 seconds, before we shut down a machine, we run kubeadm reset to remove it gracefully from the control plane. Shutting down API server this way causes failing requests rate to increase, despite having redundant API servers, to the point that we see random cluster nodes being reported as NotReady, which causes other disruptions to the cluster. With graceful shutdown, as expected, we see much less disruption happening overall.

@neolit123
Copy link
Member

it seems that we don't have agreement to add this change to reset,
maybe it makes sense to have the graceful steps outlined in the docs here:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/

(new task page)

@invidian
WDYT and would you like to help us with such a docs PR?

@invidian
Copy link
Member Author

WDYT and would you like to help us with such a docs PR?

Sounds good I think. Let me open a PR.

invidian added a commit to invidian/website that referenced this issue Feb 15, 2024
invidian added a commit to invidian/website that referenced this issue Feb 19, 2024
To address kubernetes/kubeadm#2978.

Co-authored-by: Lubomir I. Ivanov <neolit123@gmail.com>
Co-authored-by: Tim Bannister <tim@scalefactory.com>
Andygol pushed a commit to Andygol/k8s-website that referenced this issue Mar 12, 2024
To address kubernetes/kubeadm#2978.

Co-authored-by: Lubomir I. Ivanov <neolit123@gmail.com>
Co-authored-by: Tim Bannister <tim@scalefactory.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controlplane kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants