Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm reset takes more than 50 seconds to retry deleting the last etcd member #91143

Closed
tnqn opened this issue May 15, 2020 · 4 comments
Closed
Labels
area/kubeadm kind/bug Categorizes issue or PR as related to a bug. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@tnqn
Copy link
Member

tnqn commented May 15, 2020

What happened:
When using kubeadm reset to delete the last control-plane node of a highly available Kubernetes cluster or the only node of a single control-plane cluster, it always take more than 50 seconds in the remove-etcd-member phase with a lot of Failed to remove etcd member errors.
It's impossible to remove the last/only member of an etcd cluster. There's no need to try/retry removing it as the etcd cluster will be destroyed in next phase, then a reset operation can finish in a few seconds, instead of nearly one minute.

I0515 04:22:38.005689    2780 etcd.go:250] etcd endpoints read from etcd: https://192.168.10.12:2379
I0515 04:22:38.005712    2780 etcd.go:120] update etcd endpoints: https://192.168.10.12:2379
I0515 04:22:38.005721    2780 local.go:109] [etcd] get the member id from peer: https://192.168.10.12:2380
I0515 04:22:38.013476    2780 local.go:115] [etcd] removing etcd member: https://192.168.10.12:2380, id: 11188237867079222133
{"level":"warn","ts":"2020-05-15T04:22:38.022-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:38.022187    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:38.075-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:38.075861    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:38.185-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:38.185952    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:38.399-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:38.399959    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:38.818-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:38.818367    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:39.652-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:39.653052    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:41.363-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:41.363659    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:44.585-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:44.585459    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:22:51.086-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:22:51.086407    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:23:04.011-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:23:04.011372    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-05-15T04:23:30.382-0700","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-834a096e-d413-4552-b508-f192cb26956e/192.168.10.12:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
I0515 04:23:30.382526    2780 etcd.go:329] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
W0515 04:23:30.382566    2780 removeetcdmember.go:61] [reset] failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members.Please manually remove this etcd member using etcdctl
I0515 04:23:30.382577    2780 cleanupnode.go:57] [reset] Getting init system

What you expected to happen:
kubeadm reset doesn't need to try/retry removing the only member of an etcd cluster to avoid unnecessary errors and reduce the execution time from one minute to a few seconds.

How to reproduce it (as minimally and precisely as possible):

  1. Create a single control-plane node with kubeadm init
  2. Delete the node with kubeadm reset
    It will take more than 50 seconds in remove-etcd-member phase.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.18.2
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@tnqn tnqn added the kind/bug Categorizes issue or PR as related to a bug. label May 15, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 15, 2020
@tnqn
Copy link
Member Author

tnqn commented May 15, 2020

/sig cluster-lifecycle
/area kubeadm

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. area/kubeadm and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 15, 2020
@rosti
Copy link
Contributor

rosti commented May 15, 2020

Thanks for filing this issue @tnqn !
kubeadm tracks its issues in a separate issue tracker.
I'll move this one over there for greater visibility and we can continue the discussion there.

@neolit123
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubeadm kind/bug Categorizes issue or PR as related to a bug. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests

4 participants