Node that is restarted never reconnects to cluster #45753

sjezewski · 2017-05-12T22:47:08Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

No

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): restart node

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release):

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Kernel (e.g. uname -a): Linux ip-172-20-34-246 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux
Install tools: kops Version 1.6.0-beta.1 (git-77f222d)
Others:

What happened:

As mentioned here I need to restart a node as part of the GPU nvidia driver installation process.

However, when doing a restart (either via /sbin/shutdown -r or via the AWS UI), the node never seems to come back into the k8s cluster (it never shows up in the output of kubectl get nodes) ... UNLESS ... I kill the api server pod, e.g:

$kubectl --namespace=kube-system delete po/kube-apiserver-ip-1-2-3-4.us-west-2.compute.internal

It takes ~2-3 min for the node to show up again ... but it does show up under the output of kubectl get nodes

I don't think its just a matter of waiting. I've waited an hour after a restart and the node never re-appeared. It seems I must kill the api-server pod for the node to get detected again.

What you expected to happen:

After a node restart, the node would appear ready and part of the k8s cluster according to kubectl get nodes

How to reproduce it (as minimally and precisely as possible):

I believe its a matter of just restarting any VM. I've only tested on AWS though.

Anything else we need to know:

The text was updated successfully, but these errors were encountered:

huangjiasingle · 2017-05-16T09:17:35Z

@sjezewski please make the same veriosn between client and version

0xmichalis · 2017-05-21T10:33:21Z

@kubernetes/sig-node-bugs

resouer · 2017-05-24T08:27:08Z

Can you make sure kubelet is auto started after your VM is rebooted?

fejta-bot · 2017-12-25T02:40:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-24T03:28:26Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-23T03:34:49Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 21, 2017

resouer added the area/kubelet label May 21, 2017

sjezewski mentioned this issue Aug 4, 2017

GPU bootstrap method not setting capacity kubernetes/kops#2493

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2018

k8s-ci-robot closed this as completed Feb 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node that is restarted never reconnects to cluster #45753

Node that is restarted never reconnects to cluster #45753

sjezewski commented May 12, 2017

huangjiasingle commented May 16, 2017

0xmichalis commented May 21, 2017

resouer commented May 24, 2017

fejta-bot commented Dec 25, 2017

fejta-bot commented Jan 24, 2018

fejta-bot commented Feb 23, 2018

Node that is restarted never reconnects to cluster #45753

Node that is restarted never reconnects to cluster #45753

Comments

sjezewski commented May 12, 2017

huangjiasingle commented May 16, 2017

0xmichalis commented May 21, 2017

resouer commented May 24, 2017

fejta-bot commented Dec 25, 2017

fejta-bot commented Jan 24, 2018

fejta-bot commented Feb 23, 2018