Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node that is restarted never reconnects to cluster #45753

Closed
sjezewski opened this issue May 12, 2017 · 6 comments
Closed

Node that is restarted never reconnects to cluster #45753

sjezewski opened this issue May 12, 2017 · 6 comments
Labels
area/kubelet lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@sjezewski
Copy link

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

No

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): restart node


Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel (e.g. uname -a): Linux ip-172-20-34-246 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux
  • Install tools: kops Version 1.6.0-beta.1 (git-77f222d)
  • Others:

What happened:

As mentioned here I need to restart a node as part of the GPU nvidia driver installation process.

However, when doing a restart (either via /sbin/shutdown -r or via the AWS UI), the node never seems to come back into the k8s cluster (it never shows up in the output of kubectl get nodes) ... UNLESS ... I kill the api server pod, e.g:

$kubectl --namespace=kube-system delete po/kube-apiserver-ip-1-2-3-4.us-west-2.compute.internal

It takes ~2-3 min for the node to show up again ... but it does show up under the output of kubectl get nodes

I don't think its just a matter of waiting. I've waited an hour after a restart and the node never re-appeared. It seems I must kill the api-server pod for the node to get detected again.

What you expected to happen:

After a node restart, the node would appear ready and part of the k8s cluster according to kubectl get nodes

How to reproduce it (as minimally and precisely as possible):

I believe its a matter of just restarting any VM. I've only tested on AWS though.

Anything else we need to know:

@huangjiasingle
Copy link

@sjezewski please make the same veriosn between client and version

@0xmichalis
Copy link
Contributor

@kubernetes/sig-node-bugs

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 21, 2017
@resouer
Copy link
Contributor

resouer commented May 24, 2017

Can you make sure kubelet is auto started after your VM is rebooted?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

6 participants