Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kops-controller stuck - node labels missing #8861

Closed
zetaab opened this issue Apr 7, 2020 · 0 comments · Fixed by #8862
Closed

kops-controller stuck - node labels missing #8861

zetaab opened this issue Apr 7, 2020 · 0 comments · Fixed by #8862
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@zetaab
Copy link
Member

zetaab commented Apr 7, 2020

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.17.0 alpha 4

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.17.3

3. What cloud provider are you using?

AWS and OpenStack

I am running quite many kops clusters and randomly I can see that kops-controller is stuck. That leads to situation that after rolling-update new normal nodes do not get correct node labels:

% kubectl get nodes
NAME                                STATUS   ROLES    AGE     VERSION
master-zone-1-1-1-rofa1-k8s-local   Ready    master   7d19h   v1.17.3
master-zone-2-1-1-rofa1-k8s-local   Ready    master   4d21h   v1.17.3
master-zone-3-1-1-rofa1-k8s-local   Ready    master   14h     v1.17.3
nodes-1-rofa1-k8s-local             Ready    <none>   14h     v1.17.3
nodes-2-rofa1-k8s-local             Ready    <none>   14h     v1.17.3
nodes-3-rofa1-k8s-local             Ready    <none>   14h     v1.17.3

I expect that kops-controller should have working liveness and readiness probes that takes care that kops-controller is healthy

from active leader from kops-controller logs I can see following:

E0407 04:36:27.040917       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-1-rofa1-k8s-local: error identifying node \"nodes-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-1-rofa1-k8s-local"}
E0407 04:36:28.062881       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-2-1-1-rofa1-k8s-local: error identifying node \"master-zone-2-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-2-1-1-rofa1-k8s-local"}
E0407 04:36:57.079896       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-3-rofa1-k8s-local: error identifying node \"nodes-3-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-3-rofa1-k8s-local"}
I0407 04:37:15.906681       1 controller.go:242] controller-runtime/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-1-1-1-rofa1-k8s-local"}
E0407 04:37:45.394319       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-2-rofa1-k8s-local: error identifying node \"nodes-2-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-2-rofa1-k8s-local"}
E0407 04:37:46.418744       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-2-1-1-rofa1-k8s-local: error identifying node \"master-zone-2-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-2-1-1-rofa1-k8s-local"}
E0407 04:38:25.014021       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-1-rofa1-k8s-local: error identifying node \"nodes-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-1-rofa1-k8s-local"}
E0407 04:41:01.864712       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-3-1-1-rofa1-k8s-local: error identifying node \"master-zone-3-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-3-1-1-rofa1-k8s-local"}
E0407 04:41:16.665254       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-3-rofa1-k8s-local: error identifying node \"nodes-3-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-3-rofa1-k8s-local"}
E0407 04:41:28.380557       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-1-rofa1-k8s-local: error identifying node \"nodes-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-1-rofa1-k8s-local"}
E0407 04:41:29.398740       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-2-1-1-rofa1-k8s-local: error identifying node \"master-zone-2-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-2-1-1-rofa1-k8s-local"}
E0407 04:41:59.333181       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-3-rofa1-k8s-local: error identifying node \"nodes-3-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-3-rofa1-k8s-local"}
I0407 04:42:17.355032       1 controller.go:242] controller-runtime/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-1-1-1-rofa1-k8s-local"}
E0407 04:42:47.103931       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-2-rofa1-k8s-local: error identifying node \"nodes-2-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-2-rofa1-k8s-local"}
E0407 04:42:59.379186       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-2-rofa1-k8s-local: error identifying node \"nodes-2-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-2-rofa1-k8s-local"}
E0407 04:44:07.741343       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-3-1-1-rofa1-k8s-local: error identifying node \"master-zone-3-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-3-1-1-rofa1-k8s-local"}
E0407 04:46:04.276976       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-3-1-1-rofa1-k8s-local: error identifying node \"master-zone-3-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-3-1-1-rofa1-k8s-local"}
E0407 04:46:29.645245       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-1-rofa1-k8s-local: error identifying node \"nodes-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-1-rofa1-k8s-local"}
E0407 04:46:31.241729       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-2-1-1-rofa1-k8s-local: error identifying node \"master-zone-2-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-2-1-1-rofa1-k8s-local"}
E0407 04:47:00.475174       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-3-rofa1-k8s-local: error identifying node \"nodes-3-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-3-rofa1-k8s-local"}
I0407 04:47:18.835877       1 controller.go:242] controller-runtime/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-1-1-1-rofa1-k8s-local"}
E0407 04:47:48.968532       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node nodes-2-rofa1-k8s-local: error identifying node \"nodes-2-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"nodes-2-rofa1-k8s-local"}
E0407 04:51:05.104917       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load instance group object for node master-zone-3-1-1-rofa1-k8s-local: error identifying node \"master-zone-3-1-1-rofa1-k8s-local\": Authentication failed"  "controller"="node" "request"={"Namespace":"","Name":"master-zone-3-1-1-rofa1-k8s-local"}

If I restart this pod - everything will start working again.

cc @justinsb

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants