Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet not restarting pod from static manifest (v1.5.2) #40331

Closed
colemickens opened this issue Jan 23, 2017 · 19 comments
Closed

kubelet not restarting pod from static manifest (v1.5.2) #40331

colemickens opened this issue Jan 23, 2017 · 19 comments
Labels
area/usability sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@colemickens
Copy link
Contributor

Kubernetes 1.5.2. kubelet runs under docker with static manifests for kube-{apiserver,controller-manager,scheduler}.

I just tweaked the --v flag on my controller-manager manifest:

  • kubelet killed the running instance
  • kubelet failed to start a new instance

Even after sudo systemctl restart kubelet... I can see kubelet die, come back up in a new container... but it didn't restart kube-controller-manager.

It seems like it might be getting confused by a dotfile created by vim? But I don't understand why it would fail to parse the swap file as a manifest and then proceed to tear down the controller-manager container it was creating...

Jan 23 23:26:04 k8s-master-06481525-0 docker[45689]: E0123 23:26:04.872921   45713 file_linux.go:107] can't process config file "/etc/kubernetes/manifests/.kube-controller-manager.yaml.swp": /etc/kubernetes/manifests/.kube-controller-manager.yaml.swp: read 'b0VIM 7.4
Jan 23 23:26:04 k8s-master-06481525-0 docker[45689]: E0123 23:26:04.872949   45713 file_linux.go:107] can't process config file "/etc/kubernetes/manifests/.kube-controller-manager.yaml.swx": open /etc/kubernetes/manifests/.kube-controller-manager.yaml.swx: no such file or directory
Jan 23 23:26:04 k8s-master-06481525-0 docker[45689]: E0123 23:26:04.872994   45713 file_linux.go:107] can't process config file "/etc/kubernetes/manifests/.kube-controller-manager.yaml.swp": /etc/kubernetes/manifests/.kube-controller-manager.yaml.swp: read 'b0VIM 7.4
Jan 23 23:26:04 k8s-master-06481525-0 docker[45689]: E0123 23:26:04.873027   45713 file_linux.go:107] can't process config file "/etc/kubernetes/manifests/.kube-controller-manager.yaml.swp": /etc/kubernetes/manifests/.kube-controller-manager.yaml.swp: read 'b0VIM 7.4
Jan 23 23:26:05 k8s-master-06481525-0 docker[45689]: I0123 23:26:05.028598   45713 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
Jan 23 23:26:05 k8s-master-06481525-0 docker[45689]: I0123 23:26:05.028628   45713 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
Jan 23 23:26:05 k8s-master-06481525-0 docker[45689]: I0123 23:26:05.050341   45713 server.go:740] GET /stats/summary/: (36.197392ms) 200 [[Go-http-client/1.1] 10.244.1.2:54538]
Jan 23 23:26:07 k8s-master-06481525-0 docker[45689]: I0123 23:26:07.125959   45713 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
Jan 23 23:26:07 k8s-master-06481525-0 docker[45689]: I0123 23:26:07.125979   45713 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
Jan 23 23:26:08 k8s-master-06481525-0 docker[45689]: E0123 23:26:08.529693   45713 file_linux.go:107] can't process config file "/etc/kubernetes/manifests/.kube-controller-manager.yaml.swp": /etc/kubernetes/manifests/.kube-controller-manager.yaml.swp: read 'b0VIM 7.4
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: E0123 23:26:10.112774   45713 file_linux.go:107] can't process config file "/etc/kubernetes/manifests/4913": open /etc/kubernetes/manifests/4913: no such file or directory
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.113201   45713 kubelet.go:1791] SyncLoop (REMOVE, "file"): "kube-controller-manager-k8s-master-06481525-0_kube-system(eba7e523b11000e345f2732f3fbece65)"
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.116086   45713 kubelet.go:1781] SyncLoop (ADD, "file"): "kube-controller-manager-k8s-master-06481525-0_kube-system(eba7e523b11000e345f2732f3fbece65)"
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.116180   45713 kubelet_pods.go:785] Killing unwanted pod "kube-controller-manager-k8s-master-06481525-0"
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.116265   45713 docker_manager.go:1564] Killing container "d31a249743c02b04f4d75df0f5f92e2d6e6b05716fca896638d474081d0ff07b kube-controller-manager kube-system/kube-controller-manager-k8s-master-06481525-0" with 30 second grace period
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.116879   45713 kubelet.go:1788] SyncLoop (UPDATE, "file"): "kube-controller-manager-k8s-master-06481525-0_kube-system(eba7e523b11000e345f2732f3fbece65)"
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: E0123 23:26:10.132943   45713 file_linux.go:107] can't process config file "/etc/kubernetes/manifests/.kube-controller-manager.yaml.swp": /etc/kubernetes/manifests/.kube-controller-manager.yaml.swp: read 'b0VIM 7.4
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.133100   45713 kubelet.go:1791] SyncLoop (REMOVE, "file"): "kube-controller-manager-k8s-master-06481525-0_kube-system(eba7e523b11000e345f2732f3fbece65)"
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.195920   45713 docker_manager.go:1605] Container "d31a249743c02b04f4d75df0f5f92e2d6e6b05716fca896638d474081d0ff07b kube-controller-manager kube-system/kube-controller-manager-k8s-master-06481525-0" exited after 79.625032ms
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.197874   45713 docker_manager.go:1564] Killing container "4cfb50357530c73c278b9f58b96a9947d0abf86ba97f1154bd5be175701e8008 kube-system/kube-controller-manager-k8s-master-06481525-0" with 30 second grace period
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.430090   45713 docker_manager.go:1605] Container "4cfb50357530c73c278b9f58b96a9947d0abf86ba97f1154bd5be175701e8008 kube-system/kube-controller-manager-k8s-master-06481525-0" exited after 232.150122ms
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.431798   45713 mirror_client.go:84] Deleting a mirror pod "kube-controller-manager-k8s-master-06481525-0_kube-system"
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.462535   45713 kubelet.go:1797] SyncLoop (DELETE, "api"): "kube-controller-manager-k8s-master-06481525-0_kube-system(f9e774b7-e1c2-11e6-a240-000d3a309df9)"
Jan 23 23:26:10 k8s-master-06481525-0 docker[45689]: I0123 23:26:10.463488   45713 kubelet.go:1791] SyncLoop (REMOVE, "api"): "kube-controller-manager-k8s-master-06481525-0_kube-system(f9e774b7-e1c2-11e6-a240-000d3a309df9)"
Jan 23 23:26:11 k8s-master-06481525-0 docker[45689]: I0123 23:26:11.392318   45713 kubelet_node_status.go:412] Using Node Hostname from cloudprovider: "k8s-master-06481525-0"
Jan 23 23:26:12 k8s-master-06481525-0 docker[45689]: I0123 23:26:12.230654   45713 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/host-path/eba7e523b11000e345f2732f3fbece65-var-lib-kubelet" (spec.Name: "var-lib-kubelet") from pod "eba7e523b11000e345f2732f3fbece65" (UID: "eba7e523b11000e345f2732f3fbece65").
Jan 23 23:26:12 k8s-master-06481525-0 docker[45689]: I0123 23:26:12.230728   45713 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/host-path/eba7e523b11000e345f2732f3fbece65-etc-kubernetes" (spec.Name: "etc-kubernetes") from pod "eba7e523b11000e345f2732f3fbece65" (UID: "eba7e523b11000e345f2732f3fbece65").
Jan 23 23:26:12 k8s-master-06481525-0 docker[45689]: I0123 23:26:12.230803   45713 operation_executor.go:992] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/eba7e523b11000e345f2732f3fbece65-var-lib-kubelet" (OuterVolumeSpecName: "var-lib-kubelet") pod "eba7e523b11000e345f2732f3fbece65" (UID: "eba7e523b11000e345f2732f3fbece65"). InnerVolumeSpecName "var-lib-kubelet". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Jan 23 23:26:12 k8s-master-06481525-0 docker[45689]: I0123 23:26:12.230834   45713 operation_executor.go:992] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/eba7e523b11000e345f2732f3fbece65-etc-kubernetes" (OuterVolumeSpecName: "etc-kubernetes") pod "eba7e523b11000e345f2732f3fbece65" (UID: "eba7e523b11000e345f2732f3fbece65"). InnerVolumeSpecName "etc-kubernetes". PluginName "kubernetes.io/host-path", VolumeGidValue ""

Full kubelet log including a couple different attempts to modify the manifest and/or restart kubelet: http://termbin.com/xr12

@colemickens colemickens changed the title kubelet not restarting pod from static manifest kubelet not restarting pod from static manifest (v1.5.2) Jan 23, 2017
@nfons
Copy link

nfons commented Jan 25, 2017

+1 to this, as noted in #40109
and kubernetes/kubeadm#106

@colemickens
Copy link
Contributor Author

I've now seen this happen repeatedly, and have had colleagues report it as well.

@nfons
Copy link

nfons commented Feb 3, 2017

@colemickens hopefully you saw my "fix"

@alexbrand
Copy link
Contributor

alexbrand commented Feb 21, 2017

+1. Just ran into this as well. Restarting the kubelet seems to get around it.

@keontang
Copy link
Contributor

keontang commented Mar 2, 2017

I have also encountered the same problem in azure and aliyun cloud.

When i added

    - --cloud-provider=azure
    - --cloud-config=/etc/kubernetes/cloud-config.json

into hyperkube-controller.yaml by vim, controller pod failed to renew until I restarted kubelet.

@smarterclayton
Copy link
Contributor

I've seen this as well.

E0506 22:15:17.842608   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": open /etc/origin/manifests/.controller.yaml.swp: no such file or directory
E0506 22:15:17.842649   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swx": open /etc/origin/manifests/.controller.yaml.swx: no such file or directory
E0506 22:15:17.842776   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": /etc/origin/manifests/.controller.ya3210#"! U', but couldn't parse as pod(yaml: control characters are not allowed)./controller.yaml
E0506 22:15:17.843270   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": /etc/origin/manifests/.controller.ya3210#"! U', but couldn't parse as pod(yaml: control characters are not allowed)./controller.yaml
E0506 22:15:21.173467   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": /etc/origin/manifests/.controller.yaU3210#"! U', but couldn't parse as pod(yaml: control characters are not allowed).controller.yamlutf-8
E0506 22:15:31.290957   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": /etc/origin/manifests/.controller.ya�������uk_R@$�������{-          name: config      path: /data/src/github.com/openshift/origin/openshift.local.test/master  - hostPath:  volumes:    -     ports:      mountPath: /etc/origin/master    - name: config    volumeMounts:    - --config=/etc/origin/master/master-config.yaml    - --listen=0.0.0.0:8444    - controllers    - master    - start    args:    image: openshift/origin:v3.6.0-alpha.1  - name: controller  containers:spec:  name: controllersmetadata:kind: PodapiVersion: v1', but couldn't parse as pod(yaml: control characters are not allowed).
E0506 22:16:02.035084   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/4913": open /etc/origin/manifests/4913: no such file or directory
E0506 22:16:02.040717   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": /etc/origin/manifests/.controller.ya�������uk_R@$�������{-          name: config      path: /data/src/github.com/openshift/origin/openshift.local.test/master  - hostPath:  volumes:    -     ports:      mountPath: /etc/origin/master    - name: config    volumeMounts:    - --config=/etc/origin/master/master-config.yaml    - --listen=0.0.0.0:8444    - controllers    - master    - start    args:    image: openshift/origin:v3.6.0-alpha.1  - name: controller  containers:spec:  name: controllersmetadata:kind: PodapiVersion: v1', but couldn't parse as pod(yaml: control characters are not allowed).
W0506 22:16:02.059635   15282 kubelet.go:1529] Deleting mirror pod "controllers-localhost.localdomain_default(62ca911f-32a9-11e7-9467-080027893417)" because it is outdated
W

I think there's a bug where we exit early from the manifest loop and then entire goroutine wedges or dies.

@smarterclayton smarterclayton added area/usability sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 6, 2017
@smarterclayton
Copy link
Contributor

Makes it hard to debug static pods.

@ericchiang
Copy link
Contributor

E0506 22:15:17.842608   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": open /etc/origin/manifests/.controller.yaml.swp: no such file or directory
E0506 22:15:17.842776   15282 file_linux.go:113] can't process config file "/etc/origin/manifests/.controller.yaml.swp": /etc/origin/manifests/.controller.ya3210#"! U', but couldn't parse as pod(yaml: control characters are not allowed)./controller.yaml

The kubelet should be ignoring files prefixed with a dot. Seems like this might be a bug? cc @kubernetes/sig-node-bugs

@xiangpengzhao
Copy link
Contributor

I remember there was a PR fixed this. Let me find it.

@xiangpengzhao
Copy link
Contributor

similar issue #44450. fixed in #45111 which has 1.7 milestone.

@xiangpengzhao
Copy link
Contributor

another related issue #46171

@yujuhong
Copy link
Contributor

yujuhong commented Jun 5, 2017

#45111 fixed the bug where kubelet did not ignore the "." files properly. This PR was not cherry picked to 1.6.

By the way, kubelet does not include filename or source when generating the pod UID, so the temporary file generated by the editor can have the same UID as original pod. Removing the temp file causes the pod to be removed as well. This is part of what causes the confusion. I have a fix for this (#43420), but it's a pre-1.0 bug. Fixing it would causes pods to be restarted when upgrading kubelet in-place, and I'm not sure how much value it adds since the dot file issue has been fixed.

@xiangpengzhao
Copy link
Contributor

ping @thockin for closing fixed issue.

@yujuhong
Copy link
Contributor

ping @thockin for closing fixed issue.

Uh...why @thockin out of the blue...

@xiangpengzhao
Copy link
Contributor

When I helped close some other stale issues days ago, I always ping Tim to batch close them. So I also ping him again here :)

@yujuhong
Copy link
Contributor

When I helped close some other stale issues days ago, I always ping Tim to batch close them. So I also ping him again here :)

I think it makes more sense to ping the people who are already involved (e.g., on this thread) to close the issue. I don't know think we should treat @thockin like a bot ;)

@xiangpengzhao
Copy link
Contributor

:D

@yujuhong
Copy link
Contributor

:D

Oh I checked with @thockin and he did sign up to help close issues like this to encourage people to triage issues :-|

@xiangpengzhao
Copy link
Contributor

Thanks Yuju!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/usability sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants