New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet SIGSEV: panic: runtime error: invalid memory address or nil pointer dereference #59969

Closed
jforman opened this Issue Feb 16, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@jforman

jforman commented Feb 16, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

A completely new k8s v1.9.3 cluster (3 controller nodes, 5 worker nodes) following kelseyhightower's kubernetes-the-hard-way tutorial running on CoreOS Beta shows the following kubelet crashes on all workers. The runtime is rkt (using the rktlet implmentation from [1]). No pods have been installed,

1: https://github.com/kubernetes-incubator/rktlet

Feb 16 11:43:48 corea-worker0.obfuscated.doman.net kubelet[8851]: I0216 11:43:48.995028 8851 manager.go:1027] Destroyed container: "/system.slice/etcd-member.service" (aliases: [], namespace: "")
Feb 16 11:43:48 corea-worker0.obfuscated.doman.net kubelet[8851]: I0216 11:43:48.996088 8851 handler.go:325] Added event &{/system.slice/etcd-member.service 2018-02-16 11:43:48.996072745 +0000 UTC m=+12.554381793 containerDeletion {}}
Feb 16 11:43:48 corea-worker0.obfuscated.doman.net kubelet[8851]: I0216 11:43:48.997557 8851 factory.go:112] Using factory "rkt" for container "/system.slice/etcd-member.service"
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: I0216 11:43:49.000003 8851 manager.go:970] Added container: "/system.slice/etcd-member.service" (aliases: [ rkt://055ea29c-6312-4c9f-9b9f-dae188494863], namespace: "rkt")
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: I0216 11:43:49.000422 8851 handler.go:325] Added event &{/system.slice/etcd-member.service 2018-02-16 11:41:17.261966082 +0000 UTC containerCreation {}}
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: panic: runtime error: invalid memory address or nil pointer dereference
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1906a40]
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: goroutine 317 [running]:
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/rkt.(*rktContainerHandler).Start(0xc4203b75e0)
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/rkt/handler.go:186 +0x30
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).housekeeping(0xc4204ab680)
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:429 +0x4b
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: created by k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).Start
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net kubelet[8851]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:106 +0x3f
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net rkt[727]: 2018/02/16 11:43:49 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:15441->127.0.0.1:47244: read: connection reset by peer
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net systemd[1]: kubelet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 16 11:43:49 corea-worker0.obfuscated.doman.net systemd[1]: kubelet.service: Failed with result 'exit-code'.

What you expected to happen:

The kubelet not to crash and to stay up.

How to reproduce it (as minimally and precisely as possible):

Re-install a fresh k8s cluster. No pods, nothing fancy.

Anything else we need to know?:

core@corea-worker0 ~ $ systemctl cat kubelet.service

/etc/systemd/system/kubelet.service

[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/opt/kubernetes/bin/kubelet
--alsologtostderr
--allow-privileged=true
--anonymous-auth=false
--authorization-mode=Webhook
--cgroup-driver=systemd
--client-ca-file=/var/lib/kubernetes/ca.pem
--cloud-provider=
--cluster-dns=10.122.0.10
--cluster-domain=cluster.local
--cni-conf-dir=/etc/rkt/net.d/
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/rktlet.sock
--image-pull-progress-deadline=2m
--image-service-endpoint=unix:///var/run/rktlet.sock
--kubeconfig=/var/lib/kubelet/kubeconfig
--network-plugin=cni
--pod-cidr=10.123.58.0/24
--register-node=true
--runtime-request-timeout=15m
--tls-cert-file=/var/lib/kubelet/corea-worker0.obfuscated.doman.net.pem
--tls-private-key-file=/var/lib/kubelet/corea-worker0.obfuscated.doman.net-key.pem
--v=4
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration: CoreOS Beta channel running as VMs via libvirt on a Ubuntu LTS amd64.

  • OS (e.g. from /etc/os-release):
    core@corea-worker0 ~ $ cat /etc/os-release
    NAME="Container Linux by CoreOS"
    ID=coreos
    VERSION=1662.1.0
    VERSION_ID=1662.1.0
    BUILD_ID=2018-02-01-0107
    PRETTY_NAME="Container Linux by CoreOS 1662.1.0 (Ladybug)"
    ANSI_COLOR="38;5;75"
    HOME_URL="https://coreos.com/"
    BUG_REPORT_URL="https://issues.coreos.com"
    COREOS_BOARD="amd64-usr"

  • Kernel (e.g. uname -a):
    core@corea-worker0 ~ $ uname -a
    Linux corea-worker0.obfuscated.domain.net 4.14.15-coreos #1 SMP Thu Feb 1 00:43:57 UTC 2018 x86_64 Intel Core Processor (Haswell, no TSX) GenuineIntel GNU/Linux

  • Install tools:

  • Others:

@jforman

This comment has been minimized.

jforman commented Feb 16, 2018

@kubernetes/sig-node-kubelet

@jforman

This comment has been minimized.

jforman commented Feb 16, 2018

@kubernetes/sig-node-bugs

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Feb 16, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 16, 2018

@jforman: Reiterating the mentions to trigger a notification:
@kubernetes/sig-node-bugs

In response to this:

@kubernetes/sig-node-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jforman

This comment has been minimized.

jforman commented Feb 16, 2018

I think the nil reference (seen here):
Feb 16 13:47:01 corea-worker0.obfuscated.domain.net kubelet[11348]: I0216 13:47:01.505887 11348 handler.go:325] Added event &{/system.slice/etcd-member.service 2018-02-16 13:14:38.244696583 +0000 UTC containerCreation {}}

Is coming from

glog.V(4).Infof("Added event %v", e)

@fejta-bot

This comment has been minimized.

fejta-bot commented May 17, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

fejta-bot commented Jun 16, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

fejta-bot commented Jul 16, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment