New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet - 1.10.3-00 - segmentation violation code=0x2 addr=0x14c7ba40 pc=0x14c7ba40 #64234

Closed
jaredledvina opened this Issue May 23, 2018 · 18 comments

Comments

Projects
None yet
@jaredledvina

jaredledvina commented May 23, 2018

/kind bug

What happened:
Brand new HypriotOS 1.9.0 install on a RaspberryPi 3 Model B. Installed kubeadm, kubectl, and kubelet per https://kubernetes.io/docs/tasks/tools/install-kubeadm/. kubelet seg faults with:

# kubelet
unexpected fault address 0x14c7ba40
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x14c7ba40 pc=0x14c7ba40]

goroutine 1 [running, locked to thread]:
runtime.throw(0x2a84a9e, 0x5)
    /usr/local/go/src/runtime/panic.go:605 +0x70 fp=0x15401e98 sp=0x15401e8c pc=0x3efa4
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:374 +0x1cc fp=0x15401ebc sp=0x15401e98 pc=0x5517c
k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types.SemVer.Empty(...)
    /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types/semver.go:68
k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types.NewSemVer(0x1501ad90, 0x20945b4, 0x2a8fbcf, 0xb, 0x14c75860)
    /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types/semver.go:41 +0x90 fp=0x15401f58 sp=0x15401ec0 pc=0x206c8d8

goroutine 19 [chan receive]:
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x4551f48)
    /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:879 +0x70
created by k8s.io/kubernetes/vendor/github.com/golang/glog.init.0
    /workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:410 +0x1a0

goroutine 74 [syscall]:
os/signal.signal_recv(0x2bd146c)
    /usr/local/go/src/runtime/sigqueue.go:131 +0x134
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:22 +0x14
created by os/signal.init.0
    /usr/local/go/src/os/signal/signal_unix.go:28 +0x30

What you expected to happen:
kubelet not to Seg Fault

How to reproduce it (as minimally and precisely as possible):
I believe any new install on ARMv7 hosts should experience this seg fault.

Anything else we need to know?:
Downgrading to kubelet-1.10.2-00 resolves this issue:

HypriotOS/armv7: root@node-01 in ~
# aptitude install kubelet=1.10.2-00
The following packages will be DOWNGRADED:
  kubelet
0 packages upgraded, 0 newly installed, 1 downgraded, 0 to remove and 8 not upgraded.
Need to get 18.8 MB of archives. After unpacking 17.4 kB will be freed.
Get: 1 https://packages.cloud.google.com/apt kubernetes-xenial/main armhf kubelet armhf 1.10.2-00 [18.8 MB]
Fetched 18.8 MB in 4s (4,217 kB/s)
dpkg: warning: downgrading kubelet from 1.10.3-00 to 1.10.2-00
(Reading database ... 28358 files and directories currently installed.)
Preparing to unpack .../kubelet_1.10.2-00_armhf.deb ...
Unpacking kubelet (1.10.2-00) over (1.10.3-00) ...
Setting up kubelet (1.10.2-00) ...

Current status: 9 (+1) upgradable.
HypriotOS/armv7: root@node-01 in ~
# kubelet version
I0523 21:57:31.150300   30733 feature_gate.go:226] feature gates: &{{} map[]}
W0523 21:57:31.178232   30733 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
W0523 21:57:31.201477   30733 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup.
I0523 21:57:31.201662   30733 server.go:376] Version: v1.10.2
I0523 21:57:31.201928   30733 feature_gate.go:226] feature gates: &{{} map[]}
I0523 21:57:31.202449   30733 plugins.go:89] No cloud provider specified.
W0523 21:57:31.202662   30733 server.go:517] standalone mode, no API client
E0523 21:57:32.528095   30733 machine.go:194] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or directory
W0523 21:57:32.534699   30733 server.go:433] No api server defined - no events will be sent to API server.
I0523 21:57:32.534825   30733 server.go:613] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
I0523 21:57:32.536163   30733 container_manager_linux.go:242] container manager verified user specified cgroup-root exists: /
I0523 21:57:32.536235   30733 container_manager_linux.go:247] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true}
I0523 21:57:32.536786   30733 container_manager_linux.go:266] Creating device plugin manager: true
I0523 21:57:32.536935   30733 state_mem.go:36] [cpumanager] initializing new in-memory state store
I0523 21:57:32.537355   30733 state_mem.go:84] [cpumanager] updated default cpuset: ""
I0523 21:57:32.537417   30733 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
W0523 21:57:32.552003   30733 kubelet_network.go:139] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
I0523 21:57:32.552113   30733 kubelet.go:556] Hairpin mode set to "hairpin-veth"
I0523 21:57:32.558134   30733 client.go:75] Connecting to docker on unix:///var/run/docker.sock
I0523 21:57:32.558253   30733 client.go:104] Start docker client with request timeout=2m0s
W0523 21:57:32.563692   30733 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
W0523 21:57:32.578788   30733 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup.
I0523 21:57:32.585451   30733 docker_service.go:244] Docker cri networking managed by kubernetes.io/no-op
I0523 21:57:32.841211   30733 docker_service.go:249] Docker Info: &{ID:WVMA:DPEU:ROCE:6ZVD:B33N:3FEE:6SQ4:F3ZP:HHL7:FYAN:SJ45:VED6 Containers:4 ContainersRunning:0 ContainersPaused:0 ContainersStopped:4 Images:1 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:29 OomKillDisable:true NGoroutines:46 SystemTime:2018-05-23T21:57:32.805853873Z LoggingDriver:json-file CgroupDriver:cgroupfs NEventsListener:0 KernelVersion:4.14.34-hypriotos-v7+ OperatingSystem:Raspbian GNU/Linux 9 (stretch) OSType:linux Architecture:armv7l IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0x16244d00 NCPU:4 MemTotal:1024184320 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:node-01 Labels:[] ExperimentalBuild:false ServerVersion:18.04.0-ce ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil>} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:773c489c9c1b21a6d78b5c538cd395416ec50f88 Expected:773c489c9c1b21a6d78b5c538cd395416ec50f88} RuncCommit:{ID:4fc53a81fb7c994640722ac585fa9ca548971871 Expected:4fc53a81fb7c994640722ac585fa9ca548971871} InitCommit:{ID:949e6fa Expected:949e6fa} SecurityOptions:[name=seccomp,profile=default]}
I0523 21:57:32.841659   30733 docker_service.go:262] Setting cgroupDriver to cgroupfs
I0523 21:57:32.962625   30733 remote_runtime.go:43] Connecting to runtime service unix:///var/run/dockershim.sock
I0523 21:57:32.968666   30733 kuberuntime_manager.go:186] Container runtime docker initialized, version: 18.04.0-ce, apiVersion: 1.37.0
I0523 21:57:32.970512   30733 csi_plugin.go:61] kubernetes.io/csi: plugin initializing...
F0523 21:57:32.977761   30733 server.go:157] listen tcp 0.0.0.0:10255: bind: address already in use

Environment:

  • Kubernetes version: Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/arm"}
  • Cloud provider or hardware configuration: RaspberryPi 3 Model B
  • OS (e.g. from /etc/os-release):
# cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
HYPRIOT_OS="HypriotOS/armhf"
HYPRIOT_OS_VERSION="v2.0.1"
HYPRIOT_DEVICE="Raspberry Pi"
HYPRIOT_IMAGE_VERSION="v1.9.0"
  • Kernel (e.g. uname -a): Linux node-01 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux

Happy to provide any additional information that can help! I'm also game to test this out as well. All 5 of my Raspberry Pi's hit this and simply downgrading to 1.10.2 works, I assume it's something funky with the 1.10.3 ARM package.

@jaredledvina

This comment has been minimized.

jaredledvina commented May 23, 2018

To add, the error, F0523 21:57:32.977761 30733 server.go:157] listen tcp 0.0.0.0:10255: bind: address already in use is expected. I was attempting to show that now, after downgrading to 1.10.2, kubelet works and does not immediately seg fault. Hopefully, that clears that up.

@jaredledvina

This comment has been minimized.

jaredledvina commented May 23, 2018

Based on https://github.com/kubernetes/community/blob/master/sig-node/README.md, I think
@kubernetes/sig-node-bugs is the right SIG for this.

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels May 23, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented May 23, 2018

@jaredledvina: Reiterating the mentions to trigger a notification:
@kubernetes/sig-node-bugs

In response to this:

Based on https://github.com/kubernetes/community/blob/master/sig-node/README.md, I think
@kubernetes/sig-node-bugs is the right SIG for this.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@edernucci

This comment has been minimized.

edernucci commented May 26, 2018

Same here on Raspbian, kubelet instant crash on 1.10.3-00 armhf

root@ryuk:~# kubelet 
unexpected fault address 0x15371620
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x15371620 pc=0x15371620]

goroutine 1 [running, locked to thread]:
runtime.throw(0x2a84a9e, 0x5)
	/usr/local/go/src/runtime/panic.go:605 +0x70 fp=0x15b3be98 sp=0x15b3be8c pc=0x3efa4
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:374 +0x1cc fp=0x15b3bebc sp=0x15b3be98 pc=0x5517c
k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types.SemVer.Empty(...)
	/workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types/semver.go:68
k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types.NewSemVer(0x1530f318, 0x20945b4, 0x2a8fbcf, 0xb, 0x1536ec60)
	/workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/appc/spec/schema/types/semver.go:41 +0x90 fp=0x15b3bf58 sp=0x15b3bec0 pc=0x206c8d8

goroutine 5 [chan receive]:
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x4551f48)
	/workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:879 +0x70
created by k8s.io/kubernetes/vendor/github.com/golang/glog.init.0
	/workspace/anago-v1.10.3-beta.0.74+2bba0127d85d5a/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:410 +0x1a0

goroutine 54 [syscall]:
os/signal.signal_recv(0x2bd146c)
	/usr/local/go/src/runtime/sigqueue.go:131 +0x134
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:22 +0x14
created by os/signal.init.0
	/usr/local/go/src/os/signal/signal_unix.go:28 +0x30
@tvinhas

This comment has been minimized.

tvinhas commented May 27, 2018

Same on Ubuntu 16.04 on either Raspberry Pi 3 or oDroid XU4

@darrenstarr

This comment has been minimized.

darrenstarr commented May 28, 2018

Same problem here on

  • Raspberry Pi 3
  • Rasbian Stretch Lite 2018-04-18

I've tested on 4 PIs, same problem.

Relevant output from "apt list --installed"

docker-ce/stretch,now 18.05.0ce3-0~raspbian armhf [installed]
kubeadm/kubernetes-xenial,now 1.10.3-00 armhf [installed]
kubectl/kubernetes-xenial,now 1.10.3-00 armhf [installed]
kubelet/kubernetes-xenial,now 1.10.3-00 armhf [installed]
kubernetes-cni/kubernetes-xenial,now 0.6.0-00 armhf [installed,automatic]

I have not repeated the error message from above, it's exactly the same

@DanielRamosAcosta

This comment has been minimized.

DanielRamosAcosta commented May 28, 2018

I'm having the same problem 😢, more details in issue #64409

@abdennour

This comment has been minimized.

abdennour commented Jun 2, 2018

same problem so far.

@brendandburns

This comment has been minimized.

Contributor

brendandburns commented Jun 3, 2018

+1, I'm seeing this as well...

Downgrading to 1.10.2 fixes things:

sudo apt-get install kubelet=1.10.2-00
@brendandburns

This comment has been minimized.

Contributor

brendandburns commented Jun 3, 2018

7rouz added a commit to 7rouz/k8s-pi-cluster-ansible that referenced this issue Jun 3, 2018

update role k8s-pi
install version 1.10.2 instead of 1.10.3 due to kubelet segfault issue
kubernetes/kubernetes#64234
@luxas

This comment has been minimized.

Member

luxas commented Jun 4, 2018

Thanks for the ping. I don't have much time to debug this, but here are possible clues where to dig:

  • v1.10.2...v1.10.3
    • Shows that something changed in kubelet wrt mount propagation (e.g. #62633)
    • Some of these changes use sync/atomic functions in the testing code, which is always a suspicious thing for 32bit platforms. Maybe something there broke it?
  • I'd also check out if something in github.com/appc/spec/schema/types/semver.go:68 can cause the problem. kubelet --version panicing is kinda strange 🤔
@Pinaute

This comment has been minimized.

Pinaute commented Jun 9, 2018

I have the same problem, the downgrade works on HypriotOS but Kubelet still crash on Raspbian.

@ab7

This comment has been minimized.

ab7 commented Jun 10, 2018

@Pinaute - Check out the solution in this thread. Downgrading to 1.9.7 worked for me on Raspbian Stretch.

@sysexcontrol

This comment has been minimized.

sysexcontrol commented Jun 17, 2018

Same on Odroid HC1: kubelet --version crashed for Ubuntu 16.04.
Downgrading with sudo apt-get install kubelet=1.10.2-00 helped here as well.

@keslerm

This comment has been minimized.

keslerm commented Jun 19, 2018

Ran into the same issue on HypriotOS and a PI 3. Downgrading Kubelet to 1.10.2 worked fine for me.

@vsliouniaev

This comment has been minimized.

vsliouniaev commented Jun 19, 2018

+1, downgrade to 1.10.2 works

@detiber

This comment has been minimized.

Member

detiber commented Jun 22, 2018

I just tested with 1.10.5 and this issue appears to now be resolved.

@jaredledvina

This comment has been minimized.

jaredledvina commented Jul 1, 2018

Yep, looks like we're good to go here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment