Recommended Kernel and Docker Storage for 1.4 / 1.5 #874

chrislovecnm · 2016-11-12T16:45:01Z

We need to document and validate that this is resolved. kubernetes/kubernetes#30706

What kernel is recommended
What docker storage is recommended
How do we test this

chrislovecnm · 2016-11-12T16:46:50Z

@dchen1107 / @justinsb / @zmerlynn how do we get this documented and tested?

jaygorrell · 2016-11-12T17:06:44Z

@chrislovecnm posted this after discussing things with me in Slack.

Basically where I'm at is I have a kops git-a09f3a9 that was originally managing 1.3.x clusters that have been bumped to 1.4. My understanding is that the panic issue in kubernetes/kubernetes#30706 for m4 instance types is resolved in the 4.4 kernel so I'm trying to understand the best upgrade path.

Kops has an updated AMI/kernel for 4.4 but I'm not sure if that has been released yet on a stable channel. Should I be doing a cluster upgrade with my current kops version and editing the image? Should I update kops and just do an upgrade from there -- and if so, should we build from master or use Kops 1.4.1?

This scope of this issue is a bit larger (documentation and testing) but that's the root questions from my perspective.

chrislovecnm · 2016-11-12T22:26:22Z

yah ... @jaygorrell you covered about 42+ things that should be created as an issue. I jest as I just created a cluster with 42 in it and if you don't know the significance of 42 I have a book for you.

use kops 1.4.1 as it is stable
need to validate which ami is used by 1.1.4 ~ @justinsb
you need to install 1.4.4+ K8s version
do not use master, as it requires a new version of nodeup
we are improving this release process to help with some of these questions
we are improving the release notes in order to help with some of these questions
we need to test, validate, and firm up exactly what the community ami consists of.

Let me ask you.

What would be AMAZING for you? What would have helped answer all of these questions in the best OSS project you have ever seen?

chrislovecnm · 2016-11-12T22:29:35Z

Also, we need some more toys on the base image ... kubernetes-retired/kube-deploy#255

jaygorrell · 2016-11-14T19:50:28Z

That was very helpful @chrislovecnm, thank you!

As for what would make this all amazing? Simply having what you outlined readily available! We have kops versioning that almost aligns with Kubernetes versioning, making things a bit confusing. The lack of clarity around which system dictates the AMI received for 4.4 kernel is a great example of the problems around this.

Perhaps all we need is a sort of table/matrix to outline the variables related to a version of kops. It could show the default AMI, networking layer, and other settings... as well as the supported Kubernetes versions.

Ideally, someone should be able to reference it to easily see that if they have kops version 1.4.1, which Versions of Kubernetes they're able to install and what AMI would be used. Similarly, if they know they want Kubernetes 1.4.4 for security reasons, they should be able to see which versions of Kops can help with that.

krisnova · 2016-11-21T02:47:21Z

@chrislovecnm - Can we touch base on this tomorrow? Maybe after our morning call - Would like to know where we stand.

Cheers

chrislovecnm · 2016-11-21T17:11:22Z

https://docs.docker.com/engine/userguide/storagedriver/selectadriver/
~ overlayfs is still the recommended and more tested driver per @justinsb

jkemp101 · 2016-11-22T12:50:36Z

I'm eager to try overlay2. I regularly deploy a Prometheus container that is based on busybox. Busybox uses thousands of hardlinks which causes deployment to fail with Error trying v2 registry: failed to register layer: link...too many links after a handful of deployments of the same container. You quickly reach the ext4 65,000 limit on hardlinks. I have to manually run a docker rmi to remove some of the hardlinks so the docker pull will complete. So unless I'm missing something you can't deploy a busybox based container more than about 6 times with overlayfs before it will start failing.

justinsb · 2017-01-19T05:46:21Z

Here is what is validated with k8s 1.5:

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#external-dependency-version-information

The reason for the 4.4 kernel: kubernetes/kubernetes#30706

We are figuring out what to do about the recent docker security hole: kubernetes/kubernetes#40061

Also, work on validating overlay2: kubernetes/kubernetes#32536

Until then, I think recommended is:

4.4 kernel
docker 1.12.3
overlay filesystem

lcjlcj · 2017-02-09T17:02:15Z

@justinsb We were seeing minion crashing about once a day. We were on jessie 3.16, docker 1.11.2, aufs, kube 1.4.3. After seeing issue, we upgraded to jessie 4.4, docker 1.12.3, overlay, kube 1.4.3. Now crash happens every 5 minutes. Also noticed cbr0 has lots of churns. Eth0 keeps going offline and online every half minute to minute. This is on AWS m3.2xlarge.

lcjlcj · 2017-02-10T00:45:08Z

Finally captured a stacktrace here.
[50377.745293] IPv6: ADDRCONF(NETDEV_UP): vethc32763d: link is not ready
[50378.138431] eth0: renamed from vethc1d7543
[50378.156526] BUG: unable to handle kernel [50378.158396] IPv6: ADDRCONF(NETDEV_CHANGE): vethc32763d: link becomes ready
[50378.158423] docker0: port 9(vethc32763d) entered forwarding state
[50378.158438] docker0: port 9(vethc32763d) entered forwarding state
[50378.160476] NULL pointer dereference at 0000000000000050
[50378.160476] IP: [] wakeup_preempt_entity.isra.62+0x9/0x50
[50378.160476] PGD 1d7b98067 PUD 1e540c067 PMD 0
[50378.160476] Oops: 0000 [#1] SMP
[50378.160476] Modules linked in: xt_statistic(E) sch_htb(E) ebt_ip(E) ebtable_filter(E) ebtables(E) veth(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_nat(E) xt_tcpudp(E) xt_recent(E) xt_mark(E) xt_comment(E) binfmt_misc(E) overlay(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) xfrm_user(E) xfrm_algo(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E) nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) crc32c_generic(E) loop(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) crct10dif_pclmul(E) crc32_pclmul(E) hmac(E) drbg(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ppdev(E) parport_pc(E) 8250_fintek(E) ablk_helper(E) cryptd(E) evdev(E) parport(E) acpi_cpufreq(E) snd_pcsp(E) tpm_tis(E) tpm(E) snd_pcm(E) snd_timer(E) snd(E) cirrus(E) ttm(E) drm_kms_helper(E) drm(E) i2c_piix4(E) soundcore(E) processor(E) button(E) serio_raw(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) xor(E) raid6_pq(E) dm_mod(E) ata_generic(E) xen_netfront(E) xen_blkfront(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) scsi_mod(E) fjes(E)
[50378.160476] CPU: 1 PID: 2551 Comm: docker-containe Tainted: G E 4.4.41-k8s #1
[50378.160476] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
[50378.160476] task: ffff88013213ee40 ti: ffff8801e55bc000 task.ti: ffff8801e55bc000
[50378.160476] RIP: 0010:[] [] wakeup_preempt_entity.isra.62+0x9/0x50
[50378.160476] RSP: 0018:ffff8801e55bf828 EFLAGS: 00010086
[50378.160476] RAX: ffff880137528100 RBX: 0000a0ca01034375 RCX: ffff8800eaffa4c0
[50378.160476] RDX: ffff8801efc35e30 RSI: 0000000000000000 RDI: 0000a0ca01034375
[50378.160476] RBP: 0000000000000000 R08: 0000000000004000 R09: ffff8800eaeb1e01
[50378.160476] R10: 00000000000043dc R11: 0000000000000000 R12: 0000000000000000
[50378.160476] R13: 0000000000000000 R14: ffff8801efc35dc0 R15: 0000000000000001
[50378.160476] FS: 00007fc2ef475700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000
[50378.160476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50378.160476] CR2: 0000000000000050 CR3: 00000001cf57c000 CR4: 00000000001406e0
[50378.160476] Stack:
[50378.160476] ffff8800eafa4a00 ffffffff810aa3b0 ffff8800eafa4a00 0000000000000000
[50378.160476] 0000000000015dc0 0000000000000000 ffffffff810b332f ffff88013213ee40
[50378.160476] ffff8801efc35dc0 0000000000016840 0000000000015dc0 ffff8801efc35e30
[50378.160476] Call Trace:
[50378.160476] [] ? pick_next_entity+0x70/0x140
[50378.160476] [] ? pick_next_task_fair+0x30f/0x4a0
[50378.160476] [] ? __schedule+0xdf/0x960
[50378.160476] [] ? schedule+0x31/0x80
[50378.160476] [] ? schedule_hrtimeout_range_clock+0xa1/0x120
[50378.160476] [] ? hrtimer_init+0x100/0x100
[50378.160476] [] ? schedule_hrtimeout_range_clock+0x94/0x120
[50378.160476] [] ? poll_schedule_timeout+0x45/0x60
[50378.160476] [] ? do_select+0x57b/0x7d0
[50378.160476] [] ? find_busiest_group+0x3e/0x4f0
[50378.160476] [] ? cpumask_next_and+0x2a/0x40
[50378.160476] [] ? update_curr+0xba/0x130
[50378.160476] [] ? set_next_entity+0x71/0x7d0
[50378.160476] [] ? update_curr+0x55/0x130
[50378.160476] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[50378.160476] [] ? finish_task_switch+0x6d/0x230
[50378.160476] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[50378.160476] [] ? hrtimer_try_to_cancel+0xc7/0x120
[50378.160476] [] ? hrtimer_cancel+0x15/0x20
[50378.160476] [] ? futex_wait+0x1e6/0x260
[50378.160476] [] ? hrtimer_init+0x100/0x100
[50378.160476] [] ? core_sys_select+0x19c/0x2a0
[50378.160476] [] ? do_futex+0x110/0xb50
[50378.160476] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[50378.160476] [] ? xen_clocksource_get_cycles+0x11/0x20
[50378.160476] [] ? ktime_get_ts64+0x3f/0xf0
[50378.160476] [] ? xen_clocksource_get_cycles+0x11/0x20
[50378.160476] [] ? ktime_get_ts64+0x3f/0xf0
[50378.160476] [] ? SyS_select+0xba/0x110
[50378.160476] [] ? entry_SYSCALL_64_fastpath+0x16/0x75
[50378.160476] Code: 5b e9 3c f2 ff ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f7 e9 e3 fb ff ff 0f 1f 00 0f 1f 44 00 00 53 48 89 fb <48> 2b 5e 50 48 85 db 7e 2c 48 81 3e 00 04 00 00 8b 05 91 88 9a
[50378.160476] RIP [] wakeup_preempt_entity.isra.62+0x9/0x50
[50378.160476] RSP
[50378.160476] CR2: 0000000000000050
[50378.160476] ---[ end trace 6910e79e3636a2b9 ]---
[50378.160476] Kernel panic - not syncing: Fatal exception
[50378.160476] Shutting down cpus with NMI
[50378.160476] Kernel Offset: disabled

chrislovecnm · 2017-02-10T01:16:33Z

@justinsb any ideas on this?

justinsb · 2017-02-10T15:21:35Z

Sorry about the trouble @lcjlcj :

Are you capturing the panics from aws ec2 get-console-output? That tends to have them very reliably, whereas systemd/journald is not very good at capturing them. For example:
aws ec2 get-console-output --instance-id i-0d91498bdd6160a51 --query Output --output text | less
Which version of kops are you running? (Though I don't think we fixed anything here). Also technically docker 1.12 is not validated with kube 1.4, so I'm wondering how you got kops to do this?
You said the machines are crashing every five minutes. Is that across your whole cluster, or per instance (the uptime you posted was ~14 hours, I think)? And then are the crashes happening on a particular instance or on multiple instances?
Are you doing anything unusual? Let's assume that other people are not hitting the same problem (it hasn't been reported, though that doesn't mean they are not!) What would your guesses be as to how you are different? For example, rapidly churning pods etc

zhan849 · 2017-02-10T17:15:01Z

@justinsb I was with @lcjlcj and we can stably reproduce the same aws console output by simply submitting short running jobs to kubernetes.
For your questions:

yes. we are doing continuous monitoring of entire cluster, all nodes
We are still using kube-up, with some of our own patches (not changing the way node is bootstrapped). What is the major difference in node setup (i.e. docker, network, etc) in kops? And as you said, its not verified with 1.4, is there major difference in the interaction between kubernetes/docker/kernel for 1.4 and 1.5+?
@lcjlcj would have better answers as 5min frequency happened on one of his test clusters 2 days ago. but we do see same kernel panic trace here and there
Yes, we are rapidly churning pods (i.e. using short running jobs): we are in a start up and our product is to automate e2e devops checkout-test-build-release-deployment pipeline. and one thing you might also be interested in was when pods gets up/down frequently, or when we are actively submitting jobs, kube-apiserver can get OOMKilled almost immediately (i.e. master dstat shows memory usage can burst from ~5G to >50G).
Is kubernetes designed to serve long-running services more?

BTW, kernel log about frequent bridge up-down (anything to do with kube-proxy?):
This snapshot clearly shows that kernel suddenly got into this crazy bridge/port up-down problem and kernel gets crashed shortly after

[   42.963853] ip_tables: (C) 2000-2006 Netfilter Core Team


Debian GNU/Linux 8 ip-10-144-6-157 ttyS0

ip-10-144-6-157 login: [   43.132533] Initializing XFRM netlink socket
[   43.163426] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[   48.410844] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[  142.049429] nr_pdflush_threads exported in /proc is scheduled for removal
[ 3821.872597] cbr0: port 13(veth00981fee) entered disabled state
[ 3821.877896] device veth00981fee left promiscuous mode
[ 3821.881761] cbr0: port 13(veth00981fee) entered disabled state
[ 3893.414006] device vethef136ae entered promiscuous mode
[ 3893.416804] IPv6: ADDRCONF(NETDEV_UP): vethef136ae: link is not ready
[ 3894.636325] eth0: renamed from veth7c38c0b
[ 3894.700455] IPv6: ADDRCONF(NETDEV_CHANGE): vethef136ae: link becomes ready
[ 3894.704328] br-30b3057eb136: port 1(vethef136ae) entered forwarding state
[ 3894.707982] br-30b3057eb136: port 1(vethef136ae) entered forwarding state
[ 3894.711244] IPv6: ADDRCONF(NETDEV_CHANGE): br-30b3057eb136: link becomes ready
[ 3895.700560] device veth49ea6ee entered promiscuous mode
[ 3895.703818] IPv6: ADDRCONF(NETDEV_UP): veth49ea6ee: link is not ready
[ 3898.132340] eth0: renamed from veth78dc38f
[ 3898.200528] IPv6: ADDRCONF(NETDEV_CHANGE): veth49ea6ee: link becomes ready
[ 3898.204388] br-30b3057eb136: port 2(veth49ea6ee) entered forwarding state
[ 3898.208144] br-30b3057eb136: port 2(veth49ea6ee) entered forwarding state
[ 3899.552897] device veth59d804a entered promiscuous mode
[ 3899.555529] IPv6: ADDRCONF(NETDEV_UP): veth59d804a: link is not ready
[ 3901.328348] eth0: renamed from veth114ad86
[ 3901.400239] IPv6: ADDRCONF(NETDEV_CHANGE): veth59d804a: link becomes ready
[ 3901.404335] br-30b3057eb136: port 3(veth59d804a) entered forwarding state
[ 3901.408387] br-30b3057eb136: port 3(veth59d804a) entered forwarding state
[ 3909.724068] br-30b3057eb136: port 1(vethef136ae) entered forwarding state
[ 3913.244064] br-30b3057eb136: port 2(veth49ea6ee) entered forwarding state
[ 3916.444068] br-30b3057eb136: port 3(veth59d804a) entered forwarding state
[ 3931.402774] device veth0cefb44 entered promiscuous mode
[ 3931.405833] IPv6: ADDRCONF(NETDEV_UP): veth0cefb44: link is not ready
[ 3932.712422] eth0: renamed from veth73a5174
[ 3932.728364] IPv6: ADDRCONF(NETDEV_CHANGE): veth0cefb44: link becomes ready
[ 3932.731786] br-85515d4f0886: port 1(veth0cefb44) entered forwarding state
[ 3932.735132] br-85515d4f0886: port 1(veth0cefb44) entered forwarding state
[ 3932.738384] IPv6: ADDRCONF(NETDEV_CHANGE): br-85515d4f0886: link becomes ready
[ 3933.404302] device vethbfc4168 entered promiscuous mode
[ 3933.406817] IPv6: ADDRCONF(NETDEV_UP): vethbfc4168: link is not ready
[ 3933.409818] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3933.413014] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3933.728096] br-85515d4f0886: port 2(vethbfc4168) entered disabled state
[ 3936.332373] eth0: renamed from veth2193da6
[ 3936.500381] IPv6: ADDRCONF(NETDEV_CHANGE): vethbfc4168: link becomes ready
[ 3936.505944] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3936.510545] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3937.604774] device vethb6fc8f9 entered promiscuous mode
[ 3937.607405] IPv6: ADDRCONF(NETDEV_UP): vethb6fc8f9: link is not ready
[ 3939.436406] eth0: renamed from veth2ab24da
[ 3939.452349] IPv6: ADDRCONF(NETDEV_CHANGE): vethb6fc8f9: link becomes ready
[ 3939.455583] br-85515d4f0886: port 3(vethb6fc8f9) entered forwarding state
[ 3939.458692] br-85515d4f0886: port 3(vethb6fc8f9) entered forwarding state
[ 3947.740069] br-85515d4f0886: port 1(veth0cefb44) entered forwarding state
[ 3951.516072] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3954.460069] br-85515d4f0886: port 3(vethb6fc8f9) entered forwarding state
[ 3954.869277] cbr0: port 3(veth8dd8f8b7) entered disabled state
[ 3954.875749] device veth8dd8f8b7 left promiscuous mode
[ 3954.878700] cbr0: port 3(veth8dd8f8b7) entered disabled state
[ 3960.004144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[ 3960.008059] IP: [<ffffffff810b332f>] pick_next_task_fair+0x30f/0x4a0
[ 3960.008059] PGD 6e7bd7067 PUD 72813c067 PMD 0 
[ 3960.008059] Oops: 0000 [#1] SMP 
[ 3960.008059] Modules linked in: xt_statistic(E) xt_nat(E) xt_recent(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) sch_htb(E) ebt_ip(E) ebtable_filter(E) ebtables(E) veth(E) xt_mark(E) xt_comment(E) binfmt_misc(E) overlay(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) xfrm_user(E) xfrm_algo(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E) nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) crc32c_generic(E) loop(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) crct10dif_pclmul(E) crc32_pclmul(E) hmac(E) drbg(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ppdev(E) ablk_helper(E) cryptd(E) evdev(E) cirrus(E) ttm(E) snd_pcsp(E) drm_kms_helper(E) snd_pcm(E) acpi_cpufreq(E) snd_timer(E) parport_pc(E) tpm_tis(E) 8250_fintek(E) snd(E) i2c_piix4(E) parport(E) tpm(E) soundcore(E) drm(E) serio_raw(E) processor(E) button(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) xor(E) raid6_pq(E) dm_mod(E) ata_generic(E) xen_netfront(E) xen_blkfront(E) ata_piix(E) libata(E) crc32c_intel(E) psmouse(E) scsi_mod(E) fjes(E)
[ 3960.008059] CPU: 4 PID: 10158 Comm: mysql_tzinfo_to Tainted: G            E   4.4.41-k8s #1
[ 3960.008059] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
[ 3960.008059] task: ffff8807578fae00 ti: ffff88075f028000 task.ti: ffff88075f028000
[ 3960.008059] RIP: 0010:[<ffffffff810b332f>]  [<ffffffff810b332f>] pick_next_task_fair+0x30f/0x4a0
[ 3960.008059] RSP: 0018:ffff88075f02be38  EFLAGS: 00010046
[ 3960.008059] RAX: 0000000000000000 RBX: ffff8807250ff400 RCX: 0000000000000000
[ 3960.008059] RDX: ffff88078fc95e30 RSI: 0000000000000000 RDI: ffff8807250ff400
[ 3960.008059] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88076bc13700
[ 3960.008059] R10: 0000000000001cf7 R11: ffffea001c98a100 R12: 0000000000015dc0
[ 3960.008059] R13: 0000000000000000 R14: ffff88078fc95dc0 R15: 0000000000000004
[ 3960.008059] FS:  00007fa34b7f6740(0000) GS:ffff88078fc80000(0000) knlGS:0000000000000000
[ 3960.008059] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3960.008059] CR2: 0000000000000080 CR3: 000000067762d000 CR4: 00000000001406e0
[ 3960.008059] Stack:
[ 3960.008059]  ffff8807578fae00 0000000000001000 0000000200000000 0000000000015dc0
[ 3960.008059]  ffff88078fc95e30 00007fa34b7fc000 000000005ef04228 ffff88078fc95dc0
[ 3960.008059]  ffff8807578fae00 0000000000015dc0 0000000000000000 ffff8807578fb2a0
[ 3960.008059] Call Trace:
[ 3960.008059]  [<ffffffff8159cd1f>] ? __schedule+0xdf/0x960
[ 3960.008059]  [<ffffffff8159d5d1>] ? schedule+0x31/0x80
[ 3960.008059]  [<ffffffff810031cb>] ? exit_to_usermode_loop+0x6b/0xc0
[ 3960.008059]  [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
[ 3960.008059]  [<ffffffff815a1518>] ? int_ret_from_sys_call+0x25/0x8f
[ 3960.008059] Code: c6 44 24 17 00 eb 4d 48 8b 5c 24 20 eb 29 31 ed 48 89 df e8 04 a2 ff ff 84 c0 0f 85 99 fd ff ff 48 89 df 48 89 ee e8 11 70 ff ff <48> 8b 98 80 00 00 00 48 85 db 74 57 48 8b 6b 38 48 85 ed 74 e0 
[ 3960.008059] RIP  [<ffffffff810b332f>] pick_next_task_fair+0x30f/0x4a0
[ 3960.008059]  RSP <ffff88075f02be38>
[ 3960.008059] CR2: 0000000000000080
[ 3960.008059] ---[ end trace e1b9f0775b83e8e3 ]---
[ 3960.008059] Kernel panic - not syncing: Fatal exception
[ 3960.008059] Shutting down cpus with NMI
[ 3960.008059] Kernel Offset: disabled
[    0.000000] Initializing cgroup subsys cpuset

Thanks

lcjlcj · 2017-02-11T02:52:32Z

@justinsb Thanks for quick response. We were trying to debug the problem and finally determined that it's more likely a kernel issue. We saw many kernel backtraces. Most of them were in CFS scheduler area, pick_next_task_fair+0x30f/0x4a0 or wakeup_preempt_entity.isra.62+0x9/0x50. We specified cpu resource limit for quite some pods. Those limits are small (at 0.1 - 0.2 cores per pod). This probably caused some complication between CFS and cgroup limit.
We disabled cpu limit temporarily and system is stable now.
I believe kubernetes maps 0.1 cores to {cfs_period_us: 100000, cfs_quota_us: 10000}. With some moderate number of pods (less than 100 pods per minion on m3.2xlarge AWS), kernel would panic consistently within minutes. We will file a bug for kernel. In the mean time, it might be good document kuebrnetes CPU resource limit best practice.

justinsb · 2017-02-11T03:02:05Z

Hi - was pondering. AFAIK there's no known issue with what you're doing - I was just guessing on the high rate of pod churn because, well, the panic is in the scheduler!

So we have some great leads:

CFS / cgroup limits (much stronger than my guesses!)
Some configuration difference between kube-up & kops (there definitely are differences, for example we set different sysctls, but I don't think they would make a difference)
Something in docker 1.12 with k8s 1.4 (but kernel panics should never be possible)

I agree that we're triggering a kernel bug. I didn't see any known issues in the same function, but I can also build a newer kernel, though it would probably be purely speculative.

I'm going to cc @kubernetes/sig-node-bugs as this could well be kops/aws specific, but it does seem like it is related to resource limits. @dchen1107 let us know if we should copy this bug into kubernetes/kubernetes!

On the apiserver memory ballooning, I would definitely open an issue on kubernetes/kubernetes. The apiserver will use more memory if there are more pods, and we do retain recent pods for a period of time, so I believe it follows that high churn of pods => more memory. But I honestly just don't know enough of the details here to say if what you are seeing is "expected" - it feels excessive, but I could well be wrong.

zhan849 · 2017-02-11T04:45:40Z

@justinsb thanks for filing the bug about apiserver memory usage. I can provide more details about our workload and configuration of apiserver. (we took the numbers provided in kubernetes source code into consideration, i.e., set target-ran-mb to 60MB per 30Pods)

zytek · 2017-02-15T14:37:59Z

@chrislovecnm

https://docs.docker.com/engine/userguide/storagedriver/selectadriver/
~ overlayfs is still the recommended and more tested driver per @justinsb

how does this compare to the fact that kops provisioned kubernetes clusters use devicemapper by default ?
Related to #1731 probably

mutemule · 2017-04-11T01:24:27Z

We've been running into a kernel oops that's fairly similar to #874 (comment):

[61778.046591] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[61778.054546] IP: [<ffffffff810b84a2>] wakeup_preempt_entity.isra.55+0x12/0x60
[61778.061672] PGD 2035f76067 PUD 2035f77067 PMD 0
[61778.066368] Oops: 0000 [#1] SMP
[61778.069644] Modules linked in: binfmt_misc nf_conntrack_netlink veth xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_raw xt_multiport ip_set_hash_net ip_set nfnetlink xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc overlay intel_rapl x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul mei_me glue_helper sb_edac ablk_helper joydev input_leds cryptd mei lpc_ich ioatdma edac_core shpchp 8250_fintek ipmi_ssif acpi_power_meter mac_hid ipmi_si ipmi_devintf ipmi_msghandler coretemp bonding autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 ixgbe ast i2c_algo_bit dca ttm vxlan ip6_udp_tunnel drm_kms_helper udp_tunnel syscopyarea ptp sysfillrect hid_generic sysimgblt fb_sys_fops usbhid ahci pps_core drm hid libahci mdio wmi fjes
[61778.167824] CPU: 19 PID: 104 Comm: migration/19 Tainted: G        W       4.4.59-1-custom #1
[61778.176917] Hardware name: Supermicro SYS-2028TP-HTTR/X10DRT-PT
[61778.184705] task: ffff881038d78000 ti: ffff881038d78318 task.ti: ffff881038d78318
[61778.192223] RIP: 0010:[<ffffffff810b84a2>]  [<ffffffff810b84a2>] wakeup_preempt_entity.isra.55+0x12/0x60
[61778.201770] RSP: 0000:ffffc9000c9ebd68  EFLAGS: 00010086
[61778.207111] RAX: ffff88102410b000 RBX: 000002b26d33296d RCX: 0000000000146d08
[61778.214280] RDX: 0000002b89334993 RSI: 0000000000000000 RDI: 000002b26d33296d
[61778.221450] RBP: ffffc9000c9ebd78 R08: 0000000000000013 R09: 0000000000000000
[61778.228617] R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
[61778.235786] R13: 0000000000000000 R14: ffff88103fd73940 R15: ffff88103fd73940
[61778.242955] FS:  0000000000000000(0000) GS:ffff88103fd60000(0000) knlGS:0000000000000000
[61778.251082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[61778.256857] CR2: 0000000000000050 CR3: 0000002036b76000 CR4: 00000000003606f0
[61778.264025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[61778.271192] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[61778.278361] Stack:
[61778.280384]  ffff88102a6c5200 0000000000000000 ffffc9000c9ebda8 ffffffff810b8562
[61778.287887]  ffff88102a6c5400 ffff88102a6c5200 ffffffff81a18c20 ffff88103fd73940
[61778.295383]  ffffc9000c9ebe18 ffffffff810c08cf ffff88103fd739b0 0000000000013940
[61778.302881] Call Trace:
[61778.305348]  [<ffffffff810b8562>] pick_next_entity+0x72/0x130
[61778.311129]  [<ffffffff810c08cf>] pick_next_task_fair+0x7f/0x500
[61778.317173]  [<ffffffff8188af63>] __schedule+0x473/0x7d0
[61778.322516]  [<ffffffff8188b2f9>] schedule+0x39/0x80
[61778.327508]  [<ffffffff810a5e0d>] smpboot_thread_fn+0xcd/0x180
[61778.333370]  [<ffffffff810a5d40>] ? sort_range+0x30/0x30
[61778.338715]  [<ffffffff810a20b9>] kthread+0xd9/0xf0
[61778.343619]  [<ffffffff810a1fe0>] ? kthread_create_on_node+0x190/0x190
[61778.350181]  [<ffffffff8188fa6e>] ret_from_fork+0x3e/0x70
[61778.355609]  [<ffffffff810a1fe0>] ? kthread_create_on_node+0x190/0x190
[61778.362174] Code: 89 e5 53 48 89 f3 48 89 df e8 7b fb ff ff 5b 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 f4 53 48 89 fb <49> 2b 5c 24 50 48 85 db 7e 30 49 81 3c 24 00 04 00 00 8b 05 46
[61778.383928] RIP  [<ffffffff810b84a2>] wakeup_preempt_entity.isra.55+0x12/0x60
[61778.392892]  RSP <ffffc9000c9ebd68>
[61778.398176] CR2: 0000000000000050
[61778.409187] ---[ end trace bd67486fd1f7583f ]---

This is a custom kernel based on 4.4.59, but the issue traces back to wakup_preempt_entity in kernel/sched/fair.c. This is called from pick_next_entity, which in turn is called from pick_next_task_fair.

I haven't been able to prove this, but my initial hunch is that in pick_next_entity, __pick_first_entity(cfs_req) returns NULL, so the attempt to ensure left is assigned doesn't achieve the desired impact. I've never mucked about in the scheduler before, so there's a very good chance I'm way off.

daxtens · 2017-05-04T00:59:59Z

I've seen similar problems, eventually traced to the 4.4 kernel missing a couple of patches. My results are at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1687512

chrislovecnm · 2017-05-04T02:56:16Z

Which patches?

chrislovecnm · 2017-05-04T02:57:58Z

http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz

And

http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz

@daxtens are these released

justinsb · 2017-05-04T03:30:02Z

Can we copy this into a bug in kubernetes/kubernetes? @kubernetes/sig-node-bugs will want to validate.

k8s-ci-robot · 2017-05-04T03:30:03Z

@justinsb: These labels do not exist in this repository: sig/node.

In response to this:

Can we copy this into a bug in kubernetes/kubernetes? @kubernetes/sig-node-bugs will want to validate.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

daxtens · 2017-05-04T05:29:40Z

Hi @chrislovecnm,

Yes, these are in every kernel since 4.7 (see git tag --contains 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7)

They're not currently in the Ubuntu Xenial kernel (which is 4.4 based), but they will be eventually. You could try the HWE kernel which will be more recent.

bpineau · 2017-05-28T09:29:50Z

For reference, the two scheduler fixes (754bd598 and 094f4691) also made their way in the just released 4.4.70 kernel.

So 4.4 should be good to go.

chrislovecnm · 2017-05-28T17:46:07Z

Did anyone copy this bug over to Kubernetes/kubernetes?

chrislovecnm · 2017-05-28T17:48:11Z

cc @dchen1107

Dawn we are noticing that a couple of scheduler fixes have helped with k8s kernel panics. Do we have an issue open in core already?

zhan849 · 2017-06-27T00:26:26Z

@justinsb @chrislovecnm

We observed different kernel crash message on 4.4.41 after running Kubernetes cluster for 2 month, this is preventing docker from start up. lemme know if there is a better place for kernel related bugs. (similar issue is mentioned in kubernetes/kubernetes#23253 but that issue has been closed)

Kernel Version:

root@ip-172-20-0-9:/home/admin# uname -a
Linux ip-172-20-0-9 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux

Kernel crash message:

[5349238.836307] Call Trace:
[5349238.836408]  [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90
[5349238.836415]  [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30
[5349238.836416]  [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90
[5349238.836433]  [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50
[5349238.836454]  [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0
[5349238.836506]  [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90
[5349238.836515]  [<ffffffff810c92c2>] ? print_prefix+0x62/0x90
[5349238.836527]  [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110
[5349238.836529]  [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0
[5349238.836534]  [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540
[5349238.836574]  [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0
[5349238.836637]  [<ffffffff81168cfc>] ? printk+0x57/0x73
[5349238.837187]  [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4]
[5349238.837210]  [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4]
[5349238.837236]  [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190
[5349238.837257]  [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349238.837285]  [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4]
[5349238.837307]  [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4]
[5349238.837338]  [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349238.837347]  [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0
[5349238.837348]  [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180
[5349238.837349]  [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300
[5349238.837362]  [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[5349301.856150] INFO: rcu_sched detected stalls on CPUs/tasks:

[5349301.856207] 	7-...: (621 ticks this GP) idle=db7/140000000000000/0 softirq=291405734/291405734 fqs=1516663 
[5349301.856213] 	
[5349301.856223] (detected by 2, t=1643772 jiffies, g=333398020, c=333398019, q=2185604)
[5349301.856227] Task dump for CPU 7:
[5349301.856236] dockerd         R
[5349301.856240]   running task    
[5349301.856245]     0  1206      1 0x0000000c
[5349301.856268]  ffff00066c0a1000
[5349301.856269]  00000000400d65da
[5349301.856270]  ffffffff813ed5d0
[5349301.856270]  ffffffff81d27a00

[5349301.856271]  0000000000000030
[5349301.856273]  ffffffff81d27a00
[5349301.856276]  ffffffff813ed646
[5349301.856277]  ffffffff81cbcec0

[5349301.856277]  ffffffff813ed630
[5349301.856280]  ffffffff813e733e
[5349301.856280]  ffffffff81d27a00
[5349301.856282]  0000000000000001

[5349301.856290] Call Trace:
[5349301.856387]  [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90
[5349301.856397]  [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30
[5349301.856398]  [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90
[5349301.856416]  [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50
[5349301.856431]  [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0
[5349301.856487]  [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90
[5349301.856493]  [<ffffffff810c92c2>] ? print_prefix+0x62/0x90
[5349301.856505]  [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110
[5349301.856511]  [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0
[5349301.856513]  [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540
[5349301.856549]  [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0
[5349301.856590]  [<ffffffff81168cfc>] ? printk+0x57/0x73
[5349301.857035]  [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4]
[5349301.857058]  [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4]
[5349301.857083]  [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190
[5349301.857099]  [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349301.857111]  [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4]
[5349301.857123]  [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4]
[5349301.857134]  [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349301.857146]  [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0
[5349301.857155]  [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180
[5349301.857156]  [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300
[5349301.857179]  [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[5349364.876077] INFO: rcu_sched detected stalls on CPUs/tasks:

[5349364.876089] 	7-...: (1922 ticks this GP) idle=db7/140000000000000/0 softirq=291405734/291405734 fqs=1531627 
[5349364.876089] 	
[5349364.876095] (detected by 5, t=1659527 jiffies, g=333398020, c=333398019, q=2203695)
[5349364.876097] Task dump for CPU 7:
[5349364.876098] dockerd         R
[5349364.876099]   running task    
[5349364.876101]     0  1206      1 0x0000000c
[5349364.876109]  ffff00066c0a1000
[5349364.876109]  00000000400d65da
[5349364.876110]  ffffffff813ed5d0
[5349364.876110]  ffffffff81d27a00

[5349364.876111]  0000000000000072
[5349364.876111]  ffffffff81d27a00
[5349364.876111]  ffffffff813ed646
[5349364.876111]  ffffffff81cbceb4

[5349364.876112]  ffffffff813ed630
[5349364.876113]  ffffffff813e733e
[5349364.876114]  ffffffff81d27a00
[5349364.876116]  0000000000000001

[5349364.876118] Call Trace:
[5349364.876150]  [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90
[5349364.876155]  [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30
[5349364.876157]  [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90
[5349364.876168]  [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50
[5349364.876176]  [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0
[5349364.876192]  [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90
[5349364.876196]  [<ffffffff810c92c2>] ? print_prefix+0x62/0x90
[5349364.876199]  [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110
[5349364.876201]  [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0
[5349364.876204]  [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540
[5349364.876221]  [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0
[5349364.876242]  [<ffffffff81168cfc>] ? printk+0x57/0x73
[5349364.876557]  [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4]
[5349364.876566]  [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4]
[5349364.876581]  [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190
[5349364.876595]  [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349364.876607]  [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4]
[5349364.876619]  [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4]
[5349364.876628]  [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349364.876633]  [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0
[5349364.876634]  [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180
[5349364.876636]  [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300
[5349364.876640]  [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[5349378.904655] Detected aborted journal

[5349378.908433] systemd-journald[7132]: /dev/kmsg buffer overrun, some messages lost.
[5349380.045882] cbr0: port 1(veth924e7693) entered disabled state
[5349380.062306] device veth924e7693 left promiscuous mode
[5349380.066332] cbr0: port 1(veth924e7693) entered disabled state
[5363766.236135] EXT4-fs (dm-0): error count since last fsck: 32
[5363766.240072] EXT4-fs (dm-0): initial error at time 1492716166: ext4_do_update_inode:4652
[5363766.248073] EXT4-fs (dm-0): last error at time 1498397621: ext4_find_entry:1450: inode 9444726
[5450273.756116] EXT4-fs (dm-0): error count since last fsck: 32
[5450273.759960] EXT4-fs (dm-0): initial error at time 1492716166: ext4_do_update_inode:4652
[5450273.760604] EXT4-fs (dm-0): last error at time 1498397621: ext4_find_entry:1450: inode 9444726

daxtens · 2017-06-27T01:33:25Z

That looks like an ext4 error - "[5349378.904655] Detected aborted journal" and other EXT4 errors in the logs. Is your filesystem broken?

…

On Tue, Jun 27, 2017 at 10:26 AM, Harry Zhang ***@***.***> wrote: @justinsb <https://github.com/justinsb> @chrislovecnm <https://github.com/chrislovecnm> We observed different (from all what have been mentioned above) kernel crash message on 4.4.41 after running Kubernetes cluster for 2 month, this is preventing docker from start up. lemme know if there is a better place for kernel related bugs Kernel Version: ***@***.***:/home/admin# uname -a Linux ip-172-20-0-9 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux Kernel crash message: [5349238.836307] Call Trace: [5349238.836408] [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90 [5349238.836415] [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30 [5349238.836416] [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90 [5349238.836433] [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50 [5349238.836454] [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0 [5349238.836506] [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90 [5349238.836515] [<ffffffff810c92c2>] ? print_prefix+0x62/0x90 [5349238.836527] [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110 [5349238.836529] [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0 [5349238.836534] [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540 [5349238.836574] [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0 [5349238.836637] [<ffffffff81168cfc>] ? printk+0x57/0x73 [5349238.837187] [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4] [5349238.837210] [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4] [5349238.837236] [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190 [5349238.837257] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4] [5349238.837285] [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4] [5349238.837307] [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4] [5349238.837338] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4] [5349238.837347] [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0 [5349238.837348] [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180 [5349238.837349] [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300 [5349238.837362] [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75 [5349301.856150] INFO: rcu_sched detected stalls on CPUs/tasks: [5349301.856207] 7-...: (621 ticks this GP) idle=db7/140000000000000/0 softirq=291405734/291405734 fqs=1516663 [5349301.856213] [5349301.856223] (detected by 2, t=1643772 jiffies, g=333398020, c=333398019, q=2185604) [5349301.856227] Task dump for CPU 7: [5349301.856236] dockerd R [5349301.856240] running task [5349301.856245] 0 1206 1 0x0000000c [5349301.856268] ffff00066c0a1000 [5349301.856269] 00000000400d65da [5349301.856270] ffffffff813ed5d0 [5349301.856270] ffffffff81d27a00 [5349301.856271] 0000000000000030 [5349301.856273] ffffffff81d27a00 [5349301.856276] ffffffff813ed646 [5349301.856277] ffffffff81cbcec0 [5349301.856277] ffffffff813ed630 [5349301.856280] ffffffff813e733e [5349301.856280] ffffffff81d27a00 [5349301.856282] 0000000000000001 [5349301.856290] Call Trace: [5349301.856387] [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90 [5349301.856397] [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30 [5349301.856398] [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90 [5349301.856416] [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50 [5349301.856431] [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0 [5349301.856487] [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90 [5349301.856493] [<ffffffff810c92c2>] ? print_prefix+0x62/0x90 [5349301.856505] [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110 [5349301.856511] [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0 [5349301.856513] [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540 [5349301.856549] [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0 [5349301.856590] [<ffffffff81168cfc>] ? printk+0x57/0x73 [5349301.857035] [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4] [5349301.857058] [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4] [5349301.857083] [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190 [5349301.857099] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4] [5349301.857111] [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4] [5349301.857123] [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4] [5349301.857134] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4] [5349301.857146] [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0 [5349301.857155] [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180 [5349301.857156] [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300 [5349301.857179] [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75 [5349364.876077] INFO: rcu_sched detected stalls on CPUs/tasks: [5349364.876089] 7-...: (1922 ticks this GP) idle=db7/140000000000000/0 softirq=291405734/291405734 fqs=1531627 [5349364.876089] [5349364.876095] (detected by 5, t=1659527 jiffies, g=333398020, c=333398019, q=2203695) [5349364.876097] Task dump for CPU 7: [5349364.876098] dockerd R [5349364.876099] running task [5349364.876101] 0 1206 1 0x0000000c [5349364.876109] ffff00066c0a1000 [5349364.876109] 00000000400d65da [5349364.876110] ffffffff813ed5d0 [5349364.876110] ffffffff81d27a00 [5349364.876111] 0000000000000072 [5349364.876111] ffffffff81d27a00 [5349364.876111] ffffffff813ed646 [5349364.876111] ffffffff81cbceb4 [5349364.876112] ffffffff813ed630 [5349364.876113] ffffffff813e733e [5349364.876114] ffffffff81d27a00 [5349364.876116] 0000000000000001 [5349364.876118] Call Trace: [5349364.876150] [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90 [5349364.876155] [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30 [5349364.876157] [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90 [5349364.876168] [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50 [5349364.876176] [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0 [5349364.876192] [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90 [5349364.876196] [<ffffffff810c92c2>] ? print_prefix+0x62/0x90 [5349364.876199] [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110 [5349364.876201] [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0 [5349364.876204] [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540 [5349364.876221] [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0 [5349364.876242] [<ffffffff81168cfc>] ? printk+0x57/0x73 [5349364.876557] [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4] [5349364.876566] [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4] [5349364.876581] [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190 [5349364.876595] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4] [5349364.876607] [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4] [5349364.876619] [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4] [5349364.876628] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4] [5349364.876633] [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0 [5349364.876634] [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180 [5349364.876636] [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300 [5349364.876640] [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75 [5349378.904655] Detected aborted journal [5349378.908433] systemd-journald[7132]: /dev/kmsg buffer overrun, some messages lost. [5349380.045882] cbr0: port 1(veth924e7693) entered disabled state [5349380.062306] device veth924e7693 left promiscuous mode [5349380.066332] cbr0: port 1(veth924e7693) entered disabled state [5363766.236135] EXT4-fs (dm-0): error count since last fsck: 32 [5363766.240072] EXT4-fs (dm-0): initial error at time 1492716166: ext4_do_update_inode:4652 [5363766.248073] EXT4-fs (dm-0): last error at time 1498397621: ext4_find_entry:1450: inode 9444726 [5450273.756116] EXT4-fs (dm-0): error count since last fsck: 32 [5450273.759960] EXT4-fs (dm-0): initial error at time 1492716166: ext4_do_update_inode:4652 [5450273.760604] EXT4-fs (dm-0): last error at time 1498397621: ext4_find_entry:1450: inode 9444726 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#874 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJ5BTQF2jwn6txEV4xERCGltgWFpLBWks5sIEw8gaJpZM4Kwdlp> .

zhan849 · 2017-06-28T22:55:00Z

@daxtens looks like so. I found someone else posted similar problems and he suspected it was hypervisor problem but there was no further discussion

pierreozoux · 2017-09-07T08:31:11Z

Should we switch to overlay2 as discussed here?
We also see that on our staging cluster running kops 1.7.0 k8s 1.7.2:

failed to register layer: link /var/lib/docker/overlay/be836e6250911702549cdc77fbb598aa738f223516d0bb0872a8f0a860250edf/root/var/lib/yum/yumdb/l/de98a95d9c8b6c75f9c747096419d5e3c5017e0d-libdb-5.3.21-19.el7-x86_64/checksum_type /var/lib/docker/overlay/d4446fe19efb4a67daa5d5bd038ce29ac793baa28772d9562a989ba92442257a/tmproot592051146/var/lib/yum/yumdb/l/de98a95d9c8b6c75f9c747096419d5e3c5017e0d-libdb-5.3.21-19.el7-x86_64/checksum_type: too many links

fejta-bot · 2018-01-04T17:51:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-02-09T01:03:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-03-11T01:49:35Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Cryptophobia · 2018-04-12T17:07:11Z

Any chance of a Debian image that is using a new kernel > 4.13? We are running into docker overlayfs obscure bugs that are related to older kernels. We don't want to switch docker to auf storage driver as that is older technology.

moby/moby#19647

Cryptophobia · 2018-04-12T17:10:33Z

/reopen

k8s-ci-robot · 2018-04-12T17:10:34Z

@Cryptophobia: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chrislovecnm assigned justinsb and krisnova Nov 12, 2016

chrislovecnm added focus/stability P0 labels Nov 12, 2016

justinsb added this to the 1.5.0 milestone Dec 28, 2016

justinsb modified the milestones: 1.5.1, 1.5.0 Jan 19, 2017

justinsb mentioned this issue May 4, 2017

Test failures caused by kernel NULL pointer dereference on debian-based CVM kubernetes/kubernetes#45368

Closed

chrislovecnm mentioned this issue Jun 29, 2017

Security Updates for Debian Images #2826

Closed

a-chernykh mentioned this issue Aug 2, 2017

k8s reports pod as "Terminated: Error" with "Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container" kubernetes/kubernetes#45626

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 9, 2018

k8s-ci-robot closed this as completed Mar 11, 2018

Recommended Kernel and Docker Storage for 1.4 / 1.5 #874

Recommended Kernel and Docker Storage for 1.4 / 1.5 #874

Comments

chrislovecnm commented Nov 12, 2016

chrislovecnm commented Nov 12, 2016

jaygorrell commented Nov 12, 2016

chrislovecnm commented Nov 12, 2016

chrislovecnm commented Nov 12, 2016

jaygorrell commented Nov 14, 2016

krisnova commented Nov 21, 2016

chrislovecnm commented Nov 21, 2016

jkemp101 commented Nov 22, 2016

justinsb commented Jan 19, 2017

lcjlcj commented Feb 9, 2017

lcjlcj commented Feb 10, 2017

chrislovecnm commented Feb 10, 2017

justinsb commented Feb 10, 2017

zhan849 commented Feb 10, 2017 • edited

lcjlcj commented Feb 11, 2017

justinsb commented Feb 11, 2017 • edited

zhan849 commented Feb 11, 2017 • edited

zytek commented Feb 15, 2017

mutemule commented Apr 11, 2017

daxtens commented May 4, 2017

chrislovecnm commented May 4, 2017

chrislovecnm commented May 4, 2017

justinsb commented May 4, 2017

k8s-ci-robot commented May 4, 2017

daxtens commented May 4, 2017

bpineau commented May 28, 2017

chrislovecnm commented May 28, 2017

chrislovecnm commented May 28, 2017

zhan849 commented Jun 27, 2017 • edited

daxtens commented Jun 27, 2017 via email

zhan849 commented Jun 28, 2017

pierreozoux commented Sep 7, 2017

fejta-bot commented Jan 4, 2018

fejta-bot commented Feb 9, 2018

fejta-bot commented Mar 11, 2018

Cryptophobia commented Apr 12, 2018

Cryptophobia commented Apr 12, 2018

k8s-ci-robot commented Apr 12, 2018

zhan849 commented Feb 10, 2017 •

edited

justinsb commented Feb 11, 2017 •

edited

zhan849 commented Feb 11, 2017 •

edited

zhan849 commented Jun 27, 2017 •

edited