New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommended Kernel and Docker Storage for 1.4 / 1.5 #874
Comments
@dchen1107 / @justinsb / @zmerlynn how do we get this documented and tested? |
@chrislovecnm posted this after discussing things with me in Slack. Basically where I'm at is I have a kops Kops has an updated AMI/kernel for 4.4 but I'm not sure if that has been released yet on a stable channel. Should I be doing a cluster upgrade with my current kops version and editing the image? Should I update kops and just do an upgrade from there -- and if so, should we build from master or use Kops 1.4.1? This scope of this issue is a bit larger (documentation and testing) but that's the root questions from my perspective. |
yah ... @jaygorrell you covered about 42+ things that should be created as an issue. I jest as I just created a cluster with 42 in it and if you don't know the significance of 42 I have a book for you.
Let me ask you. What would be AMAZING for you? What would have helped answer all of these questions in the best OSS project you have ever seen? |
Also, we need some more toys on the base image ... kubernetes-retired/kube-deploy#255 |
That was very helpful @chrislovecnm, thank you! As for what would make this all amazing? Simply having what you outlined readily available! We have kops versioning that almost aligns with Kubernetes versioning, making things a bit confusing. The lack of clarity around which system dictates the AMI received for 4.4 kernel is a great example of the problems around this. Perhaps all we need is a sort of table/matrix to outline the variables related to a version of kops. It could show the default AMI, networking layer, and other settings... as well as the supported Kubernetes versions. Ideally, someone should be able to reference it to easily see that if they have kops version 1.4.1, which Versions of Kubernetes they're able to install and what AMI would be used. Similarly, if they know they want Kubernetes 1.4.4 for security reasons, they should be able to see which versions of Kops can help with that. |
@chrislovecnm - Can we touch base on this tomorrow? Maybe after our morning call - Would like to know where we stand. Cheers |
https://docs.docker.com/engine/userguide/storagedriver/selectadriver/ |
I'm eager to try overlay2. I regularly deploy a Prometheus container that is based on busybox. Busybox uses thousands of hardlinks which causes deployment to fail with |
Here is what is validated with k8s 1.5: The reason for the 4.4 kernel: kubernetes/kubernetes#30706 We are figuring out what to do about the recent docker security hole: kubernetes/kubernetes#40061 Also, work on validating overlay2: kubernetes/kubernetes#32536 Until then, I think recommended is:
|
@justinsb We were seeing minion crashing about once a day. We were on jessie 3.16, docker 1.11.2, aufs, kube 1.4.3. After seeing issue, we upgraded to jessie 4.4, docker 1.12.3, overlay, kube 1.4.3. Now crash happens every 5 minutes. Also noticed cbr0 has lots of churns. Eth0 keeps going offline and online every half minute to minute. This is on AWS m3.2xlarge. |
Finally captured a stacktrace here. |
@justinsb any ideas on this? |
Sorry about the trouble @lcjlcj :
|
@justinsb I was with @lcjlcj and we can stably reproduce the same aws console output by simply submitting short running jobs to kubernetes.
BTW, kernel log about frequent bridge up-down (anything to do with kube-proxy?): [ 42.963853] ip_tables: (C) 2000-2006 Netfilter Core Team
Debian GNU/Linux 8 ip-10-144-6-157 ttyS0
ip-10-144-6-157 login: [ 43.132533] Initializing XFRM netlink socket
[ 43.163426] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[ 48.410844] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[ 142.049429] nr_pdflush_threads exported in /proc is scheduled for removal
[ 3821.872597] cbr0: port 13(veth00981fee) entered disabled state
[ 3821.877896] device veth00981fee left promiscuous mode
[ 3821.881761] cbr0: port 13(veth00981fee) entered disabled state
[ 3893.414006] device vethef136ae entered promiscuous mode
[ 3893.416804] IPv6: ADDRCONF(NETDEV_UP): vethef136ae: link is not ready
[ 3894.636325] eth0: renamed from veth7c38c0b
[ 3894.700455] IPv6: ADDRCONF(NETDEV_CHANGE): vethef136ae: link becomes ready
[ 3894.704328] br-30b3057eb136: port 1(vethef136ae) entered forwarding state
[ 3894.707982] br-30b3057eb136: port 1(vethef136ae) entered forwarding state
[ 3894.711244] IPv6: ADDRCONF(NETDEV_CHANGE): br-30b3057eb136: link becomes ready
[ 3895.700560] device veth49ea6ee entered promiscuous mode
[ 3895.703818] IPv6: ADDRCONF(NETDEV_UP): veth49ea6ee: link is not ready
[ 3898.132340] eth0: renamed from veth78dc38f
[ 3898.200528] IPv6: ADDRCONF(NETDEV_CHANGE): veth49ea6ee: link becomes ready
[ 3898.204388] br-30b3057eb136: port 2(veth49ea6ee) entered forwarding state
[ 3898.208144] br-30b3057eb136: port 2(veth49ea6ee) entered forwarding state
[ 3899.552897] device veth59d804a entered promiscuous mode
[ 3899.555529] IPv6: ADDRCONF(NETDEV_UP): veth59d804a: link is not ready
[ 3901.328348] eth0: renamed from veth114ad86
[ 3901.400239] IPv6: ADDRCONF(NETDEV_CHANGE): veth59d804a: link becomes ready
[ 3901.404335] br-30b3057eb136: port 3(veth59d804a) entered forwarding state
[ 3901.408387] br-30b3057eb136: port 3(veth59d804a) entered forwarding state
[ 3909.724068] br-30b3057eb136: port 1(vethef136ae) entered forwarding state
[ 3913.244064] br-30b3057eb136: port 2(veth49ea6ee) entered forwarding state
[ 3916.444068] br-30b3057eb136: port 3(veth59d804a) entered forwarding state
[ 3931.402774] device veth0cefb44 entered promiscuous mode
[ 3931.405833] IPv6: ADDRCONF(NETDEV_UP): veth0cefb44: link is not ready
[ 3932.712422] eth0: renamed from veth73a5174
[ 3932.728364] IPv6: ADDRCONF(NETDEV_CHANGE): veth0cefb44: link becomes ready
[ 3932.731786] br-85515d4f0886: port 1(veth0cefb44) entered forwarding state
[ 3932.735132] br-85515d4f0886: port 1(veth0cefb44) entered forwarding state
[ 3932.738384] IPv6: ADDRCONF(NETDEV_CHANGE): br-85515d4f0886: link becomes ready
[ 3933.404302] device vethbfc4168 entered promiscuous mode
[ 3933.406817] IPv6: ADDRCONF(NETDEV_UP): vethbfc4168: link is not ready
[ 3933.409818] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3933.413014] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3933.728096] br-85515d4f0886: port 2(vethbfc4168) entered disabled state
[ 3936.332373] eth0: renamed from veth2193da6
[ 3936.500381] IPv6: ADDRCONF(NETDEV_CHANGE): vethbfc4168: link becomes ready
[ 3936.505944] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3936.510545] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3937.604774] device vethb6fc8f9 entered promiscuous mode
[ 3937.607405] IPv6: ADDRCONF(NETDEV_UP): vethb6fc8f9: link is not ready
[ 3939.436406] eth0: renamed from veth2ab24da
[ 3939.452349] IPv6: ADDRCONF(NETDEV_CHANGE): vethb6fc8f9: link becomes ready
[ 3939.455583] br-85515d4f0886: port 3(vethb6fc8f9) entered forwarding state
[ 3939.458692] br-85515d4f0886: port 3(vethb6fc8f9) entered forwarding state
[ 3947.740069] br-85515d4f0886: port 1(veth0cefb44) entered forwarding state
[ 3951.516072] br-85515d4f0886: port 2(vethbfc4168) entered forwarding state
[ 3954.460069] br-85515d4f0886: port 3(vethb6fc8f9) entered forwarding state
[ 3954.869277] cbr0: port 3(veth8dd8f8b7) entered disabled state
[ 3954.875749] device veth8dd8f8b7 left promiscuous mode
[ 3954.878700] cbr0: port 3(veth8dd8f8b7) entered disabled state
[ 3960.004144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[ 3960.008059] IP: [<ffffffff810b332f>] pick_next_task_fair+0x30f/0x4a0
[ 3960.008059] PGD 6e7bd7067 PUD 72813c067 PMD 0
[ 3960.008059] Oops: 0000 [#1] SMP
[ 3960.008059] Modules linked in: xt_statistic(E) xt_nat(E) xt_recent(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) sch_htb(E) ebt_ip(E) ebtable_filter(E) ebtables(E) veth(E) xt_mark(E) xt_comment(E) binfmt_misc(E) overlay(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) xfrm_user(E) xfrm_algo(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E) nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) crc32c_generic(E) loop(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) crct10dif_pclmul(E) crc32_pclmul(E) hmac(E) drbg(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ppdev(E) ablk_helper(E) cryptd(E) evdev(E) cirrus(E) ttm(E) snd_pcsp(E) drm_kms_helper(E) snd_pcm(E) acpi_cpufreq(E) snd_timer(E) parport_pc(E) tpm_tis(E) 8250_fintek(E) snd(E) i2c_piix4(E) parport(E) tpm(E) soundcore(E) drm(E) serio_raw(E) processor(E) button(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) xor(E) raid6_pq(E) dm_mod(E) ata_generic(E) xen_netfront(E) xen_blkfront(E) ata_piix(E) libata(E) crc32c_intel(E) psmouse(E) scsi_mod(E) fjes(E)
[ 3960.008059] CPU: 4 PID: 10158 Comm: mysql_tzinfo_to Tainted: G E 4.4.41-k8s #1
[ 3960.008059] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
[ 3960.008059] task: ffff8807578fae00 ti: ffff88075f028000 task.ti: ffff88075f028000
[ 3960.008059] RIP: 0010:[<ffffffff810b332f>] [<ffffffff810b332f>] pick_next_task_fair+0x30f/0x4a0
[ 3960.008059] RSP: 0018:ffff88075f02be38 EFLAGS: 00010046
[ 3960.008059] RAX: 0000000000000000 RBX: ffff8807250ff400 RCX: 0000000000000000
[ 3960.008059] RDX: ffff88078fc95e30 RSI: 0000000000000000 RDI: ffff8807250ff400
[ 3960.008059] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88076bc13700
[ 3960.008059] R10: 0000000000001cf7 R11: ffffea001c98a100 R12: 0000000000015dc0
[ 3960.008059] R13: 0000000000000000 R14: ffff88078fc95dc0 R15: 0000000000000004
[ 3960.008059] FS: 00007fa34b7f6740(0000) GS:ffff88078fc80000(0000) knlGS:0000000000000000
[ 3960.008059] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3960.008059] CR2: 0000000000000080 CR3: 000000067762d000 CR4: 00000000001406e0
[ 3960.008059] Stack:
[ 3960.008059] ffff8807578fae00 0000000000001000 0000000200000000 0000000000015dc0
[ 3960.008059] ffff88078fc95e30 00007fa34b7fc000 000000005ef04228 ffff88078fc95dc0
[ 3960.008059] ffff8807578fae00 0000000000015dc0 0000000000000000 ffff8807578fb2a0
[ 3960.008059] Call Trace:
[ 3960.008059] [<ffffffff8159cd1f>] ? __schedule+0xdf/0x960
[ 3960.008059] [<ffffffff8159d5d1>] ? schedule+0x31/0x80
[ 3960.008059] [<ffffffff810031cb>] ? exit_to_usermode_loop+0x6b/0xc0
[ 3960.008059] [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
[ 3960.008059] [<ffffffff815a1518>] ? int_ret_from_sys_call+0x25/0x8f
[ 3960.008059] Code: c6 44 24 17 00 eb 4d 48 8b 5c 24 20 eb 29 31 ed 48 89 df e8 04 a2 ff ff 84 c0 0f 85 99 fd ff ff 48 89 df 48 89 ee e8 11 70 ff ff <48> 8b 98 80 00 00 00 48 85 db 74 57 48 8b 6b 38 48 85 ed 74 e0
[ 3960.008059] RIP [<ffffffff810b332f>] pick_next_task_fair+0x30f/0x4a0
[ 3960.008059] RSP <ffff88075f02be38>
[ 3960.008059] CR2: 0000000000000080
[ 3960.008059] ---[ end trace e1b9f0775b83e8e3 ]---
[ 3960.008059] Kernel panic - not syncing: Fatal exception
[ 3960.008059] Shutting down cpus with NMI
[ 3960.008059] Kernel Offset: disabled
[ 0.000000] Initializing cgroup subsys cpuset Thanks |
@justinsb Thanks for quick response. We were trying to debug the problem and finally determined that it's more likely a kernel issue. We saw many kernel backtraces. Most of them were in CFS scheduler area, pick_next_task_fair+0x30f/0x4a0 or wakeup_preempt_entity.isra.62+0x9/0x50. We specified cpu resource limit for quite some pods. Those limits are small (at 0.1 - 0.2 cores per pod). This probably caused some complication between CFS and cgroup limit. |
Hi - was pondering. AFAIK there's no known issue with what you're doing - I was just guessing on the high rate of pod churn because, well, the panic is in the scheduler! So we have some great leads:
I agree that we're triggering a kernel bug. I didn't see any known issues in the same function, but I can also build a newer kernel, though it would probably be purely speculative. I'm going to cc @kubernetes/sig-node-bugs as this could well be kops/aws specific, but it does seem like it is related to resource limits. @dchen1107 let us know if we should copy this bug into kubernetes/kubernetes! On the apiserver memory ballooning, I would definitely open an issue on kubernetes/kubernetes. The apiserver will use more memory if there are more pods, and we do retain recent pods for a period of time, so I believe it follows that high churn of pods => more memory. But I honestly just don't know enough of the details here to say if what you are seeing is "expected" - it feels excessive, but I could well be wrong. |
@justinsb thanks for filing the bug about apiserver memory usage. I can provide more details about our workload and configuration of apiserver. (we took the numbers provided in kubernetes source code into consideration, i.e., set target-ran-mb to 60MB per 30Pods) |
how does this compare to the fact that kops provisioned kubernetes clusters use devicemapper by default ? |
We've been running into a kernel oops that's fairly similar to #874 (comment):
This is a custom kernel based on 4.4.59, but the issue traces back to I haven't been able to prove this, but my initial hunch is that in |
I've seen similar problems, eventually traced to the 4.4 kernel missing a couple of patches. My results are at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1687512 |
Which patches? |
Can we copy this into a bug in kubernetes/kubernetes? @kubernetes/sig-node-bugs will want to validate. |
@justinsb: These labels do not exist in this repository: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Hi @chrislovecnm, Yes, these are in every kernel since 4.7 (see They're not currently in the Ubuntu Xenial kernel (which is 4.4 based), but they will be eventually. You could try the HWE kernel which will be more recent. |
For reference, the two scheduler fixes (754bd598 and 094f4691) also made their way in the just released 4.4.70 kernel. So 4.4 should be good to go. |
Did anyone copy this bug over to Kubernetes/kubernetes? |
cc @dchen1107 Dawn we are noticing that a couple of scheduler fixes have helped with k8s kernel panics. Do we have an issue open in core already? |
We observed different kernel crash message on 4.4.41 after running Kubernetes cluster for 2 month, this is preventing docker from start up. lemme know if there is a better place for kernel related bugs. (similar issue is mentioned in kubernetes/kubernetes#23253 but that issue has been closed) Kernel Version:
Kernel crash message:
|
That looks like an ext4 error - "[5349378.904655] Detected aborted journal"
and other EXT4 errors in the logs. Is your filesystem broken?
…On Tue, Jun 27, 2017 at 10:26 AM, Harry Zhang ***@***.***> wrote:
@justinsb <https://github.com/justinsb> @chrislovecnm
<https://github.com/chrislovecnm>
We observed different (from all what have been mentioned above) kernel
crash message on 4.4.41 after running Kubernetes cluster for 2 month, this
is preventing docker from start up. lemme know if there is a better place
for kernel related bugs
Kernel Version:
***@***.***:/home/admin# uname -a
Linux ip-172-20-0-9 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux
Kernel crash message:
[5349238.836307] Call Trace:
[5349238.836408] [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90
[5349238.836415] [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30
[5349238.836416] [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90
[5349238.836433] [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50
[5349238.836454] [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0
[5349238.836506] [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90
[5349238.836515] [<ffffffff810c92c2>] ? print_prefix+0x62/0x90
[5349238.836527] [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110
[5349238.836529] [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0
[5349238.836534] [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540
[5349238.836574] [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0
[5349238.836637] [<ffffffff81168cfc>] ? printk+0x57/0x73
[5349238.837187] [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4]
[5349238.837210] [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4]
[5349238.837236] [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190
[5349238.837257] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349238.837285] [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4]
[5349238.837307] [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4]
[5349238.837338] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349238.837347] [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0
[5349238.837348] [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180
[5349238.837349] [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300
[5349238.837362] [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[5349301.856150] INFO: rcu_sched detected stalls on CPUs/tasks:
[5349301.856207] 7-...: (621 ticks this GP) idle=db7/140000000000000/0 softirq=291405734/291405734 fqs=1516663
[5349301.856213]
[5349301.856223] (detected by 2, t=1643772 jiffies, g=333398020, c=333398019, q=2185604)
[5349301.856227] Task dump for CPU 7:
[5349301.856236] dockerd R
[5349301.856240] running task
[5349301.856245] 0 1206 1 0x0000000c
[5349301.856268] ffff00066c0a1000
[5349301.856269] 00000000400d65da
[5349301.856270] ffffffff813ed5d0
[5349301.856270] ffffffff81d27a00
[5349301.856271] 0000000000000030
[5349301.856273] ffffffff81d27a00
[5349301.856276] ffffffff813ed646
[5349301.856277] ffffffff81cbcec0
[5349301.856277] ffffffff813ed630
[5349301.856280] ffffffff813e733e
[5349301.856280] ffffffff81d27a00
[5349301.856282] 0000000000000001
[5349301.856290] Call Trace:
[5349301.856387] [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90
[5349301.856397] [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30
[5349301.856398] [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90
[5349301.856416] [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50
[5349301.856431] [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0
[5349301.856487] [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90
[5349301.856493] [<ffffffff810c92c2>] ? print_prefix+0x62/0x90
[5349301.856505] [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110
[5349301.856511] [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0
[5349301.856513] [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540
[5349301.856549] [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0
[5349301.856590] [<ffffffff81168cfc>] ? printk+0x57/0x73
[5349301.857035] [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4]
[5349301.857058] [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4]
[5349301.857083] [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190
[5349301.857099] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349301.857111] [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4]
[5349301.857123] [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4]
[5349301.857134] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349301.857146] [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0
[5349301.857155] [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180
[5349301.857156] [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300
[5349301.857179] [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[5349364.876077] INFO: rcu_sched detected stalls on CPUs/tasks:
[5349364.876089] 7-...: (1922 ticks this GP) idle=db7/140000000000000/0 softirq=291405734/291405734 fqs=1531627
[5349364.876089]
[5349364.876095] (detected by 5, t=1659527 jiffies, g=333398020, c=333398019, q=2203695)
[5349364.876097] Task dump for CPU 7:
[5349364.876098] dockerd R
[5349364.876099] running task
[5349364.876101] 0 1206 1 0x0000000c
[5349364.876109] ffff00066c0a1000
[5349364.876109] 00000000400d65da
[5349364.876110] ffffffff813ed5d0
[5349364.876110] ffffffff81d27a00
[5349364.876111] 0000000000000072
[5349364.876111] ffffffff81d27a00
[5349364.876111] ffffffff813ed646
[5349364.876111] ffffffff81cbceb4
[5349364.876112] ffffffff813ed630
[5349364.876113] ffffffff813e733e
[5349364.876114] ffffffff81d27a00
[5349364.876116] 0000000000000001
[5349364.876118] Call Trace:
[5349364.876150] [<ffffffff813ed5d0>] ? wait_for_xmitr+0x30/0x90
[5349364.876155] [<ffffffff813ed646>] ? serial8250_console_putchar+0x16/0x30
[5349364.876157] [<ffffffff813ed630>] ? wait_for_xmitr+0x90/0x90
[5349364.876168] [<ffffffff813e733e>] ? uart_console_write+0x2e/0x50
[5349364.876176] [<ffffffff813f011c>] ? serial8250_console_write+0xcc/0x2a0
[5349364.876192] [<ffffffff810c9238>] ? print_time.part.10+0x68/0x90
[5349364.876196] [<ffffffff810c92c2>] ? print_prefix+0x62/0x90
[5349364.876199] [<ffffffff810c9afe>] ? call_console_drivers.constprop.24+0xfe/0x110
[5349364.876201] [<ffffffff810cadc0>] ? console_unlock+0x2f0/0x4d0
[5349364.876204] [<ffffffff810cb356>] ? vprintk_emit+0x3b6/0x540
[5349364.876221] [<ffffffff8159d9fe>] ? out_of_line_wait_on_bit+0x7e/0xa0
[5349364.876242] [<ffffffff81168cfc>] ? printk+0x57/0x73
[5349364.876557] [<ffffffffc028c7f1>] ? ext4_commit_super+0x1d1/0x270 [ext4]
[5349364.876566] [<ffffffffc028d021>] ? __ext4_abort+0x81/0x170 [ext4]
[5349364.876581] [<ffffffff811eadfb>] ? filename_parentat+0x10b/0x190
[5349364.876595] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349364.876607] [<ffffffffc02a043e>] ? ext4_journal_check_start+0x6e/0x80 [ext4]
[5349364.876619] [<ffffffffc02a0574>] ? __ext4_journal_start_sb+0x34/0x100 [ext4]
[5349364.876628] [<ffffffffc027d40a>] ? ext4_unlink+0x1aa/0x380 [ext4]
[5349364.876633] [<ffffffff811e6d94>] ? __inode_permission+0x24/0xa0
[5349364.876634] [<ffffffff811e8227>] ? vfs_unlink+0xe7/0x180
[5349364.876636] [<ffffffff811eb8b9>] ? do_unlinkat+0x289/0x300
[5349364.876640] [<ffffffff815a13b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[5349378.904655] Detected aborted journal
[5349378.908433] systemd-journald[7132]: /dev/kmsg buffer overrun, some messages lost.
[5349380.045882] cbr0: port 1(veth924e7693) entered disabled state
[5349380.062306] device veth924e7693 left promiscuous mode
[5349380.066332] cbr0: port 1(veth924e7693) entered disabled state
[5363766.236135] EXT4-fs (dm-0): error count since last fsck: 32
[5363766.240072] EXT4-fs (dm-0): initial error at time 1492716166: ext4_do_update_inode:4652
[5363766.248073] EXT4-fs (dm-0): last error at time 1498397621: ext4_find_entry:1450: inode 9444726
[5450273.756116] EXT4-fs (dm-0): error count since last fsck: 32
[5450273.759960] EXT4-fs (dm-0): initial error at time 1492716166: ext4_do_update_inode:4652
[5450273.760604] EXT4-fs (dm-0): last error at time 1498397621: ext4_find_entry:1450: inode 9444726
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#874 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJ5BTQF2jwn6txEV4xERCGltgWFpLBWks5sIEw8gaJpZM4Kwdlp>
.
|
@daxtens looks like so. I found someone else posted similar problems and he suspected it was hypervisor problem but there was no further discussion |
Should we switch to overlay2 as discussed here?
|
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Any chance of a Debian image that is using a new kernel > 4.13? We are running into docker overlayfs obscure bugs that are related to older kernels. We don't want to switch docker to auf storage driver as that is older technology. |
/reopen |
@Cryptophobia: you can't re-open an issue/PR unless you authored it or you are assigned to it. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We need to document and validate that this is resolved. kubernetes/kubernetes#30706
The text was updated successfully, but these errors were encountered: