Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic on Debian 8 with Docker 1.12 #29397

jmcollin78 opened this issue Dec 14, 2016 · 15 comments

Kernel panic on Debian 8 with Docker 1.12 #29397

jmcollin78 opened this issue Dec 14, 2016 · 15 comments


Copy link

@jmcollin78 jmcollin78 commented Dec 14, 2016


Debian 8 box on an Openstack Juno Cloud, I've docker 1.12 installed, with run a Postgresql image. Regularly, this instance crash with kernel Panic.

Steps to reproduce the issue:

  1. create an instance on a Openstack Juno Cloud
  2. start the docker image provided here: paunin/postgresql-cluster-pgsql:latest
  3. waits for crash

Describe the results you received:
The instance crash with kernel panic with those logs:

[11176.153139] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[11176.155778] IP: [<ffffffff810a10d4>] check_preempt_wakeup+0xd4/0x1d0
[11176.157013] PGD bb2d8067 PUD b9ac2067 PMD 0 
[11176.157013] Oops: 0000 [#1] SMP 
[11176.157013] Modules linked in: ipt_MASQUERADE xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs(C) joydev hid_generic usbhid hid ppdev crc32_pclmul aesni_intel evdev aes_x86_64 lrw gf128mul glue_helper ablk_helper ttm cryptd drm_kms_helper serio_raw virtio_balloon parport_pc drm pvpanic parport processor i2c_piix4 thermal_sys i2c_core button autofs4 ext4 crc16 mbcache jbd2 ata_generic virtio_blk virtio_net ata_piix uhci_hcd crct10dif_pclmul crct10dif_common ehci_hcd crc32c_intel psmouse libata virtio_pci usbcore virtio_ring scsi_mod virtio usb_common floppy
[11176.157013] CPU: 0 PID: 23662 Comm: exe Tainted: G         C    3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2
[11176.157013] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[11176.157013] task: ffff8800bad30210 ti: ffff8800badc4000 task.ti: ffff8800badc4000
[11176.157013] RIP: 0010:[<ffffffff810a10d4>]  [<ffffffff810a10d4>] check_preempt_wakeup+0xd4/0x1d0
[11176.157013] RSP: 0000:ffff88013fc03e58  EFLAGS: 00010097
[11176.157013] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008
[11176.157013] RDX: 0000000000000000 RSI: ffff88013a87e210 RDI: ffff8800bb75e800
[11176.157013] RBP: ffff88007f8f0340 R08: ffffffff816108c0 R09: 0000000000000001
[11176.157013] R10: 0000000000020022 R11: 0000000000000010 R12: ffff8800bad30210
[11176.157013] R13: ffff88013fc12f40 R14: 0000000000000000 R15: 0000000000000000
[11176.157013] FS:  00007fe9bdc48700(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[11176.157013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11176.157013] CR2: 0000000000000078 CR3: 00000000bad3e000 CR4: 00000000000406f0
[11176.157013] Stack:
[11176.157013]  ffffffff8109ffe2 ffff88013fc12f40 ffff88013a87e210 ffff88013fc12f40
[11176.157013]  0000000000000046 0000000000000000 0000000000000000 ffffffff81095b75
[11176.157013]  ffff88013a87e210 ffffffff81095ba4 ffff88013a87e210 ffff88013fc12f40
[11176.157013] Call Trace:
[11176.157013]  <IRQ> 
[11176.157013]  [<ffffffff8109ffe2>] ? enqueue_task_fair+0x7f2/0xe20
[11176.157013]  [<ffffffff81095b75>] ? check_preempt_curr+0x85/0xa0
[11176.157013]  [<ffffffff81095ba4>] ? ttwu_do_wakeup+0x14/0xf0
[11176.157013]  [<ffffffff81098176>] ? try_to_wake_up+0x1b6/0x2f0
[11176.157013]  [<ffffffff8108bfe0>] ? hrtimer_get_res+0x50/0x50
[11176.157013]  [<ffffffff8108bffe>] ? hrtimer_wakeup+0x1e/0x30
[11176.157013]  [<ffffffff8108c667>] ? __run_hrtimer+0x67/0x210
[11176.157013]  [<ffffffff8108ca69>] ? hrtimer_interrupt+0xe9/0x220
[11176.157013]  [<ffffffff8151b46b>] ? smp_apic_timer_interrupt+0x3b/0x50
[11176.157013]  [<ffffffff815194fd>] ? apic_timer_interrupt+0x6d/0x80
[11176.157013]  <EOI> 
[11176.157013] Code: 0f 1f 80 00 00 00 00 83 e8 01 48 8b 5b 70 39 d0 75 f5 48 8b 7d 78 48 3b 7b 78 74 15 0f 1f 00 48 8b 6d 70 48 8b 5b 70 48 8b 7d 78 <48> 3b 7b 78 75 ee 48 85 ff 74 e9 e8 8c cb ff ff 48 85 db 0f 84 
[11176.157013] RIP  [<ffffffff810a10d4>] check_preempt_wakeup+0xd4/0x1d0
[11176.157013]  RSP <ffff88013fc03e58>
[11176.157013] CR2: 0000000000000078
[11176.157013] ---[ end trace 90a4d010673f1243 ]---
[11176.157013] Kernel panic - not syncing: Fatal exception in interrupt
[11176.157013] Shutting down cpus with NMI
[11176.157013] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[11176.157013] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Describe the results you expected:
No crash

Additional information you deem important (e.g. issue happens only occasionally):

root$ uname -a 
Linux etg-dbs-temp-pgcluster-0 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

Output of docker version:

root$ docker version 
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:39:14 2016
 OS/Arch:      linux/amd64

 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:39:14 2016
 OS/Arch:      linux/amd64

Output of docker info:

root$ docker info 
Containers: 4
 Running: 3
 Paused: 0
 Stopped: 1
Images: 4
Server Version: 1.12.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 34
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.873 GiB
Name: xxxxxxx
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy:
No Proxy: localhost,,
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Insecure Registries:

Additional environment details (AWS, VirtualBox, physical, etc.):
Openstack Juno environment.

Copy link

@justincormack justincormack commented Dec 16, 2016

A kernel crash is not a problem we can fix in docker, it means you need to fix the kernel, or possibly the virtualisation, as it is an emulated OpenStack machine. I recommend testing on the same kernel on a different VM and/or physical hardware, and on a more recent kernel (eg the debian backports).

Copy link

@jmcollin78 jmcollin78 commented Dec 16, 2016

I understand your point of view but only instances with Docker are doing those Kernel Panic. Same instance in same cloud with same OS but without Docker don't crash. So, I guess there is something on Docker which cause this kernel panic and this should be fixed or workaround.

Copy link

@cpuguy83 cpuguy83 commented Dec 16, 2016

@jmcollin78 Sure, something Docker is doing may be triggering the issue, but that doesn't mean docker is the cause or even that docker could work around it.

From the stack trace it looks like hardware (virtualized or otherwise) issues... but @justincormack would probably know more about that (being a Xen maintainer).

Copy link

@hdimitriou hdimitriou commented Jan 23, 2017

@cpuguy83 , @justincormack I understand your point of view, but if Kernel 3.16 crashes with Docker and you cannot do anything about it, then just state that Docker is not compatible with it.
By not saying anything about it, many people end up using it and when they try to scale they face a wall they cannot climb.
The whole Docker approach is faulty on the subject, on the commercial support ( Debian is not mentioned into the supported systems, but nowhere on the official documents can someone find the reason. Instead you find a page on how to install on Debian.

Really, it's so much easier to stop using docker than changing distribution for a non-startup company and this is plain sad for both your effort and our effort.

Copy link

@ijc ijc commented Jan 23, 2017

The stack trace and kernel version here looks identical to Debian bug #847360 to me, I'd suggest subscribing to that bug and perhaps posting there regarding your usecase and reproduction steps (since that bug seems rather light on those to me).

Copy link

@cpuguy83 cpuguy83 commented Jan 23, 2017

@hdimitriou You say "regularly" in the original post. What does this mean? When the container is running after some time the kernel panics? The kernel panics exactly when the container starts?

I understand it's frustrating to run into an issue like this that seemingly blocks everything you are trying to do.

Copy link

@hdimitriou hdimitriou commented Jan 23, 2017

@cpuguy83 I did not write the original ticket and I do not want to hijack it. I have just noticed a significant number of tickets that refer to panics with kernel 3.16, after suffering from such an issue repeatedly. I haven't seen a resolution of the issue in any of those tickets without using a newer kernel.
As a result, I wonder if you should note down somewhere that there are unsolved issues while running Docker under 3.16 kernel, for people who consider production usage.

Sorry again for taking attention from the original issue

Copy link

@jmcollin78 jmcollin78 commented Jan 23, 2017

Hi @hdimitriou , @cpuguy83 , @justincormack thank's for your effort to try to help with this issue. I subscribe to the Debian Kernel Issue as mentionned. "regularly" means that after a certain time (not at startup), randomly the VM is stopped with this kernel panic.
The VM could run 10 days without trouble and one day being stopped. Those days nothing particular is noticed.
This could the Mongodb container or the Postgresql container or another container that crash.
The only commons thing I notice is that all crashing VM have Openstack volume mounted into the container.
Other container without volume mounted don't crash (as far I can see).

Copy link

@cpuguy83 cpuguy83 commented Jan 24, 2017

Alo note I found this: kubernetes/kubernetes#23253 (comment)

Copy link

@jmcollin78 jmcollin78 commented Feb 2, 2017

Maybe this will help also: #13940
I will try upgrade Linux kernel to 3.19

Copy link

@thiesschneider thiesschneider commented Mar 3, 2017

did it solve your problem?

Copy link

@jmcollin78 jmcollin78 commented Mar 4, 2017

I upgrade to kernel 4.9 and it solve my problem.

Copy link

@c0deright c0deright commented Jul 14, 2017

I had a similar kernel panic

IP: [<ffffffffc06a1a2b>] au_write_pre+0x8b/0x110 [aufs]
Call Trace:
 [<ffffffffc06a229c>] aufs_write_iter+0x4c/0x100 [aufs]
 [<ffffffffc06a2250>] ? aufs_splice_write+0x110/0x110 [aufs]
 [<ffffffff8125fa7a>] aio_run_iocb+0x26a/0x2d0
 [<ffffffff812f644c>] ? jbd2_complete_transaction+0x5c/0xa0
 [<ffffffff811b4ecd>] ? kzfree+0x2d/0x40
 [<ffffffff811ee2ba>] ? kfree+0x13a/0x150
 [<ffffffff8126088b>] ? do_io_submit+0x19b/0x500
 [<ffffffff8126094f>] do_io_submit+0x25f/0x500
 [<ffffffff81210fe0>] ? __fput+0x190/0x220
 [<ffffffff81260c00>] SyS_io_submit+0x10/0x20
 [<ffffffff81840b72>] entry_SYSCALL_64_fastpath+0x16/0x71


NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [mysqld:10353]

that I was able to reproduce on Ubuntu 16.04 LTS running percona-server-5.6 (mysql) dockerized when my.cnf setting innodb_flush_method = O_DIRECT was active.

The kernel panic and docker crashes went away the second I disabled the innodb_flush_method option.

Might be related?

Copy link

@cpuguy83 cpuguy83 commented Jul 14, 2017

Could be related, but I would make sure percona is not writing to aufs (or overlayfs).

Copy link

@c0deright c0deright commented Jul 14, 2017

Sorry, forgot to mention that I tried to test the concept outlined at with a 20GB dataset.

I intentionally modified the percona image like outlined in the article so that mysql datadir was not outsourced into a docker volume (/var/lib/mysql) but remained inside the container (/data).

aufs doesn't seem to play nicely with big data written with O_DIRECT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.