Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upkernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
drpancake
May 23, 2014
I'm seeing a very similar issue for eth0. Ubuntu 12.04 also.
I have to power cycle the machine. From /var/log/kern.log:
May 22 19:26:08 box kernel: [596765.670275] device veth5070 entered promiscuous mode
May 22 19:26:08 box kernel: [596765.680630] IPv6: ADDRCONF(NETDEV_UP): veth5070: link is not ready
May 22 19:26:08 box kernel: [596765.700561] IPv6: ADDRCONF(NETDEV_CHANGE): veth5070: link becomes ready
May 22 19:26:08 box kernel: [596765.700628] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:08 box kernel: [596765.700638] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:19 box kernel: [596777.386084] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=170 DF PROTO=TCP SPT=51615 DPT=13162 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:21 box kernel: [596779.371993] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=549 DF PROTO=TCP SPT=46878 DPT=12518 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:23 box kernel: [596780.704031] docker0: port 7(veth5070) entered forwarding state
May 22 19:27:13 box kernel: [596831.359999] docker0: port 7(veth5070) entered disabled state
May 22 19:27:13 box kernel: [596831.361329] device veth5070 left promiscuous mode
May 22 19:27:13 box kernel: [596831.361333] docker0: port 7(veth5070) entered disabled state
May 22 19:27:24 box kernel: [596841.516039] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:34 box kernel: [596851.756060] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:44 box kernel: [596861.772101] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
drpancake
commented
May 23, 2014
|
I'm seeing a very similar issue for eth0. Ubuntu 12.04 also. I have to power cycle the machine. From
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
egasimus
Jun 4, 2014
Hey, this just started happening for me as well.
Docker version:
Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99
Go version (server): go1.2.1
Last stable version: 0.11.1
Kernel log: http://pastebin.com/TubCy1tG
System details:
Running Ubuntu 14.04 LTS with patched kernel (3.14.3-rt4). Yet to see it happen with the default linux-3.13.0-27-generic kernel. What's funny, though, is that when this happens, all my terminal windows freeze, letting me type a few characters at most before that. The same fate befalls any new ones I open, too - and I end up needing to power cycle my poor laptop just like the good doctor above. For the record, I'm running fish shell in urxvt or xterm in xmonad. Haven't checked if it affects plain bash.
egasimus
commented
Jun 4, 2014
|
Hey, this just started happening for me as well. Docker version:
Kernel log: http://pastebin.com/TubCy1tG System details: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
egasimus
Jun 5, 2014
This might be relevant:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1065434#yui_3_10_3_1_1401948176063_2050
Copying a fairly large amount of data over the network inside a container
and then exiting the container can trigger a missing decrement in the per
cpu reference count on a network device.
Sure enough, one of the times this happened for me was right after apt-getting a package with a ton of dependencies.
egasimus
commented
Jun 5, 2014
|
This might be relevant:
Sure enough, one of the times this happened for me was right after |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
drpancake
Jun 5, 2014
Upgrading from Ubuntu 12.04.3 to 14.04 fixed this for me without any other changes.
drpancake
commented
Jun 5, 2014
|
Upgrading from Ubuntu 12.04.3 to 14.04 fixed this for me without any other changes. |
unclejack
added
the
kernel
label
Jul 16, 2014
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
csabahenk
commented
Jul 22, 2014
|
I experience this on RHEL7, 3.10.0-123.4.2.el7.x86_64 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
egasimus
Jul 22, 2014
I've noticed the same thing happening with my VirtualBox virtual network interfaces when I'm running 3.14-rt4. It's supposed to be fixed in vanilla 3.13 or something.
egasimus
commented
Jul 22, 2014
|
I've noticed the same thing happening with my VirtualBox virtual network interfaces when I'm running 3.14-rt4. It's supposed to be fixed in vanilla 3.13 or something. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spiffytech
Jul 25, 2014
@egasimus Same here - I pulled in hundreds of MB of data before killing the container, then got this error.
spiffytech
commented
Jul 25, 2014
|
@egasimus Same here - I pulled in hundreds of MB of data before killing the container, then got this error. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spiffytech
Jul 25, 2014
I upgraded to Debian kernel 3.14 and the problem appears to have gone away. Looks like the problem existed in some kernels < 3.5, was fixed in 3.5, regressed in 3.6, and was patched in something 3.12-3.14. https://bugzilla.redhat.com/show_bug.cgi?id=880394
spiffytech
commented
Jul 25, 2014
|
I upgraded to Debian kernel 3.14 and the problem appears to have gone away. Looks like the problem existed in some kernels < 3.5, was fixed in 3.5, regressed in 3.6, and was patched in something 3.12-3.14. https://bugzilla.redhat.com/show_bug.cgi?id=880394 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
egasimus
Jul 27, 2014
@spiffytech Do you have any idea where I can report this regarding the realtime kernel flavour? I think they're only releasing a RT patch for every other version, and would really hate to see 3.16-rt come out with this still broken. :/
EDIT: Filed it at kernel.org.
egasimus
commented
Jul 27, 2014
|
@spiffytech Do you have any idea where I can report this regarding the realtime kernel flavour? I think they're only releasing a RT patch for every other version, and would really hate to see 3.16-rt come out with this still broken. :/ EDIT: Filed it at kernel.org. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ibuildthecloud
Dec 22, 2014
Contributor
I'm getting this on Ubuntu 14.10 running a 3.18.1. Kernel log shows
Dec 21 22:49:31 inotmac kernel: [15225.866600] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:40 inotmac kernel: [15235.179263] INFO: task docker:19599 blocked for more than 120 seconds.
Dec 21 22:49:40 inotmac kernel: [15235.179268] Tainted: G OE 3.18.1-031801-generic #201412170637
Dec 21 22:49:40 inotmac kernel: [15235.179269] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 22:49:40 inotmac kernel: [15235.179271] docker D 0000000000000001 0 19599 1 0x00000000
Dec 21 22:49:40 inotmac kernel: [15235.179275] ffff8802082abcc0 0000000000000086 ffff880235c3b700 00000000ffffffff
Dec 21 22:49:40 inotmac kernel: [15235.179277] ffff8802082abfd8 0000000000013640 ffff8800288f2300 0000000000013640
Dec 21 22:49:40 inotmac kernel: [15235.179280] ffff880232cf0000 ffff8801a467c600 ffffffff81f9d4b8 ffffffff81cd9c60
Dec 21 22:49:40 inotmac kernel: [15235.179282] Call Trace:
Dec 21 22:49:40 inotmac kernel: [15235.179289] [<ffffffff817af549>] schedule+0x29/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179292] [<ffffffff817af88e>] schedule_preempt_disabled+0xe/0x10
Dec 21 22:49:40 inotmac kernel: [15235.179296] [<ffffffff817b1545>] __mutex_lock_slowpath+0x95/0x100
Dec 21 22:49:40 inotmac kernel: [15235.179299] [<ffffffff8168d5c9>] ? copy_net_ns+0x69/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179302] [<ffffffff817b15d3>] mutex_lock+0x23/0x37
Dec 21 22:49:40 inotmac kernel: [15235.179305] [<ffffffff8168d5f8>] copy_net_ns+0x98/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179308] [<ffffffff810941f1>] create_new_namespaces+0x101/0x1b0
Dec 21 22:49:40 inotmac kernel: [15235.179311] [<ffffffff8109432b>] copy_namespaces+0x8b/0xa0
Dec 21 22:49:40 inotmac kernel: [15235.179315] [<ffffffff81073458>] copy_process.part.28+0x828/0xed0
Dec 21 22:49:40 inotmac kernel: [15235.179318] [<ffffffff811f157f>] ? get_empty_filp+0xcf/0x1c0
Dec 21 22:49:40 inotmac kernel: [15235.179320] [<ffffffff81073b80>] copy_process+0x80/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179323] [<ffffffff81073ca2>] do_fork+0x62/0x280
Dec 21 22:49:40 inotmac kernel: [15235.179326] [<ffffffff8120cfc0>] ? get_unused_fd_flags+0x30/0x40
Dec 21 22:49:40 inotmac kernel: [15235.179329] [<ffffffff8120d028>] ? __fd_install+0x58/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179331] [<ffffffff81073f46>] SyS_clone+0x16/0x20
Dec 21 22:49:40 inotmac kernel: [15235.179334] [<ffffffff817b3ab9>] stub_clone+0x69/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179336] [<ffffffff817b376d>] ? system_call_fastpath+0x16/0x1b
Dec 21 22:49:41 inotmac kernel: [15235.950976] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:51 inotmac kernel: [15246.059346] unregister_netdevice: waiting for lo to become free. Usage count = 2
I'll send docker version/info once the system isn't frozen anymore :)
|
I'm getting this on Ubuntu 14.10 running a 3.18.1. Kernel log shows
I'll send |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sbward
commented
Dec 23, 2014
|
We're seeing this issue as well. Ubuntu 14.04, 3.13.0-37-generic |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jbalonso
Dec 29, 2014
On Ubuntu 14.04 server, my team has found that downgrading from 3.13.0-40-generic to 3.13.0-32-generic "resolves" the issue. Given @sbward's observation, that would put the regression after 3.13.0-32-generic and before (or including) 3.13.0-37-generic.
I'll add that, in our case, we sometimes see a negative usage count.
jbalonso
commented
Dec 29, 2014
|
On Ubuntu 14.04 server, my team has found that downgrading from 3.13.0-40-generic to 3.13.0-32-generic "resolves" the issue. Given @sbward's observation, that would put the regression after 3.13.0-32-generic and before (or including) 3.13.0-37-generic. I'll add that, in our case, we sometimes see a negative usage count. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsampaio
Jan 15, 2015
Contributor
FWIW we hit this bug running lxc on trusty kernel (3.13.0-40-generic #69-Ubuntu) the message appears in dmesg followed by this stacktrace:
[27211131.602869] INFO: task lxc-start:26342 blocked for more than 120 seconds.
[27211131.602874] Not tainted 3.13.0-40-generic #69-Ubuntu
[27211131.602877] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27211131.602881] lxc-start D 0000000000000001 0 26342 1 0x00000080
[27211131.602883] ffff88000d001d40 0000000000000282 ffff88001aa21800 ffff88000d001fd8
[27211131.602886] 0000000000014480 0000000000014480 ffff88001aa21800 ffffffff81cdb760
[27211131.602888] ffffffff81cdb764 ffff88001aa21800 00000000ffffffff ffffffff81cdb768
[27211131.602891] Call Trace:
[27211131.602894] [<ffffffff81723b69>] schedule_preempt_disabled+0x29/0x70
[27211131.602897] [<ffffffff817259d5>] __mutex_lock_slowpath+0x135/0x1b0
[27211131.602900] [<ffffffff811a2679>] ? __kmalloc+0x1e9/0x230
[27211131.602903] [<ffffffff81725a6f>] mutex_lock+0x1f/0x2f
[27211131.602905] [<ffffffff8161c2c1>] copy_net_ns+0x71/0x130
[27211131.602908] [<ffffffff8108f889>] create_new_namespaces+0xf9/0x180
[27211131.602910] [<ffffffff8108f983>] copy_namespaces+0x73/0xa0
[27211131.602912] [<ffffffff81065b16>] copy_process.part.26+0x9a6/0x16b0
[27211131.602915] [<ffffffff810669f5>] do_fork+0xd5/0x340
[27211131.602917] [<ffffffff810c8e8d>] ? call_rcu_sched+0x1d/0x20
[27211131.602919] [<ffffffff81066ce6>] SyS_clone+0x16/0x20
[27211131.602921] [<ffffffff81730089>] stub_clone+0x69/0x90
[27211131.602923] [<ffffffff8172fd2d>] ? system_call_fastpath+0x1a/0x1f
|
FWIW we hit this bug running lxc on trusty kernel (3.13.0-40-generic #69-Ubuntu) the message appears in dmesg followed by this stacktrace:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
MrMMorris
Mar 16, 2015
Ran into this on Ubuntu 14.04 and Debian jessie w/ kernel 3.16.x.
Docker command:
docker run -t -i -v /data/sitespeed.io:/sitespeed.io/results company/dockerfiles:sitespeed.io-latest --name "Superbrowse"
This seems like a pretty bad issue...
MrMMorris
commented
Mar 16, 2015
|
Ran into this on Ubuntu 14.04 and Debian jessie w/ kernel 3.16.x. Docker command:
This seems like a pretty bad issue... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
MrMMorris
Mar 17, 2015
@jbalonso even with 3.13.0-32-generic I get the error after only a few successful runs
MrMMorris
commented
Mar 17, 2015
|
@jbalonso even with 3.13.0-32-generic I get the error after only a few successful runs |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsampaio
Mar 17, 2015
Contributor
@MrMMorris could you share a reproducer script using public available images?
|
@MrMMorris could you share a reproducer script using public available images? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
unclejack
Mar 18, 2015
Contributor
Everyone who's seeing this error on their system is running a package of the Linux kernel on their distribution that's far too old and lacks the fixes for this particular problem.
If you run into this problem, make sure you run apt-get update && apt-get dist-upgrade -y and reboot your system. If you're on Digital Ocean, you also need to select the kernel version which was just installed during the update because they don't use the latest kernel automatically (see https://digitalocean.uservoice.com/forums/136585-digitalocean/suggestions/2814988-give-option-to-use-the-droplet-s-own-bootloader).
CentOS/RHEL/Fedora/Scientific Linux users need to keep their systems updated using yum update and reboot after installing the updates.
When reporting this problem, please make sure your system is fully patched and up to date with the latest stable updates (no manually installed experimental/testing/alpha/beta/rc packages) provided by your distribution's vendor.
|
Everyone who's seeing this error on their system is running a package of the Linux kernel on their distribution that's far too old and lacks the fixes for this particular problem. If you run into this problem, make sure you run CentOS/RHEL/Fedora/Scientific Linux users need to keep their systems updated using When reporting this problem, please make sure your system is fully patched and up to date with the latest stable updates (no manually installed experimental/testing/alpha/beta/rc packages) provided by your distribution's vendor. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
MrMMorris
Mar 18, 2015
I ran apt-get update && apt-get dist-upgrade -y
ubuntu 14.04 3.13.0-46-generic
Still get the error after only one docker run
I can create an AMI for reproducing if needed
MrMMorris
commented
Mar 18, 2015
|
I ran ubuntu 14.04 3.13.0-46-generic Still get the error after only one I can create an AMI for reproducing if needed |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
unclejack
Mar 18, 2015
Contributor
@MrMMorris Thank you for confirming it's still a problem with the latest kernel package on Ubuntu 14.04.
|
@MrMMorris Thank you for confirming it's still a problem with the latest kernel package on Ubuntu 14.04. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
MrMMorris
commented
Mar 18, 2015
|
Anything else I can do to help, let me know! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsampaio
Mar 18, 2015
Contributor
@MrMMorris if you can provide a reproducer there is a bug opened for Ubuntu and it will be much appreciated: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152
|
@MrMMorris if you can provide a reproducer there is a bug opened for Ubuntu and it will be much appreciated: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
MrMMorris
commented
Mar 18, 2015
|
@rsampaio if I have time today, I will definitely get that for you! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
fxposter
Mar 23, 2015
This problem also appears on 3.16(.7) on both Debian 7 and Debian 8: docker#9605 (comment). Rebooting the server is the only way to fix this for now.
fxposter
commented
Mar 23, 2015
|
This problem also appears on 3.16(.7) on both Debian 7 and Debian 8: docker#9605 (comment). Rebooting the server is the only way to fix this for now. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
chrisjstevenson
Apr 27, 2015
Seeing this issue on RHEL 6.6 with kernel 2.6.32-504.8.1.el6.x86_64 when starting some docker containers (not all containers)
kernel:unregister_netdevice: waiting for lo to become free. Usage count = -1
Again, rebooting the server seems to be the only solution at this time
chrisjstevenson
commented
Apr 27, 2015
|
Seeing this issue on RHEL 6.6 with kernel 2.6.32-504.8.1.el6.x86_64 when starting some docker containers (not all containers) Again, rebooting the server seems to be the only solution at this time |
popsikle
referenced this issue
in coreos/bugs
May 11, 2015
Closed
Docker completely stuck - unregister_netdevice: waiting for lo to become free. Usage count = 1 #254
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
popsikle
May 12, 2015
Also seeing this on CoreOS (647.0.0) with kernel 3.19.3.
Rebooting is also the only solution I have found.
popsikle
commented
May 12, 2015
|
Also seeing this on CoreOS (647.0.0) with kernel 3.19.3. Rebooting is also the only solution I have found. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
fxposter
commented
May 20, 2015
|
Tested Debian jessie with sid's kernel (4.0.2) - the problem remains. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
popsikle
commented
Jun 19, 2015
|
Anyone seeing this issue running non-ubuntu containers? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
fxposter
Jun 19, 2015
Yes. Debian ones.
19 июня 2015 г. 19:01 пользователь "popsikle" notifications@github.com
написал:
Anyone seeing this issue running non-ubuntu containers?
—
Reply to this email directly or view it on GitHub
docker#5618 (comment).
fxposter
commented
Jun 19, 2015
|
Yes. Debian ones.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
unclejack
Jun 20, 2015
Contributor
This is a kernel issue, not an image related issue. Switching an image for another won't improve or make this problem worse.
|
This is a kernel issue, not an image related issue. Switching an image for another won't improve or make this problem worse. |
ibuildthecloud
referenced this issue
in rancher/rancher
Jul 17, 2015
Closed
Docker not responding after cloned host #1557
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
techniq
Jul 17, 2015
Experiencing issue on Debian Jessie on a BeagleBone Black running 4.1.2-bone12 kernel
techniq
commented
Jul 17, 2015
|
Experiencing issue on Debian Jessie on a BeagleBone Black running 4.1.2-bone12 kernel |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
igorastds
Jul 17, 2015
Experiencing after switching from 4.1.2 to 4.2-rc2 (using git build of 1.8.0).
Deleting /var/lib/docker/* doesn't solve the problem.
Switching back to 4.1.2 solves the problem.
Also, VirtualBox has same issue and there's patch for v5.0.0 (retro-ported to v4) which supposedly does something in kernel driver part.. worth looking to understand the problem.
igorastds
commented
Jul 17, 2015
|
Experiencing after switching from 4.1.2 to 4.2-rc2 (using git build of 1.8.0). Also, VirtualBox has same issue and there's patch for v5.0.0 (retro-ported to v4) which supposedly does something in kernel driver part.. worth looking to understand the problem. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
fxposter
Jul 22, 2015
This is the fix in the VirtualBox: https://www.virtualbox.org/attachment/ticket/12264/diff_unregister_netdev
They don't actually modify the kernel, just their kernel module.
fxposter
commented
Jul 22, 2015
|
This is the fix in the VirtualBox: https://www.virtualbox.org/attachment/ticket/12264/diff_unregister_netdev |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nazar-pc
Jul 24, 2015
Also having this issue with 4.2-rc2:
unregister_netdevice: waiting for vethf1738d3 to become free. Usage count = 1
nazar-pc
commented
Jul 24, 2015
|
Also having this issue with 4.2-rc2:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nazar-pc
commented
Jul 24, 2015
|
Just compiled 4.2-RC3, seems to work again |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
feisuzhu
Jul 30, 2015
Linux docker13 3.19.0-22-generic #22-Ubuntu SMP Tue Jun 16 17:15:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Kernel from Ubuntu 15.04, same issue
feisuzhu
commented
Jul 30, 2015
|
Linux docker13 3.19.0-22-generic #22-Ubuntu SMP Tue Jun 16 17:15:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Kernel from Ubuntu 15.04, same issue |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
LK4D4
Jul 30, 2015
Contributor
I saw it with 4.2-rc3 as well. There is not one bug about device leakage :) I can reproduce on any kernel >=4.1 under highload.
|
I saw it with 4.2-rc3 as well. There is not one bug about device leakage :) I can reproduce on any kernel >=4.1 under highload. |
sdenovan
referenced this issue
in lxc/lxc
Feb 20, 2018
Closed
Namespace clone results in hang: uninterruptible sleep #2141
4admin2root
referenced this issue
in rancher/rancher
Mar 4, 2018
Closed
when scale down a service and then scale up , cause kernel wall "unregister_netdevice: waiting for eth0 to become free. Usage count = 4" #11927
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
wuming5569
Mar 9, 2018
still happend "unregister_netdevice: waiting for eth0 to become free. Usage count = 1" although I‘v upgraded kernel version to 4.4.118, and docker version 17.09.1-ce ,maybe I should try disable ipv6 at the kernel level . Hope it cloud work.
wuming5569
commented
Mar 9, 2018
|
still happend "unregister_netdevice: waiting for eth0 to become free. Usage count = 1" although I‘v upgraded kernel version to 4.4.118, and docker version 17.09.1-ce ,maybe I should try disable ipv6 at the kernel level . Hope it cloud work. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
scher200
commented
Mar 9, 2018
•
|
@wuming5569 please let me know if it worked for you with that version of linux |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
4admin2root
Mar 10, 2018
@wuming5569 maybe, upgrade kernel 4.4.114 fix "unregister_netdevice: waiting for lo to become free. Usage count = 1", not for "unregister_netdevice: waiting for eth0 to become free. Usage count = 1".
I tested in production.
@ddstreet this is a feedback, any help ?
4admin2root
commented
Mar 10, 2018
•
|
@wuming5569 maybe, upgrade kernel 4.4.114 fix "unregister_netdevice: waiting for lo to become free. Usage count = 1", not for "unregister_netdevice: waiting for eth0 to become free. Usage count = 1". |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rn
Mar 10, 2018
Member
@wuming5569 as mentioned above, the messages them self are benign but they may eventually lead to the kernel hanging. Does your kernel hang and if so, what is your network pattern, ie what type of networking do your containers do?
|
@wuming5569 as mentioned above, the messages them self are benign but they may eventually lead to the kernel hanging. Does your kernel hang and if so, what is your network pattern, ie what type of networking do your containers do? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
soglad
Mar 14, 2018
Experienced same issue on CentOS. My kernel is 3.10.0-693.17.1.el7.x86_64. But, I didn't get similar stack trace in syslog.
soglad
commented
Mar 14, 2018
|
Experienced same issue on CentOS. My kernel is 3.10.0-693.17.1.el7.x86_64. But, I didn't get similar stack trace in syslog. |
vicary
referenced this issue
in rancher/rancher
Mar 24, 2018
Open
Connects to existing AWS host shouldn't fail. #12243
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
danielefranceschi
Mar 27, 2018
Same on Centos7 kernel 3.10.0-514.21.1.el7.x86_64 and docker 18.03.0-ce
danielefranceschi
commented
Mar 27, 2018
|
Same on Centos7 kernel 3.10.0-514.21.1.el7.x86_64 and docker 18.03.0-ce |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alexhexabeam
Mar 27, 2018
@danielefranceschi I recommend you upgrade to the latest CentOS kernel (at least 3.10.0-693). It won't solve the issue, but it seems to be much less frequent. In kernels 3.10.0-327 and 3.10.0-514, we were seeing the stack trace, but by my memory, I don't think we've seen any of those in 3.10.0-693.
alexhexabeam
commented
Mar 27, 2018
|
@danielefranceschi I recommend you upgrade to the latest CentOS kernel (at least 3.10.0-693). It won't solve the issue, but it seems to be much less frequent. In kernels 3.10.0-327 and 3.10.0-514, we were seeing the stack trace, but by my memory, I don't think we've seen any of those in 3.10.0-693. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
danielefranceschi
commented
Mar 28, 2018
|
@alexhexabeam 3.10.0-693 seems to work flawlessy, tnx :) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
LeonanCarvalho
Apr 3, 2018
Same on CentOS7 kernel 4.16.0-1.el7.elrepo.x86_64 and docker 18.03.0-ce
It worked for weeks before the crash and when to try to up, it completely stuck.
The problem also happened with kernel 3.10.0-693.21.1.el7
LeonanCarvalho
commented
Apr 3, 2018
•
|
Same on CentOS7 kernel 4.16.0-1.el7.elrepo.x86_64 and docker 18.03.0-ce It worked for weeks before the crash and when to try to up, it completely stuck. The problem also happened with kernel 3.10.0-693.21.1.el7 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marckamerbeek
Apr 4, 2018
I can confirm it also happens on:
Linux 3.10.0-693.17.1.el7.x86_64
Red Hat Enterprise Linux Server release 7.4 (Maipo)
I can reproduce it by doing "service docker restart" while having a certain amount of load.
marckamerbeek
commented
Apr 4, 2018
|
I can confirm it also happens on: Linux 3.10.0-693.17.1.el7.x86_64 I can reproduce it by doing "service docker restart" while having a certain amount of load. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Xuexiang825
Apr 11, 2018
@wuming5569 have you fixed this issue?what's your network type ? we have been confused by this issue for weeks .
Do you have wechat account ?
Xuexiang825
commented
Apr 11, 2018
|
@wuming5569 have you fixed this issue?what's your network type ? we have been confused by this issue for weeks . |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Sherweb-turing-pipeline
Apr 12, 2018
4admin2root, given the fix you mentioned, https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.114,
is it safe to disable userland proxy for docker daemon, if proper recent kernel is installed? It is not very clear if it is from
Since both are older than the kernel fix
Thank you
Sherweb-turing-pipeline
commented
Apr 12, 2018
|
4admin2root, given the fix you mentioned, https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.114, is it safe to disable userland proxy for docker daemon, if proper recent kernel is installed? It is not very clear if it is from Since both are older than the kernel fix Thank you |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sampsonhuo
Apr 18, 2018
we have been confused by this issue for weeks .
Linux 3.10.0-693.17.1.el7.x86_64
CentOS Linux release 7.4.1708 (Core)
sampsonhuo
commented
Apr 18, 2018
|
we have been confused by this issue for weeks . |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dElogics
May 4, 2018
Can anyone confirm if the latest 4.14 kernel has this issue? Seems like it does not. No one around the Internet faced this issue with the 4.14 kernel.
dElogics
commented
May 4, 2018
|
Can anyone confirm if the latest 4.14 kernel has this issue? Seems like it does not. No one around the Internet faced this issue with the 4.14 kernel. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dimm0
commented
May 4, 2018
|
I see this in 4.15.15-1 kernel, Centos7 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dElogics
May 7, 2018
Looking at the change logs, https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.15.8 has a fix for SCTP, but not TCP. So you may like to try the latest 4.14.
dElogics
commented
May 7, 2018
|
Looking at the change logs, https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.15.8 has a fix for SCTP, but not TCP. So you may like to try the latest 4.14. |
zihaoyu
referenced this issue
in kubernetes/kubernetes
May 15, 2018
Open
Node flapping between Ready/NotReady with PLEG issues #45419
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spronin-aurea
Jun 4, 2018
- even 4.15.18 does not help with this bug
- disabling ipv6 does not help as well
we have now upgraded to 4.16.13. Observing. This bug was hitting us on a one node only approx once per week.
spronin-aurea
commented
Jun 4, 2018
we have now upgraded to 4.16.13. Observing. This bug was hitting us on a one node only approx once per week. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
qrpike
Jun 4, 2018
qrpike
commented
Jun 4, 2018
|
Did you disable ipv6 in grub boot params or sysctl? Only boot params will work. Sysctl will not fix it.
…On June 4, 2018 at 12:09:53 PM, Sergey Pronin ***@***.******@***.***)) wrote:
even 4.15.18 does not help with this bug
disabling ipv6 does not help as well
we have now upgraded to 4.16.13. Observing. This bug was hitting us on a one node only approx once per week.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub(#5618 (comment)), or mute the thread(https://github.com/notifications/unsubscribe-auth/AAo3HLYI_jnwjgtQ0ce-E4mc6Em5yeISks5t5VvRgaJpZM4B4L4Z).
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
scher200
Jun 4, 2018
for me, most of the time the bug shows up after redeploying the same project/network again
scher200
commented
Jun 4, 2018
|
for me, most of the time the bug shows up after redeploying the same project/network again |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spronin-aurea
Jun 4, 2018
@qrpike you are right, we tried only sysctl. Let me try with grub. Thanks!
spronin-aurea
commented
Jun 4, 2018
|
@qrpike you are right, we tried only sysctl. Let me try with grub. Thanks! |
AkihiroSuda
referenced this issue
Jun 7, 2018
Closed
kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1 #37222
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dElogics
commented
Jun 19, 2018
|
4.9.88 Debian kernel. Reproducible. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
komljen
Jun 19, 2018
@qrpike you are right, we tried only sysctl. Let me try with grub. Thanks!
In my case disabling ipv6 didn't make any difference.
komljen
commented
Jun 19, 2018
•
In my case disabling ipv6 didn't make any difference. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
qrpike
commented
Jun 19, 2018
|
@spronin-aurea Did disabling ipv6 at boot loader help? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
komljen
Jun 19, 2018
@qrpike can you tell us about the nodes you are using if disabling ipv6 helped in your case? Kernel version, k8s version, CNI, docker version etc.
komljen
commented
Jun 19, 2018
|
@qrpike can you tell us about the nodes you are using if disabling ipv6 helped in your case? Kernel version, k8s version, CNI, docker version etc. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
qrpike
Jun 19, 2018
@komljen I have been using CoreOS for the past 2years without a single incident. Since ~ver 1000. I haven't tried it recently but if I do not disable ipv6 the bug happens.
qrpike
commented
Jun 19, 2018
|
@komljen I have been using CoreOS for the past 2years without a single incident. Since ~ver 1000. I haven't tried it recently but if I do not disable ipv6 the bug happens. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
deimosfr
Jun 19, 2018
On my side, I'm using CoreOS too, ipv6 disabled with grub and still getting the issue
deimosfr
commented
Jun 19, 2018
|
On my side, I'm using CoreOS too, ipv6 disabled with grub and still getting the issue |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
qrpike
Jun 19, 2018
@deimosfr I'm currently using PXE boot for all my nodes:
DEFAULT menu.c32
prompt 0
timeout 50
MENU TITLE PXE Boot Blade 1
label coreos
menu label CoreOS ( blade 1 )
kernel coreos/coreos_production_pxe.vmlinuz
append initrd=coreos/coreos_production_pxe_image.cpio.gz ipv6.disable=1 net.ifnames=1 biosdevname=0 elevator=deadline cloud-config-url=http://HOST_PRIV_IP:8888/coreos-cloud-config.yml?host=1 root=LABEL=ROOT rootflags=noatime,discard,rw,seclabel,nodiratime
However, my main node that is the PXE host is also CoreOS and boots from disk, and does not have the issue either.
qrpike
commented
Jun 19, 2018
•
|
@deimosfr I'm currently using PXE boot for all my nodes:
However, my main node that is the PXE host is also CoreOS and boots from disk, and does not have the issue either. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dElogics
commented
Jun 19, 2018
|
What kernel versions you guys are running? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
deimosfr
Jun 19, 2018
The ones I got the issue were on 4.14.32-coreos and before. I do not encounter this issue yet on 4.14.42-coreos
deimosfr
commented
Jun 19, 2018
|
The ones I got the issue were on 4.14.32-coreos and before. I do not encounter this issue yet on 4.14.42-coreos |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
wallewuli
Jul 2, 2018
Centos 7.5 with 4.17.3-1 kernel, still got the issue.
Env :
kubernetes 1.10.4
Docker 13.1
with Flannel network plugin.
Log :
[ 89.790907] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 89.798523] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 89.799623] cni0: port 8(vethb8a93c6f) entered blocking state
[ 89.800547] cni0: port 8(vethb8a93c6f) entered disabled state
[ 89.801471] device vethb8a93c6f entered promiscuous mode
[ 89.802323] cni0: port 8(vethb8a93c6f) entered blocking state
[ 89.803200] cni0: port 8(vethb8a93c6f) entered forwarding state
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1。
Now :
The node IP can reach, but cannot use any network services , like ssh...
wallewuli
commented
Jul 2, 2018
|
Centos 7.5 with 4.17.3-1 kernel, still got the issue. Env : Log : kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1。 Now : |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Blub
Jul 2, 2018
The symptoms here are similar to a lot of reports in various other places. All having to do with network namespaces. Could the people running into this please see if unshare -n hangs, and if so, from another terminal, do cat /proc/$pid/stack of the unshare process to see if it hangs in copy_net_ns()? This seems to be a common denominator for many of the issues including some backtraces found here. Between 4.16 and 4.18 there have been a number of patches by Kirill Tkhai refactoring the involved locking a lot. The affected distro/kernel package maintainers should probably look into applying/backporting them to stable kernels and see if that helps.
See also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779678
Blub
commented
Jul 2, 2018
•
|
The symptoms here are similar to a lot of reports in various other places. All having to do with network namespaces. Could the people running into this please see if |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cassiussa
Jul 3, 2018
sudo cat /proc/122355/stack
[<ffffffff8157f6e2>] copy_net_ns+0xa2/0x180
[<ffffffff810b7519>] create_new_namespaces+0xf9/0x180
[<ffffffff810b775a>] unshare_nsproxy_namespaces+0x5a/0xc0
[<ffffffff81088983>] SyS_unshare+0x193/0x300
[<ffffffff816b8c6b>] tracesys+0x97/0xbd
[<ffffffffffffffff>] 0xffffffffffffffff
cassiussa
commented
Jul 3, 2018
•
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Blub
Jul 4, 2018
Given the locking changes in 4.18 it would be good to test the current 4.18rc, especially if you can trigger it more or less reliably, as from what I've seen there are many people where changing kernel versions also changed the likelihood of this happening a lot.
Blub
commented
Jul 4, 2018
|
Given the locking changes in 4.18 it would be good to test the current 4.18rc, especially if you can trigger it more or less reliably, as from what I've seen there are many people where changing kernel versions also changed the likelihood of this happening a lot. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
komljen
Jul 4, 2018
I had this issues with Kubernetes and after switching to latest CoreOS stable release - 1745.7.0 the issue is gone:
- kernel: 4.14.48
- docker: 18.03.1
komljen
commented
Jul 4, 2018
|
I had this issues with Kubernetes and after switching to latest CoreOS stable release - 1745.7.0 the issue is gone:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PengBAI
Jul 5, 2018
same issue on CentOS 7
- kernel: 4.11.1-1.el7.elrepo.x86_64
- docker: 17.12.0-ce
Can anyone give us a woking version please !
PengBAI
commented
Jul 5, 2018
•
|
same issue on CentOS 7
Can anyone give us a woking version please ! |
tankywoo commentedMay 6, 2014
This happens when I login the container, and can't quit by Ctrl-c.
My system is
Ubuntu 12.04, kernel is3.8.0-25-generic.docker version:
I have used the script https://raw.githubusercontent.com/dotcloud/docker/master/contrib/check-config.sh to check, and all right.
I watch the syslog and found this message:
After happend this, I open another terminal and kill this process, and then restart docker, but this will be hanged.
I reboot the host, and it still display that messages for some minutes when shutdown:
