Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" #5618

Open
tankywoo opened this issue May 6, 2014 · 536 comments

Comments

@tankywoo
Copy link

@tankywoo tankywoo commented May 6, 2014

This happens when I login the container, and can't quit by Ctrl-c.

My system is Ubuntu 12.04, kernel is 3.8.0-25-generic.

docker version:

root@wutq-docker:~# docker version
Client version: 0.10.0
Client API version: 1.10
Go version (client): go1.2.1
Git commit (client): dc9c28f
Server version: 0.10.0
Server API version: 1.10
Git commit (server): dc9c28f
Go version (server): go1.2.1
Last stable version: 0.10.0

I have used the script https://raw.githubusercontent.com/dotcloud/docker/master/contrib/check-config.sh to check, and all right.

I watch the syslog and found this message:

May  6 11:30:33 wutq-docker kernel: [62365.889369] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:44 wutq-docker kernel: [62376.108277] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:54 wutq-docker kernel: [62386.327156] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:02 wutq-docker kernel: [62394.423920] INFO: task docker:1024 blocked for more than 120 seconds.
May  6 11:31:02 wutq-docker kernel: [62394.424175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  6 11:31:02 wutq-docker kernel: [62394.424505] docker          D 0000000000000001     0  1024      1 0x00000004
May  6 11:31:02 wutq-docker kernel: [62394.424511]  ffff880077793cb0 0000000000000082 ffffffffffffff04 ffffffff816df509
May  6 11:31:02 wutq-docker kernel: [62394.424517]  ffff880077793fd8 ffff880077793fd8 ffff880077793fd8 0000000000013f40
May  6 11:31:02 wutq-docker kernel: [62394.424521]  ffff88007c461740 ffff880076b1dd00 000080d081f06880 ffffffff81cbbda0
May  6 11:31:02 wutq-docker kernel: [62394.424526] Call Trace:                                                         
May  6 11:31:02 wutq-docker kernel: [62394.424668]  [<ffffffff816df509>] ? __slab_alloc+0x28a/0x2b2
May  6 11:31:02 wutq-docker kernel: [62394.424700]  [<ffffffff816f1849>] schedule+0x29/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424705]  [<ffffffff816f1afe>] schedule_preempt_disabled+0xe/0x10
May  6 11:31:02 wutq-docker kernel: [62394.424710]  [<ffffffff816f0777>] __mutex_lock_slowpath+0xd7/0x150
May  6 11:31:02 wutq-docker kernel: [62394.424715]  [<ffffffff815dc809>] ? copy_net_ns+0x69/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424719]  [<ffffffff815dc0b1>] ? net_alloc_generic+0x21/0x30
May  6 11:31:02 wutq-docker kernel: [62394.424724]  [<ffffffff816f038a>] mutex_lock+0x2a/0x50
May  6 11:31:02 wutq-docker kernel: [62394.424727]  [<ffffffff815dc82c>] copy_net_ns+0x8c/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424733]  [<ffffffff81084851>] create_new_namespaces+0x101/0x1b0
May  6 11:31:02 wutq-docker kernel: [62394.424737]  [<ffffffff81084a33>] copy_namespaces+0xa3/0xe0
May  6 11:31:02 wutq-docker kernel: [62394.424742]  [<ffffffff81057a60>] ? dup_mm+0x140/0x240
May  6 11:31:02 wutq-docker kernel: [62394.424746]  [<ffffffff81058294>] copy_process.part.22+0x6f4/0xe60
May  6 11:31:02 wutq-docker kernel: [62394.424752]  [<ffffffff812da406>] ? security_file_alloc+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424758]  [<ffffffff8119d118>] ? get_empty_filp+0x88/0x180
May  6 11:31:02 wutq-docker kernel: [62394.424762]  [<ffffffff81058a80>] copy_process+0x80/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424766]  [<ffffffff81058b7c>] do_fork+0x9c/0x230
May  6 11:31:02 wutq-docker kernel: [62394.424769]  [<ffffffff816f277e>] ? _raw_spin_lock+0xe/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424774]  [<ffffffff811b9185>] ? __fd_install+0x55/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424777]  [<ffffffff81058d96>] sys_clone+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424782]  [<ffffffff816fb939>] stub_clone+0x69/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424786]  [<ffffffff816fb5dd>] ? system_call_fastpath+0x1a/0x1f
May  6 11:31:04 wutq-docker kernel: [62396.466223] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:14 wutq-docker kernel: [62406.689132] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:25 wutq-docker kernel: [62416.908036] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:35 wutq-docker kernel: [62427.126927] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:45 wutq-docker kernel: [62437.345860] unregister_netdevice: waiting for lo to become free. Usage count = 3

After happend this, I open another terminal and kill this process, and then restart docker, but this will be hanged.

I reboot the host, and it still display that messages for some minutes when shutdown:
screen shot 2014-05-06 at 11 49 27

@drpancake
Copy link

@drpancake drpancake commented May 23, 2014

I'm seeing a very similar issue for eth0. Ubuntu 12.04 also.

I have to power cycle the machine. From /var/log/kern.log:

May 22 19:26:08 box kernel: [596765.670275] device veth5070 entered promiscuous mode
May 22 19:26:08 box kernel: [596765.680630] IPv6: ADDRCONF(NETDEV_UP): veth5070: link is not ready
May 22 19:26:08 box kernel: [596765.700561] IPv6: ADDRCONF(NETDEV_CHANGE): veth5070: link becomes ready
May 22 19:26:08 box kernel: [596765.700628] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:08 box kernel: [596765.700638] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:19 box kernel: [596777.386084] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=170 DF PROTO=TCP SPT=51615 DPT=13162 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:21 box kernel: [596779.371993] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=549 DF PROTO=TCP SPT=46878 DPT=12518 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:23 box kernel: [596780.704031] docker0: port 7(veth5070) entered forwarding state
May 22 19:27:13 box kernel: [596831.359999] docker0: port 7(veth5070) entered disabled state
May 22 19:27:13 box kernel: [596831.361329] device veth5070 left promiscuous mode
May 22 19:27:13 box kernel: [596831.361333] docker0: port 7(veth5070) entered disabled state
May 22 19:27:24 box kernel: [596841.516039] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:34 box kernel: [596851.756060] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:44 box kernel: [596861.772101] unregister_netdevice: waiting for eth0 to become free. Usage count = 1

@egasimus
Copy link

@egasimus egasimus commented Jun 4, 2014

Hey, this just started happening for me as well.

Docker version:

Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99
Go version (server): go1.2.1
Last stable version: 0.11.1

Kernel log: http://pastebin.com/TubCy1tG

System details:
Running Ubuntu 14.04 LTS with patched kernel (3.14.3-rt4). Yet to see it happen with the default linux-3.13.0-27-generic kernel. What's funny, though, is that when this happens, all my terminal windows freeze, letting me type a few characters at most before that. The same fate befalls any new ones I open, too - and I end up needing to power cycle my poor laptop just like the good doctor above. For the record, I'm running fish shell in urxvt or xterm in xmonad. Haven't checked if it affects plain bash.

@egasimus
Copy link

@egasimus egasimus commented Jun 5, 2014

This might be relevant:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1065434#yui_3_10_3_1_1401948176063_2050

Copying a fairly large amount of data over the network inside a container
and then exiting the container can trigger a missing decrement in the per
cpu reference count on a network device.

Sure enough, one of the times this happened for me was right after apt-getting a package with a ton of dependencies.

@drpancake
Copy link

@drpancake drpancake commented Jun 5, 2014

Upgrading from Ubuntu 12.04.3 to 14.04 fixed this for me without any other changes.

@csabahenk
Copy link

@csabahenk csabahenk commented Jul 22, 2014

I experience this on RHEL7, 3.10.0-123.4.2.el7.x86_64

@egasimus
Copy link

@egasimus egasimus commented Jul 22, 2014

I've noticed the same thing happening with my VirtualBox virtual network interfaces when I'm running 3.14-rt4. It's supposed to be fixed in vanilla 3.13 or something.

@spiffytech
Copy link

@spiffytech spiffytech commented Jul 25, 2014

@egasimus Same here - I pulled in hundreds of MB of data before killing the container, then got this error.

@spiffytech
Copy link

@spiffytech spiffytech commented Jul 25, 2014

I upgraded to Debian kernel 3.14 and the problem appears to have gone away. Looks like the problem existed in some kernels < 3.5, was fixed in 3.5, regressed in 3.6, and was patched in something 3.12-3.14. https://bugzilla.redhat.com/show_bug.cgi?id=880394

@egasimus
Copy link

@egasimus egasimus commented Jul 27, 2014

@spiffytech Do you have any idea where I can report this regarding the realtime kernel flavour? I think they're only releasing a RT patch for every other version, and would really hate to see 3.16-rt come out with this still broken. :/

EDIT: Filed it at kernel.org.

@ibuildthecloud
Copy link
Contributor

@ibuildthecloud ibuildthecloud commented Dec 22, 2014

I'm getting this on Ubuntu 14.10 running a 3.18.1. Kernel log shows

Dec 21 22:49:31 inotmac kernel: [15225.866600] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:40 inotmac kernel: [15235.179263] INFO: task docker:19599 blocked for more than 120 seconds.
Dec 21 22:49:40 inotmac kernel: [15235.179268]       Tainted: G           OE  3.18.1-031801-generic #201412170637
Dec 21 22:49:40 inotmac kernel: [15235.179269] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 22:49:40 inotmac kernel: [15235.179271] docker          D 0000000000000001     0 19599      1 0x00000000
Dec 21 22:49:40 inotmac kernel: [15235.179275]  ffff8802082abcc0 0000000000000086 ffff880235c3b700 00000000ffffffff
Dec 21 22:49:40 inotmac kernel: [15235.179277]  ffff8802082abfd8 0000000000013640 ffff8800288f2300 0000000000013640
Dec 21 22:49:40 inotmac kernel: [15235.179280]  ffff880232cf0000 ffff8801a467c600 ffffffff81f9d4b8 ffffffff81cd9c60
Dec 21 22:49:40 inotmac kernel: [15235.179282] Call Trace:
Dec 21 22:49:40 inotmac kernel: [15235.179289]  [<ffffffff817af549>] schedule+0x29/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179292]  [<ffffffff817af88e>] schedule_preempt_disabled+0xe/0x10
Dec 21 22:49:40 inotmac kernel: [15235.179296]  [<ffffffff817b1545>] __mutex_lock_slowpath+0x95/0x100
Dec 21 22:49:40 inotmac kernel: [15235.179299]  [<ffffffff8168d5c9>] ? copy_net_ns+0x69/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179302]  [<ffffffff817b15d3>] mutex_lock+0x23/0x37
Dec 21 22:49:40 inotmac kernel: [15235.179305]  [<ffffffff8168d5f8>] copy_net_ns+0x98/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179308]  [<ffffffff810941f1>] create_new_namespaces+0x101/0x1b0
Dec 21 22:49:40 inotmac kernel: [15235.179311]  [<ffffffff8109432b>] copy_namespaces+0x8b/0xa0
Dec 21 22:49:40 inotmac kernel: [15235.179315]  [<ffffffff81073458>] copy_process.part.28+0x828/0xed0
Dec 21 22:49:40 inotmac kernel: [15235.179318]  [<ffffffff811f157f>] ? get_empty_filp+0xcf/0x1c0
Dec 21 22:49:40 inotmac kernel: [15235.179320]  [<ffffffff81073b80>] copy_process+0x80/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179323]  [<ffffffff81073ca2>] do_fork+0x62/0x280
Dec 21 22:49:40 inotmac kernel: [15235.179326]  [<ffffffff8120cfc0>] ? get_unused_fd_flags+0x30/0x40
Dec 21 22:49:40 inotmac kernel: [15235.179329]  [<ffffffff8120d028>] ? __fd_install+0x58/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179331]  [<ffffffff81073f46>] SyS_clone+0x16/0x20
Dec 21 22:49:40 inotmac kernel: [15235.179334]  [<ffffffff817b3ab9>] stub_clone+0x69/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179336]  [<ffffffff817b376d>] ? system_call_fastpath+0x16/0x1b
Dec 21 22:49:41 inotmac kernel: [15235.950976] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:51 inotmac kernel: [15246.059346] unregister_netdevice: waiting for lo to become free. Usage count = 2

I'll send docker version/info once the system isn't frozen anymore :)

@sbward
Copy link

@sbward sbward commented Dec 23, 2014

We're seeing this issue as well. Ubuntu 14.04, 3.13.0-37-generic

@jbalonso
Copy link

@jbalonso jbalonso commented Dec 29, 2014

On Ubuntu 14.04 server, my team has found that downgrading from 3.13.0-40-generic to 3.13.0-32-generic "resolves" the issue. Given @sbward's observation, that would put the regression after 3.13.0-32-generic and before (or including) 3.13.0-37-generic.

I'll add that, in our case, we sometimes see a negative usage count.

@rsampaio
Copy link
Contributor

@rsampaio rsampaio commented Jan 15, 2015

FWIW we hit this bug running lxc on trusty kernel (3.13.0-40-generic #69-Ubuntu) the message appears in dmesg followed by this stacktrace:

[27211131.602869] INFO: task lxc-start:26342 blocked for more than 120 seconds.
[27211131.602874]       Not tainted 3.13.0-40-generic #69-Ubuntu
[27211131.602877] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27211131.602881] lxc-start       D 0000000000000001     0 26342      1 0x00000080
[27211131.602883]  ffff88000d001d40 0000000000000282 ffff88001aa21800 ffff88000d001fd8
[27211131.602886]  0000000000014480 0000000000014480 ffff88001aa21800 ffffffff81cdb760
[27211131.602888]  ffffffff81cdb764 ffff88001aa21800 00000000ffffffff ffffffff81cdb768
[27211131.602891] Call Trace:
[27211131.602894]  [<ffffffff81723b69>] schedule_preempt_disabled+0x29/0x70
[27211131.602897]  [<ffffffff817259d5>] __mutex_lock_slowpath+0x135/0x1b0
[27211131.602900]  [<ffffffff811a2679>] ? __kmalloc+0x1e9/0x230
[27211131.602903]  [<ffffffff81725a6f>] mutex_lock+0x1f/0x2f
[27211131.602905]  [<ffffffff8161c2c1>] copy_net_ns+0x71/0x130
[27211131.602908]  [<ffffffff8108f889>] create_new_namespaces+0xf9/0x180
[27211131.602910]  [<ffffffff8108f983>] copy_namespaces+0x73/0xa0
[27211131.602912]  [<ffffffff81065b16>] copy_process.part.26+0x9a6/0x16b0
[27211131.602915]  [<ffffffff810669f5>] do_fork+0xd5/0x340
[27211131.602917]  [<ffffffff810c8e8d>] ? call_rcu_sched+0x1d/0x20
[27211131.602919]  [<ffffffff81066ce6>] SyS_clone+0x16/0x20
[27211131.602921]  [<ffffffff81730089>] stub_clone+0x69/0x90
[27211131.602923]  [<ffffffff8172fd2d>] ? system_call_fastpath+0x1a/0x1f

@MrMMorris
Copy link

@MrMMorris MrMMorris commented Mar 16, 2015

Ran into this on Ubuntu 14.04 and Debian jessie w/ kernel 3.16.x.

Docker command:

docker run -t -i -v /data/sitespeed.io:/sitespeed.io/results company/dockerfiles:sitespeed.io-latest --name "Superbrowse"

This seems like a pretty bad issue...

@MrMMorris
Copy link

@MrMMorris MrMMorris commented Mar 17, 2015

@jbalonso even with 3.13.0-32-generic I get the error after only a few successful runs 😭

@rsampaio
Copy link
Contributor

@rsampaio rsampaio commented Mar 17, 2015

@MrMMorris could you share a reproducer script using public available images?

@unclejack
Copy link
Contributor

@unclejack unclejack commented Mar 18, 2015

Everyone who's seeing this error on their system is running a package of the Linux kernel on their distribution that's far too old and lacks the fixes for this particular problem.

If you run into this problem, make sure you run apt-get update && apt-get dist-upgrade -y and reboot your system. If you're on Digital Ocean, you also need to select the kernel version which was just installed during the update because they don't use the latest kernel automatically (see https://digitalocean.uservoice.com/forums/136585-digitalocean/suggestions/2814988-give-option-to-use-the-droplet-s-own-bootloader).

CentOS/RHEL/Fedora/Scientific Linux users need to keep their systems updated using yum update and reboot after installing the updates.

When reporting this problem, please make sure your system is fully patched and up to date with the latest stable updates (no manually installed experimental/testing/alpha/beta/rc packages) provided by your distribution's vendor.

@MrMMorris
Copy link

@MrMMorris MrMMorris commented Mar 18, 2015

@unclejack

I ran apt-get update && apt-get dist-upgrade -y

ubuntu 14.04 3.13.0-46-generic

Still get the error after only one docker run

I can create an AMI for reproducing if needed

@unclejack
Copy link
Contributor

@unclejack unclejack commented Mar 18, 2015

@MrMMorris Thank you for confirming it's still a problem with the latest kernel package on Ubuntu 14.04.

@MrMMorris
Copy link

@MrMMorris MrMMorris commented Mar 18, 2015

Anything else I can do to help, let me know! 😄

@rsampaio
Copy link
Contributor

@rsampaio rsampaio commented Mar 18, 2015

@MrMMorris if you can provide a reproducer there is a bug opened for Ubuntu and it will be much appreciated: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152

@MrMMorris
Copy link

@MrMMorris MrMMorris commented Mar 18, 2015

@rsampaio if I have time today, I will definitely get that for you!

@fxposter
Copy link

@fxposter fxposter commented Mar 23, 2015

This problem also appears on 3.16(.7) on both Debian 7 and Debian 8: #9605 (comment). Rebooting the server is the only way to fix this for now.

@chrisjstevenson
Copy link

@chrisjstevenson chrisjstevenson commented Apr 27, 2015

Seeing this issue on RHEL 6.6 with kernel 2.6.32-504.8.1.el6.x86_64 when starting some docker containers (not all containers)
kernel:unregister_netdevice: waiting for lo to become free. Usage count = -1

Again, rebooting the server seems to be the only solution at this time

@popsikle
Copy link

@popsikle popsikle commented May 12, 2015

Also seeing this on CoreOS (647.0.0) with kernel 3.19.3.

Rebooting is also the only solution I have found.

@fxposter
Copy link

@fxposter fxposter commented May 20, 2015

Tested Debian jessie with sid's kernel (4.0.2) - the problem remains.

@popsikle
Copy link

@popsikle popsikle commented Jun 19, 2015

Anyone seeing this issue running non-ubuntu containers?

@fxposter
Copy link

@fxposter fxposter commented Jun 19, 2015

Yes. Debian ones.
19 июня 2015 г. 19:01 пользователь "popsikle" notifications@github.com
написал:

Anyone seeing this issue running non-ubuntu containers?


Reply to this email directly or view it on GitHub
#5618 (comment).

@steelcowboy
Copy link

@steelcowboy steelcowboy commented Mar 21, 2020

Found out my Docker logs are also being spammed. Kernel 5.4.0, Docker 19.03.8:

Mar 21 18:46:14 host.mysite.com dockerd[16544]: time="2020-03-21T18:46:14.127275161Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Mar 21 18:45:13 host.mysite.com dockerd[16544]: time="2020-03-21T18:45:13.642050333Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Mar 21 18:44:13 host.mysite.com dockerd[16544]: time="2020-03-21T18:44:13.161364216Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Mar 21 18:43:12 host.mysite.com dockerd[16544]: time="2020-03-21T18:43:12.714725302Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

@steelcowboy
Copy link

@steelcowboy steelcowboy commented Mar 27, 2020

I finally found out how to suppress these messages btw. From this question on StackExchange, I commented out this line in /etc/rsyslog.conf:

# Everybody gets emergency messages
#*.emerg                    :omusrmsg:*

Very nuclear option, but at least now my system is usable again!

@SuperSandro2000
Copy link

@SuperSandro2000 SuperSandro2000 commented Sep 6, 2020

@steelcowboy You can configure rsyslog to only void those annoying messages instead of all emergencies which is more desirable.

I wrote the following into /etc/rsyslog.d/40-unreigster-netdevice.conf and restarted rsyslog systemctl restart rsyslog.

# match frequent not relevant emergency messages generated by Docker when transfering large amounts of data through the network
:msg,contains,"unregister_netdevice: waiting for lo to become free. Usage count = 1" /dev/null

# discard matching messages
& stop

@hedza06
Copy link

@hedza06 hedza06 commented Oct 20, 2020

Any news here?

@StruggleYang
Copy link

@StruggleYang StruggleYang commented Dec 29, 2020

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Any news here?

@w-simon
Copy link

@w-simon w-simon commented Dec 29, 2020

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Any news here?

This patch has fixed this problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee60ad219f5c7c4fb2f047f88037770063ef785f

We have analyzed in the following link, and this problem could also be reproduced:
https://github.com/w-simon/kernel_debug_notes/blob/master/reproduce_a_leaked_dst_entry

@Xyaren
Copy link

@Xyaren Xyaren commented Feb 5, 2021

This is still happening for me on kernel Ubuntu 5.8.0-41.46-generic 5.8.18

@rseffner
Copy link

@rseffner rseffner commented May 21, 2021

For me this first happening at going from kernel 5.10.37 to 5.10.38 with debian 10.9 amd64 on different machines.

@tsjk
Copy link

@tsjk tsjk commented May 22, 2021

I saw this for the first time on a Gentoo system with kernel v5.4.120, just upgraded from kernel v5.4.117. Kernel sources used: sys-kernel/gentoo-sources.

I get

unregister_netdevice: waiting for ip6_vti0 to become free. Usage count = 1

every 10 seconds or so.

@rantala
Copy link

@rantala rantala commented May 24, 2021

I saw this for the first time on a Gentoo system with kernel v5.4.120, just upgraded from kernel v5.4.117. Kernel sources used: sys-kernel/gentoo-sources.

I get

unregister_netdevice: waiting for ip6_vti0 to become free. Usage count = 1

every 10 seconds or so.

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

@nivseg2
Copy link

@nivseg2 nivseg2 commented Oct 6, 2021

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

Hello, do you have any more info on which specific commits introduced
and fixed the issue as mentioned above?

@rantala
Copy link

@rantala rantala commented Oct 6, 2021

Hi, this regression was introduced in 5.4.120, and is fixed in 5.4.121.

Hello, do you have any more info on which specific commits introduced
and fixed the issue as mentioned above?

See the commits authored by Eric Dumazet:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.4.121

It was specifically about ip6_vti interfaces: 5.4.120 added https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.4.121&id=98ebeb87b2cf26663270e8e656fe599a32e4c96d which introduced the regression.

(If I remember right, same issue was seen in some other stable/LTS kernel versions as well.)

@nivseg2
Copy link

@nivseg2 nivseg2 commented Oct 6, 2021

Ah, thanks!

@truman369
Copy link

@truman369 truman369 commented Dec 21, 2021

I have the same issue on kernels 5.10.70 and 5.14.9 This happens when I restart ipv6 containers.
Does anyone know solutions except of suppressing the output?

@tconrado
Copy link

@tconrado tconrado commented Dec 21, 2021

I have some reports that Ubuntu 20.04 LTS HWE did not present this issue;

the work around is to never close the namespace, or to guarantee that the namespace is free of networks before closing it

@coolljt0725
Copy link
Contributor

@coolljt0725 coolljt0725 commented Dec 22, 2021

Linux kernel is adding a reference count tracing mechanism, https://lwn.net/ml/netdev/20211205042217.982127-1-eric.dumazet@gmail.com/ Hope with this mechanism, it would be easier to find and fix this kind of reference counting bugs in future.

@nivseg2
Copy link

@nivseg2 nivseg2 commented Dec 23, 2021

Have been trying to test it.

Unfortunately, I don't have a reproducer yet.

@fserve
Copy link

@fserve fserve commented Jan 8, 2022

I have this same problem, this VM is running docker and ipv6. I have another VMs as this one and dont get this problem, maybe it's related to a running container. It stops after reboot but come back after a few days.

unregister_netdevice: waiting for lo to become free. Usage count = 1

Linux xxx 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

linux-headers-generic-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-headers-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-hwe-5.11-headers-5.11.0-43/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-43.47~20.04.2 all [installed,automatic]
linux-hwe-5.11-headers-5.11.0-44/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-44.48~20.04.2 all [installed,automatic]
linux-image-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed]

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal

docker version
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.8
 Git commit:        20.10.7-0ubuntu5~20.04.2
 Built:             Mon Nov  1 00:34:17 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.7-0ubuntu5~20.04.2
  Built:            Fri Oct 22 00:45:53 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.5-0ubuntu3~20.04.1
  GitCommit:
 runc:
  Version:          1.0.1-0ubuntu2~20.04.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:

@tconrado
Copy link

@tconrado tconrado commented Jan 8, 2022

@fserve
Copy link

@fserve fserve commented Jan 11, 2022

Yes, ubuntu focal 20.04 hwe, as you can see here:

Linux xxx 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

linux-headers-generic-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-headers-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-hwe-5.11-headers-5.11.0-43/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-43.47~20.04.2 all [installed,automatic]
linux-hwe-5.11-headers-5.11.0-44/focal-updates,focal-updates,focal-security,focal-security,now 5.11.0-44.48~20.04.2 all [installed,automatic]
linux-image-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed,automatic]
linux-virtual-hwe-20.04/focal-updates,focal-security,now 5.11.0.44.48~20.04.22 amd64 [installed]

@tconrado
Copy link

@tconrado tconrado commented Jan 11, 2022

@truman369
Copy link

@truman369 truman369 commented Jan 23, 2022

It looks like the issue is resolved in kernel 5.15.5. Anyway, I stopped getting error messages after switching to this version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet