New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network: Many interfaces lead to "kernel receive buffer overrun" #14417
Comments
Could you give us a script or .netdev files to generate the interfaces to reproduce the issue? |
Attached is a It is also weird that, using that configuration, starting Even more weird: Running |
Hmm, I cannot reproduce the issue with your config. Also, I've tested with 2000 dummy device or 1000 veth interface, but they seem to work fine. Could you test the issue with v244? |
Yes, maybe I can test with a newer version on NixOS or in a VM. Did you only test via the service, or also by running |
I think this a very complex situation. Rather than we try to reproduce it if you give us that would be more useful. And it should not restricted to one distribution. |
I did not say anything against that. It was a question about your past test. |
Surely not we just trying to get a reproducer so that we can help you solve your problem . Do you figure? |
i figure we should just set the socket buffer for the netlink socket ridiculously high, like udev does. Socket memory is nowadays properly accounted properly to the processes owning them, hence that's not even a bad idea.. If we do that, then the issue should not go away, but be very hard to reproduce |
@haslersn I guess a workaround for this issue is extending receive buffer size in systemd-networkd.socket, e.g. by creating the following drop-in file:
|
I guess one host enables systemd-networkd.socket, and the other one disables it.
In that case, it is almost equivalent that .socket is disabled. So, it is reasonable. |
Bumping ReceiveBuffer= is something we definitely should do by default btw. The workaround @yuwata is not just a workaround, but part of the solution I think. |
Starting |
I don't think this has been fixed (entirely). I am running into a very similar case with 200 VLAN interfaces. This is on v245.5 on both NixOS and Fedora 33 (fedora On the first start (during system bootup) I am observing timeouts and assertions: https://gist.githubusercontent.com/andir/616f937189d176da86dbe7aa0f65217e/raw/918c9926f66c5788d9de2795dee5200331daf8c5/l (as mentioned in #15669 (comment)) Manually restarting networkd (
Increased buffer size further (1024MB) didn't change that behavior. The networkd config files that I used are here: https://gist.github.com/andir/7963aa3edcd5b834224515f826f10579 You might have to change the |
@yuwata This should probable be re-opened as the initial problem isn't gone but only worked around by activating the |
I spent a bit more time on this and this bug is indeed partially fixed by the increase of buffer size. At least the |
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417.
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417.
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417.
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417.
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417. (cherry picked from commit b92f350)
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417.
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417.
by a mistake, instead creating the adapter and moving to its namespace, we created 195 adapters on the host + 3 others that provide the connectivity and we started to get |
Not really, as long as systemd-networkd doesn't support configuring network namespaces. |
@tconrado What version of systemd was this with? IIRC the latest release(s) have somewhat mitigated the issue. |
…ger than the kernel limit The commit 10ce2e0 inverts the order of SO_{RCV,SND}BUFFORCE and SO_{RCV,SND}BUF. However, setting buffer size with SO_{RCV,SND}BUF does not fail even if the requested size is larger than the kernel limit. Hence, SO_{RCV,SND}BUFFORCE will not use anymore and the buffer size is always limited by the kernel limit even if we have the priviledge to ignore the limit. This makes the buffer size is checked after configuring it with SO_{RCV,SND}BUF, and if it is still not sufficient, then try to set it with FORCE command. With this commit, if we have enough priviledge, the requested buffer size is correctly set. Hopefully fixes systemd#14417. (cherry picked from commit b92f350) (cherry picked from commit 4dcae66)
Same problem here with one "real" interface and some docker interfaces. ReceiveBuffer=512M did not help |
I just stumbled across this issue on Ubuntu 20.04 using Ubuntu's Netplan.io in version 0.102-0ubuntu1~20.04.2. Systemd is at version 245 (245.4-4ubuntu3.6). I was unable to create more than 21 bridged VLANs. Trying to create more bridged VLANs resulted in symptoms as described above. Enabling and activating systemd-networkd.socket seems to mitigate this issue. Initial testing suggests that this solution also survives restarts. |
I also think this should be rewritten or reopened. I had same issues on several up2date machines last week. |
@ImmoWetzel and others, if you still have the issue with recent systemd (currently, v248, v249, or newer), then please open a new issue with debugging logs of networkd and your relevant configs (.network, .netdev, and so on). Also, please try to increase |
This commit fixes the intermittent `Recv() error Out of memory [-5]` crashes of worker process by raising the netlink socket's receive buffer size to 1 MiB. A netlink socket's default receive buffer size is 208 KiB (taken from net.core.rmem_default). When incoming messages fill up the receive buffer, the recvmsg(2) call returns -ENOBUFS, which is then translated to the Out of memory error above by libnl. Both NetworkManager and systemd-networkd raise their netlink socket's buffer size to work around the same issue. More details on systemd-networkd's decision can be found at systemd/systemd#14417 and systemd/systemd#14434. Fixes cifsd-team#235. Signed-off-by: database64128 <free122448@hotmail.com>
This commit fixes the intermittent `Recv() error Out of memory [-5]` crashes of worker process by raising the netlink socket's receive buffer size to 1 MiB. A netlink socket's default receive buffer size is 208 KiB (taken from net.core.rmem_default). When incoming messages fill up the receive buffer, the recvmsg(2) call returns -ENOBUFS, which is then translated to the Out of memory error above by libnl. Both NetworkManager and systemd-networkd raise their netlink socket's buffer size to work around the same issue. More details on systemd-networkd's decision can be found at systemd/systemd#14417 and systemd/systemd#14434. Fixes cifsd-team#235. Signed-off-by: database64128 <free122448@hotmail.com>
This commit fixes the intermittent `Recv() error Out of memory [-5]` crashes of worker process by raising the netlink socket's receive buffer size to 1 MiB. A netlink socket's default receive buffer size is 208 KiB (taken from net.core.rmem_default). When incoming messages fill up the receive buffer, the recvmsg(2) call returns -ENOBUFS, which is then translated to the Out of memory error above by libnl. Both NetworkManager and systemd-networkd raise their netlink socket's buffer size to work around the same issue. More details on systemd-networkd's decision can be found at systemd/systemd#14417 and systemd/systemd#14434. Fixes cifsd-team#235. Signed-off-by: database64128 <free122448@hotmail.com>
This commit fixes the intermittent `Recv() error Out of memory [-5]` crashes of worker process by raising the netlink socket's receive buffer size to 1 MiB. A netlink socket's default receive buffer size is 208 KiB (taken from net.core.rmem_default). When incoming messages fill up the receive buffer, the recvmsg(2) call returns -ENOBUFS, which is then translated to the Out of memory error above by libnl. Both NetworkManager and systemd-networkd raise their netlink socket's buffer size to work around the same issue. More details on systemd-networkd's decision can be found at systemd/systemd#14417 and systemd/systemd#14434. Fixes cifsd-team#235. Signed-off-by: database64128 <free122448@hotmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
This commit fixes the intermittent `Recv() error Out of memory [-5]` crashes of worker process by raising the netlink socket's receive buffer size to 1 MiB. A netlink socket's default receive buffer size is 208 KiB (taken from net.core.rmem_default). When incoming messages fill up the receive buffer, the recvmsg(2) call returns -ENOBUFS, which is then translated to the Out of memory error above by libnl. Both NetworkManager and systemd-networkd raise their netlink socket's buffer size to work around the same issue. More details on systemd-networkd's decision can be found at systemd/systemd#14417 and systemd/systemd#14434. Fixes #235. Signed-off-by: database64128 <free122448@hotmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
This commit fixes the intermittent `Recv() error Out of memory [-5]` crashes of worker process by raising the netlink socket's receive buffer size to 1 MiB. A netlink socket's default receive buffer size is 208 KiB (taken from net.core.rmem_default). When incoming messages fill up the receive buffer, the recvmsg(2) call returns -ENOBUFS, which is then translated to the Out of memory error above by libnl. Both NetworkManager and systemd-networkd raise their netlink socket's buffer size to work around the same issue. More details on systemd-networkd's decision can be found at systemd/systemd#14417 and systemd/systemd#14434. Fixes cifsd-team#235. Signed-off-by: database64128 <free122448@hotmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
This commit fixes the intermittent `Recv() error Out of memory [-5]` crashes of worker process by raising the netlink socket's receive buffer size to 1 MiB. A netlink socket's default receive buffer size is 208 KiB (taken from net.core.rmem_default). When incoming messages fill up the receive buffer, the recvmsg(2) call returns -ENOBUFS, which is then translated to the Out of memory error above by libnl. Both NetworkManager and systemd-networkd raise their netlink socket's buffer size to work around the same issue. More details on systemd-networkd's decision can be found at systemd/systemd#14417 and systemd/systemd#14434. Fixes #235. Signed-off-by: database64128 <free122448@hotmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
systemd version the issue has been seen with
241, but the relevant code has not changed since.
Used distribution
Debian 10
Expected behaviour you didn't see
systemd-networkd.service
should start successfully.Unexpected behaviour you saw
With about 200 interfaces (100 VLANs and a bridge on each of them), when starting
systemd-networkd
I get the following error in the journal:Running
sudo SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-networkd
gives a bit more insight:The buffer that is overrun is probably this one somewhere inside this function.
Steps to reproduce the problem
sudo systemctl restart systemd-networkd
The text was updated successfully, but these errors were encountered: