Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to receive UDP traffic after container restart #8795

Closed
mpeterss opened this issue Oct 27, 2014 · 96 comments · Fixed by #32505 or #44742
Closed

Failed to receive UDP traffic after container restart #8795

mpeterss opened this issue Oct 27, 2014 · 96 comments · Fixed by #32505 or #44742
Labels
area/networking exp/expert kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Milestone

Comments

@mpeterss
Copy link

I start a container and share a port for UDP traffic as this:

docker run —rm -p 5060:5060/udp —name host1 -i -t ubuntu:14.04

Then in that container I wait for traffic with:

nc -u -l 5060

I then generate traffic from another machine:

nc -u <docker_host_ip> 5060

Then everything works fine and I can see that I receive the UDP traffic in the container.

But when I exit the container and do the same thing again, then I can no longer receive UDP traffic in the docker container.
If I wait for about 5 minutes before I start to send it will work though. I have also noticed that if the sender change the port it is binding to locally it will also work. So there seems to be some mapping that is not deleted when the docker container is removed.

@liyichao
Copy link

This issue is due to conntrack. The linux kernel keeps state of each connection. Even though
udp is connectionless, if you use

sudo cat /proc/net/ip_conntrack

you will see a lot of entries. The output shows that the container address is still the last one before restart, the state also prevents packet form arriving at the new container, the reason is this:

For a connection, the first packet will go through the iptables's NAT table, and that's where docker routes packet to its own chain, then to the right container.

When you restart the container, container's ip has changed, so the DNAT rule, which will route to the new address. But the old connection's state in conntrack is not cleared. So when a packet arrives, it will not go through NAT table again, because it is not "the first" packet. So the solution is clearing the conntrack, which can be done as follows:

sudo conntrack -D -p udp

(you will need sudo apt-get install conntrack)

Looking forward to Docker's solution.

@ljakob
Copy link

ljakob commented Dec 19, 2014

Same problem on my side (openvpn within a container). I could resolve it temporary with

iptables --table raw --append PREROUTING --protocol udp --source-port 4000 --destination-port 4000 --jump NOTRACK

Running on docker host. It's ugly but gets the job done.

IMHO the correct solution would be to clean up to conntrack-table after adjusting iptables.

@blalor
Copy link

blalor commented Jan 5, 2015

Definitely looking forward to a fix for this one.

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 5, 2015

Seems like working for me with 3.18.0 kernel.

@erikh
Copy link
Contributor

erikh commented Jan 5, 2015

The UDP proxy has always had issues with packet loss, we've never found
a good answer for it.

-Erik

On Mon, Jan 5, 2015 at 9:26 AM, Alexander Morozov
notifications@github.com wrote:

Seems like working for me with 3.18.0 kernel.


Reply to this email directly or view it on GitHub.

@blalor
Copy link

blalor commented Jan 6, 2015

I'm using CentOS 6.6, kernel 2.6.32-504.1.3.el6.x86_64. Seems like Docker should be responsible for (or at least facilitate through configuration) expiring conntrack table entries.

@technolo-g
Copy link

I too would like to see some real solution to this.

@nmarasoiu
Copy link

Hi, we would also need to know when this issue makes progress, what the impediments to fix this bug? Can we help in any way with details? We run Consul and at some point (I guess after some restarts), the nodes start "suspecting each other" (per gossip protocol); the nodes can receive the udp that they are being suspected and they try to reply with hey i am alive, but the reply never reaches destination.

Is this a priority? is it hard to reproduce or debug? can we help with more concrete data?
i reproduced it with kernel 3.13

@spf13 spf13 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/help-wanted exp/expert and removed /project/helpwanted exp/expert labels Mar 21, 2015
@grimmy
Copy link

grimmy commented May 7, 2015

Flushing the conntrack table worked for me, but I'm running on a dev machine and not prod, I'll have to give @liyichao's answer a go if/when we hit this in prod.

@grimmy
Copy link

grimmy commented May 12, 2015

Is there any reason why the conntrack entries can't just be removed when docker determines a container stopped?

@ljakob
Copy link

ljakob commented May 13, 2015

@grimmy No, fix should be not too difficult to implement. After removing iptables-entries just call conntrack --delete with similar arguments (ip + port)

@grimmy
Copy link

grimmy commented May 13, 2015

ok, that's what i figured. I'll see if i can find some time to put a pull request together unless someone else wants to jump on it.

@nmarasoiu
Copy link

Hi,

I applied a patch in the cleanup callback of mapper.go, adding conntrack delete for the container ip as the source ip in 3 places in mapper.go, including Unmap, and cleanup functions. Did not succeed ie. serf gossip protocol which i run over udp complains that packages do not make across and blacklist other nodes in their memberlist. Either there must be other places to do this, or this should be also done on the remote nodes.

Normally this should be done via accessible "objects", but I have not found a suitable one, either in docker or as a golang import, and started by calling a command in the OS (which of course is not a portable solution, but one to check assumptions).

cleanup := func() error {
    // need to undo the iptables rules before we return
    if m.userlandProxy != nil {
        m.userlandProxy.Stop()
    }
    pm.forward(iptables.Delete, m.proto, hostIP, allocatedHostPort, containerIP.String(), containerPort)
    if err := pm.Allocator.ReleasePort(hostIP, m.proto, allocatedHostPort); err != nil {
        return err
    }
    exec.Command("/usr/sbin/conntrack", "-D", "-s", containerIP.String()).Run()
    return exec.Command("/usr/sbin/conntrack", "-F").Run()
}

@nmarasoiu
Copy link

Hi, any feedback on my attempt to start a way to fix this?

@grimmy
Copy link

grimmy commented May 27, 2015

I've been cheating locally when it happens by using "conntrack -F" next time it happens I'll try with just the specific ip address.

@nmarasoiu
Copy link

hi,

but i called -F too, probably in the wrong place.

for sure only the local tablea need to be flushed, not the remote ones
right?

În data de miercuri, 27 mai 2015, grimmy notifications@github.com a scris:

I've been cheating locally when it happens by using "conntrack -F" next
time it happens I'll try with just the specific ip address.


Reply to this email directly or view it on GitHub
#8795 (comment).

@grimmy
Copy link

grimmy commented May 27, 2015

I haven't had to do anything on the remote end. But I do have multiple containers talking to each other and external devices over UDP. The first time this happend (and I discovered that it was conntrack) was that there was a conntrack entry for an external device pointing to an old container. Doing "conntrack -F" cleared that and then the next packet from that external device made it to the correct container.

@berglh
Copy link

berglh commented Jun 9, 2015

So we're running StatsD in a Docker container on RHEL 7 and ran into this problem when the Docker service is restarted, which in turn restarts Docker. The UDP packet to StatsD were arriving on interface but not making it through to the container and IPTables wasn't blocking it, which led us to this thread.

The solution for us was to use conntrack to delete only the states for the things that are not working so that we have the least impact on existing states. In the SystemD unit file that launches the Docker container for StatsD, running a ExecStartPre with conntrack to delete the specific states that are UDP and 8125 has solved this problem for us. Running conntrack -F really seems a bit brute force for our requirements:

# grep -B1 run /etc/systemd/system/statsd.service 
ExecStartPre=/sbin/conntrack -D -p udp --orig-port-dst 8125
ExecStart=/usr/bin/docker run -p 8125:8125/udp -p 8126:8126 \

@grimmy
Copy link

grimmy commented Jun 9, 2015

Yes the -F has only been preformed on dev workstations and of course not in prod. This really just needs to be fixed in docker but @nmarasoiu hasn't had any success and I haven't had time to fix it either.

@levesquejf
Copy link

levesquejf commented Jul 26, 2018

@thaJeztah @fcrisciani Is the referenced issue moby/libnetwork#2154 fixing the UDP packet loss issue after container restart?

@fcrisciani
Copy link
Contributor

@levesquejf also this one is needed: moby/libnetwork#2243

@dpajin
Copy link

dpajin commented Nov 20, 2019

I still see the same issue after 5 years since it was opened. Although, I am using the latest Docker version 19.03.5, in Docker Swarm and starting my services with docker stack. Container is

It is exactly the same behavior as described in the initial post, when I first create stack it works. When I remove it and create again, containers do not receive traffic for exactly 5 minutes (300 seconds) and then they start to receive it. Deleting connections with conntrack at any point does not help at all.

I have changes all parameters from sysct related to networking which had timeout or any parameter for 300 seconds to 30 seconds, but that did not changed behavior.

Also, as mentioned in the first post, if I change the source port of the UDP sender, containers start to receive traffic.

Containers are deployed over 3 docker swarm nodes with global mode. I don't have any specific configuration for network except exposing UDP ports where target and published ports are the same.

I don't believe that this issue is actually fixed. Any ideas what I could try more?

@mman
Copy link

mman commented Nov 21, 2019

@dpajin I believe the issue is not fixed. I have successfully settled on the following workaround that seems to be stable for me for more than a year:

Somewhere in /etc/sysctl.conf I use value of 10 seconds for super aggressive timeouts, choose your own appropriate one.

net.netfilter.nf_conntrack_udp_timeout = 10
net.netfilter.nf_conntrack_udp_timeout_stream = 10

@dpajin
Copy link

dpajin commented Nov 21, 2019

@mman, thank you for your reply and suggestion. Strangely, your change does not have any effect on the behavior in my setup. I will investigate this a bit more and eventually open a new issue.

@lucasbritos
Copy link

lucasbritos commented Apr 20, 2020

@mman, thank you for your reply and suggestion. Strangely, your change does not have any effect on the behavior in my setup. I will investigate this a bit more and eventually open a new issue.

Not working for me either. Probably because I have a constant UDP traffic rate which restart timeout counters

@hannip
Copy link

hannip commented Jul 10, 2020

This is still happening using docker-ce-19.03.8-3.
Note that on rhel 7.8 the command to see the conntrack entries is different.
cat /proc/net/nf_conntrack

I had to yum install conntrack then issue
conntrack -D -p udp

to get udp traffic to start flowing to the container again after a restart.

@jhmartin
Copy link

How about docker detect that it has launched a container that maps a udp port, and flushes the conntrack state for that port?

trebonian pushed a commit to trebonian/docker that referenced this issue Jun 3, 2021
Flush all the endpoint flows when the external
connectivity is removed.
This will prevent issues where if there is a flow
in conntrack this will have precedence and will
let the packet skip the POSTROUTING chain.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
@markfqs
Copy link

markfqs commented Oct 15, 2021

Could someone reopen this issue?
This issue still exists on engine version 20.10.9

Having low timeouts on conntrack table doesn't fix the problem for my setup:

net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 120

The only way is to flush the table with:

conntrack -D -p udp

vincentbernat added a commit to vincentbernat/libnetwork that referenced this issue Mar 27, 2022
When a specific port is requested, we do not start a dummy proxy to
mark the port not available anymore. Doing so trigger numerous
problems, notably with UDP.

 - moby/moby#28589
 - moby/moby#8795
 - moby#2423

This change requires altering tests. It also becomes possible to map
to a port already used by the host, shadowing it. There is also the
possibility of using a port in the dynamic range which is already in
use by the host (but this case was not handled gracefully either,
making the whole allocation fails instead of trying the next port).

Alternatively, the change could be done only for UDP where the problem
is more apparent. Or it could be configurable.

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
thaJeztah pushed a commit that referenced this issue Jun 3, 2022
There is a race condition between the local proxy and iptables rule
setting. When we have a lot of UDP traffic, the kernel will create
conntrack entries to the local proxy and will ignore the iptables
rules set after that.

Related to PR #32505. Fix #8795.

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
akerouanton added a commit to akerouanton/docker that referenced this issue Jan 4, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

Fixes (at least) moby#44688, moby#8795, moby#16720, moby#7540.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
akerouanton added a commit to akerouanton/docker that referenced this issue Jan 4, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

Fixes (at least) moby#44688, moby#8795, moby#16720, moby#7540.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
akerouanton added a commit to akerouanton/docker that referenced this issue Jan 5, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes moby#44688
- Fixes moby#8795
- Fixes moby#16720
- Fixes moby#7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
corhere pushed a commit to corhere/moby that referenced this issue Jan 5, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes moby#44688
- Fixes moby#8795
- Fixes moby#16720
- Fixes moby#7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit b37d343)
Signed-off-by: Cory Snider <csnider@mirantis.com>
akerouanton added a commit to akerouanton/docker that referenced this issue Jan 5, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes moby#44688
- Fixes moby#8795
- Fixes moby#16720
- Fixes moby#7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit b37d343)
@SkySai1
Copy link

SkySai1 commented Mar 31, 2024

Hello everyone, i confirm the bogus in 20.10 version at 2024y. I broke my mind while making DNS load balancer, Fixed it to update docker to 24.02 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking exp/expert kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet