Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to receive UDP traffic after container restart #8795

Closed
mpeterss opened this issue Oct 27, 2014 · 94 comments
Closed

Failed to receive UDP traffic after container restart #8795

mpeterss opened this issue Oct 27, 2014 · 94 comments

Comments

@mpeterss
Copy link

@mpeterss mpeterss commented Oct 27, 2014

I start a container and share a port for UDP traffic as this:

docker run —rm -p 5060:5060/udp —name host1 -i -t ubuntu:14.04

Then in that container I wait for traffic with:

nc -u -l 5060

I then generate traffic from another machine:

nc -u <docker_host_ip> 5060

Then everything works fine and I can see that I receive the UDP traffic in the container.

But when I exit the container and do the same thing again, then I can no longer receive UDP traffic in the docker container.
If I wait for about 5 minutes before I start to send it will work though. I have also noticed that if the sender change the port it is binding to locally it will also work. So there seems to be some mapping that is not deleted when the docker container is removed.

@liyichao
Copy link

@liyichao liyichao commented Nov 29, 2014

This issue is due to conntrack. The linux kernel keeps state of each connection. Even though
udp is connectionless, if you use

sudo cat /proc/net/ip_conntrack

you will see a lot of entries. The output shows that the container address is still the last one before restart, the state also prevents packet form arriving at the new container, the reason is this:

For a connection, the first packet will go through the iptables's NAT table, and that's where docker routes packet to its own chain, then to the right container.

When you restart the container, container's ip has changed, so the DNAT rule, which will route to the new address. But the old connection's state in conntrack is not cleared. So when a packet arrives, it will not go through NAT table again, because it is not "the first" packet. So the solution is clearing the conntrack, which can be done as follows:

sudo conntrack -D -p udp

(you will need sudo apt-get install conntrack)

Looking forward to Docker's solution.

@ljakob
Copy link

@ljakob ljakob commented Dec 19, 2014

Same problem on my side (openvpn within a container). I could resolve it temporary with

iptables --table raw --append PREROUTING --protocol udp --source-port 4000 --destination-port 4000 --jump NOTRACK

Running on docker host. It's ugly but gets the job done.

IMHO the correct solution would be to clean up to conntrack-table after adjusting iptables.

@blalor
Copy link

@blalor blalor commented Jan 5, 2015

Definitely looking forward to a fix for this one.

@LK4D4
Copy link
Contributor

@LK4D4 LK4D4 commented Jan 5, 2015

Seems like working for me with 3.18.0 kernel.

@erikh
Copy link
Contributor

@erikh erikh commented Jan 5, 2015

The UDP proxy has always had issues with packet loss, we've never found
a good answer for it.

-Erik

On Mon, Jan 5, 2015 at 9:26 AM, Alexander Morozov
notifications@github.com wrote:

Seems like working for me with 3.18.0 kernel.


Reply to this email directly or view it on GitHub.

@blalor
Copy link

@blalor blalor commented Jan 6, 2015

I'm using CentOS 6.6, kernel 2.6.32-504.1.3.el6.x86_64. Seems like Docker should be responsible for (or at least facilitate through configuration) expiring conntrack table entries.

@technolo-g
Copy link

@technolo-g technolo-g commented Feb 3, 2015

I too would like to see some real solution to this.

@nmarasoiu
Copy link

@nmarasoiu nmarasoiu commented Mar 5, 2015

Hi, we would also need to know when this issue makes progress, what the impediments to fix this bug? Can we help in any way with details? We run Consul and at some point (I guess after some restarts), the nodes start "suspecting each other" (per gossip protocol); the nodes can receive the udp that they are being suspected and they try to reply with hey i am alive, but the reply never reaches destination.

Is this a priority? is it hard to reproduce or debug? can we help with more concrete data?
i reproduced it with kernel 3.13

@grimmy
Copy link

@grimmy grimmy commented May 7, 2015

Flushing the conntrack table worked for me, but I'm running on a dev machine and not prod, I'll have to give @liyichao's answer a go if/when we hit this in prod.

@grimmy
Copy link

@grimmy grimmy commented May 12, 2015

Is there any reason why the conntrack entries can't just be removed when docker determines a container stopped?

@ljakob
Copy link

@ljakob ljakob commented May 13, 2015

@grimmy No, fix should be not too difficult to implement. After removing iptables-entries just call conntrack --delete with similar arguments (ip + port)

@grimmy
Copy link

@grimmy grimmy commented May 13, 2015

ok, that's what i figured. I'll see if i can find some time to put a pull request together unless someone else wants to jump on it.

@nmarasoiu
Copy link

@nmarasoiu nmarasoiu commented May 22, 2015

Hi,

I applied a patch in the cleanup callback of mapper.go, adding conntrack delete for the container ip as the source ip in 3 places in mapper.go, including Unmap, and cleanup functions. Did not succeed ie. serf gossip protocol which i run over udp complains that packages do not make across and blacklist other nodes in their memberlist. Either there must be other places to do this, or this should be also done on the remote nodes.

Normally this should be done via accessible "objects", but I have not found a suitable one, either in docker or as a golang import, and started by calling a command in the OS (which of course is not a portable solution, but one to check assumptions).

cleanup := func() error {
    // need to undo the iptables rules before we return
    if m.userlandProxy != nil {
        m.userlandProxy.Stop()
    }
    pm.forward(iptables.Delete, m.proto, hostIP, allocatedHostPort, containerIP.String(), containerPort)
    if err := pm.Allocator.ReleasePort(hostIP, m.proto, allocatedHostPort); err != nil {
        return err
    }
    exec.Command("/usr/sbin/conntrack", "-D", "-s", containerIP.String()).Run()
    return exec.Command("/usr/sbin/conntrack", "-F").Run()
}
@nmarasoiu
Copy link

@nmarasoiu nmarasoiu commented May 27, 2015

Hi, any feedback on my attempt to start a way to fix this?

@grimmy
Copy link

@grimmy grimmy commented May 27, 2015

I've been cheating locally when it happens by using "conntrack -F" next time it happens I'll try with just the specific ip address.

@nmarasoiu
Copy link

@nmarasoiu nmarasoiu commented May 27, 2015

hi,

but i called -F too, probably in the wrong place.

for sure only the local tablea need to be flushed, not the remote ones
right?

În data de miercuri, 27 mai 2015, grimmy notifications@github.com a scris:

I've been cheating locally when it happens by using "conntrack -F" next
time it happens I'll try with just the specific ip address.


Reply to this email directly or view it on GitHub
#8795 (comment).

@grimmy
Copy link

@grimmy grimmy commented May 27, 2015

I haven't had to do anything on the remote end. But I do have multiple containers talking to each other and external devices over UDP. The first time this happend (and I discovered that it was conntrack) was that there was a conntrack entry for an external device pointing to an old container. Doing "conntrack -F" cleared that and then the next packet from that external device made it to the correct container.

@berglh
Copy link

@berglh berglh commented Jun 9, 2015

So we're running StatsD in a Docker container on RHEL 7 and ran into this problem when the Docker service is restarted, which in turn restarts Docker. The UDP packet to StatsD were arriving on interface but not making it through to the container and IPTables wasn't blocking it, which led us to this thread.

The solution for us was to use conntrack to delete only the states for the things that are not working so that we have the least impact on existing states. In the SystemD unit file that launches the Docker container for StatsD, running a ExecStartPre with conntrack to delete the specific states that are UDP and 8125 has solved this problem for us. Running conntrack -F really seems a bit brute force for our requirements:

# grep -B1 run /etc/systemd/system/statsd.service 
ExecStartPre=/sbin/conntrack -D -p udp --orig-port-dst 8125
ExecStart=/usr/bin/docker run -p 8125:8125/udp -p 8126:8126 \
@grimmy
Copy link

@grimmy grimmy commented Jun 9, 2015

Yes the -F has only been preformed on dev workstations and of course not in prod. This really just needs to be fixed in docker but @nmarasoiu hasn't had any success and I haven't had time to fix it either.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jun 14, 2017

@Hermain Looking at the comment above, @fcrisciani was not able to reproduce; udp traffic stopped when the dst container was killed, and started again when the dst container was started. Can you give more details? Exact steps to reproduce?

@fcrisciani
Copy link
Contributor

@fcrisciani fcrisciani commented Jun 14, 2017

@Hermain the issue that you are experiencing is most likely the one fixed by this PR: moby/libnetwork#1792 the 17.06-rc3 is out, you should try with that image to confirm that the issue is fixed.

That 5min delay to reconcile is exactly matching with the expiration time of the mac entry

@Hermain
Copy link

@Hermain Hermain commented Oct 4, 2017

The issue still persists. I now used the same tools as @fcrisciani to reproduce it again independently of logstash.

I have a one node docker swarm and use nicolaka/netshoot which I start with docker stack deploy and the following compose file:

version: "3.1"
services:
  udpReceiver:
    image: nicolaka/netshoot
    ports:
      - "127.0.0.1:12201:12201/udp"
    command: tcpdump -eni any udp and port 12201

If I now generate logs with docker run:

docker run --log-driver=gelf --log-opt gelf-address=udp://127.0.0.1:12201  ubuntu /bin/sh -c 'COUNTER=1;while true; do date "+%Y-%m-%d %H:%M:%S.%3N" | xargs printf "%s %s | 51c489da-2ba7-466e-abe1-14c236de54c5 | INFO | HostingLoggerExtensions.RequestFinished    | $COUNTER\n"; COUNTER=$((COUNTER+1)); sleep 1; done' 

and user docker logs on the netshoot container. I can see that it receives udp packages as expected. I leave the log generating container running and sending logs.

If I now docker kill ${netStatContainerId} docker swarm will spin up a new netstat and the logs from the log generator will not reach it. (I test this by doing docker logs on the new container and waiting for a minute --> Nothing happens)

If I stop the log generator and start a new one those logs reach the application.

Looks like this bug right?

Docker version returns:

Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:18 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:56 2017
 OS/Arch:      linux/amd64
 Experimental: false
@Hermain
Copy link

@Hermain Hermain commented Oct 6, 2017

Or an even easier way to reproduce the problem:
Start the log generator from my last post, start the log receiver --> Logs will never reach the receiver as witnessed with docker service logs udpReceiver.
With 17.05 at least it recovered after 5 minutes now it never recovers.

@amithgeorge
Copy link

@amithgeorge amithgeorge commented Jan 3, 2018

Any updates on this? We are seeing the same issue with web app containers sending udp messages to a rsyslog container. If the rsyslog container is killed and started again, the web app containers also need to be killed and started again for the udp messages to reach the rsyslog container. This is super weird. We are still on 17.05.0-ce, build 89658be. Unlike with what @Hermain posted, it doesn't work even after 5 mins. Only a stop/start fixes this.

@levesquejf
Copy link

@levesquejf levesquejf commented Apr 30, 2018

I have the same issue with version 17.12.1-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3 running on AWS ECS. An outside host is sending UDP packets to the container every 500ms. Before I restart the container, everything is running fine. Once the container is restarted, I have around 50% of packet loss. When I have the issue, by using tcpdump inside the container, I see all the ingress and egress packets. However, when I run tcpdump on the docker host, I see half of the packet coming from the container.

The workaround conntrack -F is working for me.

@praseodym
Copy link

@praseodym praseodym commented Apr 30, 2018

I see the same behaviour; the conntrack table is not flushed on a container restart so packets will be effectively blackholed until it is.

Only UDP ‘connections’ with the same source IP+port pair are affected, so not everyone will be hitting this bug.

@chrisxaustin
Copy link

@chrisxaustin chrisxaustin commented Apr 30, 2018

I saw this behaviour on an instance that received ~4k syslog messages per second.
This didn't only happen after a restart though, the container would stop seeing traffic from some of the sources until I used conntrack to clear the table. Tcpdump on the host showed the traffic, but the container never saw it.

I've stopped using docker for that particular service since I can't afford to lose logs, and I couldn't find a solution.

@levesquejf
Copy link

@levesquejf levesquejf commented May 1, 2018

@fcrisciani I understand you worked on this last year. Since this issue is currently closed but still present in 17.12.1, would it be better to have a new issue created? Is there any information you need to reproduce the issue? Let me know if I can help to get that fixed.

@mman
Copy link

@mman mman commented May 3, 2018

Since this issue is real, I'm attaching my least invasive solution I have found to work rather reliably, use nohup(1) or any other system dependent mechanism to keep this script running in the background and watch container starting up to clean conntrack entries corresponding to given container name that exposes given UDP port. Modify the script to adjust c and p appropriately.

#!/bin/bash

export PATH=/bin:/usr/bin:/sbin:/usr/sbin

# modify c and p to match your container name and UDP port
c=YOUR_CONTAINER_NAME
p=12345

docker events --filter type=container --filter event=start --filter container=$c | while read
do
    logger "$c restarted"
    conntrack -D -p udp --orig-port-dst $p 2>&1 >/dev/null
done

Since docker touches iptables on Linux host when containers are created, I do believe that it should also properly cleanup conntrack mappings belonging to the container.

@fcrisciani
Copy link
Contributor

@fcrisciani fcrisciani commented May 10, 2018

@Hermain @mman looking into this, as far as I can see the problem seems to be in the state maintained inside the ipvs connection table.
As for now if you enter in the ingress sandbox: nsenter --net=/var/run/docker/netns/ingress_sbox of each single node and you enable this knob: echo 1 > /proc/sys/net/ipv4/vs/expire_nodest_conn, this will allow ipvs to automatically purge the connection once the ipvs backend is not anymore available. With that the reproduction mentioned in #8795 (comment) seems to work properly. Still anyway working on making sure that that is enough

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented May 14, 2018

@fcrisciani should we reopen this issue, or is this a different cause as the original one and should we have a new issue for tracking?

@levesquejf
Copy link

@levesquejf levesquejf commented Jul 26, 2018

@thaJeztah @fcrisciani Is the referenced issue moby/libnetwork#2154 fixing the UDP packet loss issue after container restart?

@fcrisciani
Copy link
Contributor

@fcrisciani fcrisciani commented Jul 26, 2018

@levesquejf also this one is needed: moby/libnetwork#2243

@dpajin
Copy link

@dpajin dpajin commented Nov 20, 2019

I still see the same issue after 5 years since it was opened. Although, I am using the latest Docker version 19.03.5, in Docker Swarm and starting my services with docker stack. Container is

It is exactly the same behavior as described in the initial post, when I first create stack it works. When I remove it and create again, containers do not receive traffic for exactly 5 minutes (300 seconds) and then they start to receive it. Deleting connections with conntrack at any point does not help at all.

I have changes all parameters from sysct related to networking which had timeout or any parameter for 300 seconds to 30 seconds, but that did not changed behavior.

Also, as mentioned in the first post, if I change the source port of the UDP sender, containers start to receive traffic.

Containers are deployed over 3 docker swarm nodes with global mode. I don't have any specific configuration for network except exposing UDP ports where target and published ports are the same.

I don't believe that this issue is actually fixed. Any ideas what I could try more?

@mman
Copy link

@mman mman commented Nov 21, 2019

@dpajin I believe the issue is not fixed. I have successfully settled on the following workaround that seems to be stable for me for more than a year:

Somewhere in /etc/sysctl.conf I use value of 10 seconds for super aggressive timeouts, choose your own appropriate one.

net.netfilter.nf_conntrack_udp_timeout = 10
net.netfilter.nf_conntrack_udp_timeout_stream = 10
@dpajin
Copy link

@dpajin dpajin commented Nov 21, 2019

@mman, thank you for your reply and suggestion. Strangely, your change does not have any effect on the behavior in my setup. I will investigate this a bit more and eventually open a new issue.

@lucasbritos
Copy link

@lucasbritos lucasbritos commented Apr 20, 2020

@mman, thank you for your reply and suggestion. Strangely, your change does not have any effect on the behavior in my setup. I will investigate this a bit more and eventually open a new issue.

Not working for me either. Probably because I have a constant UDP traffic rate which restart timeout counters

@hannip
Copy link

@hannip hannip commented Jul 10, 2020

This is still happening using docker-ce-19.03.8-3.
Note that on rhel 7.8 the command to see the conntrack entries is different.
cat /proc/net/nf_conntrack

I had to yum install conntrack then issue
conntrack -D -p udp

to get udp traffic to start flowing to the container again after a restart.

@jhmartin
Copy link

@jhmartin jhmartin commented Apr 28, 2021

How about docker detect that it has launched a container that maps a udp port, and flushes the conntrack state for that port?

trebonian pushed a commit to trebonian/docker that referenced this issue Jun 3, 2021
Flush all the endpoint flows when the external
connectivity is removed.
This will prevent issues where if there is a flow
in conntrack this will have precedence and will
let the packet skip the POSTROUTING chain.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.