Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-proxy processes UDP packets from a remote host #44688

Closed
PlushBeaver opened this issue Dec 22, 2022 · 2 comments · Fixed by #44742
Closed

docker-proxy processes UDP packets from a remote host #44688

PlushBeaver opened this issue Dec 22, 2022 · 2 comments · Fixed by #44742
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage

Comments

@PlushBeaver
Copy link

PlushBeaver commented Dec 22, 2022

Description

A remote host (10.8.4.1) is sending a stream of UDP packets to the Docker host (10.8.4.2), e.g. 10.8.4.1:10000 → 10.8.4.2:9555, so that there is a conntrack record in the kernel. Then a container listening for this UDP stream starts. Despite originating from the remote host, the stream gets handled by docker-proxy. The source address of the packets seen in the container is replaced by the address of the Docker bridge. If conntrack records are flushed or the UDP stream is restarted, the packets start reaching the container with the original source address (10.8.4.1).

Reproduce

On the remote host (10.8.4.1), hping3 must be installed.
On the Docker host (10.8.4.2), conntrack, ss, and tcpdump must be installed.
Docker daemon settings are the default.

  1. On the remote host, start sending packets:
    sudo hping3 --udp --flood --keep --destport 9555 --data 1 10.8.4.2
  2. Check connection tracking state on the Docker host:
    conntrack -L -p 17
  3. Run the container:
    docker run --rm -d --name server -p 9555:9555/udp raesene/ncat -nul 0.0.0.0 9555
  4. Check that the container receives packets with the source address of the Docker bridge:
    nsenter -n -t docker inspect -f '{{ .State.Pid }}' server tcpdump -nvi eth0 udp | head
  5. Check that docker-proxy is involved:
    ss -aunp | grep docker-proxy
  6. Remove tracked connections:
    conntrack -D -p 17 --dport 9555
  7. Check that the container now receives packets with the remote host source address:
    nsenter -n -t $(docker inspect -f '{{ .State.Pid }}' server) tcpdump -nvi eth0 udp | head
  8. Cleanup:
    docker stop server

Expected behavior

The container receives packets with the remote host source address from the start.

docker version

Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.2
 Git commit:        20.10.12-0ubuntu2~20.04.1
 Built:             Wed Apr  6 02:14:38 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.2
  Git commit:       20.10.12-0ubuntu2~20.04.1
  Built:            Thu Feb 10 15:03:35 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.9-0ubuntu1~20.04.6
  GitCommit:        
 runc:
  Version:          1.1.0-0ubuntu1~20.04.2
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:

docker info

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-135-generic
 Operating System: Ubuntu 20.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 3.839GiB
 Name: ubuntu2004
 ID: 7USK:YG7W:B6OA:NTQX:2HLQ:IHPA:SED4:XYGZ:PAR7:4PQ2:DNMU:AMJP
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  docker.netdike
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional Info

No response

@PlushBeaver PlushBeaver added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Dec 22, 2022
@akerouanton
Copy link
Member

akerouanton commented Jan 2, 2023

I believe #8795 and #35135 are related to the root cause of this issue. I tried to reproduce this bug with and without #43409, but with no luck - this fix is incomplete.

On my tests, I used hping3 --fast instead of --flood, which sends 10 pps and shows replies. It makes even more clear that the kernel (on test_vm2) creates a conntrack entry even if there's no socket listening to that UDP port:

vagrant@testvm1$ sudo hping3 --udp --fast --keep --destport 9555 --data 1 10.8.4.3
HPING 10.8.4.3 (eth1 10.8.4.3): udp mode set, 28 headers + 1 data bytes
ICMP Port Unreachable from ip=10.8.4.3 name=UNKNOWN   
status=0 port=1817 seq=0

vagrant@testvm2$ sudo conntrack -L -p udp --dport 9555
udp      17 22 src=10.8.4.2 dst=10.8.4.3 sport=1817 dport=9555 [UNREPLIED] src=172.17.0.2 dst=10.8.4.2 sport=9555 dport=1817 mark=0 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.

For completeness, I also used iptables -j TRACE to make apparent that iptables doesn't re-evaluate NAT rules once a matching conntrack entry exists, even if new rules have been added meanwhile. As such, once the server container is started, the kernel ignores the NAT rules and send traffic to the userland proxy.

#43409 assumes there's a race condition between the userland proxy start up and iptables rules setup. In fact this issue shows the problem only lies in conntrack entries being created before nat rules are applied, and thus are totally ignored by netfilter. Moreover, this fix is incomplete because it calls clearEndpointConnections, which flushes entries based on endpoint IP address. Instead, we need to flush entries based on dest port.

Also note that, unlike UDP, TCP and probably SCTP are totally immune to this issue since the handshake is aborted if nothing listen on dest port, and thus no conntrack entry is created.

I'm going to submit a patch today 🙂

EDIT: This is a duplicate of #16720.

akerouanton added a commit to akerouanton/docker that referenced this issue Jan 4, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

Fixes (at least) moby#44688, moby#8795, moby#16720, moby#7540.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
akerouanton added a commit to akerouanton/docker that referenced this issue Jan 4, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

Fixes (at least) moby#44688, moby#8795, moby#16720, moby#7540.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
akerouanton added a commit to akerouanton/docker that referenced this issue Jan 5, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes moby#44688
- Fixes moby#8795
- Fixes moby#16720
- Fixes moby#7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
@PlushBeaver
Copy link
Author

Thank you, @akerouanton! I've tested e927539, the source address is correct.

corhere pushed a commit to corhere/moby that referenced this issue Jan 5, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes moby#44688
- Fixes moby#8795
- Fixes moby#16720
- Fixes moby#7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit b37d343)
Signed-off-by: Cory Snider <csnider@mirantis.com>
akerouanton added a commit to akerouanton/docker that referenced this issue Jan 5, 2023
Conntrack entries are created for UDP flows even if there's nowhere to
route these packets (ie. no listening socket and no NAT rules to
apply). Moreover, iptables NAT rules are evaluated by netfilter only
when creating a new conntrack entry.

When Docker adds NAT rules, netfilter will ignore them for any packet
matching a pre-existing conntrack entry. In such case, when
dockerd runs with userland proxy enabled, packets got routed to it and
the main symptom will be bad source IP address (as shown by moby#44688).

If the publishing container is run through Docker Swarm or in
"standalone" Docker but with no userland proxy, affected packets will
be dropped (eg. routed to nowhere).

As such, Docker needs to flush all conntrack entries for published UDP
ports to make sure NAT rules are correctly applied to all packets.

- Fixes moby#44688
- Fixes moby#8795
- Fixes moby#16720
- Fixes moby#7540
- Fixes moby/libnetwork#2423
- and probably more.

As a precautionary measure, those conntrack entries are also flushed
when revoking external connectivity to avoid those entries to be reused
when a new sandbox is created (although the kernel should already
prevent such case).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit b37d343)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants