Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds rules outside of DOCKER chain, unlike documented #44816

Open
matthijskooijman opened this issue Jan 13, 2023 · 0 comments
Open

Adds rules outside of DOCKER chain, unlike documented #44816

matthijskooijman opened this issue Jan 13, 2023 · 0 comments
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage

Comments

@matthijskooijman
Copy link

matthijskooijman commented Jan 13, 2023

Description

The documentation says:

All of Docker’s iptables rules are added to the DOCKER chain. Do not manipulate this chain manually. If you need to add rules which load before Docker’s rules, add them to the DOCKER-USER chain. These rules are applied before any rules Docker creates automatically.

This suggests to me that all rules that docker adds are added to this DOCKER chain (which is jumped to from the FORWARD chain) and the main chains would only be modified to jump to this DOCKER chain.

However, in practice, docker also adds rules directly to the FORWARD chain in the filter table and POSTROUTING in the nat table. This makes it a lot harder to apply iptables firewalling rules in addition to the rules created by docker, because it is hard to identify the docker-created rules, because reloading rules by flushing and re-adding rules also flushes the docker rules.

In fact, looking at the current behavior, I really cannot see a reliable and clean way to use additional firewalling rules with docker, especially if POSTROUTING rules are needed.

If docker would add all of its rules to its own chains (and only add simple and predictable rules to the main chains to jump to its own chains), other firewall software can be configured to not flush the docker chains and preserve the (now easy to detect) docker rules in the main chains when reloading, or duplicate these (now simple, predictable and static) jump-to-docker rules in their own config. The latter gives the third-party software even more control over ordering its rules relative to the docker-generated rules.

Another advantage of adding all docker rules to its own chains, is that it is potentially also easier for docker itself to reload its rules (since it can just flush its own chains, I'm suspecting it now has to remove its own specific rules from the main chains).

A second approach to interoperability would be to have third-party firewall software add all of its rules to the DOCKER-USER chain, so it can just flush and refill that chain without interfering with docker's rules. However, this chain is currently only created for the FORWARD chain, instead of for all chains. It would be good if that would be created for all chains as well, which is requested in #40544. I believe that issue and this issue would be good to both solve, since both approaches have their own value.

History
I suspect (but have not looked in the git history) that the original design might have been to put all rules in a DOCKER chain, and have a DOCKER-USER chain for user rules, but perhaps later additions either did not realize this policy when adding new rules to the FORWARD chain instead of the DOCKER chain (or perhaps a specific ordering of rules was needed and it was easier to add rules to FORWARD directly than creating a subchain inside the DOCKER chain). Similarly when adding rules to POSTROUTING. This is just a guess, though.

Implementation
To properly implement the behavior suggested by the documentation, a single DOCKER chain is probably not enough, since then (if all main chains jump to the single DOCKER chain) it is no longer possible to detect the original chain a packet came in. Currently, in the nat table this already happens (both PREROUTING and OUTPUT jump to the same DOCKER chain), but that is probably intentional to allow sharing rules between both chains. However, this is limits future expansion, since any rules added to the DOCKER chain are now necessarily shared between PREROUTING and OUTPUT.

Instead, it seems like docker should create multiple chains, e.g. DOCKER-FORWARD, DOCKER-INPUT, DOCKER-POSTROUTING, etc. and then insert a jump to each of these from the corresponding main chain. Inside these docker-specific chains, docker has all the freedom to add the rules it needs, including jumping to the same DOCKER chain to share rules between multiple chains (maybe that chain should be renamed, though) or using subchains for splitting static rules from dynamic rules to make it easier to manage rules on e.g. reload.

Similarly, the DOCKER-USER chain should be duplicate for each main rule, e.g. DOCKER-USER-INPUT, DOCKER-USER-FORWARD, etc. Or maybe it could be renamed to something more explicit like BEFORE-DOCKER-FORWARD to make the ordering more explicit (maybe also add AFTER-DOCKER-FORWARD for more control over ordering)?

Reproduce

Install docker, show iptables rules generated (including all chain headers so you can see in which chain each rule is):

sudo iptables -t nat -n -v -L | egrep -i '(docker|chain)'
sudo iptables -n -v -L | egrep -i '(docker|chain)'

This returns (I manually removed some libvirt chains that are not relevant here):

matthijs@dottie:~$ sudo iptables -n -v -L | egrep -i '(docker|chain)'
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
2729K 3947M DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
2729K 3947M DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
1691K 3890M ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
1038K   57M ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER     all  --  *      br-dcf7d1b4a116  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER     all  --  *      br-4ed7b2eb6256  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER     all  --  *      br-07218c42302a  0.0.0.0/0            0.0.0.0/0           
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
Chain DOCKER (4 references)
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
1038K   57M DOCKER-ISOLATION-STAGE-2  all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER-ISOLATION-STAGE-2  all  --  br-dcf7d1b4a116 !br-dcf7d1b4a116  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER-ISOLATION-STAGE-2  all  --  br-4ed7b2eb6256 !br-4ed7b2eb6256  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER-ISOLATION-STAGE-2  all  --  br-07218c42302a !br-07218c42302a  0.0.0.0/0            0.0.0.0/0           
Chain DOCKER-ISOLATION-STAGE-2 (4 references)
    0     0 DROP       all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
Chain DOCKER-USER (1 reference)


matthijs@dottie:~$ sudo iptables -t nat -n -v -L | egrep -i '(docker|chain)'
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
  332  195K DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    0     0 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
  765 47233 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0           
Chain DOCKER (2 references)
    0     0 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0           

In particular not these non-static (depends on interface names) and non-trivial rules added to the FORWARD chain:

1691K 3890M ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
1038K   57M ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER     all  --  *      br-dcf7d1b4a116  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER     all  --  *      br-4ed7b2eb6256  0.0.0.0/0            0.0.0.0/0           
    0     0 DOCKER     all  --  *      br-07218c42302a  0.0.0.0/0            0.0.0.0/0 

And this rule added to POSTROUTING:

  765 47233 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0           

These are all with some docker containers running, but none of them exposing any ports on the host (so no DNAT rules are added to the DOCKER chain).

Expected behavior

No response

docker version

Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.17.3
 Git commit:        20.10.12-0ubuntu4
 Built:             Mon Mar  7 17:10:06 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.3
  Git commit:       20.10.12-0ubuntu4
  Built:            Mon Mar  7 15:57:50 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.9-0ubuntu3.1
  GitCommit:        
 runc:
  Version:          1.1.0-0ubuntu1.1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:

docker info

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 11
  Running: 2
  Paused: 0
  Stopped: 9
 Images: 65
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.15.0-25-generic
 Operating System: Ubuntu 22.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.42GiB
 Name: dottie
 ID: MVIO:X4HE:VVWK:D43M:SEEU:5E2Z:EFA6:PUPQ:LTQB:T6OS:XME3:HO4Q
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

Note that this is essentially the same issue as docker/cli#3698, which was posted to the wrong repo but never reposted after being closed there.

@matthijskooijman matthijskooijman added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage
Projects
None yet
Development

No branches or pull requests

2 participants