bridge: protect bridge subnet from direct external access in raw PREROUTING#52224
bridge: protect bridge subnet from direct external access in raw PREROUTING#52224agners wants to merge 6 commits intomoby:masterfrom
Conversation
This adds two patches with fixes/improvements for the Docker engine - `0001-daemon-respect-explicit-AppArmor-profile-on-privileg.patch`: Makes sure that AppArmor rules are always loaded, also on reboot. This is a long standing bug in Docker and affects Supervisor which is a privileged container with an AppArmor profile. Upstream PR: moby/moby#52215 - `0002-bridge-protect-bridge-subnet-from-direct-external-ac.patch`: Makes sure that the whole network (including gateway IP) of any Docker bridge network in NAT mode is firewalled from access from the outside. This essentially implements on Docker level what Supervisor applies on startup with home-assistant/supervisor#6650. Upstream PR: moby/moby#52224.
This adds two patches with fixes/improvements for the Docker engine - `0001-daemon-respect-explicit-AppArmor-profile-on-privileg.patch`: Makes sure that AppArmor rules are always loaded, also on reboot. This is a long standing bug in Docker and affects Supervisor which is a privileged container with an AppArmor profile. Upstream PR: moby/moby#52215 - `0002-bridge-protect-bridge-subnet-from-direct-external-ac.patch`: Makes sure that the whole network (including gateway IP) of any Docker bridge network in NAT mode is firewalled from access from the outside. This essentially implements on Docker level what Supervisor applies on startup with home-assistant/supervisor#6650. Upstream PR: moby/moby#52224.
robmry
left a comment
There was a problem hiding this comment.
Thank you @agners - this looks good, I've kicked off the tests.
We should have done it like this in the first place! I think it started off as a rule per exposed port and I expanded it to per-container, but should have taken it to per-subnet.
I guess the idea is to add protection for "other things" connected to the bridge, not just containers?
| } | ||
| func (ic iptablesCleaner) DelEndpoint(_ context.Context, _ firewaller.NetworkConfig, _, _ netip.Addr) { | ||
| // Direct access filtering is now done at the network (subnet) level, not | ||
| // per-endpoint. There are no per-endpoint raw PREROUTING rules to clean up. |
There was a problem hiding this comment.
Maybe we should keep cleanup for the old rules, so they're deleted on upgrade?
There was a problem hiding this comment.
I've put this back DelEndpoint, and renamed filterDirectAccess to deleteLegacyDirectAccess which deletes alwasys (no enable bool flag). Maybe we could remove the AllowDirectRouting or rawRulesDisabled return condition (delete always instead), in case the daemon config changed 🤔 . But then again, maybe something else creates this rules if those options are set, and its better to not touch them at all, as it used to be.
|
Looks like some golden files need to be re-generated; do you know if we have a make or You can run 'go test . -update' to automatically update testdata/TestNftabler/hairpin=true,internal=false,icc=true,masq=true,snat=true,gwm=routed,bindlh=true,wsl2mirrored=true__ip.golden to the new expected value.' |
Ah, yes - and no, I don't think we do. I just run "go test" manually. Looks like I forgot to update the text in these last time around too ... At the moment they still both talk about the per-port rules:
Those templates are used by a "go test -update" to generate docs with generated rules filled in... |
Yeah honestly that was what we expected, that the whole subnet would be internal only. The fact that the gateway IP is actually reachable from outside was a surprise to us, leading to a CVE/security issue on our end (GHSA-gh5m-4m97-c95h). We use the gateway to bind internal services to in Home Assistant Supervisor/Operating System environment. |
d930c60 to
12191f9
Compare
…OUTING
Add a per-network rule to the raw PREROUTING chain dropping packets
destined to any address in the bridge subnet from interfaces other than
the bridge itself, loopback, or configured trusted host interfaces.
This covers both containers on unpublished ports and the bridge's own
gateway address (the host-side bridge IP, e.g. 172.17.0.1) from external
hosts that have a direct route to the bridge subnet. A single
network-level subnet rule is simpler than per-container rules and ensures
the gateway address is covered by the same protection policy.
For iptables, the following rules are added per bridge (plus one ACCEPT
per trusted interface):
-t raw -A PREROUTING -d <subnet> -i lo -j ACCEPT
-t raw -A PREROUTING -d <subnet> ! -i <bridge> -j DROP
For nftables, a single compound rule is used:
<family> daddr <subnet> iifname != { <bridge>, lo, ... } drop
Loopback is always permitted: the gateway IP is a valid host address
reachable from local processes via lo. The rules are not added for
gateway modes "routed" or "nat-unprotected", nor when
--allow-direct-routing is set, consistent with the intent of those
options. DOCKER_INSECURE_NO_IPTABLES_RAW=1 disables the iptables rules.
Legacy per-container raw PREROUTING rules from older daemon versions are
cleaned up on upgrade via the existing firewall cleaner.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Stefan Agner <stefan@agner.ch>
12191f9 to
ad20817
Compare
Regenerate TestNftabler golden files to reflect the change from per-container to per-subnet direct access filtering in raw-PREROUTING. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Stefan Agner <stefan@agner.ch>
b27fcf7 to
efc3993
Compare
|
Thanks for updating! I gave CI a kick, but looks like the windows runners were flaky and need another kick (I'll do so when they finish).
Yes, we briefly discussed if we needed an advisory for this, but it's a bit of a grey area with things attached to the network outside of the daemon's knowledge. We'll probably check with our security team as well, and can still decide to do so; it might help raise awareness for other setups doing something similar. |
This adds two patches with fixes/improvements for the Docker engine - `0001-daemon-respect-explicit-AppArmor-profile-on-privileg.patch`: Makes sure that AppArmor rules are always loaded, also on reboot. This is a long standing bug in Docker and affects Supervisor which is a privileged container with an AppArmor profile. Upstream PR: moby/moby#52215 - `0002-bridge-protect-bridge-subnet-from-direct-external-ac.patch`: Makes sure that the whole network (including gateway IP) of any Docker bridge network in NAT mode is firewalled from access from the outside. This essentially implements on Docker level what Supervisor applies on startup with home-assistant/supervisor#6650. Upstream PR: moby/moby#52224.
Regenerate TestIptabler golden files to reflect the change from per-container to per-subnet direct access filtering in raw PREROUTING. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Stefan Agner <stefan@agner.ch>
a567700 to
134dc2d
Compare
The iptabler cleaner's DelNetwork must also remove setSubnetProtection rules when switching from iptables to nftables backend, otherwise TestFirewallBackendSwitch fails with leftover raw PREROUTING rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Stefan Agner <stefan@agner.ch>
371a8e5 to
469cd0f
Compare
This adds two patches with fixes/improvements for the Docker engine - `0001-daemon-respect-explicit-AppArmor-profile-on-privileg.patch`: Makes sure that AppArmor rules are always loaded, also on reboot. This is a long standing bug in Docker and affects Supervisor which is a privileged container with an AppArmor profile. Upstream PR: moby/moby#52215 - `0002-bridge-protect-bridge-subnet-from-direct-external-ac.patch`: Makes sure that the whole network (including gateway IP) of any Docker bridge network in NAT mode is firewalled from access from the outside. This essentially implements on Docker level what Supervisor applies on startup with home-assistant/supervisor#6650. Upstream PR: moby/moby#52224. (cherry picked from commit 50c1efd)
| if n.ipt.config.AllowDirectRouting || rawRulesDisabled(ctx) { | ||
| enable = false | ||
| return nil | ||
| } |
There was a problem hiding this comment.
I think this check / early return should be dropped, to unconditionally delete the rules.
(It's a very-corner case, but if "AllowDirectRouting" got enabled on upgrade we still need to delete the old rules.)
There was a problem hiding this comment.
Yeah I had this already in a previous iteration, but I thought maybe if something else created such rules it's better not to touch it (since we can't say for certainty its created by Docker). But either way works for me, I removed this guard to make Docker always cleanup the rules.
There was a problem hiding this comment.
since we can't say for certainty its created by Docker
Yes, fair point. We tend to remove rules we probably-added anyway - subnet addresses get recycled when networks are created/deleted and a stray rule from an old network could get very confusing.
This all gets much easier with iptables. Hopefully we'll be able to start switching over soon.
Thank you for the updates, it's all looking good. I've kicked off another test run.
There was a problem hiding this comment.
This all gets much easier with iptables. Hopefully we'll be able to start switching over soon.
You meant to say nftables?
|
|
||
| func (n *network) DelEndpoint(ctx context.Context, epIPv4, epIPv6 netip.Addr) error { | ||
| return n.modEndpoint(ctx, epIPv4, epIPv6, false) | ||
| func (n *network) AddEndpoint(_ context.Context, _, _ netip.Addr) error { |
There was a problem hiding this comment.
I think AddEndpoint needs to call deleteLegacyDirectAccess, to delete the old rules on upgrade if live-restore is enabled.
func (n *network) AddEndpoint(ctx context.Context, epIPv4, epIPv6 netip.Addr) error {
if n.ipt.config.IPv4 && epIPv4.IsValid() {
if err := n.deleteLegacyDirectAccess(ctx, iptables.IPv4, n.config.Config4, epIPv4); err != nil {
return err
}
}
if n.ipt.config.IPv6 && epIPv6.IsValid() {
if err := n.deleteLegacyDirectAccess(ctx, iptables.IPv6, n.config.Config6, epIPv6); err != nil {
return err
}
}
return nil
}
There was a problem hiding this comment.
Good point. Added this accordingly.
Drop the AllowDirectRouting/rawRulesDisabled early return in deleteLegacyDirectAccess so that legacy rules are unconditionally cleaned up even if the daemon config changed on upgrade. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Stefan Agner <stefan@agner.ch>
AddEndpoint must call deleteLegacyDirectAccess so that old per-container raw PREROUTING rules are removed when the daemon restarts with live-restore enabled. Without this, old rules survive the restart since containers are not stopped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Stefan Agner <stefan@agner.ch>
|
Hm, seems a couple of tests fail, and those are legitimate failures: I guess we'd need to treat the gateway address just like any other host address so that published ports are still accessible then 🤔 Full details: Details |
Actually, traffic addressed to other host addresses go through DNAT. So this is really a case I haven't considered: One bridge accessing published ports of containers in another bridge, through the gateway address. One way to solve this is to just consider the other bridge as trusted network through |
Oh, yes - that's a bit tricky. "Something-something trusted interfaces" was my first thought too, it's probably the right way. But the networks that need to be trusted will come and go as bridge networks are created and deleted With nftables we can have a set of interfaces to allow. With iptables, It'd be difficult to insert iptables rules before the DROP in each network's group of prerouting rules. We can't use @akerouanton, @corhere - any bright ideas? |
|
Do we need to keep feature parity between iptables and nftables, or could we make them diverge (use nftables for the improved rules)? Not sure if that complicates the core more to differentiate them, but mostly drawing the parallel with our snapshotter vs graph driver implementations, where not all features are available with graph drivers |
- What I did
Added a per-network rule to the
raw PREROUTINGchain that drops packets destined to any address in the bridge subnet from interfaces other than the bridge itself, loopback, or configured trusted host interfaces. This protects both containers on unpublished ports and the bridge's own gateway address (the host-side bridge IP, e.g.172.17.0.1on the defaultdocker0bridge) from external hosts that have a direct route to the bridge subnet.- How I did it
Replaced the previous per-container direct access filtering rules (
filterDirectAccessinendpoint.go) with a single subnet-level rule created at network setup time.For iptables, the following rules are added per bridge (plus one ACCEPT per trusted interface):
For nftables, a single compound rule:
Loopback is always permitted because the gateway IP is a valid host address reachable from local processes via
lo. The rules are not added for gateway modesroutedornat-unprotected, nor when--allow-direct-routingis set, consistent with the intent of those options.DOCKER_INSECURE_NO_IPTABLES_RAW=1disables the iptables rules.Related upstream work:
This PR further consolidates from per-container to per-subnet, and extends coverage to the bridge gateway address.
- How to verify it
docker network create --subnet 172.17.0.0/16 testnetiptables -t raw -L PREROUTING -n -v172.17.0.1) still works from the host.172.17.0.0/16, verify that direct connections to addresses in the subnet are dropped.docker network rm testnet.--gateway-mode=routedor--allow-direct-routing.- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)
Opoterser, CC BY-SA 3.0, via Wikimedia Commons