Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] nftables support #26824

Open
senden9 opened this issue Sep 22, 2016 · 34 comments

Comments

@senden9
Copy link

commented Sep 22, 2016

Docker seems to be optimized for iptables at the moment. Are there any plans to support nftables in future versions of Docker?

My workaround at the moment is do deactivate the iptables integration via --iptables=false and then set the right rules for nftables by hand.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Sep 22, 2016

I'm not aware of plans in this direction

ping @aboch is this planned? Worth doing?

@aboch

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2016

I remember @mrjana had thought of using nftables last year. He knows more about the plan.
From what I read online, it seems nftables made it into kernel 3.13. Given docker supports up to linux 3.10, it may not be possible to move to nftables yet.

@mrjana

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2016

Yeah nftables are not in the kernel until 3.14 and we can't use it to generally replace iptables yet.

@Yamakaky

This comment has been minimized.

Copy link

commented Dec 4, 2016

Maybe add it as an option? That way, those who have the latest kernel can use it. Currently I have to disable my nftables firewall to get the network working, it's fine on my machine but it's not an option on a server.

@itagent

This comment has been minimized.

Copy link

commented Apr 4, 2017

Any new ideas or progress. We are in transition to nftables and really would appreciate

@ford-perfect

This comment has been minimized.

Copy link

commented Apr 7, 2017

+1 for optional nftables support
Just some dates:
Linux LTS 3.10 has it's projected EOL in October 2017.
Debian 7.0's kernel is not supported anyway but Debian 8.0's one has nftables.
RHEL-7.3 EOL is not until 2024-06 and it runs 3.10 so there is a conflict here;
but the proposition is for an optional nfttables support additionally to the existing iptables support.

@gdahlm

This comment has been minimized.

Copy link

commented Jun 1, 2017

I wanted that RHEL 7 does have nfttables as a tech preview, and it would greatly simplify ipv6 as well as allowing for a simpler implementations of throttling and very useful tools like connection tracking or load-balancing.

@Gunni

This comment has been minimized.

Copy link

commented Apr 16, 2018

I want to add that i've been using nftables on Centos 7 for over a year now i believe, on dozens of different servers both with and without nat, using ipv6 and more, and have had no issues other than understanding the parse errors when i mess up. And i'm using Ansible to manage and generate the nftables rules file and atomically reload the service to apply new rules, or do nothing if it fails to parse.

And since nftables applies the entire ruleset in one atomic operation, there is no moment when the system is in a partially configured state.

In my opinion i would NOT use nftables integration with docker unless i could control which file docker puts rules into and control the imports into my current ruleset myself and that docker would only issue reload commands to nftables (reload meaning nft -f , or systemctl which does it correctly).

I currently manage docker nat rules using ansible/manually.

@ojab

This comment has been minimized.

Copy link

commented Jun 21, 2018

Meanwhile iptables is officially deprecated.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2018

I don't see the reason to bother with nftables when the whole community seems to be (rightly) pushing for bpf.

@ojab

This comment has been minimized.

Copy link

commented Jun 21, 2018

nftables uses bpf internally. If you've implied bpfilter — it's not there yet.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2018

Sure it uses bpf internally, but it's not really any better than without bpf but rather about deduplication.
Even with bpf in the backend, nftables is still "slightly better" than iptables.

For that matter isn't iptables using nftables in the backed? (Don't quote me on that, I think I read that somewhere at some point, haven't looked into it).

@tianon

This comment has been minimized.

Copy link
Member

commented Aug 12, 2018

According to https://wiki.nftables.org/wiki-nftables/index.php/Moving_from_iptables_to_nftables (which I'd imagine is pretty authoritative), using nftables and iptables at the same time is highly discouraged:

Beware of using both the nft and the legacy tools at the same time. That means using both x_tables and nf_tables kernel subsystems at the same time, and could lead to unexpected results.

I'd been playing with firewalld for building a router system and got tired of the way firewalld does things, so I was evaluating nftables, but the fact that I'd then have to disable Docker's iptables behavior and handle Docker's routing rules myself is a bit of a hurdle.

I've looked at doing eBPF, but it doesn't seem like there's nearly as many good examples (even nftables is a bit low on examples, but I've managed to find a few people doing things similar enough to what I need that I'm comfortable), so I don't really think it's totally fair to tell folks "we should just go straight to BPF instead" yet.

Just to include what I've found for reference, here's a couple folks who've worked on getting what Docker needs implemented in nftables:

I think docker network create's ability to create arbitrary bridges is going to further complicate this, but for my own use case I'll be able to dictate a fixed number of Docker networks, so that won't be a huge deal (just bringing it up in case folks in the future find this and need to implement something similar).

On implementation details, is the current iptables/firewalld code tightly coupled with the rest of the networking system, or is it already abstracted out reasonably enough that eBPF or nftables could theoretically be implemented as an optional backend? Is there perhaps a way we could make that code pluggable, or at least pluggability friendly? Even just having Docker write out to a file the set of things it would've asked iptables to do would be an improvement; isn't it mostly port openings and masquerade settings?

(Not trying to be a bother, just trying to add some additional information about why folks might care about this and brainstorm ideas for how it could maybe move forward without being too invasive. ❤️)

Edit (2018-08-13): #35777 is also relevant (even with --iptables=false, Docker still currently touches iptables to create DOCKER-USER).

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2018

is the current iptables/firewalld code tightly coupled with the rest of the networking system

It is horribly coupled right now. It's basically all the original iptables code from years ago moved out of docker/docker into docker/libnetwork and mostly not touched except to add more cruft to it to support custom chains (remember when docker didn't use it's own chain?) and firewalld, among other things.

@pentago

This comment has been minimized.

Copy link

commented Jan 2, 2019

Hi all, any new progress on this?

nftables are getting default with Debian 10 (Buster) due in couple of months and with it goes the wave of adoption in derivative distros such as Ubuntu I guess.

Having upgrade season and iptables deprecation closing in quickly something will need to happen.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Jan 2, 2019

@camAtGitHub

This comment has been minimized.

Copy link

commented Jan 16, 2019

Redhat 8 (currently in Beta) - Notes: The nftables framework replaces iptables in the role of the default network packet filtering facility.
That means CentOS 8 will follow suit more than likely also...

@mavenugo

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

Thanks @camAtGitHub for the pointer.

The iptables, ip6tables, ebtables and arptables tools are replaced by nftables-based drop-in replacements with the same name. While external behavior is identical to their legacy counterparts, internally they use nftables with legacy netfilter kernel modules through a compatibility interface where required.

makes it the correct transition path. No change required for existing software making use of netfilter based iptables.

@tianon

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

I'd go for the latter, but I've also had PR's open for multiple years untouched on libnetwork.

@elboulangero

This comment has been minimized.

Copy link

commented Feb 28, 2019

Debian user reported that indeed, he can't use docker on a machine where a nftables-based firewall is enabled: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921600

@georgmu

This comment has been minimized.

Copy link

commented Mar 19, 2019

The issue with nftables-nft is a different behavior in the chain check:

$ iptables-legacy -t filter -n -L FOO-BAR-TEST
iptables: No chain/target/match by that name.
$ echo $?
1
$ iptables-nft -t filter -n -L FOO-BAR-TEST
# Warning: iptables-legacy tables present, use iptables-legacy to see them
$ echo $?
0

This check is used in vendor/github.com/docker/libnetwork/iptables/iptables.go.

Since iptables-nft does not return an error here, I get an error on the next rule which tries to append a rule to the chain

@sbraz

This comment has been minimized.

Copy link

commented Mar 19, 2019

@georgmu Maybe you should report that upstream, it doesn't look normal.

@georgmu

This comment has been minimized.

Copy link

commented Mar 19, 2019

I just checked upstream. The issue was fixed in iptables 1.8.1.
https://git.netfilter.org/iptables/commit/?id=03572549df349455fcade80dfab0b28904975330

Fedora uses iptables 1.8.0, debian 9 uses iptables 1.6.2...

@georgmu

This comment has been minimized.

Copy link

commented Mar 19, 2019

For fedora, I just requested an update of iptables:
https://bugzilla.redhat.com/show_bug.cgi?id=1690448

@bluescreen303

This comment has been minimized.

Copy link

commented Mar 22, 2019

@georgmu
Debian testing user here, iptables 1.8.2
I do get the correct error behaviour, but still docker kills my firewall and forwarding config so it doesn't appear that the issue is (just) the one you point at.

I tracked down the exact version where this started happening:
5:18.09.0~3-0~debian-buster is fine
5:18.09.1~3-0~debian-buster and later are not
these are versions for https://download.docker.com/linux/debian buster stable

@darkbasic

This comment has been minimized.

Copy link

commented Jun 6, 2019

rhel8 is out and centos 8 will follow soon, but still no docker nftables support :(

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

Docker will currently use the compatibility wrappers, so things should still work; is there a specific issue you're running into @darkbasic ?

(sure a rewrite would still be good to have, but likely requires a significant amount of work)

@darkbasic

This comment has been minimized.

Copy link

commented Jun 6, 2019

I'm running RHEL 8 with Docker and if I run

firewall-cmd --add-service=http
docker run --rm --name=linuxconfig-test -p 80:80 httpd

then I can reach the webserver locally (nc 127.0.0.1 80) but I cannot reach it from another machine in the network (even if Docker is supposed to listen on all interfaces by default).

If instead of Docker I run nc -l -p 80 I can access port 80 from every machine in the network.

Repeating the same exact procedure on CentOS 7 instead of RHEL 8 makes Docker work flawlessly, meaning that I can reach it from every machine in the network.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

which version of docker are you running?

@darkbasic

This comment has been minimized.

Copy link

commented Jun 7, 2019

18.09.6 build 481bc77156 (3.18.09-2) from the centos 7 repos

@L30Bola

This comment has been minimized.

Copy link

commented Jul 20, 2019

I've written rules to work with docker, while using the flag --iptables=false:

#!/usr/sbin/nft -f

define docker_nat = 172.17.0.0/12

flush ruleset

table inet filter {
  chain input {

    type filter hook input priority 0; policy accept;

    ct state {established, related} accept

    iifname lo accept

    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept

    tcp dport ssh accept

    ip saddr $docker_nat accept

    ct state invalid counter drop

    #log prefix "[nftables] Input Denied: " flags all counter drop
    #log prefix "[nftables] Input Accepted: " flags all counter accept
  }
  chain forward {

    type filter hook forward priority 0; policy drop;

    ct state {established, related} accept

    ip saddr $docker_nat oif eth0 accept

    #log prefix "[nftables] Forward Denied: " flags all counter drop
    #log prefix "[nftables] Forward Accepted: " flags all counter accept
  }
  chain output {
    type filter hook output priority 0;
  }

}

table ip nat {
  chain prerouting {
    type nat hook prerouting priority 0;
  }
  chain postrouting {
    type nat hook postrouting priority 0;

    ip saddr $docker_nat oif eth0 masquerade
  }
}

The accept policy on input traffic is so Docker can receive traffic without a lot of manual port exposing, since there is another firewall and so another machine (which is behind the same firewall) can reach the services inside the containers.

@niconorsk

This comment has been minimized.

Copy link

commented Jul 28, 2019

Regarding the compatibility wrappers just working for now, this is not actually true.
One of the bigger differences between nftables and iptables is that the basic tables don't exist by default. For the default bridge network, this is fine, because all it's commands run in the host namespace, and users can set up their base chains in nftables.

For user-defined bridge networks however, docker's behaviour changes and it starts running iptables commands within the context of the containers network namespace instead. The primary purpose of this is getting internal container DNS to work(which is a good feature that I sure would like to keep).

The problem is that within the network namespace, the nft rules are completely empty so docker tries to add rules to tables that don't exist yet. The fix to this is to manually put the required base tables and chains on namespace creation. Here's a sample ruleset that'll do it sufficiently:


# start with a clean slate
flush ruleset

table ip filter {
    chain INPUT {
            type filter hook input priority 0; policy accept;
    }

    chain FORWARD {
            type filter hook forward priority 0; policy accept;
    }

    chain OUTPUT {
            type filter hook output priority 0; policy accept;
    }
}

table ip6 filter {
    chain INPUT {
            type filter hook input priority 0; policy accept;
    }

    chain FORWARD {
            type filter hook forward priority 0; policy accept;
    }

    chain OUTPUT {
            type filter hook output priority 0; policy accept;
    }
}

table ip nat {
    chain PREROUTING {
            type nat hook prerouting priority 0; policy accept;
    }

    chain OUTPUT {
            type nat hook output priority 50; policy accept;
    }

    chain POSTROUTING {
            type nat hook postrouting priority 100; policy accept;
    }

}

table ip6 nat {
    chain PREROUTING {
            type nat hook prerouting priority 0; policy accept;
    }

    chain OUTPUT {
            type nat hook output priority 50; policy accept;
    }

    chain POSTROUTING {
            type nat hook postrouting priority 100; policy accept;
    }

}

Couple of more notes regarding this specific behaviour:

  • it can not be turned off using --iptables=false
  • this is standalone docker behaviour. I suspect but haven't tried that swarm mode will have very similar problems
  • as far as I can tell, the compat layers are sufficient after the base tables are in place. The only exception is that docker runs iptables -C(check for rule existence) which nft does not support
  • tested with nftables 0.9.0, iptables 1.61 and docker-ce 18.09.2
@arkodg

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

@niconorsk AFAIK everything is working as expected

Lets spawn a Debian Buster container

$ docker run -it --privileged debian:buster
root@c483463e2b88:/# 

and install iptables which install iptables 1.8.2 which uses an nf_tables backend

root@c483463e2b88:/# apt-get update -y
root@c483463e2b88:/# apt-get install iptables -y
root@c483463e2b88:/# iptables --version
iptables v1.8.2 (nf_tables)

Now lets run some nft or iptables-save commands, you'll see nothing :)

root@c483463e2b88:/# nft list ruleset
root@c483463e2b88:/# 
root@c483463e2b88:/# 
root@c483463e2b88:/# 
root@c483463e2b88:/# iptables-save
root@c483463e2b88:/# 
root@c483463e2b88:/# 
root@c483463e2b88:/# 

But once you run a iptables command, the base rules get installed

root@c483463e2b88:/# iptables -nvL -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
root@c483463e2b88:/# 
root@c483463e2b88:/# 
root@c483463e2b88:/# nft list ruleset
table ip nat {
	chain PREROUTING {
		type nat hook prerouting priority -100; policy accept;
	}

	chain INPUT {
		type nat hook input priority 100; policy accept;
	}

	chain POSTROUTING {
		type nat hook postrouting priority 100; policy accept;
	}

	chain OUTPUT {
		type nat hook output priority -100; policy accept;
	}
}

root@c483463e2b88:/# iptables-save
# Generated by xtables-save v1.8.2 on Wed Jul 31 22:30:13 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
# Completed on Wed Jul 31 22:30:13 2019

So it looks like the base nft rules get created the first time any operation on a iptables table (nat, filter) is performed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.