Examples for cgroup socket ingress/egress BPF filters with systemd
These are examples for my blogpost about custom BPF firewalls for systemd services that I implemented in this commit. You can find more explanations and examples on how to use it when reading the blog post.
Update: Port-based BPF firewall compiled with clang
The port-firewall folder contains a
small configurable packet filter that parses IP/IPv6 packets, ICMP, UDP ports,
and TCP ports.
The forward rule is a C expression passed as
to the compiler with
The expression can use the boolean variables
ipv6 denoting the packet type and the the integers
src_port for the UDP/TCP ports.
If the expression evaluates to 0 (false), the packet will be dropped.
Valid filters examples are
FILTER='icmp || (udp && dst_port == 53) || (tcp && dst_port == 80)' or
FILTER='!udp || dst_port == 53'.
The makefile requires to pass the filter to build the program:
make load the bytecode is loaded to
/sys/fs/bpf/port-firewall as pinned BPF program in the special BPF filesystem.
From there you can use it with the systemd options
IP(Ingress|Egress)FilterPath= or attach
it manually to a cgroup.
The folder also includes a
bpf-make.service systemd unit file to configure and load the firewall
and an example
my-filtered-ping.service file that uses the loaded firewall.
It includes an workaround you can use to not require systemd v243.
Instead of making a custom loader unit for every service you can also use the systemd unit template
bpf-firewall@.service by as done in the
This is also a unit template used through
systemctl start "service-with-filter@icmp || (udp && dst_port == 53).service"
but it can also be a regular service without the
@. The disadvantage is that you have to specify the filter more than
once it the file but the advantage is that it's you don't use templates unnecessary and don't have the filter as (ugly)
part of the service name.
Ideally I would store the filter in a BPF map so that it can be set after BPF program loading, allowing it to be
ExecStartPre line in the service file.
The next section shows how to load and use a simple dropping filter as template for your own filters if you don't want to use this one.
Simple dropping filter compiled with clang
In the bpf-program-only folder is a
minimal BPF cgroup filter dropping all packets.
You can build it with
make and then run
make load to
This will load it with
You can use this to specify an
for systemd services (>= 243).
Here an example with ping running in a temporary systemd system scope (or service)
with an ingress filter but no egress filter. You can also use user scopes without
sudo by passing
$ sudo systemd-run -p IPIngressFilterPath=/sys/fs/bpf/cgroup-sock-drop --scope ping 127.0.0.1 Running scope as unit: run-re62ba1c….scope PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. ^C # cancel since it will not get responses --- 127.0.0.1 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 186ms
You can also find the example service
my-ping.service configured with a filter
that is loaded from its dependency service
cgroup-sock-drop-filter.service (please note
LimitMEMLOCK=infinity entry in the unit that load the filter). When you want to use
it for your service to start at boot you need to add, e.g.,
After=network.target and an
[Install] section with
Until systemd 243 is released you can also try the interactive MTU filter program from the next section below which has an option to attach the filter to a cgroup.
Workaround when systemd 243 is not available
systemd-run to spawn a shell in a new cgroup either as system scope or user scope (a temporary service).
-S can be replaced with a concrete binary if you don't want to start a shell.
$ sudo systemd-run --scope -S $ # or: $ systemd-run --user --scope -S Running scope as unit: run-r63de6b74621b4ae3877d4fa86b54be75.scope
This will print out the unit name which is also the name of the cgroup. The full cgroup path for the system service shell is
/sys/fs/cgroup/unified/system.slice/NAME. For the user service shell the path is
/firstname.lastname@example.org/NAME depening on your UID not being
Then attach the BPF program to the cgroup:
$ sudo $(which bpftool) cgroup attach /email@example.com/run-rfaa93ac79de2482d8ef1870fd6b508cd.scope egress pinned /sys/fs/bpf/cgroup-sock-drop multi
You can either choose
egress to filter incoming or outgoing packets. You can load the same filter for both
egrees and you can load multiple different filters per
egress (which also true when used through the systemd v243 option above).
If you turn on
systemd-run you need to turn on
Delegate as well to allow multiple BPF programs.
Interactive MTU filter
In the standalone folder is an interactive program that loads a BPF filter and controls its behavior.
From the source code comment:
Loads a BPF cgroup ingress/egress filter bytecode that filters based on the packet size.
It loads the BPF filter to a given location in
Through the +/- keys the MTU can be changed interactively (changes values in the BPF map).
Optionally the initial MTU value can be specified on startup.
The program can also attach the BPF filter to a cgroup by specifying the cgroup by its path.
The BPF filter stays loaded when the program exits and has to be deleted manually.
It does not use a BPF compiler but uses hardcoded BPF assembly instructions to include the BPF code in the final program. Not very accessible for hacking but for me it was interesting to see how BPF instructions work and what needs to be done to comply with the verifier.
With systemd >= 243
In one terminal you can run the interactive filter:
$ sudo ./load_and_control_filter -m 100 -t ingress /sys/fs/bpf/ingressfilter cgroup dropped 0 packets, forwarded 0 packets, MTU is 100 bytes (Press +/- to change) … # keeps running
It loaded the BPF filter to
/sys/fs/bpf/ingressfilter which you can use for a
In another terminal you can, for example, again run ping as root in a temporary
systemd scope and specify our filter as
$ sudo systemd-run -p IPIngressFilterPath=/sys/fs/bpf/ingressfilter --scope ping 127.0.0.1 Running scope as unit: run-….scope PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.086 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.069 ms …
When you switch back to the first terminal and press
-, the new MTU is 50 bytes
and you can see the dropped packet count increase.
In the ping terminal you will see no new responses because they are all dropped.
Workaround when systemd 243 is not available
load_and_control_filter program can be told to attach the filter to a cgroup
of a systemd service.
Systemd uses a BPF filter for its IP accounting and firewalling based on IP addresses. If such a filter is present but no others, the flag to allow multiple BPF filters for a cgroup is missing. As workaround when, e.g., IP accounting is enabled, you can tell systemd that the cgroup management is done by externally. This means that systemd will use the flag to allow multiple BPF filters instead of loading the IP accounting BPF filter without this flag.
$ sudo systemd-run -p IPAccounting=yes -p Delegate=yes --scope ping 127.0.0.1 Running scope as unit: run-r9f31b3947f4c4a11a24babf5517fe025.scope PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.086 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.069 ms …
You can see the scope name in the first output line. This is also the last part of the cgroup path you have to use as argument in order to attach the filter to the cgroup.
$ sudo ./load_and_control_filter -m 100 -c /sys/fs/cgroup/unified/system.slice/run-r9f31b3947f4c4a11a24babf5517fe025.scope -t ingress /sys/fs/bpf/myfilter cgroup dropped 0 packets, forwarded 4 packets, MTU is 100 bytes (Press +/- to change) … # keeps running and increases the forward count
- to reduce the MTU and observe the packet drop count increasing while no ping responses can be seen.