narwhal
is used in conjunction with the --net=none
networking mode of
Docker to establish a secure network configuration with full layer‐2
isolation.
Docker currently offers 4 different networking modes: bridge
, host
,
container
and none
.
Most users rely on the default bridge
mode, as it is the only one
providing Internet connectivity with network isolation out of the box.
In this mode, Docker creates seperate network namespaces for every
container as well as a connected pair of virtual Ethernet interfaces
with one end in the host’s network namespace and the other end in the
container’s namespace. This approach is fairly secure by itself as all
communication has to go through this virtual interface and no container
is capable of interfering with other containers’ or even the host’s
network stack.
Unfortunately, Docker connects these virtual interfaces to an Ethernet
bridge docker0
which behaves (more or less) like a hardware Ethernet
switch, allowing all containers to talk to each other directly via
Ethernet (layer 2). Read
this article
to understand why this is a bad idea. It completely undermines the
carefully crafted network isolation.
narwhal
on the other hand does without a bridge by routing IPv4 and
IPv6 (layer 3) packets between containers and the outside world, thus
eliminating the problems that come with having all your containers in
the same Ethernet segment.
tl;dr: Docker’s default networking mode is vulnerable to ARP and MAC spoofing attacks. A single container under control of an attacker is enough to compromise the whole network.
Usage: narwhal [OPTION]… [CONTAINER]
-4, --ipv4 IPV4 container IPv4 address
-6, --ipv6 IPV6 container IPv6 address
--forwarding enable packet forwarding
--host-ipv4 IPV6 host IPv4 address [169.254.0.1]
--host-ipv6 IPV6 host IPv6 address [fe80::1]
-i, --interface IFACE container interface name [eth0]
--host-interface IFACE host interface name [nw-CONTAINER]
--mtu SIZE maximum transmission unit
--temp-interface IFACE temporary container interface name [nwt-PID]
--temp-namespace NS temporary network namespace name [nwt-PID]
--paranoid create restrictive Ethernet filter rules
--trace trace actions
-h, --help display this help and exit
In this example, we shall create a container with --net=none
. At first,
it will only have a loopback interface:
root@host:/# docker run --rm -t -i --net=none ubuntu /bin/bash
root@bb9b0be2a4d3:/# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
Now that our container is running, we shall run narwhal
to create a
network configuration, using 128.66.23.42
and 2001:db8:cabb:a6e5::1
as
addresses for our container.
root@host:/# narwhal --ipv4 128.66.23.42 --ipv6 2001:db8:cabb:a6e5::1 bb9b0be2a4d3
We should now be able to see the new network interface on the host:
root@host:/# ip address
…
74: nw-bb9b0be2a4d3: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 6a:55:5e:99:b8:1f brd ff:ff:ff:ff:ff:ff
inet 169.254.0.1 peer 128.66.23.42/32 scope link nw-caf242370b5c
valid_lft forever preferred_lft forever
inet6 fe80::6855:5eff:fe99:b81f/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::1 peer 2001:db8:cabb:a6e5::1/128 scope link
valid_lft forever preferred_lft forever
root@host:/# ip -4 route
default via 128.66.0.1 dev eth0
128.66.0.1/24 dev eth0 proto kernel scope link src 128.66.0.2
128.66.23.42 dev nw-bb9b0be2a4d3 proto kernel scope link src 169.254.0.1
root@host:/# ip -6 route
2001:db8:cabb:a6e5::1 dev nw-8a418da09ebf proto kernel metric 256
2001:db8:cabb:a6e5::1 dev nw-8a418da09ebf metric 1024
2001:db8:dead:beef::/64 dev eth0 proto kernel metric 256
fe80::1 dev nw-bb9b0be2a4d3 proto kernel metric 256
fe80::/64 dev eth0 proto kernel metric 256
fe80::/64 dev nw-8a418da09ebf proto kernel metric 256
default via 2001:db8:dead:beef::1 dev eth0 metric 1024
This is how it looks like from within the container:
root@bb9b0be2a4d3:/# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
157: eth0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 4e:45:86:31:c4:15 brd ff:ff:ff:ff:ff:ff
inet 128.66.23.42 peer 169.254.0.1/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 2001:db8:cabb:a6e5::1 peer fe80::1/128 scope global
valid_lft forever preferred_lft forever
inet6 fe80::4c45:86ff:fe31:c415/64 scope link
valid_lft forever preferred_lft forever
root@bb9b0be2a4d3:/# ip -4 route
default via 169.254.0.1 dev eth0
169.254.0.1 dev eth0 proto kernel scope link src 128.66.23.42
root@bb9b0be2a4d3:/# ip -6 route
2001:db8:cabb:a6e5::1 dev eth0 proto kernel metric 256
fe80::1 dev eth0 proto kernel metric 256
fe80::1 dev eth0 metric 1024
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::1 dev eth0 metric 1024
To allow our container to communicate with the outside world, we have to enable packet forwarding:
root@host:/# sysctl -w net.ipv4.conf.all.forwarding=1
net.ipv4.conf.all.forwarding = 1
root@host:/# sysctl -w net.ipv6.conf.all.forwarding=1
net.ipv6.conf.all.forwarding = 1
We should now be able to reach other Internet hosts:
root@bb9b0be2a4d3:/# ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=53 time=9.16 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 9.169/9.169/9.169/0.000 ms
root@bb9b0be2a4d3:/# ping6 -c 1 heise.de
PING heise.de(redirector.heise.de) 56 data bytes
64 bytes from redirector.heise.de: icmp_seq=1 ttl=55 time=5.47 ms
--- heise.de ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.472/5.472/5.472/0.000 m
Simply stop the container or remove the virtual Ethernet interface:
ip link del nw-$CONTAINERID
All routes etc. will vanish with it.
No, by default narwhal
does not touch any firewall rules. These are your
options:
- Assign public IPv4 or IPv6 addresses, enable IP forwarding and be happy.
Create filter rules in the
iptables
FORWARD
chain as you like. - Assign private IPv4 or unique local IPv6 addresses and just use them to connect your containers with the host or with each other. Additionally, you may configure source and/or destination NAT to selectively allow communication with the outside world.
Nothing. It will find that an eth0
device already exists and just fail
after rolling back alrady acquired resources.
Absolutely! Your other containers may use other networking modes.
Containers you wish to configure with narwahl
should use --net=none
.
Trying to apply narwahl
to otherwise configured containers just fails
if an eth0
device already exists, but it won't cause any harm.
The containers’ network namespace and the virtual Ethernet pair are destroyed automatically.
In --net=none
mode your container will start up with only a
loopback network interface.
To avoid race conditions, network services inside your container should
not be bound to a specific address but to ::
or 0.0.0.0
.
After the interface inside the container (usually eth0
) has been created,
your service will automatically receive packets at the addresses configured
with narwhal
.
An exception to this rule are services that should be only reachable
from inside the container. These should be bound to localhost
(::1
or
127.0.0.1
).
One might consider sticking with --net=bridge
and filtering out unwanted
frames on the docker0
bridge with ebtables
, but with container
virtualisation there is no reason to use a bridge in the first place.
Platform virtualisation (where a complete operating system is run on simulated hardware) usually relies on cramming all virtual machines into one Ethernet segment to allow easy network configuration via DHCP.
With container virtualisation like Docker things are quite different. Network configuration is done directly from outside the container, eliminating the need for DHCP and with it the necessity for a unified Ethernet segment.
So why bridge everything together when we don’t need to?