Skip to content

rootless-containers/bypass4netns

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
cmd
 
 
pkg
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

bypass4netns: Accelerator for slirp4netns using SECCOMP_IOCTL_NOTIF_ADDFD (Kernel 5.9)

bypass4netns is as fast as --net=host and almost as secure as traditional slirp4netns.

The current version of bypass4netns needs to be used in conjunction with slirp4netns, however, future version may work without slirp4netns.

Benchmark

(Oct 16, 2020)

Workload: iperf3 -c HOST_IP from podman run

  • --net=host (insecure): 57.9 Gbps
  • bypass4netns: 56.5 Gbps
  • slirp4netns: 7.56 Gbps

How it works

bypass4netns eliminates the overhead of slirp4netns by trapping socket syscals and executing them in the host network namespace using SECCOMP_IOCTL_NOTIF_ADDFD.

See also the talks.

Requirements

  • kernel >= 5.9
  • runc >= 1.1 (crun is currently incompatible due to crun#1002)
  • libseccomp >= 2.5
  • Rootless Docker, Rootless Podman, or Rootless containerd/nerdctl

Build-time requirement:

  • golang >= 1.17

Compile

make
sudo make install

The following binaries will be installed into /usr/local/bin:

  • bypass4netns: the bypass4netns binary.
  • bypass4netnsd: an optional REST daemon for controlling bypass4netns processes from a non-initial network namespaces. Used by nerdctl.

Usage

Hard way (docker|podman|nerdctl)

$ bypass4netns --ignore="127.0.0.0/8,10.0.0.0/8,auto" -p="8080:80"

--ignore=... is a list of the CIDRs that cannot be bypassed:

  • loopback CIDRs (127.0.0.0/8)
  • slirp4netns CIDR (10.0.0.0/8)
  • CNI CIDRs inside the slirp's network namespace (auto)
$ ./test/seccomp.json.sh >$HOME/seccomp.json
$ $DOCKER run -it --rm --security-opt seccomp=$HOME/seccomp.json --runtime=runc alpine

$DOCKER is either docker, podman, or nerdctl.

NOTE to Podman users: crun is currently incompatible due to crun#1002, and requires removing sendmsg from seccomp.json.

Easy way (nerdctl)

bypass4netns is experimentally integrated into nerdctl (>= 0.17.0).

containerd-rootless-setuptool.sh install-bypass4netnsd
nerdctl run -it --rm -p 8080:80 --label nerdctl/bypass4netns=true alpine

NOTE: --label nerdctl/bypass4netns=true will be probably replaced with --security-opt or something like --network-opt in a future version of nerdctl.

⚠️ Caveats ⚠️

Accesses to host abstract sockets and host loopback IPs (127.0.0.0/8) from containers are designed to be rejected.

However, it is probably possible to connect to host loopback IPs by exploiting TOCTOU of struct sockaddr * pointers.

TODOs

  • Integration for Docker
  • Integration for Podman
  • Enable to connect to port-fowarded ports from other containers
    • This means that a container with publish option like -p 8080:80 cannot be connected to port 80 from other containers in the same network namespace
  • Handle protocol specific publish option like -p 8080:80/udp.
    • Currently, bypass4netns ignores porotocol in publish option.
  • Bind port when bypass4netns starts with publish option like -p 8080:80
    • Currently, bypass4netns bind socket to port 8080 when it handles bind(2) with target port 80.
    • bind(2) can fail if other process bind port 8080 before container's process bind port 80

Talks

About

[Experimental] Accelerates slirp4netns using SECCOMP_IOCTL_NOTIF_ADDFD. As fast as `--net=host`.

Resources

License

Stars

Watchers

Forks

Packages

No packages published