Binary execution across Linux mount-namespaces
fxe is a small, pure-Rust Linux program which demonstrates how to execute binaries across mount-namespaces.
This technique is suitable for several usecases, as it allows to ship minimal containers with specialized binaries and then to run them in namespaces where they are not available.
For example, a bare-minimal ContainerLinux OS can augmented with a
mount-foo container to mount
foo volumes directly on the host.
This program is provided for illustrative purpose only, it is not supposed to be run as-is in production.
How this works
fexecve() performs the same task as execve(), with the difference that the file to be executed is specified via a file descriptor rather than via a pathname.
fxe to get an handle to a binary available inside its container (i.e. mount-namespace), move to a different target, and execute the binary there.
This repository contains a demo program which runs a
modinfo crc16 using the
However, the directory containing kernel modules is not available inside the container; instead the process changes its mount-namespace to the target one (e.g. host) and runs the
modinfo binary there.
A pre-built binary is available as a Docker image at
To try it, simply do a
$ make run docker run --privileged --pid=host quay.io/lucab/fxe:latest /fxe /proc/1/ns/mnt filename: /lib/modules/4.11.0-1-amd64/kernel/lib/crc16.ko description: CRC16 calculations license: GPL depends: intree: Y vermagic: 4.11.0-1-amd64 SMP mod_unload modversions
This will use
/proc/1/ns/mnt as the host mount-namespace target. Other targets can be used, as long as they are bind-mounted inside the container.
--privileged flag is a shortcut to add
CAP_SYS_CHROOT (required by
setns(2)) and to prevent the default SECCOMP filter to block it. Both can be allowed with finer granularity settings (this is left as an exercise).
--pid=host flag is required for proper
fexecve() execution. It can be changed to any arbitrary target, here it is set to
host only for demonstration purpose.
Due to how
fexecve(3) are implemented on Linux, there are some conditions imposed on the running environment:
- setns: the target mount-namespace must be available as a file descriptor
- setns: to be allowed to change mount-namespace, the process must be single-thread
/procmust be available
- fexecve: source and target processes must be running in the same PID-namespace
- fexecve: scripts and dynamic binaries resources must be available in the target
See notes in both manpages for further details and explanations.
The demo in this repository can be quickly built via
- a stable rustc/cargo toolchain for the
x86_64-unknown-linux-musltarget (available via rustup)
docker runavailable to the current user
This currently depends on a pending PR to nix.