|
| 1 | +--- |
| 2 | +title: "Advanced instrumentation" |
| 3 | +description: "" |
| 4 | +date: 2025-05-25T01:49:15+00:00 |
| 5 | +weight: 900 |
| 6 | +toc: true |
| 7 | +--- |
| 8 | + |
| 9 | +GMT is designed to provide extremly low overhead as a measurement tool. |
| 10 | + |
| 11 | +This is achieved by leveraging classic *docker* containers with native runtimes like *runc* (optionally with rootless, see down below). |
| 12 | + |
| 13 | +To benchmark more complex applications however it might be necessary to use alternative |
| 14 | +runtimes to leverage functionalities like: |
| 15 | + |
| 16 | +- Docker in Docker |
| 17 | +- Workloads that need systemd |
| 18 | +- K8s or K3s inside of GMT |
| 19 | +- Benchmarking alternative kernels and kernel modules with GMT |
| 20 | + |
| 21 | +In the following we show runtimes supported by GMT and their pros, cons and caveats. |
| 22 | + |
| 23 | +{{< callout context="caution" icon="outline/alert-triangle" >}} |
| 24 | +Using a different runtime for the docker orchestrator will almost always result in more overhead. This path should only be choosen if no other way of running the workload is possible as the base system might get damaged or the measurement itself might not be possible or distorted/disturbed without the isolation. |
| 25 | +{{< /callout >}} |
| 26 | + |
| 27 | +## Kata Containers |
| 28 | + |
| 29 | +[GitHub Repo](https://github.com/kata-containers/kata-containers/) |
| 30 | + |
| 31 | +*Kata Containers* is a *containerd* compatible runtime that creates a *qemu VM* and launches a new *docker* container inside of it. |
| 32 | + |
| 33 | +### Pros |
| 34 | + |
| 35 | +- Highest degree of isolation |
| 36 | +- Enables *docker-in-docker* workloads |
| 37 | +- Enables *systemd* workloads |
| 38 | +- Enables to load alternative kernels and kernel modules |
| 39 | + |
| 40 | +### Cons |
| 41 | + |
| 42 | +- Big overhead ... although not as big as gVisor |
| 43 | +- Requires nested virtualization |
| 44 | + |
| 45 | +### Caveats |
| 46 | + |
| 47 | +- GMT orchestrated containers cannot be put on one network. This seems to be a bug ... |
| 48 | +- Unclear if GPU forwarding is supported |
| 49 | +- *systemd* workloads need patched image. So far not achieved to get running |
| 50 | +- *docker-in-docker* workloads so far not achieved running although they should work |
| 51 | + |
| 52 | +### Activating |
| 53 | + |
| 54 | +Install *Kata Containers* and the just supply `--runtime io.containerd.kata.v2` as a `docker-run-args` in the *service* definition of your `usage_scenario.yml` |
| 55 | + |
| 56 | +## Sysbox |
| 57 | + |
| 58 | +[GitHub Repo](https://github.com/nestybox/sysbox) |
| 59 | + |
| 60 | +*sysbox* enables bare metal workloads and provides a bit more isolation than normal docker containers by providing stronger namespaces that are effectively rootless. |
| 61 | + |
| 62 | +### Pros |
| 63 | + |
| 64 | +- Slightly higher degree of isolation. Although still very close to native *docker* |
| 65 | +- Does not require nested virtualization |
| 66 | +- Enables *docker-in-docker* workloads |
| 67 | +- Enables *systemd* workloads |
| 68 | + |
| 69 | +### Cons |
| 70 | + |
| 71 | +- Biggest overhead of all runtimes |
| 72 | +- Unclear if GPU forwarding is supported |
| 73 | +- Cannot load other kernels or kernel modules |
| 74 | + |
| 75 | +### Caveats |
| 76 | + |
| 77 | +- Networking for *docker-in-docker* workloads seems to fail when containers are put on custom network. Network connection on the normal docker containers seems to work though. Suprisingly direct IP connects work, but DNS resolution fails for the *docker-in-docker* workloads |
| 78 | + - This can be mitigated at the moment by putting the containers on the *default bridge* network. This should have no further security implications |
| 79 | + - Alternatively one can also set a proxy for the docker container and forward the *HTTP_PROXY* variables to all applications that are started in the *docker-in-docker* containers. |
| 80 | + - Also alternatively all inner created docker containers in the container can be created with `--network=host` and will also retain connectivity. |
| 81 | + - Why the mitigations work is not exactly clear, but it might be related to this: https://github.com/nestybox/sysbox/issues/456. It seems clear however that it is a routing issue from the inner container to the internet but it suprising that either changing how the interface for the outer container is created can fix it as well as skipping creation the inner network adapter with `--network=host`. |
| 82 | + |
| 83 | +### Activating |
| 84 | + |
| 85 | +Install *sysbox* and the just supply `--runtime sysbox-runc` as a `docker-run-args` in the *service* definition of your `usage_scenario.yml` |
| 86 | + |
| 87 | +## gVisor |
| 88 | + |
| 89 | +*gVisor* emulates the whole kernel in user-space thus protecting the host kernel. |
| 90 | + |
| 91 | +### Pros |
| 92 | + |
| 93 | +- High degree of isolation. Probably on par with |
| 94 | +- Does not require nested virtualization |
| 95 | + |
| 96 | +### Cons |
| 97 | + |
| 98 | +- Biggest overhead of all runtimes |
| 99 | +- Unclear if GPU forwarding is supported |
| 100 | + |
| 101 | +### Caveats |
| 102 | + |
| 103 | +- Unclear if *systemd* workloads work |
| 104 | +- Unclear if GPU forwarding is supported |
| 105 | +- Unclear if *docker-in-docker* workloads work |
| 106 | +- Unclear if it can load other kernels or kernel modules |
| 107 | + |
| 108 | +**Note**: Currently in alpha and not officially supported. Ping us if you want to help developing this feature to a stable version :) |
| 109 | + |
| 110 | +## Firecracker |
| 111 | + |
| 112 | +[GitHub Repo](https://github.com/firecracker-microvm/firecracker-containerd) |
| 113 | + |
| 114 | +*Firecracker* launches a micro-VM that can also be made *containerd* compatible through a shim. |
| 115 | + |
| 116 | +### Caveats |
| 117 | + |
| 118 | +- Unclear if *systemd* workloads work |
| 119 | +- Unclear if *docker-in-docker* workloads work |
| 120 | +- Unclear if it can load other kernels or kernel modules |
| 121 | + |
| 122 | +**Note**: Currently in alpha and not officially supported. Ping us if you want to help developing this feature to a stable version :) |
| 123 | + |
| 124 | +## Docker Rootless |
| 125 | + |
| 126 | +*Docker Rootless* is the endorsed default runtime configuration of the *runc* runtime that ships with *docker* and is officially endorsed by GMT. |
| 127 | + |
| 128 | +Making containers rootless comes with some trade-offs: |
| 129 | + |
| 130 | +### Pros |
| 131 | + |
| 132 | +- Higher security. If containers are escaped no true root is possible |
| 133 | +- No *bridges* or *nftables* rules are created and might pollute host networking rules |
| 134 | + |
| 135 | +### Cons |
| 136 | + |
| 137 | +- Docker networking is completely done in user space via *slirp4netns* and thus very inefficient |
| 138 | +- Configuration of *slirp4netns* is another tool to learn to create custom networking rules for docker containers |
| 139 | + |
| 140 | +## More runtimes? |
| 141 | + |
| 142 | +Technically more runtimes can be supported as long as they are *containerd* compatible. |
| 143 | + |
| 144 | +This requirements comes from the fact that many native *docker* functionalities are used inside of GMT: |
| 145 | + |
| 146 | +- `docker exec` |
| 147 | +- `docker logs` |
| 148 | +- `docker network` |
| 149 | +- `docker run` |
| 150 | +- `docker images` |
| 151 | +- etc. |
| 152 | + |
| 153 | +## Security |
| 154 | + |
| 155 | +When setting your system up with alternative runtimes that need a *docker* root daemon you might want to lock out the default runtimes that ship with *docker*. Typically these are: |
| 156 | + |
| 157 | +- *runc* |
| 158 | +- *io.containerd.runc.v2* |
| 159 | + |
| 160 | +But double check with `docker info` |
| 161 | + |
| 162 | +### Disable run |
| 163 | + |
| 164 | +The easiest way to disable `runc` is to introduce an *AppArmor* or *SELinux* rule. Since GMT favors *Ubuntu/Debian* here is an example for *AppArmor*. |
| 165 | + |
| 166 | +First check where your `runc` binary is with `$ realpath $(which runc)`. The typical location is `/usr/local/bin/runc`. |
| 167 | + |
| 168 | +Then create a file at `/etc/apparmor.d/runc`: |
| 169 | + |
| 170 | +```AppArmor |
| 171 | +# Block execution of runc |
| 172 | +profile runc-deny /usr/local/bin/runc { |
| 173 | + /usr/local/bin/runc ix, |
| 174 | + deny /** mrwklx, |
| 175 | +} |
| 176 | +``` |
| 177 | + |
| 178 | +Test with `runc` or `docker run --rm -it --runtime runc ubuntu bash`. It should fail. |
| 179 | +Also `docker run --rm -it --runtime io.containerd.runc.v2 ubuntu bash` should fail. |
0 commit comments