You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Support running createContainer hooks in CDI spec
Description
------------
This commit adds the ability for gVisor to run createContainer hooks in
the CDI spec. This is needed to support NVIDIA's k8s-device-plugin
running in `DEVICE_LIST_STRATEGY=cdi-cri`. In this mode, the plugin
creates a CDI spec file at `/var/run/cdi/[...].json` that contains
instructions on how to mount GPU devices, which client libraries to
bind-mount into the container, and which `nvidia-ctk` hooks need to be
run.
While the device cdev and client library injection mechanism already
worked with gVisor, the createContainer hooks that created the library
symlinks (e.g. `/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1`)
and updated the ldconfig cache (`nvidia-ctk hook update-ldcache`) were
missing. This meant that processes inside the container could not
resolve the client libraries and thus did not know how to communicate
with the `/dev/nvidiactl` and `/dev/nvidia${n}` cdevs. The CDI spec file
contains the instructions on how to do this, so now gVisor follows it.
gVisor previously solved this problem by using the `nvidia-container-cli
configure` command. This largely did the same things that the CDI spec
file instructs us to do, but it is a legacy path and is not using CDI at
all.
How it Works
------------
In gofer_mount.go, the code is changed to have explicit understandings
as to what is the containerRootFs (usually under /var/lib/.../root) and
the goferRootFs (/proc/fs). The issue with nvidia-ctk hooks is that they
would pivot_root(2) into the containerRootFs while gVisor would operate
under the goferRootFs. This meant that nvidia-ctk did not see any CDI
devices mounted into the containerRootFs.
This commit changes gVisor such that all devices and setup is done under
the containerRootFs. We then bind-mount containerRootFs into goferRootFs
after running the CreateContainer hooks. The gofer pivot_roots into the
goferRootFs as before.
Note that createContainer hooks are only run if the underlying rootfs is
writable. There are many scenarios, such as when using EROFS, where
createContainer hooks can't be executed. This problem will be saved for
another day to solve.
Signed-off-by: LandonTClipp <lclipp@coreweave.com>
0 commit comments