feat: Support running createContainer hooks in CDI spec#13162
Open
copybara-service[bot] wants to merge 1 commit into
Open
feat: Support running createContainer hooks in CDI spec#13162copybara-service[bot] wants to merge 1 commit into
copybara-service[bot] wants to merge 1 commit into
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
3407471 to
59765f4
Compare
Description
------------
This commit adds the ability for gVisor to run createContainer hooks in the CDI spec. This is needed to support NVIDIA's k8s-device-plugin running in `DEVICE_LIST_STRATEGY=cdi-cri`. In this mode, the plugin creates a CDI spec file at `/var/run/cdi/[...].json` that contains instructions on how to mount GPU devices, which client libraries to bind-mount into the container, and which `nvidia-ctk` hooks need to be run.
While the device cdev and client library injection mechanism already worked with gVisor, the createContainer hooks that created the library symlinks (e.g. `/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1`) and updated the ldconfig cache (`nvidia-ctk hook update-ldcache`) were missing. This meant that processes inside the container could not resolve the client libraries and thus did not know how to communicate with the `/dev/nvidiactl` and `/dev/nvidia${n}` cdevs. The CDI spec file contains the instructions on how to do this, so now gVisor follows it.
gVisor previously solved this problem by using the `nvidia-container-cli configure` command. This largely did the same things that the CDI spec file instructs us to do, but it is a legacy path and is not using CDI at all.
How it Works
------------
In gofer_mount.go, the code is changed to have explicit understandings
as to what is the containerRootFs (usually under /var/lib/.../root) and
the goferRootFs (/proc/fs). The issue with nvidia-ctk hooks is that they
would pivot_root(2) into the containerRootFs while gVisor would operate
under the goferRootFs. This meant that nvidia-ctk did not see any CDI
devices mounted into the containerRootFs.
This commit changes gVisor such that all devices and setup is done under
the containerRootFs. We then bind-mount containerRootFs into goferRootFs
after running the CreateContainer hooks. The gofer pivot_roots into the
goferRootFs as before.
Note that createContainer hooks are only run if the underlying rootfs is
writable. There are many scenarios, such as when using EROFS, where
createContainer hooks can't be executed. This problem will be saved for
another day to solve.
Result
-------
I ran this on an H200 system and confirmed both nvidia-smi:
```
root@debug-pod:/# nvidia-smi
Tue Apr 28 19:34:25 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H200 Off | N/A Off | 0 |
| N/A 27C P0 76W / 700W | 0MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
root@debug-pod:/#
```
And CUDA vectoradd:
```
lclipp@CW-HP216DG9DT-L gvisor % k logs cuda-vectoradd-kata-gvisor
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
```
both work.
This supersedes #13024 because this method does not touch the legacy hook-based NVIDIA_DEVICES method. This PR makes gVisor fully compatible with the NVIDIA k8s-device-plugin when using it in cdi-cri mode.
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13034 from LandonTClipp:k8s-device-plugin-support ae18a84
PiperOrigin-RevId: 914666567
59765f4 to
4ebbcf3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: Support running createContainer hooks in CDI spec
Description
This commit adds the ability for gVisor to run createContainer hooks in the CDI spec. This is needed to support NVIDIA's k8s-device-plugin running in
DEVICE_LIST_STRATEGY=cdi-cri. In this mode, the plugin creates a CDI spec file at/var/run/cdi/[...].jsonthat contains instructions on how to mount GPU devices, which client libraries to bind-mount into the container, and whichnvidia-ctkhooks need to be run.While the device cdev and client library injection mechanism already worked with gVisor, the createContainer hooks that created the library symlinks (e.g.
/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1) and updated the ldconfig cache (nvidia-ctk hook update-ldcache) were missing. This meant that processes inside the container could not resolve the client libraries and thus did not know how to communicate with the/dev/nvidiactland/dev/nvidia${n}cdevs. The CDI spec file contains the instructions on how to do this, so now gVisor follows it.gVisor previously solved this problem by using the
nvidia-container-cli configurecommand. This largely did the same things that the CDI spec file instructs us to do, but it is a legacy path and is not using CDI at all.How it Works
In gofer_mount.go, the code is changed to have explicit understandings
as to what is the containerRootFs (usually under /var/lib/.../root) and
the goferRootFs (/proc/fs). The issue with nvidia-ctk hooks is that they
would pivot_root(2) into the containerRootFs while gVisor would operate
under the goferRootFs. This meant that nvidia-ctk did not see any CDI
devices mounted into the containerRootFs.
This commit changes gVisor such that all devices and setup is done under
the containerRootFs. We then bind-mount containerRootFs into goferRootFs
after running the CreateContainer hooks. The gofer pivot_roots into the
goferRootFs as before.
Note that createContainer hooks are only run if the underlying rootfs is
writable. There are many scenarios, such as when using EROFS, where
createContainer hooks can't be executed. This problem will be saved for
another day to solve.
Result
I ran this on an H200 system and confirmed both nvidia-smi:
And CUDA vectoradd:
both work.
This supersedes #13024 because this method does not touch the legacy hook-based NVIDIA_DEVICES method. This PR makes gVisor fully compatible with the NVIDIA k8s-device-plugin when using it in cdi-cri mode.
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13034 from LandonTClipp:k8s-device-plugin-support ae18a84