feat: Support running createContainer hooks in CDI spec by copybara-service[bot] · Pull Request #13162 · google/gvisor

copybara-service · 2026-05-13T06:21:58Z

feat: Support running createContainer hooks in CDI spec

Description

This commit adds the ability for gVisor to run createContainer hooks in the CDI spec. This is needed to support NVIDIA's k8s-device-plugin running in DEVICE_LIST_STRATEGY=cdi-cri. In this mode, the plugin creates a CDI spec file at /var/run/cdi/[...].json that contains instructions on how to mount GPU devices, which client libraries to bind-mount into the container, and which nvidia-ctk hooks need to be run.

While the device cdev and client library injection mechanism already worked with gVisor, the createContainer hooks that created the library symlinks (e.g. /usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1) and updated the ldconfig cache (nvidia-ctk hook update-ldcache) were missing. This meant that processes inside the container could not resolve the client libraries and thus did not know how to communicate with the /dev/nvidiactl and /dev/nvidia${n} cdevs. The CDI spec file contains the instructions on how to do this, so now gVisor follows it.

gVisor previously solved this problem by using the nvidia-container-cli configure command. This largely did the same things that the CDI spec file instructs us to do, but it is a legacy path and is not using CDI at all.

How it Works

In gofer_mount.go, the code is changed to have explicit understandings
as to what is the containerRootFs (usually under /var/lib/.../root) and
the goferRootFs (/proc/fs). The issue with nvidia-ctk hooks is that they
would pivot_root(2) into the containerRootFs while gVisor would operate
under the goferRootFs. This meant that nvidia-ctk did not see any CDI
devices mounted into the containerRootFs.

This commit changes gVisor such that all devices and setup is done under
the containerRootFs. We then bind-mount containerRootFs into goferRootFs
after running the CreateContainer hooks. The gofer pivot_roots into the
goferRootFs as before.

Note that createContainer hooks are only run if the underlying rootfs is
writable. There are many scenarios, such as when using EROFS, where
createContainer hooks can't be executed. This problem will be saved for
another day to solve.

Result

I ran this on an H200 system and confirmed both nvidia-smi:

root@debug-pod:/# nvidia-smi
Tue Apr 28 19:34:25 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20             Driver Version: 580.126.20     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    Off |   N/A              Off |                    0 |
| N/A   27C    P0             76W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@debug-pod:/#

And CUDA vectoradd:

lclipp@CW-HP216DG9DT-L gvisor % k logs cuda-vectoradd-kata-gvisor
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

both work.

This supersedes #13024 because this method does not touch the legacy hook-based NVIDIA_DEVICES method. This PR makes gVisor fully compatible with the NVIDIA k8s-device-plugin when using it in cdi-cri mode.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13034 from LandonTClipp:k8s-device-plugin-support ae18a84

google-cla · 2026-05-13T06:22:09Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Description ------------ This commit adds the ability for gVisor to run createContainer hooks in the CDI spec. This is needed to support NVIDIA's k8s-device-plugin running in `DEVICE_LIST_STRATEGY=cdi-cri`. In this mode, the plugin creates a CDI spec file at `/var/run/cdi/[...].json` that contains instructions on how to mount GPU devices, which client libraries to bind-mount into the container, and which `nvidia-ctk` hooks need to be run. While the device cdev and client library injection mechanism already worked with gVisor, the createContainer hooks that created the library symlinks (e.g. `/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1`) and updated the ldconfig cache (`nvidia-ctk hook update-ldcache`) were missing. This meant that processes inside the container could not resolve the client libraries and thus did not know how to communicate with the `/dev/nvidiactl` and `/dev/nvidia${n}` cdevs. The CDI spec file contains the instructions on how to do this, so now gVisor follows it. gVisor previously solved this problem by using the `nvidia-container-cli configure` command. This largely did the same things that the CDI spec file instructs us to do, but it is a legacy path and is not using CDI at all. How it Works ------------ In gofer_mount.go, the code is changed to have explicit understandings as to what is the containerRootFs (usually under /var/lib/.../root) and the goferRootFs (/proc/fs). The issue with nvidia-ctk hooks is that they would pivot_root(2) into the containerRootFs while gVisor would operate under the goferRootFs. This meant that nvidia-ctk did not see any CDI devices mounted into the containerRootFs. This commit changes gVisor such that all devices and setup is done under the containerRootFs. We then bind-mount containerRootFs into goferRootFs after running the CreateContainer hooks. The gofer pivot_roots into the goferRootFs as before. Note that createContainer hooks are only run if the underlying rootfs is writable. There are many scenarios, such as when using EROFS, where createContainer hooks can't be executed. This problem will be saved for another day to solve. Result ------- I ran this on an H200 system and confirmed both nvidia-smi: ``` root@debug-pod:/# nvidia-smi Tue Apr 28 19:34:25 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H200 Off | N/A Off | 0 | | N/A 27C P0 76W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ root@debug-pod:/# ``` And CUDA vectoradd: ``` lclipp@CW-HP216DG9DT-L gvisor % k logs cuda-vectoradd-kata-gvisor [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done ``` both work. This supersedes #13024 because this method does not touch the legacy hook-based NVIDIA_DEVICES method. This PR makes gVisor fully compatible with the NVIDIA k8s-device-plugin when using it in cdi-cri mode. FUTURE_COPYBARA_INTEGRATE_REVIEW=#13034 from LandonTClipp:k8s-device-plugin-support ae18a84 PiperOrigin-RevId: 914666567

copybara-service Bot added the exported Issue was exported automatically label May 13, 2026

copybara-service Bot force-pushed the test/cl914666567 branch 5 times, most recently from 3407471 to 59765f4 Compare May 13, 2026 20:08

copybara-service Bot force-pushed the test/cl914666567 branch from 59765f4 to 4ebbcf3 Compare May 13, 2026 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support running createContainer hooks in CDI spec#13162

feat: Support running createContainer hooks in CDI spec#13162
copybara-service[bot] wants to merge 1 commit into
masterfrom
test/cl914666567

copybara-service Bot commented May 13, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

copybara-service Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How it Works

Result

Uh oh!

google-cla Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

copybara-service Bot commented May 13, 2026 •

edited

Loading