runsc: skip no-op CDI disable-device-node-modification hook under nvproxy#13308
Closed
a7i wants to merge 1 commit into
Closed
runsc: skip no-op CDI disable-device-node-modification hook under nvproxy#13308a7i wants to merge 1 commit into
a7i wants to merge 1 commit into
Conversation
…roxy
NVIDIA's k8s-device-plugin and `nvidia-ctk cdi generate` emit a CDI
createContainer hook `nvidia-ctk hook disable-device-node-modification`,
which bind-mounts a modified /proc/driver/nvidia/params (ModifyDeviceFiles=0)
over the container's procfs so in-container libnvidia-ml does not try to
recreate /dev/nvidiaN device files.
Under nvproxy this hook is a no-op: gVisor already exposes
/proc/driver/nvidia/params inside the sandbox with ModifyDeviceFiles forced
to 0 (see pkg/sentry/devices/nvproxy/nvproxy.go's procDriverNvidiaParams
setup, consistent with libnvidia-container's src/nvc_mount.c:mount_procfs()),
and nvproxy mediates all device access regardless.
In addition, executing the hook is currently fatal during sandbox setup
because the hook opens <containerRootFs>/proc/driver/nvidia/params, but
procfs is not mounted into containerRootFs at hook time (the sentry serves
/proc itself later), so the open(2) fails with ENOENT and aborts container
creation:
hooks.go:63] Executing hook nvidia-ctk hook disable-device-node-modification
util.go:107] FATAL ERROR: error executing CreateContainer hooks
stderr: failed to mount modified params file: open o_path procfd:
open /run/containerd/.../rootfs/proc/driver/nvidia/params:
no such file or directory
Filter the hook out before executing CreateContainer hooks when nvproxy is
enabled. This is functionally equivalent to making the hook succeed and is
simpler than google#13284 (no host-filesystem bind-mount into containerRootFs);
the approach was suggested by @ayushr2 on google#13284.
Fixes google#13283.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #13283. Alternative approach to #13284, per @ayushr2's suggestion on that PR.
What this does
NVIDIA's
k8s-device-pluginandnvidia-ctk cdi generateemit a CDIcreateContainerhooknvidia-ctk hook disable-device-node-modificationthat bind-mounts a modified/proc/driver/nvidia/params(withModifyDeviceFiles=0) over the container's procfs, so in-containerlibnvidia-mldoes not try to recreate/dev/nvidiaNdevice files.Under
nvproxythis hook is a no-op for two reasons:nvproxymediates all NVIDIA device access and the sentry owns/dev, so in-containerlibnvidia-mlcannot create new device files regardless of what/proc/driver/nvidia/paramssays./proc/driver/nvidia/paramsinside the sandbox withModifyDeviceFilesforced to 0; seepkg/sentry/devices/nvproxy/nvproxy.go, consistent withlibnvidia-container'ssrc/nvc_mount.c:mount_procfs().In addition, executing the hook is currently fatal during sandbox setup because the hook opens
<containerRootFs>/proc/driver/nvidia/params, but procfs is not mounted into containerRootFs at hook time (the sentry serves/procitself later), so theopen(2)fails withENOENTand aborts container creation:This change filters that hook out of
spec.Hooks.CreateContainerbefore invokingspecutils.ExecuteHooks, whennvproxyis enabled. The filter is targeted at the specific argv shape (nvidia-ctk hook disable-device-node-modification) and is a no-op for any other hook.Sequence
Why this instead of #13284
@ayushr2 pointed out on #13284 that the hook's only semantic effect is something
nvproxyalready does. Skipping is therefore functionally equivalent to making the hook succeed (as #13284 does by bind-mounting host/proc/driver/nvidiainto containerRootFs), and is simpler:/proc/driver/nvidiainto per-container rootfs (sentry then shadows/procagain, so the hook's write is overwritten anyway)SetupNvidiaProcDriver+os.MkdirAll+ bind mount + EROFS handlingExecuteHooksMkdirAllTests
TestIsNvidiaDisableDeviceNodeModificationHookcovers the predicate (matches the specificnvidia-ctk hook disable-device-node-modificationargv shape, rejects other subcommands, other binaries, short argv, andnvidia-ctk-prefix-but-not-suffix paths).TestFilterNVProxyNoOpHookscovers the filter behavior (nil/empty input, only-no-op input, no no-op present, no-op in various positions, interleaved with non-NVIDIA hooks).SetupRootFS's existingif rootfsConf.ShouldUseLisafs()block where it already callsspecutils.ExecuteHooks; the filter is applied right before that call, gated onspecutils.NVProxyEnabled(spec, conf). The other GPU integration tests exercise the same path and continue to apply.Verified end-to-end
Tested on a Tesla T4 host (Ubuntu 24.04, kernel 6.17, containerd 2.2.2, kubelet 1.35,
k8s-device-pluginv0.17.0 inDEVICE_LIST_STRATEGY=cdi-cri,runsc release-20260520.0with this change applied as a one-off patch to a sandbox node). Before this change, the gofer log shows theFATAL ERRORabove. With this change applied, the gofer logs three successful hook executions (create-symlinks,enable-cuda-compat,update-ldcache) and skipsdisable-device-node-modification, the sandbox starts, and a PyTorch CUDA pod reportscuda available: Trueend to end.Risks
/proc/driver/nvidiainto containerRootFs; this PR adds no new mounts.specutils.NVProxyEnabled(spec, conf).nvidia-ctk hook disable-device-node-modificationargv shape.nvproxy, hook now silently skipped rather than aborting; the sandbox starts. Undernvproxy, this is functionally equivalent to the hook succeeding because the sentry overrides/proc/driver/nvidia/paramsanyway.