Summary
When running codex exec --sandbox workspace-write (or read-only) on Linux, any subprocess that attempts to use a CUDA GPU fails with RuntimeError: Found no NVIDIA driver on your system. The bubblewrap-based sandbox does not bind-mount /dev/nvidia* or /dev/nvidiactl into the sandbox namespace, so PyTorch / TensorFlow / any GPU-using tool sees no driver.
Reproduction
# Confirm GPU works outside the sandbox
$ python3 -c "import torch; print(torch.cuda.is_available())"
True
# Same command via Codex with workspace-write
$ codex exec --sandbox workspace-write \
"python3 -c 'import torch; print(torch.cuda.is_available())'"
# Output:
RuntimeError: Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
The full traceback originates from torch.cuda._lazy_init calling torch._C._cuda_init, which fails because /dev/nvidiactl is not accessible in the sandbox namespace.
Environment
codex-cli 0.125.0-alpha.3+looper.7dc18d9c4
- WSL2 (
6.6.87.2-microsoft-standard-WSL2) on Windows host
- NVIDIA RTX 3090, driver 580.x, CUDA 12.4
- bubblewrap available on PATH
Why this matters
Any agent task that needs to invoke CUDA workloads as a subprocess (training, inference, profiling, ML benchmarks) is blocked unless the user falls back to --sandbox danger-full-access, which removes ALL sandbox protections (network egress, filesystem isolation). This is a poor security trade-off for an otherwise-narrow requirement: the task only needs /dev/nvidia* access, not unrestricted filesystem write or network.
A concrete example: I'm building an autonomous research loop where a Codex agent designs surgery on a transformer model, runs an evaluation harness (which loads the model on GPU), and records results. The sandbox's network and filesystem isolation are exactly what I want — except the harness can't run because the sandbox blocks /dev/nvidiactl.
Suggested fix(es)
In ascending order of effort:
-
Auto-detect GPU and bind-mount when present. When bubblewrap detects /dev/nvidia* exists on the host, add --dev-bind /dev/nvidia0 /dev/nvidia0 etc. by default in non-read-only modes. Low risk — these device files are present only on systems with NVIDIA drivers, and exposing them doesn't grant network or filesystem-write access.
-
Add a gpu_passthrough config flag. Extend SandboxPolicy::WorkspaceWrite with a gpu_passthrough: bool field. When true, bind-mount /dev/nvidia* and /dev/nvidiactl. Document the security trade-off (a malicious agent could spin GPU compute, potentially exhausting power or melting silicon, but cannot exfiltrate via network or write outside cwd).
-
Extend writable_roots to support device files. Allow paths like /dev/nvidia0 in writable_roots. Codex translates these to bubblewrap --dev-bind rather than --bind. Treats it as opt-in.
(1) is friendliest to the common case of "user has GPU, wants Codex to use it"; (2) gives explicit consent; (3) maximally generic.
Workaround in the meantime
--sandbox danger-full-access works but defeats the purpose. I'm using it for now and documenting the limitation in my project's RFC as a blocking prerequisite for moving to production.
Related code paths
~/git/codex/codex-rs/sandboxing/src/bwrap.rs — bwrap invocation
~/git/codex/codex-rs/protocol/src/protocol.rs — SandboxPolicy::WorkspaceWrite definition (writable_roots, network_access, exclude_tmpdir_env_var, exclude_slash_tmp — no GPU/dev option today)
Happy to draft a PR if (1) is the preferred direction.
Summary
When running
codex exec --sandbox workspace-write(orread-only) on Linux, any subprocess that attempts to use a CUDA GPU fails withRuntimeError: Found no NVIDIA driver on your system. The bubblewrap-based sandbox does not bind-mount/dev/nvidia*or/dev/nvidiactlinto the sandbox namespace, so PyTorch / TensorFlow / any GPU-using tool sees no driver.Reproduction
The full traceback originates from
torch.cuda._lazy_initcallingtorch._C._cuda_init, which fails because/dev/nvidiactlis not accessible in the sandbox namespace.Environment
codex-cli 0.125.0-alpha.3+looper.7dc18d9c46.6.87.2-microsoft-standard-WSL2) on Windows hostWhy this matters
Any agent task that needs to invoke CUDA workloads as a subprocess (training, inference, profiling, ML benchmarks) is blocked unless the user falls back to
--sandbox danger-full-access, which removes ALL sandbox protections (network egress, filesystem isolation). This is a poor security trade-off for an otherwise-narrow requirement: the task only needs/dev/nvidia*access, not unrestricted filesystem write or network.A concrete example: I'm building an autonomous research loop where a Codex agent designs surgery on a transformer model, runs an evaluation harness (which loads the model on GPU), and records results. The sandbox's network and filesystem isolation are exactly what I want — except the harness can't run because the sandbox blocks
/dev/nvidiactl.Suggested fix(es)
In ascending order of effort:
Auto-detect GPU and bind-mount when present. When bubblewrap detects
/dev/nvidia*exists on the host, add--dev-bind /dev/nvidia0 /dev/nvidia0etc. by default in non-read-onlymodes. Low risk — these device files are present only on systems with NVIDIA drivers, and exposing them doesn't grant network or filesystem-write access.Add a
gpu_passthroughconfig flag. ExtendSandboxPolicy::WorkspaceWritewith agpu_passthrough: boolfield. When true, bind-mount/dev/nvidia*and/dev/nvidiactl. Document the security trade-off (a malicious agent could spin GPU compute, potentially exhausting power or melting silicon, but cannot exfiltrate via network or write outside cwd).Extend
writable_rootsto support device files. Allow paths like/dev/nvidia0inwritable_roots. Codex translates these to bubblewrap--dev-bindrather than--bind. Treats it as opt-in.(1) is friendliest to the common case of "user has GPU, wants Codex to use it"; (2) gives explicit consent; (3) maximally generic.
Workaround in the meantime
--sandbox danger-full-accessworks but defeats the purpose. I'm using it for now and documenting the limitation in my project's RFC as a blocking prerequisite for moving to production.Related code paths
~/git/codex/codex-rs/sandboxing/src/bwrap.rs— bwrap invocation~/git/codex/codex-rs/protocol/src/protocol.rs—SandboxPolicy::WorkspaceWritedefinition (writable_roots,network_access,exclude_tmpdir_env_var,exclude_slash_tmp— no GPU/dev option today)Happy to draft a PR if (1) is the preferred direction.