docker run & docker-compose run fails sharing nvidia-gpu capabilities #47424
Labels
kind/bug
Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
status/0-triage
Description
I've been trying to share nvidia-gpu (for cuda/compute) to docker-container as described in:
I'm using Ubuntu 22.04 and installed recent docker and nvidia-docker2 + nvidia-container-toolkit:
Reproduce
docker run --rm -ti --gpus all --entrypoint nvidia-smi nvidia/cuda:12.3.1-runtime-ubuntu22.04
Failed to initialize NVML: Unknown Error
inside container: ls -lah /dev/nvidia* shows up nvidia-devices
Trying with docker-compose.yml results in the same problem:
Expected behavior
NVIDIA/Compute sharing should work as documented in docker docs!
docker version
Client: Docker Engine - Community Version: 25.0.2 API version: 1.44 Go version: go1.21.6 Git commit: 29cf629 Built: Thu Feb 1 00:23:03 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 25.0.2 API version: 1.44 (minimum version 1.24) Go version: go1.21.6 Git commit: fce6e0c Built: Thu Feb 1 00:23:03 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.28 GitCommit: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0
docker info
Additional Info
nvidia-container-cli --load-kmods info 16:51:59
[sudo] password for gabriel:
NVRM version: 535.146.02
CUDA version: 12.2
Device Index: 0
Device Minor: 0
Model: NVIDIA GeForce RTX 4060 Laptop GPU
Brand: GeForce
GPU UUID: GPU-46eb0b05-a309-1169-2f8d-e076379b85a3
Bus Location: 00000000:01:00.0
Architecture: 8.9
The text was updated successfully, but these errors were encountered: