You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Docker Desktop currently supports GPU compute on WSL.
I am trying to use GPU compute with Podman, a drop-in Docker replacement.
The first issue I had was nvidia-smi not even finding the Nvidia drivers, this is solved.
The second issue, detailed below, is despite nvidia-smi working and detecting the GPU, other applications cannot.
Issue 1: nvidia-smi unable to locate libnvidia-ml.so: SOLVED
Process:
podman run --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c 'export PATH=$PATH:/usr/lib/wsl/lib ; /usr/lib/wsl/lib/nvidia-smi'
nvidia-smi executes, but it looks for libnvidia-ml.so, which isn't present, it doesn't seem to be auto-mounted into WSL like the other Nvidia drivers.
nvidia-smi complains:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.
The Microsoft virtual GPU devices does appear to be successfully passing through to the Podman container with --gpus all set, compare:
lspci | grep 3D
on a plain WSL instance with the following inside a Podman container:
The blocker was nvidia-smi looking for libnvidia-ml.so, even though it doesn't seem to need it to run in plain WSL. If you compare the straces of nvidia-smi inside Podman, Docker, and plain WSL, some interesting patterns appeared.
nvidia-smi in both Podman and Docker seems to search around for libpthread.so.0 in different places.
But then in Docker, at strace log line 115, nvidia-smi jumps straight to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 whereas in Podman, nvidia-smi continues hunting for libpthread.so.0.
Installing non-WSL libnvidia-compute-* or nvidia-cuda-dev from the Ubuntu archives did not work, nvidia-smi complains; "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running" as those are not the WSL-aware NVIDIA drivers.
root@d28dc30f8410:/# find . -name "libnvidia-ml.so"
./usr/local/cuda-11.7/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
find: './proc/tty/driver': Permission denied
root@d28dc30f8410:/# echo $PATH
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/cuda-11.7/targets/x86_64-linux/lib/stubs/
root@d28dc30f8410:/# ./usr/lib/wsl/lib/nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
Mounting /lib/x86_64-linux-gnu/ from the host into the Podman container causes issues with dynamic libraries, as to be expected.
Docker Desktop currently supports GPU compute on WSL.
I am trying to use GPU compute with Podman, a drop-in Docker replacement.
The first issue I had was nvidia-smi not even finding the Nvidia drivers, this is solved.
The second issue, detailed below, is despite nvidia-smi working and detecting the GPU, other applications cannot.
Issue 1: nvidia-smi unable to locate libnvidia-ml.so: SOLVED
Process:
nvidia-smi executes, but it looks for libnvidia-ml.so, which isn't present, it doesn't seem to be auto-mounted into WSL like the other Nvidia drivers.
nvidia-smi complains:
The Microsoft virtual GPU devices does appear to be successfully passing through to the Podman container with
--gpus all
set, compare:on a plain WSL instance with the following inside a Podman container:
The blocker was nvidia-smi looking for libnvidia-ml.so, even though it doesn't seem to need it to run in plain WSL. If you compare the straces of nvidia-smi inside Podman, Docker, and plain WSL, some interesting patterns appeared.
nvidia-smi in both Podman and Docker seems to search around for
libpthread.so.0
in different places.But then in Docker, at strace log line 115, nvidia-smi jumps straight to
/lib/x86_64-linux-gnu/libnvidia-ml.so.1
whereas in Podman, nvidia-smi continues hunting forlibpthread.so.0
.Installing non-WSL
libnvidia-compute-*
ornvidia-cuda-dev
from the Ubuntu archives did not work, nvidia-smi complains; "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running" as those are not the WSL-aware NVIDIA drivers.Installing the Ubuntu WSL-aware drivers directly from Nvidia did not seem to make a difference:
Mounting
/lib/x86_64-linux-gnu/
from the host into the Podman container causes issues with dynamic libraries, as to be expected.Took and compared logs:
nvidia-smi_strace_outside_container.log
nvidia-smi_strace_inside_docker_container.log
nvidia-smi_strace_inside_podman_container.txt
nvidia-smi_strace_inside_podman_container_with_ubuntuwslnvidiadrivers.log
Issue 1: Fixed by symlinking /usr/lib/wsl/lib/* into /usr/lib/x86_64-linux-gnu
Issue 2: nvidia-smi can see GPU, but GPU-aware tools cannot see GPU
nvidia-smi can see my GPU on Podman:
Same as Docker:
But benchmarking fails on Podman:
It fails on Docker too, but only because it can't open the display, it finds the GPU:
I can reproduce this with the determined AI agent, which does not pick up the GPU on Podman, but does on Docker.
The text was updated successfully, but these errors were encountered: