GPU compute and Podman #8666

sirredbeard · 2022-07-30T01:23:53Z

Docker Desktop currently supports GPU compute on WSL.

I am trying to use GPU compute with Podman, a drop-in Docker replacement.

The first issue I had was nvidia-smi not even finding the Nvidia drivers, this is solved.

The second issue, detailed below, is despite nvidia-smi working and detecting the GPU, other applications cannot.

Issue 1: nvidia-smi unable to locate libnvidia-ml.so: SOLVED

Process:

podman run --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c 'export PATH=$PATH:/usr/lib/wsl/lib ; /usr/lib/wsl/lib/nvidia-smi'

nvidia-smi executes, but it looks for libnvidia-ml.so, which isn't present, it doesn't seem to be auto-mounted into WSL like the other Nvidia drivers.

nvidia-smi complains:

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.

The Microsoft virtual GPU devices does appear to be successfully passing through to the Podman container with --gpus all set, compare:

lspci | grep 3D

on a plain WSL instance with the following inside a Podman container:

$ podman run --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c 'apt update ; apt install pciutils -y ; lspci | grep 3D'

The blocker was nvidia-smi looking for libnvidia-ml.so, even though it doesn't seem to need it to run in plain WSL. If you compare the straces of nvidia-smi inside Podman, Docker, and plain WSL, some interesting patterns appeared.

nvidia-smi in both Podman and Docker seems to search around for libpthread.so.0 in different places.

But then in Docker, at strace log line 115, nvidia-smi jumps straight to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 whereas in Podman, nvidia-smi continues hunting for libpthread.so.0.

Installing non-WSL libnvidia-compute-* or nvidia-cuda-dev from the Ubuntu archives did not work, nvidia-smi complains; "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running" as those are not the WSL-aware NVIDIA drivers.

Installing the Ubuntu WSL-aware drivers directly from Nvidia did not seem to make a difference:

root@d28dc30f8410:/# find . -name "libnvidia-ml.so"
./usr/local/cuda-11.7/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
find: './proc/tty/driver': Permission denied
root@d28dc30f8410:/# echo $PATH
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/cuda-11.7/targets/x86_64-linux/lib/stubs/
root@d28dc30f8410:/# ./usr/lib/wsl/lib/nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

Mounting /lib/x86_64-linux-gnu/ from the host into the Podman container causes issues with dynamic libraries, as to be expected.

Took and compared logs:
nvidia-smi_strace_outside_container.log
nvidia-smi_strace_inside_docker_container.log
nvidia-smi_strace_inside_podman_container.txt
nvidia-smi_strace_inside_podman_container_with_ubuntuwslnvidiadrivers.log

Issue 1: Fixed by symlinking /usr/lib/wsl/lib/* into /usr/lib/x86_64-linux-gnu

Issue 2: nvidia-smi can see GPU, but GPU-aware tools cannot see GPU

nvidia-smi can see my GPU on Podman:

podman run -t --device=/dev/dxg --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nvidia-smi'

Same as Docker:

docker run --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c 'export PATH=$PATH:/usr/lib/wsl/lib ; /usr/lib/wsl/lib/nvidia-smi'

But benchmarking fails on Podman:

podman run -t --device=/dev/dxg --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/k8s/cuda-sample:nbody bash -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nbody -gpu -benchmark'

Error: only 0 Devices available, 1 requested.  Exiting.

It fails on Docker too, but only because it can't open the display, it finds the GPU:

> 1 Devices used for simulation

I can reproduce this with the determined AI agent, which does not pick up the GPU on Podman, but does on Docker.

The text was updated successfully, but these errors were encountered:

glaudiston · 2023-11-21T14:36:51Z

This is still an issue for me using WSL on windows 11 and podman 3.4.4 . Why was this closed, how did you got it working?

sirredbeard · 2023-11-28T20:22:31Z

@glaudiston The issue is that Podman does not by default support the Nvidia Container Runtime, though it is possible to add it.

sirredbeard closed this as completed Jul 30, 2022

sirredbeard reopened this Jul 30, 2022

sirredbeard closed this as completed Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU compute and Podman #8666

GPU compute and Podman #8666

sirredbeard commented Jul 30, 2022 •

edited

Loading

glaudiston commented Nov 21, 2023

sirredbeard commented Nov 28, 2023

GPU compute and Podman #8666

GPU compute and Podman #8666

Comments

sirredbeard commented Jul 30, 2022 • edited Loading

glaudiston commented Nov 21, 2023

sirredbeard commented Nov 28, 2023

sirredbeard commented Jul 30, 2022 •

edited

Loading