Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU compute and Podman #8666

Closed
sirredbeard opened this issue Jul 30, 2022 · 2 comments
Closed

GPU compute and Podman #8666

sirredbeard opened this issue Jul 30, 2022 · 2 comments

Comments

@sirredbeard
Copy link
Contributor

sirredbeard commented Jul 30, 2022

Docker Desktop currently supports GPU compute on WSL.

I am trying to use GPU compute with Podman, a drop-in Docker replacement.

The first issue I had was nvidia-smi not even finding the Nvidia drivers, this is solved.

The second issue, detailed below, is despite nvidia-smi working and detecting the GPU, other applications cannot.

Issue 1: nvidia-smi unable to locate libnvidia-ml.so: SOLVED

Process:

podman run --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c 'export PATH=$PATH:/usr/lib/wsl/lib ; /usr/lib/wsl/lib/nvidia-smi'

nvidia-smi executes, but it looks for libnvidia-ml.so, which isn't present, it doesn't seem to be auto-mounted into WSL like the other Nvidia drivers.

nvidia-smi complains:

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.

The Microsoft virtual GPU devices does appear to be successfully passing through to the Podman container with --gpus all set, compare:

lspci | grep 3D

on a plain WSL instance with the following inside a Podman container:

$ podman run --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c 'apt update ; apt install pciutils -y ; lspci | grep 3D'

The blocker was nvidia-smi looking for libnvidia-ml.so, even though it doesn't seem to need it to run in plain WSL. If you compare the straces of nvidia-smi inside Podman, Docker, and plain WSL, some interesting patterns appeared.

nvidia-smi in both Podman and Docker seems to search around for libpthread.so.0 in different places.

But then in Docker, at strace log line 115, nvidia-smi jumps straight to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 whereas in Podman, nvidia-smi continues hunting for libpthread.so.0.

Installing non-WSL libnvidia-compute-* or nvidia-cuda-dev from the Ubuntu archives did not work, nvidia-smi complains; "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running" as those are not the WSL-aware NVIDIA drivers.

Installing the Ubuntu WSL-aware drivers directly from Nvidia did not seem to make a difference:

root@d28dc30f8410:/# find . -name "libnvidia-ml.so"
./usr/local/cuda-11.7/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
find: './proc/tty/driver': Permission denied
root@d28dc30f8410:/# echo $PATH
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/cuda-11.7/targets/x86_64-linux/lib/stubs/
root@d28dc30f8410:/# ./usr/lib/wsl/lib/nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

Mounting /lib/x86_64-linux-gnu/ from the host into the Podman container causes issues with dynamic libraries, as to be expected.

Took and compared logs:
nvidia-smi_strace_outside_container.log
nvidia-smi_strace_inside_docker_container.log
nvidia-smi_strace_inside_podman_container.txt
nvidia-smi_strace_inside_podman_container_with_ubuntuwslnvidiadrivers.log

Issue 1: Fixed by symlinking /usr/lib/wsl/lib/* into /usr/lib/x86_64-linux-gnu

Issue 2: nvidia-smi can see GPU, but GPU-aware tools cannot see GPU

nvidia-smi can see my GPU on Podman:

podman run -t --device=/dev/dxg --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nvidia-smi'

image

Same as Docker:

docker run --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04 bash -c 'export PATH=$PATH:/usr/lib/wsl/lib ; /usr/lib/wsl/lib/nvidia-smi'

image

But benchmarking fails on Podman:

podman run -t --device=/dev/dxg --gpus all --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl nvcr.io/nvidia/k8s/cuda-sample:nbody bash -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nbody -gpu -benchmark'
Error: only 0 Devices available, 1 requested.  Exiting.

image

It fails on Docker too, but only because it can't open the display, it finds the GPU:

image

> 1 Devices used for simulation

I can reproduce this with the determined AI agent, which does not pick up the GPU on Podman, but does on Docker.

@glaudiston
Copy link

This is still an issue for me using WSL on windows 11 and podman 3.4.4 . Why was this closed, how did you got it working?

@sirredbeard
Copy link
Contributor Author

@glaudiston The issue is that Podman does not by default support the Nvidia Container Runtime, though it is possible to add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants