Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yet another “Driver Not Loaded / can’t communicate with the NVIDIA driver” error on Windows 21376co_release.210503-1432 #6925

Closed
1 of 2 tasks
Marietto2008 opened this issue May 9, 2021 · 7 comments
Labels

Comments

@Marietto2008
Copy link

Windows Build Number

21376co_release.210503-1432

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.10.16.3-microsoft-standard-WSL2

Distro Version

ubuntu 20.04

Other Software

Docker version 20.10.6, build 370c289 (installed with sudo apt-get install nvidia-docker2)

Repro Steps

These are the commands that I have issued (taken from here : https://dilililabs.com/zh/blog/2021/01/26/deploying-docker-with-gpu-support-on-windows-subsystem-for-linux/

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo sh -c 'echo "deb

http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list' sudo apt-get update

sudo apt-get install cuda-toolkit-11-0
curl https://get.docker.com | sh
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list

sudo apt-get update

sudo apt-get install nvidia-docker2 cuda-toolkit-11-0 cuda-drivers

sudo service docker start

Expected Behavior

I expect that the nvidia driver can communicate.

Actual Behavior

docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04

Unable to find image 'nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04' locally
11.0-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
171857c49d0f: Pull complete
419640447d26: Pull complete
61e52f862619: Pull complete
2a93278deddf: Pull complete
c9f080049843: Pull complete
8189556b2329: Pull complete
c306a0c97a55: Pull complete
4a9478bd0b24: Pull complete
19a76c31766d: Pull complete
Digest: sha256:11777cee30f0bbd7cb4a3da562fdd0926adb2af02069dad7cf2e339ec1dad036
Status: Downloaded newer image for nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

IN ADDITION :

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Failed to properly shut down NVML: Driver Not Loaded

Diagnostic Logs

No response

@Marietto2008 Marietto2008 changed the title Yet another “Driver Not Loaded / can’t communicate with the NVIDIA driver” error while trying to deploy a docker container with GPU support on WSL2 Yet another “Driver Not Loaded / can’t communicate with the NVIDIA driver” error on Windows 21376co_release.210503-1432 / Cuda 11 / Nvidia driver 470.14 on the host May 9, 2021
@Marietto2008 Marietto2008 changed the title Yet another “Driver Not Loaded / can’t communicate with the NVIDIA driver” error on Windows 21376co_release.210503-1432 / Cuda 11 / Nvidia driver 470.14 on the host Yet another “Driver Not Loaded / can’t communicate with the NVIDIA driver” error on Windows 21376co_release.210503-1432 May 9, 2021
@benhillis benhillis added the GPU label May 10, 2021
@tianguangye
Copy link

Hello, I encountered the same problem with the same software version as I started installation of wsl2 gpu support 2 days ago on a newly activated window 10 notebook. (WIP Build 21376.co_release.210503-1432).

As the discussions in issue-6773 explained, the preview build 21359 fixes this phenomenon. I am wondering if there's a version in between which is reliable enough and we could check back to in order to use gpu under wsl2?

@Marietto2008
Copy link
Author

which fixes are u using ?

@Marietto2008
Copy link
Author

the workaround is here :

NVIDIA/nvidia-docker#1496 (comment)

@tianguangye
Copy link

Well I tried the fixes, the same problem remains as you mentioned above when invoking the nvidia-smi command and the docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 container.

@tianguangye
Copy link

Hello,

I was able to launch a jupyer notebook (tensorflow/tensorflow:latest-gpu-py3-jupyter) under wsl2 ubuntu18.04, and train a classifer with GPU support.

Great thanks for your help!

It seems, however, that the error of 'nvidia-smi' command still exists. Looking forward for a future update of the nvidia driver!

Guangye

Copy link
Contributor

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants