Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL2 & CUDA does not work [v20226] #6014

Closed
noofaq opened this issue Oct 1, 2020 · 176 comments
Closed

WSL2 & CUDA does not work [v20226] #6014

noofaq opened this issue Oct 1, 2020 · 176 comments

Comments

@noofaq
Copy link

noofaq commented Oct 1, 2020

Environment

Windows build number: 10.0.20226.0
Your Distribution version: 18.04 / 20.04
Whether the issue is on WSL 2 and/or WSL 1: Linux version 4.19.128-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Tue Jun 23 12:58:10 UTC 2020

Steps to reproduce

Exactly followed instructions available at https://docs.nvidia.com/cuda/wsl-user-guide/index.html
Tested on previously working Ubuntu WSL image (IIRC GPU last worked on 20206, than whole WSL2 stopped working)
Tested also on newly created Ubuntu 18.04 and Ubuntu 20.04 images.

I have tested CUDA compatible NVIDIA drivers 455.41 & 460.20. I have tried removing all drivers etc.
I have also tested using CUDA 10.2 & CUDA 11.0.

It was tested on two separate machines (one Intel + GTX1060, other Ryzen + RTX 2080Ti)

Issue tested directly in OS also in docker containers inside.

Example (directly in Ubuntu):

piotr@DESKTOP-FS6J3NT:/usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Turing" with compute capability 7.5

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
CUDA error at BlackScholes.cu:116 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_CallResult, OPT_SZ)"

Example in container:

piotr@DESKTOP-FS6J3NT:/mnt/c/Users/pppnn$ docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter python
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-10-01 14:18:07.538627: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-10-01 14:18:07.624188: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-10-01 14:18:32.359457: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-10-01 14:18:32.398949: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3200035000 Hz
2020-10-01 14:18:32.402692: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d06b70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-01 14:18:32.402748: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-10-01 14:18:32.409370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-10-01 14:18:32.877228: W tensorflow/compiler/xla/service/platform_util.cc:276] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
2020-10-01 14:18:32.877370: I tensorflow/compiler/jit/xla_gpu_device.cc:136] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Internal: no supported devices found for platform CUDA
2020-10-01 14:18:32.879904: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:32.880192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:1d:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.665GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-10-01 14:18:32.880277: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-01 14:18:32.880340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-01 14:18:32.959947: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-01 14:18:32.973554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-01 14:18:33.111736: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-01 14:18:33.127902: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-01 14:18:33.128018: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-01 14:18:33.128535: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:33.129170: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:33.129403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-10-01 14:18:33.131671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/test_util.py", line 1513, in is_gpu_available
    for local_device in device_lib.list_local_devices():
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/device_lib.py", line 43, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable
>>>
>>>
>>>
>>>
>>> tf.config.list_physical_devices('GPU')
2020-10-01 14:18:55.610151: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:55.610510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:1d:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.665GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-10-01 14:18:55.610579: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-01 14:18:55.610623: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-01 14:18:55.610676: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-01 14:18:55.610719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-01 14:18:55.610762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-01 14:18:55.610805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-01 14:18:55.610846: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-01 14:18:55.611251: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:55.611765: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:55.611999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>>
>>>
>>>
>>> tf.test.gpu_device_name()
2020-10-01 14:20:08.762060: W tensorflow/compiler/xla/service/platform_util.cc:276] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
2020-10-01 14:20:08.762222: I tensorflow/compiler/jit/xla_gpu_device.cc:136] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Internal: no supported devices found for platform CUDA
2020-10-01 14:20:08.762863: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:20:08.763201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:1d:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.665GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-10-01 14:20:08.763263: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-01 14:20:08.763316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-01 14:20:08.763358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-01 14:20:08.763379: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-01 14:20:08.763428: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-01 14:20:08.763480: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-01 14:20:08.763533: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-01 14:20:08.763898: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:20:08.764536: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:20:08.764810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/test_util.py", line 112, in gpu_device_name
    for x in device_lib.list_local_devices():
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/device_lib.py", line 43, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable
>>>

Expected behavior

CUDA working inside WSL2

Actual behavior

All tests which are using CUDA inside WSL Ubuntu are resulting with various CUDA errors - mostly referring to no CUDA devices available.

@benhillis benhillis added the GPU label Oct 1, 2020
@kunc
Copy link

kunc commented Oct 1, 2020

I am having the same issue. Everything was working flawlessly this morning but then I have updated to 20226.1000 from 20221.1000 and it does not work anymore (tried reinstalling nvidia drivers, etc.) with error that all cuda devices are busy or unavailable.

Edit:
After going back to version 20221, everything works again, thus it confirms that the new version caused the problem.

@benhillis
Copy link
Member

Can you share the contents of c:\Windows\System32\lxss\lib?

@dfreelan
Copy link

dfreelan commented Oct 1, 2020

Having same issue. Here's my C:\WINDOWS\System32\lxss\lib.

09/17/2020 01:24 PM 124,664 libcuda.so
09/17/2020 01:24 PM 124,664 libcuda.so.1
09/17/2020 01:24 PM 124,664 libcuda.so.1.1
09/17/2020 01:24 PM 40,980,456 libnvwgf2umx.so

@CarbonPool
Copy link

Oh too bad, I also encountered this problem. I was so happy when wsl worked again in the 20226 version, but cuda couldn’t work. I was left out of the cold. I tried the following solutions, but none of them worked for me.

  1. Reinstall the graphics card driver 460.20.

  2. Recompile cuda dependent environment library.

  3. Uninstall wsl2 and kernel program and reinstall.

@benhillis
Copy link
Member

Interesting, you seem to be missing the libdxcore libraries.

@dfreelan
Copy link

dfreelan commented Oct 1, 2020

I reverted my windows back to the previous version, then reinstalled the 20226 build, and now it looks like this:

09/17/2020 01:24 PM 124,664 libcuda.so
09/17/2020 01:24 PM 124,664 libcuda.so.1
09/17/2020 01:24 PM 124,664 libcuda.so.1.1
09/26/2020 03:32 PM 832,936 libd3d12.so
09/26/2020 03:32 PM 5,115,392 libd3d12core.so
09/26/2020 03:32 PM 25,074,040 libdirectml.so
09/26/2020 03:32 PM 878,768 libdxcore.so
09/17/2020 01:24 PM 40,980,456 libnvwgf2umx.so

@adamfarquhar
Copy link

adamfarquhar commented Oct 1, 2020

I am having the same problem. WIndows 10 build 20226 and Nvidia driver 460.20. It is great to see that it is not just my install. I hope that this can be fixed soon.

And now I can also confirm that it will work if you roll back to the previous build 20221. You can download the (old) iso file from Microsoft and re-install without losing any data.

@jin8495
Copy link

jin8495 commented Oct 1, 2020

Same problem here, Nvidia driver 460.20 and build 20226.

@CarbonPool
Copy link

可以共享c:\ Windows \ System32 \ lxss \ lib的内容吗?

lib_list

@geneing
Copy link

geneing commented Oct 2, 2020

I have the same problem Nvidia driver 460.15, build 20226. It worked with the previous insider build.

@noofaq
Copy link
Author

noofaq commented Oct 2, 2020

Can you share the contents of c:\Windows\System32\lxss\lib?

obraz

Looked into previous Windows version folder too:
obraz

@mitch-at-orika
Copy link

Same problem Nvidia driver 460.20 and build 20226 my contents in lsxx\lib are:
image

@aticie
Copy link

aticie commented Oct 2, 2020

I have the same problem in 20226. My build also contains same 8 files in lxss\lib. But I get cudaErrorDevicesUnavailable.

Is there a way to roll back 20221? Using "Go back to previous version of Windows 10" sends me to 19041.508.

@kunc
Copy link

kunc commented Oct 2, 2020

I have the same problem in 20226. My build also contains same 8 files in lxss\lib. But I get cudaErrorDevicesUnavailable.

Is there a way to roll back 20221? Using "Go back to previous version of Windows 10" sends me to 19041.508.

It worked for me. Are you sure you have went to the 20226 from 20221 - I think it might store only the last version as backup - the option is no longer available for me when I have reset from 20226 to 20221.

@adamfarquhar
Copy link

I have the same problem in 20226. My build also contains same 8 files in lxss\lib. But I get cudaErrorDevicesUnavailable.

Is there a way to roll back 20221? Using "Go back to previous version of Windows 10" sends me to 19041.508.

Yes, you can install 20221 from https://www.microsoft.com/en-us/software-download/windowsinsiderpreviewadvanced

@kivancguckiran
Copy link

kivancguckiran commented Oct 2, 2020

It seems that it is not possible to downgrade windows without losing the apps and files which is not possible for me under these circumstances. Does anyone know another solution for this? Or we wait for Microsoft the fix the problem?

I too have version 10226.

@PRIMA-LAB-IPU
Copy link

Same here.
`$ nvidia-smi.exe
Fri Oct 2 23:54:29 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.15 Driver Version: 460.15 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 45C P5 12W / N/A | 176MiB / 8192MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1752 C+G Insufficient Permissions N/A |
| 0 N/A N/A 2424 C+G ...b3d8bbwe\WinStore.App.exe N/A |
| 0 N/A N/A 3500 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 5536 C+G ...m Files\VcXsrv\vcxsrv.exe N/A |
| 0 N/A N/A 8288 C+G ...batNotificationClient.exe N/A |
| 0 N/A N/A 10104 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 11152 C+G ...qxf38zg5c\Skype\Skype.exe N/A |
| 0 N/A N/A 11512 C+G ...artMenuExperienceHost.exe N/A |
| 0 N/A N/A 11548 C+G ...ekyb3d8bbwe\YourPhone.exe N/A |
| 0 N/A N/A 11832 C+G ...3m\Quick Eye\QuickEye.exe N/A |
| 0 N/A N/A 11996 C+G ...8wekyb3d8bbwe\Cortana.exe N/A |
| 0 N/A N/A 12856 C+G ...5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 13608 C+G ...2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 14484 C+G ...re1.8.0_261\bin\javaw.exe N/A |
| 0 N/A N/A 15152 C+G ...qxf38zg5c\Skype\Skype.exe N/A |
| 0 N/A N/A 15620 C+G ...he8kybcnzg4\app\Slack.exe N/A |
| 0 N/A N/A 16728 C+G ...ropbox\Client\Dropbox.exe N/A |
| 0 N/A N/A 18824 C+G Insufficient Permissions N/A |
| 0 N/A N/A 19316 C+G ...arp.BrowserSubprocess.exe N/A |
| 0 N/A N/A 22372 C+G ...obeNotificationClient.exe N/A |
+-----------------------------------------------------------------------------+`

@ChengyuSheu
Copy link

Thanks, @adamfarquhar. Rollback to version 20201 resolve this issue. Even though some settings are removed, files stay.

@lminer
Copy link

lminer commented Oct 3, 2020

Same problem.

  • WSL2 Ubuntu 20.04
  • driver version 460.20
  • Razer blade advanced 4K 2019
  • RTX 2080 Max Q
  • Windows insider 20226

Rollback to previous version fixes it. For people who want to do it without reinstalling, go to Recovery > restore previous version of windows

@aisensiy
Copy link

aisensiy commented Oct 4, 2020

I have the error remote procedure call failed in the last version, and I have this issue after upgrade. So...when I recovery does it mean I will get the remote procedure call failed back 😿

@sirisian
Copy link

sirisian commented Oct 4, 2020

@kivancguckiran I just joined the insider build so I'm in the same boat. It would probably take like 4 hours, but you could probably revert windows to the previous version (non-insider) maybe then go specifically to 20221. I'm not going to try it and just wait though.

@strarsis
Copy link

strarsis commented Oct 4, 2020

+1, same issue here.

The kernel, driver and other versions are above the required minimum, so CUDA in WSL 2 should work.
However, when running the NVIDIA samples built with make, they always fail to run:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
cat /usr/local/cuda/version.txt
CUDA Version 11.0.228

@bbongcol
Copy link

bbongcol commented Oct 5, 2020

I have the same problem in 20226.

  • WSL2 Ubuntu 18.04
  • Kerver Version 4.19.128
  • driver version 460.20
  • RTX 2060
  • Windows insider 20226
    https://aka.ms/AA9utty

Cuda device query is ok.

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce RTX 2060"
CUDA Driver Version / Runtime Version 11.2 / 10.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 6144 MBytes (6442450944 bytes)
(30) Multiprocessors, ( 64) CUDA Cores/MP: 1920 CUDA Cores
GPU Max Clock rate: 1200 MHz (1.20 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 3145728 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.2, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

But cuda utility does not worked.

[./BlackScholes] - Starting...
GPU Device 0: "GeForce RTX 2060" with compute capability 7.5

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
CUDA error at BlackScholes.cu:116 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_CallResult, OPT_SZ)"

Below is strace log.
BlackScholes_cuda_error_log.zip

@liamhan0905
Copy link

liamhan0905 commented Oct 5, 2020

I also have tensorflow-gpu on WSL2. But I'm getting the error message as shown below.

RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable

Following this link resolved the issue for me! It seems like my issue was also the Windows10 Insider Previews... smh. Simply following "Roll Back Soon After Enabling Insider Previews" section solved it for me (current version: 10.0.20221 Build 20221) and now I can train my model again using tensorflow-gpu. Thank you everyone for the help!

@onomatopellan
Copy link

  • Windows 10 Version 2004 (Build 19041.546)

@strarsis In your case you need to use a Windows Insider build from the Dev Channel (build >=20150). CUDA in WSL2 won't work in build 19041.

@strarsis
Copy link

strarsis commented Oct 5, 2020

@onomatopellan: How long do have I to wait to get this support in stable Windows 10?

@onomatopellan
Copy link

onomatopellan commented Oct 5, 2020

@strarsis This is expected for 21H1 aka April 2021.

@strarsis
Copy link

strarsis commented Oct 5, 2020

@onomatopellan: To use this now, I have to register for Windows Insider, download the ISO - or can I use the Windows updater?
Any downsides to using Windows Insider version like performance or stabilitiy?

@Meeka33
Copy link

Meeka33 commented Oct 5, 2020

This stopped working for me as well. winver 2004 20226 with CUDA. It previously was working until yesterday on previous builds. When will this be fixed? Too many recurring bugs, ready to dump windows

@tadam98
Copy link

tadam98 commented Dec 10, 2020

Updated to Insider version 20270.1 still with 465.12 and all is working well as before.

@tadam98
Copy link

tadam98 commented Dec 10, 2020

@ArieTwigt Not sure how it worked for you as you did not include "experimental" in the .list as in the instructions.

curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list

@tadam98
Copy link

tadam98 commented Dec 15, 2020

Updated to Insider version 20279.1 still with 465.12 and all is working well as before.

@alchemistake
Copy link

Did anyone tried this on a mainline windows build? Other programs on my system does not want to work on insider build.

@serg06
Copy link

serg06 commented Jan 8, 2021

Did anyone tried this on a mainline windows build? Other programs on my system does not want to work on insider build.

I tried it on Nov 24 and there was some issue I couldn't fix, I think it couldn't detect my GPU.

@Cuberick-Orion
Copy link

Thank you all for reporting the status on different OS versions. I was wondering if anyone is currently/have tried version 21286?

I am about to try using CUDA on WSL2 (hence, need to switch to the Dev Channel), turns out that the ISO files are only released for selected versions. I would prefer to stay at a steady one for the moment.

Appreciate any response :)

@onomatopellan
Copy link

onomatopellan commented Feb 24, 2021

@Cuberick-Orion You can follow CUDA known issues to see if there is a known problem. Most problematic build was 20226, any build after that should not be a problem.

@zzjin
Copy link

zzjin commented Mar 5, 2021

Windows insider updated to 21327.1000 cuda broken again. previous insider version work well.

Error message with BlackScholes: CUDA error at ../../common/inc/helper_cuda.h:779 code=35(cudaErrorInsufficientDriver) "cudaGetDeviceCount(&device_count)

After re-install(Custom-Perform a clean install-restart pc) the CUDA driver (wsl ready) all goes well again.

@shkarupa-alex
Copy link

shkarupa-alex commented Mar 6, 2021

Windows insider updated to 21327.1000 cuda broken again. previous insider version work well.

Error message with BlackScholes: CUDA error at ../../common/inc/helper_cuda.h:779 code=35(cudaErrorInsufficientDriver) "cudaGetDeviceCount(&device_count)

+1

@jenatali
Copy link
Member

jenatali commented Mar 6, 2021

Windows insider updated to 21327.1000 cuda broken again. previous insider version work well.

This is called out in the flight notes: https://blogs.windows.com/windows-insider/2021/03/03/announcing-windows-10-insider-preview-build-21327/

Windows Subsystem for Linux (WSL) users who upgrade to this build will be unable to use the GPU Compute feature. We’re working on a fix for this. Users who do a clean install will not be affected.

@ahmadelsallab
Copy link

The following worked for me:

  • Re-install the 465.42 WSL driver (https://developer.nvidia.com/cuda/wsl)
  • Disable and enable the GPU driver from device manager.
    This a known issue by NVIDIA as described in their documentation:https://docs.nvidia.com/cuda/wsl-user-guide/index.html
    "Note:NVIDIA is aware of a specific installation issue reported on mobile platforms with the WIP driver 465.12 posted on 11/16/2020. A known workaround will be to disable and reenable the GPU adapter from device manager at system start. We are working on a fix for this issue and will have an updated driver soon.As an alternative, users may opt to roll back to an earlier driver from device manager driver updates."

@EtienneT
Copy link

CUDA works again in 21332.1000.

@PetrarcaBruto
Copy link

PetrarcaBruto commented Mar 15, 2021

Error on WSL2

Environment: Windows insider program 21332.1000,  NVIDIA Driver 461.72 on Windows (for GeForce GTX 1660 Ti) , WSL2  Ubuntu 20-04,
Error: CUDA error at ../../common/inc/helper_cuda.h:779 code=35(cudaErrorInsufficientDriver) "cudaGetDeviceCount(&device_count)"

After installing and having the NVIDIA driver working on Windows. I followed the NVIDA site https://docs.nvidia.com/cuda/wsl-user-guide/index.html instructions  (adapted for Ubuntu 20-04 because the example uses 18-04)

Steps on Ubuntu-20.04:

apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
sudo apt-get install -y cuda-toolkit-11-2

After compiling and executing the ./BlackScholes example I get (again) the same error described on this thread.
Although there is a comment earlier that it worked again for user EtienneT with the Windows insider release 21332.1000, it didn't work for me. It does the same as with the previous insider release 21327.1000.
Note that Microsoft release notes for 21327 admits than GPU computing won't work (a regression bug), but in the release notes for 21332 MS says it is fixed.
I get the same error on both Windows insider releases.

Situation on Windows 10

However the GPU works on Windows with using Pytorch

Pytorch Windows, Installed with:
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

Code: 
print(torch. cuda. get_device_name(torch. cuda. current_device()))
It prints: GeForce GTX 1660 Ti with Max-Q Design

But tensorflow 2.4.1 on Windows installed with:
pip install tensorflow
It recognizes the GPU but fails downstream with a known problem with Tensorflow for which I cannot find updated info.
Code:
if tf.test.gpu_device_name() != '/device:GPU:0':
  print('WARNING: GPU device not found.')
else:
  print('SUCCESS: Found GPU: {}'.format(tf.test.gpu_device_name()))
  physical_devices = tf.config.list_physical_devices('GPU')
  tf.config.experimental.set_memory_growth(physical_devices[0], True) #thia should solve the cuda solver problem

It prints 'SUCCESS: Found GPU:' etc, but later in the process it has a problem in one of the cuda libraries. Here is the output of the run:

021-03-11 21:06:25.144177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4744 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti with Max-Q Design, pci bus id: 0000:02:00.0, compute capability: 7.5)
2021-03-11 21:06:26.697412: F tensorflow/core/util/cuda_solvers.cc:115] Check failed: cusolverDnCreate(&cusolver_dn_handle) == CUSOLVER_STATUS_SUCCESS Failed to create cuSolverDN instance

Note that for Tensorflow I had to download the CUDA Toolk kit 11.2 and associated Cudadnn libraries.

Any help will be appreciated.
Petrarca Bruto

@kerim371
Copy link

kerim371 commented Apr 30, 2021

What could I do if I Nvidia driver manager can't find appropriate driver for me?
Samsung RC720 notebook
OS build: 21370.1
Basic GPU is Intel, additional Nvidia GeForce GT 520M
According to Nvidia docs my GeForce GT 520M supports CUDA
image

@onomatopellan
Copy link

@kerim371 I'm afraid it won't work.

Note: CUDA on WSL 2 is enabled on GPUs starting with the Kepler architecture

https://docs.nvidia.com/cuda/wsl-user-guide/index.html

@tommyip
Copy link

tommyip commented Jun 11, 2021

I have a Nvidia GeForce GT750m which is based on the Kepler architecture as well as Windows Dev build 21390, the installer still give the same error as @kerim371 above.

@onomatopellan
Copy link

@tommyip mobile Kepler did lose drivers support long ago.

@Josepaezra
Copy link

@kerim371 , @tommyip I was having the same issue, which i resolved installing the drivers via the wsl-ubuntu´s prompt, following the commands shown in https://developer.nvidia.com/cuda-downloads, under Linux>x86_64>WSL-Ubuntu>2.0>deb(network).

@AllardJM
Copy link

I followed the instructions here:https://radiant-brushlands-42789.herokuapp.com/medium.com/swlh/how-to-install-the-nvidia-cuda-toolkit-11-in-wsl2-88292cf4ab77
and then what @Josepaezra added when that didnt work and still TF does not recognize the GPU

@onomatopellan
Copy link

onomatopellan commented Jun 18, 2021

@AllardJM That guide is incomplete. First of all you need to run a Windows build from the Dev channel and install a Windows Nvidia driver with WSL2 CUDA support. After that you will see a device /dev/dxg inside WSL2. That's the GPU.

@AllardJM
Copy link

AllardJM commented Jun 18, 2021 via email

@onomatopellan
Copy link

@AllardJM Which version of TensorFlow did you install? If you see the /dev/dxg device then you only need a GPU version of TensorFlow.

@AllardJM
Copy link

@onomatopellan I have an (empty) file dxg under the dev folder....

The TF version is 2.5.0 from pip install tensorflow

@onomatopellan
Copy link

onomatopellan commented Jun 19, 2021

@AllardJM That means the GPU is already available inside WSL2. Launch python3 interpreter and run:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

The GPU should be recognized although could not be available due to lack of specific cuda libraries.

This is why I always prefer running TensorFlow in Docker since it has all the libraries needed in the container.
docker run -it --rm --runtime=nvidia tensorflow/tensorflow:latest-gpu python

@thusinh1969
Copy link

Windows insider updated to 20262.1
With the update, the regular nvidia driver was re-installed. Not good for wsl2.
Also, nvidia issued a new driver that you should install in windows:
https://developer.nvidia.com/cuda/wsl
The new drive is 465.12
Download and install it.
You must reboot after the install or wsl2 will not see the gpu as of yet.
Start wsl2

$ python
> import tensorflow as tf
>  tf.test.is_gpu_available()
2020-11-19 17:15:13.395676: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
True

Then do the above docker procedures to check that the docker aslo works.
In my case, it works fine.

After doing this, it still didn't work for me.
I was able to get it to work by following it up with this:

  • Installed everything else that Windows Update wanted to install and rebooted

  • Reinstalled the 465.12 driver

    • On the first screen, I selected "Drivers + GeForce Experience"
    • On the second screen, I selected Custom, then pressed Next and checked "Perform a clean install"
  • Uninstalled all my existing Ubuntu installations

  • Restarted Windows

  • Installed Ubuntu from Microsoft store (the one called "Ubuntu" with no version) (it installed Ubuntu 20.04)

  • Restarted Windows

  • Followed these instructions to add the sources to my sources lists, but I modified the URLs to be 2004 instead of 1804:

    • sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
    • sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
    • sudo apt-get update
  • Continued to follow the instructions by installing Nvidia Toolkit 11.1 (not 11.0)

    • sudo apt-get install -y cuda-toolkit-11-1
  • Followed these instructions to test the installation

    • wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

    • bash Miniconda3-latest-Linux-x86_64.sh

      • When it asks whether to init conda, hit yes
    • restart terminal

    • conda config --set auto_activate_base false

    • restart terminal

    • conda create --name directml python=3.6

    • conda activate directml

    • pip install tensorflow-directml

  • Then in Python

    • import tensorflow.compat.v1 as tf
    • tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True))
    • print(tf.add([1.0, 2.0], [3.0, 4.0]))

And there, I finally saw the name of my GPU pop up!
After that I went back and tried the ./BlackScholes test:

  • cd /usr/local/cuda/samples/4_Finance/BlackScholes
  • sudo make
  • ./BlackScholes

And it also worked!

Thanks for posting your steps. I also run Ubuntu in WSL2 and use the Nvidia docker containers as explained in https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-nvidia-drivers I also faced the same problem (nvcr.io/nvidia/tensorflow not working in WSL 2 after the update of Windows Insider version).

From your steps, I only had to re-install the CUDA driver ( https://developer.nvidia.com/cuda/wsl ). In my case I just had to run the installer like I normally did. I didn't even have to restart WSL.

I ran the benchmark container to check if everything works:

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

The benchmark passed and my nvcr.io/nvidia/tensorflow container works again.

Not working with Ubuntu 18.04 (changed params accordingly).
Steve

@PetrarcaBruto
Copy link

Thank you Anh,
Your information looks very useful. Unfortunately I won't be able to try it out. My problem has got worse with the GPU, even on Windows, let alone WSL2. I will describe it here in case it is useful to someone else, or at least to vent some of my frustration with MS:
I have been in the Windows insider program waiting for a solution to the problem described here (no GPU on WSL2). In turns out that the Windows 11 insider update that came out about 8 weeks ago caused that my NVIDIA GPU stopped being recognized on Windows 11.
I documented the problem in the MS Feedback Hub and since then I updated the problem description with the following:
"IMPORTANT UPDATE:
Trying to solve the problem reported by Device Manager with the driver installation (as per the original problem report)" This time, trying to address the "Currently this hardware is not connected to the computer (Code 45) this Device is not connected". I did the following:
1- After selecting properties, I clicked on "install driver" to which I got the reply that the driver was already installed.
2- Then I selected to uninstall the driver (to avoid the message on 1 above) and re-install it again. I didn't get any response to this click but the GPU disappeared from the device list. Now the device is not shown at all, not even clicking "View/Show Hidden Devices"

It seems the problem got worse. If I try to install a device from the NVIDIA site, the first thing the installer does is to check device compatibility and it aborts saying there it didn't detect a compatible device.

Please help. I have had intermittent problems with GPU connection, even in Windows 10."

As you can see, I am very disappointed with the whole NVIDIA GPU experience on Windows. I am still waiting for an update. This issue, and related ones, have more then 4K Up votes in the insider problem list.

Petrarca

@strarsis
Copy link

strarsis commented Dec 1, 2021

With Windows 10 (non-insider) 2021H2 November update, CUDA now works in WSL 2.

@PetrarcaBruto
Copy link

@strarsis You are right. Thanks for the notification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests