got the error: out of memory ,when invoke cuda in wsl2. #8447

before31 · 2022-05-26T03:02:22Z

Version

Microsoft Windows [版本 10.0.19044.1706]

WSL Version

WSL 2
WSL 1

Kernel Version

5.10.102.1

Distro Version

Ubuntu 20.04

Other Software

nvidia driver (on Windows), version: 512.77
cuda (installed in wsl2, only the cuda tookit, not reinstall nvidia driver in wsl2 ),version:10.2 or 11.6, both are tried, the same error.

Repro Steps

setup Windows10 21H2
enabled wsl2
install the latest nvidia driver, version: 512.77 (I have 4*1080ti GPUs in this machine)
install cuda tookit in wsl2, by the instruction: here
In wsl2(Ubuntu20.04), run a java application that invoke cuda through JNA. In fact, any cuda apps would get the same error. This application is only a sample app to simplify the problem reproduction process. It looks like this:

public static void main(String[] args) {
System.setProperty("jna.debug_load", "true");
int[] deviceCount = new int[1];
int result = WxCudaLibrary.INSTANCE.cudaGetDeviceCount(deviceCount);
if (result == 0) {
System.out.println("gpu check success:" + deviceCount[0]);
} else {
String msg = WxCudaLibrary.INSTANCE.cudaGetErrorString(result);
System.out.println("gpu check failed:" + result + ",msg:" + msg);
}
}

got the error: gpu check failed:2,msg:out of memory
The same application runs well on Windows (Changed the library name).

Expected Behavior

I can invoke cuda in wsl2 normally.

Actual Behavior

Any cuda apps got the same error: out of memory.
In wsl2, the nvidia-smi program got:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A |
| 23% 33C P8 10W / 250W | 541MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:03:00.0 Off | N/A |
| 23% 29C P8 11W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... On | 00000000:82:00.0 Off | N/A |
| 23% 28C P8 10W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... On | 00000000:83:00.0 Off | N/A |
| 23% 27C P8 10W / 250W | 11MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Diagnostic Logs

No response

before31 · 2022-05-31T05:33:27Z

Is any one here?

OneBlue · 2022-05-31T21:32:50Z

/logs

ghost · 2022-05-31T21:32:52Z

Hello! Could you please provide more logs to help us better diagnose your issue?

To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The scipt will output the path of the log file once done.

Once completed please upload the output files to this Github issue.

Click here for more info on logging

Thank you!

before31 · 2022-06-01T02:11:17Z

Here is the logs. Thanks.

WslLogs-2022-06-01_10-08-04.zip

before31 · 2022-06-02T04:12:36Z

In addition, I need to add that my WDDM version is 1.3. I don't know if it has any impact. In my mind, the WDDM version should be changed to 3.x after I install the latest NVIDIA driver.

before31 · 2022-06-06T01:50:07Z

Is there anything new?

OneBlue · 2022-06-07T21:18:12Z

Thanks for the logs @before31. Once you get this out of memory error, can you also share the output of dmesg inside WSL ?

before31 · 2022-06-08T05:45:59Z

I run my cudaapp and got the out of memory error.
I executed the dmesg >dmesg.log command inside the WSL. Here is the output:
dmesg.log

Thanks. @OneBlue

before31 · 2022-06-27T09:42:03Z

Any updated?

xiaojinyu-hhu · 2022-08-09T05:59:27Z

How do you fix this problem?

before31 · 2022-08-09T06:02:08Z

How do you fix this problem?

The problem remains unresolved.

Abdullah-Aldosari · 2022-08-11T05:52:35Z

What is the PC/workstation that you are using?

xiaojinyu-hhu · 2022-08-11T06:08:23Z

CPU
Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
Supermicro X12DPG-OA6
Directx 12
No TPM2.0 (so this PC cannot update to win11)
System
Window 10 22H2 19045.1865

The hardware may not support it.

Abdullah-Aldosari · 2022-08-11T06:38:55Z

I am using a Dell precision tower with windows 11 and with 4 quadro rtx 4000 and I am running into the same issue.

CanisLupus518 · 2022-08-31T21:42:41Z

I am experiencing the same issue with windows 11, i7-12700KF, RTX 3070-Ti. Graphs for used memory only reach about half available GPU RAM, before the
RuntimeError: CUDA error: out of memory

thoj · 2022-09-02T13:33:47Z

This has something to do with pin_memory on my system with Pytorch. Once i set pin_memory=False i can use all the memory on the GPU. Unfortunately this also reduces the performance quite a bit.

Probably something to do with this:
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations-for-linux-cuda-apps

Hope a fix is possible.

maloletnik · 2022-09-19T08:18:50Z

Having the same issue, docker under WSL.
Also when setting pin_memory=False the CUDA memory error goes away, but the training process is much slower.

QCHighing · 2022-10-02T03:55:17Z

Having the same issue, docker under WSL.
Need to set pin_memory=False for the dataloader of pytorch.

ltorres6 · 2022-10-06T04:51:26Z

I have the same pinned memory issue with Ubuntu 18.04 and Cupy. It seems that I can only set a small amount of memory as pinned memory (much smaller than RAM and VRAM). Ulimit states 64 kilobytes is my maximum locked memory, but I can't seem to change this setting to test if this is my problem. Does anyone else have a workaround? Not being able to pin memory makes for a huge performance hit.

satoshi-ikehata · 2022-10-13T09:38:57Z

Same problem here. I got an error of "out of gpu memory" when using pin_memory=True of PyTorch on WSL2 (Windows11).
pin_memory=False resolved the error but sacrificed the performance gain

al1enjesus · 2022-12-17T18:32:36Z

Hi all!
Had the same problem! Also OOM, although there should be enough video memory.
From the script from here, replacing cpu with cuda:0 you can see that WSL won't allow available video memory to be allocated.
huggingface/diffusers#807 (comment)

Took me three weekends to solve it. Tried different versions of fixes: built Cuda from scratch, tried different Ubuntu versions, different Windows versions.

In the end I swapped Windows 10 -> Windows 11 Pro
11.0.22621 Build 22621
But, even with windows 11 with wsl installed and Ubuntu 22.04 distribution there was this bug.
Installing exactly Ubuntu 20.04 + Windows 11 helped. The driver is installed in windows, on ubuntu I use miniconda, without any manual cuda setups.
Ubuntu from here: https://www.microsoft.com/store/productId/9MTTCL66CPXJ.
Be sure to update wsl and ubuntu, and also reboot the entire system at the end.

Yeah, it's not normal to have to do rain dances to work properly under WSL, so it needs a fix. But if you need it urgently, I said what worked for me.

MrWong99 · 2024-05-30T22:06:15Z

Got the same issue today with https://github.com/ggerganov/whisper.cpp

WSL 2
Nvidia RTX 4090
Arch Linux
cuda 12.5.0-1

LordMilutin · 2024-06-28T14:30:22Z

I have the same issue. There is no fix for this?

ghost added the needs-author-feedback label May 31, 2022

ghost removed the needs-author-feedback label Jun 1, 2022

OneBlue added the needs-author-feedback label Jun 7, 2022

ghost removed the needs-author-feedback label Jun 8, 2022

before31 mentioned this issue Aug 9, 2022

invoking cuda in container, got the error: out of memory. docker/for-win#12733

Closed

3 tasks

Thomas-MMJ mentioned this issue Oct 12, 2022

Dreambooth doesn't train on 8GB huggingface/diffusers#807

Closed

strint mentioned this issue Feb 10, 2023

test_pipelines_oneflow_graph_load out of host memory error in WSL siliconflow/onediff#95

Closed

ekiwi111 mentioned this issue Feb 23, 2023

RuntimeError: CUDA error: out of memory | WSL2 | RTX 3090 | OPT-6.7B FMInference/FlexGen#47

Closed

Priestru mentioned this issue May 2, 2023

WSL: CUDA error 2 at ggml-cuda.cu:359: out of memory (Fix found) ggerganov/llama.cpp#1230

Closed

coder543 mentioned this issue Aug 31, 2023

(WSL2) RuntimeError: CUDA failed with error out of memory SYSTRAN/faster-whisper#442

Open

sunmy2019 mentioned this issue Oct 28, 2023

WSL2 Ubuntu下训练失败，Windows下就没问题。 RVC-Project/Retrieval-based-Voice-Conversion-WebUI#1482

Closed

microsoft-github-policy-service bot closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

got the error: out of memory ,when invoke cuda in wsl2. #8447

got the error: out of memory ,when invoke cuda in wsl2. #8447

before31 commented May 26, 2022 •

edited

Loading

before31 commented May 31, 2022

OneBlue commented May 31, 2022

ghost commented May 31, 2022

before31 commented Jun 1, 2022

before31 commented Jun 2, 2022

before31 commented Jun 6, 2022

OneBlue commented Jun 7, 2022

before31 commented Jun 8, 2022

before31 commented Jun 27, 2022

xiaojinyu-hhu commented Aug 9, 2022

before31 commented Aug 9, 2022

Abdullah-Aldosari commented Aug 11, 2022

xiaojinyu-hhu commented Aug 11, 2022

Abdullah-Aldosari commented Aug 11, 2022

CanisLupus518 commented Aug 31, 2022

thoj commented Sep 2, 2022

maloletnik commented Sep 19, 2022

QCHighing commented Oct 2, 2022

ltorres6 commented Oct 6, 2022 •

edited

Loading

satoshi-ikehata commented Oct 13, 2022

al1enjesus commented Dec 17, 2022

MrWong99 commented May 30, 2024

LordMilutin commented Jun 28, 2024

got the error: out of memory ,when invoke cuda in wsl2. #8447

got the error: out of memory ,when invoke cuda in wsl2. #8447

Comments

before31 commented May 26, 2022 • edited Loading

Version

WSL Version

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

before31 commented May 31, 2022

OneBlue commented May 31, 2022

ghost commented May 31, 2022

before31 commented Jun 1, 2022

before31 commented Jun 2, 2022

before31 commented Jun 6, 2022

OneBlue commented Jun 7, 2022

before31 commented Jun 8, 2022

before31 commented Jun 27, 2022

xiaojinyu-hhu commented Aug 9, 2022

before31 commented Aug 9, 2022

Abdullah-Aldosari commented Aug 11, 2022

xiaojinyu-hhu commented Aug 11, 2022

Abdullah-Aldosari commented Aug 11, 2022

CanisLupus518 commented Aug 31, 2022

thoj commented Sep 2, 2022

maloletnik commented Sep 19, 2022

QCHighing commented Oct 2, 2022

ltorres6 commented Oct 6, 2022 • edited Loading

satoshi-ikehata commented Oct 13, 2022

al1enjesus commented Dec 17, 2022

MrWong99 commented May 30, 2024

LordMilutin commented Jun 28, 2024

before31 commented May 26, 2022 •

edited

Loading

ltorres6 commented Oct 6, 2022 •

edited

Loading