Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

got the error: out of memory ,when invoke cuda in wsl2. #8447

Closed
1 of 2 tasks
before31 opened this issue May 26, 2022 · 23 comments
Closed
1 of 2 tasks

got the error: out of memory ,when invoke cuda in wsl2. #8447

before31 opened this issue May 26, 2022 · 23 comments

Comments

@before31
Copy link

before31 commented May 26, 2022

Version

Microsoft Windows [版本 10.0.19044.1706]

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.10.102.1

Distro Version

Ubuntu 20.04

Other Software

nvidia driver (on Windows), version: 512.77
cuda (installed in wsl2, only the cuda tookit, not reinstall nvidia driver in wsl2 ),version:10.2 or 11.6, both are tried, the same error.

Repro Steps

  1. setup Windows10 21H2
  2. enabled wsl2
  3. install the latest nvidia driver, version: 512.77 (I have 4*1080ti GPUs in this machine)
  4. install cuda tookit in wsl2, by the instruction: here
  5. In wsl2(Ubuntu20.04), run a java application that invoke cuda through JNA. In fact, any cuda apps would get the same error. This application is only a sample app to simplify the problem reproduction process. It looks like this:

public static void main(String[] args) {
System.setProperty("jna.debug_load", "true");
int[] deviceCount = new int[1];
int result = WxCudaLibrary.INSTANCE.cudaGetDeviceCount(deviceCount);
if (result == 0) {
System.out.println("gpu check success:" + deviceCount[0]);
} else {
String msg = WxCudaLibrary.INSTANCE.cudaGetErrorString(result);
System.out.println("gpu check failed:" + result + ",msg:" + msg);
}
}

  1. got the error: gpu check failed:2,msg:out of memory
    The same application runs well on Windows (Changed the library name).

Expected Behavior

I can invoke cuda in wsl2 normally.

Actual Behavior

  • Any cuda apps got the same error: out of memory.
  • In wsl2, the nvidia-smi program got:
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 510.68.02 Driver Version: 512.77 CUDA Version: 11.6 |
    |-------------------------------+----------------------+----------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |===============================+======================+======================|
    | 0 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A |
    | 23% 33C P8 10W / 250W | 541MiB / 11264MiB | 0% Default |
    | | | N/A |
    +-------------------------------+----------------------+----------------------+
    | 1 NVIDIA GeForce ... On | 00000000:03:00.0 Off | N/A |
    | 23% 29C P8 11W / 250W | 0MiB / 11264MiB | 0% Default |
    | | | N/A |
    +-------------------------------+----------------------+----------------------+
    | 2 NVIDIA GeForce ... On | 00000000:82:00.0 Off | N/A |
    | 23% 28C P8 10W / 250W | 0MiB / 11264MiB | 0% Default |
    | | | N/A |
    +-------------------------------+----------------------+----------------------+
    | 3 NVIDIA GeForce ... On | 00000000:83:00.0 Off | N/A |
    | 23% 27C P8 10W / 250W | 11MiB / 11264MiB | 0% Default |
    | | | N/A |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    |=============================================================================|
    | No running processes found |
    +-----------------------------------------------------------------------------+

Diagnostic Logs

No response

@before31
Copy link
Author

Is any one here?

@OneBlue
Copy link
Collaborator

OneBlue commented May 31, 2022

/logs

@ghost
Copy link

ghost commented May 31, 2022

Hello! Could you please provide more logs to help us better diagnose your issue?

To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The scipt will output the path of the log file once done.

Once completed please upload the output files to this Github issue.

Click here for more info on logging

Thank you!

@before31
Copy link
Author

before31 commented Jun 1, 2022

Here is the logs. Thanks.

WslLogs-2022-06-01_10-08-04.zip

@ghost ghost removed the needs-author-feedback label Jun 1, 2022
@before31
Copy link
Author

before31 commented Jun 2, 2022

In addition, I need to add that my WDDM version is 1.3. I don't know if it has any impact. In my mind, the WDDM version should be changed to 3.x after I install the latest NVIDIA driver.

image

@before31
Copy link
Author

before31 commented Jun 6, 2022

Is there anything new?

@OneBlue
Copy link
Collaborator

OneBlue commented Jun 7, 2022

Thanks for the logs @before31. Once you get this out of memory error, can you also share the output of dmesg inside WSL ?

@before31
Copy link
Author

before31 commented Jun 8, 2022

  1. I run my cudaapp and got the out of memory error.
  2. I executed the dmesg >dmesg.log command inside the WSL. Here is the output:
    dmesg.log

Thanks. @OneBlue

@ghost ghost removed the needs-author-feedback label Jun 8, 2022
@before31
Copy link
Author

Any updated?

@xiaojinyu-hhu
Copy link

How do you fix this problem?

@before31
Copy link
Author

before31 commented Aug 9, 2022

How do you fix this problem?

The problem remains unresolved.

@Abdullah-Aldosari
Copy link

What is the PC/workstation that you are using?

@xiaojinyu-hhu
Copy link

CPU
Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
Supermicro X12DPG-OA6
Directx 12
No TPM2.0 (so this PC cannot update to win11)
System
Window 10 22H2 19045.1865

The hardware may not support it.

@Abdullah-Aldosari
Copy link

I am using a Dell precision tower with windows 11 and with 4 quadro rtx 4000 and I am running into the same issue.

@CanisLupus518
Copy link

I am experiencing the same issue with windows 11, i7-12700KF, RTX 3070-Ti. Graphs for used memory only reach about half available GPU RAM, before the
RuntimeError: CUDA error: out of memory

@thoj
Copy link

thoj commented Sep 2, 2022

This has something to do with pin_memory on my system with Pytorch. Once i set pin_memory=False i can use all the memory on the GPU. Unfortunately this also reduces the performance quite a bit.

Probably something to do with this:
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations-for-linux-cuda-apps

Hope a fix is possible.

@maloletnik
Copy link

Having the same issue, docker under WSL.
Also when setting pin_memory=False the CUDA memory error goes away, but the training process is much slower.

@QCHighing
Copy link

Having the same issue, docker under WSL.
Need to set pin_memory=False for the dataloader of pytorch.

@ltorres6
Copy link

ltorres6 commented Oct 6, 2022

I have the same pinned memory issue with Ubuntu 18.04 and Cupy. It seems that I can only set a small amount of memory as pinned memory (much smaller than RAM and VRAM). Ulimit states 64 kilobytes is my maximum locked memory, but I can't seem to change this setting to test if this is my problem. Does anyone else have a workaround? Not being able to pin memory makes for a huge performance hit.

@satoshi-ikehata
Copy link

Same problem here. I got an error of "out of gpu memory" when using pin_memory=True of PyTorch on WSL2 (Windows11).
pin_memory=False resolved the error but sacrificed the performance gain

@al1enjesus
Copy link

Hi all!
Had the same problem! Also OOM, although there should be enough video memory.
From the script from here, replacing cpu with cuda:0 you can see that WSL won't allow available video memory to be allocated.
huggingface/diffusers#807 (comment)

Took me three weekends to solve it. Tried different versions of fixes: built Cuda from scratch, tried different Ubuntu versions, different Windows versions.

In the end I swapped Windows 10 -> Windows 11 Pro
11.0.22621 Build 22621
But, even with windows 11 with wsl installed and Ubuntu 22.04 distribution there was this bug.
Installing exactly Ubuntu 20.04 + Windows 11 helped. The driver is installed in windows, on ubuntu I use miniconda, without any manual cuda setups.
Ubuntu from here: https://www.microsoft.com/store/productId/9MTTCL66CPXJ.
Be sure to update wsl and ubuntu, and also reboot the entire system at the end.

Yeah, it's not normal to have to do rain dances to work properly under WSL, so it needs a fix. But if you need it urgently, I said what worked for me.

@MrWong99
Copy link

Got the same issue today with https://github.com/ggerganov/whisper.cpp

WSL 2
Nvidia RTX 4090
Arch Linux
cuda 12.5.0-1

@LordMilutin
Copy link

I have the same issue. There is no fix for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests