-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPINEPS : RuntimeError: CUDA error: out of memory #24
Comments
The first thing I would suggest is use PS:C:\Users\Joshua $ nvidia-smi
Tue Apr 23 13:12:41 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 546.01 Driver Version: 546.01 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1050 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 42C P8 N/A / ERR! | 0MiB / 4096MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+ I would be curious to see what the output of this is immediately prior to the spineps call. Would it be possible to add this to At the very least, this will hopefully help us to isolate the problem, since we should be able to tell if this is due to SPINEPS' memory requirements, or if this is due to other processes using the GPU memory. You may also want to set the following options in the |
Thanks @joshuacwnewton ; this is the information about the GPU And the log fileof the test subject where you can see the output before calling SPINEPS I'm trying again to install the SCT on Compute Canada; I'll let you know how it works |
@NathanMolinier can SPINEPS run on CPU only? |
I don't think it is implemented yet in the command line, but I talked about that to Hendrik and he mentioned that the code is checking if a GPU is available before running so it should not be "too difficult" to change that. I will ask if it's possible to allow the code to run on cpu only. |
Ah!! That at least partially explains things. You have quite a few memory-intensive processes running on your GPU. Normally, I would expect these processes to run on your CPU's integrated GPU instead, thus leaving the deep learning tasks solely for your GTX 750 GPU? I am wondering -- are you SSH'ing into a remote machine and forwarding the display, by chance? (This seems to be a common theme when people talk about Xorg taking up GPU resources in discussions online.) Either way, you may want to look into help posts such as: https://askubuntu.com/questions/1279809/prevent-usr-lib-xorg-xorg-from-using-gpu-memory-in-ubuntu-20-04-server. The end goal is probably something like what is shown in this answer: https://askubuntu.com/a/1313440. Given that your GPU is relatively old (GTX 750 is from 2014) and that your GPU has somewhat limited memory (2GB), you will want to take steps to conserve as much as the memory as you can, so that you are left with as much as possible when running inference on GPU. |
Yes, I'm running the analysis on the server of the institute I wasn't on any other process while running the script... Not sure I can spare GPU memory... |
Won't fix (see #26) |
Following our discussion in issue #21:
After setting the TMPDIR shell variable to the directory where the data is stored, the "batch_processing.sh" script finally starts running and creates a tmp folder. Then, it exits with this error:
After exporting these variables
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True PYTORCH_NO_CUDA_MEMORY_CACHING=1
, I got this new error:I exported again this variable
export=CUDA_LAUNCH_BLOCKING=1
, and I end up with this error:Any insights? Thanks for your help!
The text was updated successfully, but these errors were encountered: