Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. #19

Closed
grenaud opened this issue Sep 30, 2020 · 21 comments
Closed

RuntimeError: CUDA out of memory. #19

grenaud opened this issue Sep 30, 2020 · 21 comments

Comments

@grenaud
Copy link

grenaud commented Sep 30, 2020

I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 5.80 GiB total capacity; 4.14 GiB already allocated; 154.56 MiB free; 4.24 GiB reserved in total by PyTorch)

Is there a way to allocate more memory? I do not get why 4.14Gb are already allocated.

@zhangmozhe
Copy link
Contributor

Hi, there are four stages in our processing pipeline. Which stage do you come across this issue?

@grenaud
Copy link
Author

grenaud commented Oct 2, 2020

first stage

@eltechno
Copy link

eltechno commented Oct 2, 2020

@grenaud try to use NVtop, just to get a full view what is using your GPU and how much resource is using

@grenaud
Copy link
Author

grenaud commented Oct 2, 2020

amazing tool! thank you. It seems the program tries to allocate all of my 5.8G of memory. My OS plus application take very little memory. But thank you for the suggestion!

@mebelz
Copy link

mebelz commented Oct 2, 2020

I just ran into the same issue, for me it was -
RuntimeError: CUDA out of memory. Tried to allocate 734.00 MiB (GPU 0; 10.74 GiB total capacity; 7.82 GiB already allocated; 195.75 MiB free; 9.00 GiB reserved in total by PyTorch)

I was able to fix with the following steps:

  1. In run.py I changed test_mode to Scale / Crop to confirm this actually fixes the issue -> the input picture was too large.
  2. I rewrote data_transforms in test.py to scale not to 256px max dimension, but rather to 1.3Mpx total area (seems to be the max capacity of my card).
  3. The for-loop in the end of test.py seems to be leaking GPU memory (1st iteration worked while 2nd, 3rd... didn't). I extracted the loop contents to a new function to let python garbage-collect temporary variables.

I would still love to be able to process full resolution pictures if anyone has a solution.

@manikantanr
Copy link

Facing same issue

Machine

  • Windows 10 (10.0.19041 Build 19041)
  • Intel i7 8700K , 16GB RAM,
  • Nvidia 1060 6GB (Nvidia CUDA 10.1.120 driver)

Command

python run.py --input_folder "C:\Users\meman\Desktop\input"  --output_folder "C:\Users\meman\Desktop\output" --GPU 0 --with_scratch

Error

Running Stage 1: Overall restoration                                                                                                                                                                                   
initializing the dataloader                                                                                                                                                                                            
model weights loaded                                                                                                                                                                                                   
directory of testing image: C:\Users\meman\Desktop\input                                                                                                                                                               
processing 1.jpg                                                                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                                     
  File "detection.py", line 151, in <module>                                                                                                                                                                           
    main(config)                                                                                                                                                                                                       
  File "detection.py", line 123, in main                                                                                                                                                                               
    P = torch.sigmoid(model(scratch_image))                                                                                                                                                                            
  File "C:\Users\meman\Desktop\Bringing-Old-Photos-Back-to-Life\venv\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl                                                                            
    result = self.forward(*input, **kwargs)                                                                                                                                                                            
  File "C:\Users\meman\Desktop\Bringing-Old-Photos-Back-to-Life\Global\detection_models\networks.py", line 115, in forward                                                                                             
    x = self.down_sample[i](x)                                                                                                                                                                                         
  File "C:\Users\meman\Desktop\Bringing-Old-Photos-Back-to-Life\venv\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl                                                                            
    result = self.forward(*input, **kwargs)                                                                                                                                                                            
  File "C:\Users\meman\Desktop\Bringing-Old-Photos-Back-to-Life\venv\lib\site-packages\torch\nn\modules\container.py", line 117, in forward                                                                            
    input = module(input)                                                                                                                                                                                              
  File "C:\Users\meman\Desktop\Bringing-Old-Photos-Back-to-Life\venv\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl                                                                            
    result = self.forward(*input, **kwargs)                                                                                                                                                                            
  File "C:\Users\meman\Desktop\Bringing-Old-Photos-Back-to-Life\venv\lib\site-packages\torch\nn\modules\padding.py", line 170, in forward                                                                              
    return F.pad(input, self.padding, 'reflect')                                                                                                                                                                       
  File "C:\Users\meman\Desktop\Bringing-Old-Photos-Back-to-Life\venv\lib\site-packages\torch\nn\functional.py", line 3569, in _pad                                                                                     
    return torch._C._nn.reflection_pad2d(input, pad)                                                                                                                                                                   
RuntimeError: CUDA out of memory. Tried to allocate 2.35 GiB (GPU 0; 6.00 GiB total capacity; 2.56 GiB already allocated; 2.12 GiB free; 2.57 GiB reserved in total by PyTorch)                                        
You are using NL + Res                                                                                                                                                                                                 
Finish Stage 1 ...                                                                                                                                                                                                     

@absorbguo
Copy link

same error:
GPU: Titan X 12GB cuda memory,still not working
input_image size :~1500 x ~ 2000 pixel
collapse at stage 1
is it possible to modify some params to make it working?

@akenateb
Copy link

akenateb commented Nov 4, 2020

I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 5.80 GiB total capacity; 4.14 GiB already allocated; 154.56 MiB free; 4.24 GiB reserved in total by PyTorch)

Is there a way to allocate more memory? I do not get why 4.14Gb are already allocated.
GPU: 1070Ti
CUDA out of memory. Tried to allocate 570.00 MiB (GPU 0; 14.73 GiB total capacity; 12.22 GiB already allocated; 439.88 MiB free; 13.42 GiB reserved in total by PyTorch)

@akenateb
Copy link

A simply walkaround is to reduce your image size > 640x480. Evenmore if you are trying to use '--with-scratches' this will increase dramatically use of GPU memory. But it is a pitty to have to downscale your photo. With DeOldify it never happens.

@nikdanilov
Copy link

You can also use 14GB of GPU in google colab:
https://colab.research.google.com/drive/1NEm6AsybIiC5TwTU_4DqDkQO0nFRB-uA?usp=sharing#scrollTo=KvqDOPXnLmkl

@nikdanilov
Copy link

nikdanilov commented Nov 22, 2020

same error:
GPU: Titan X 12GB cuda memory,still not working
input_image size :~1500 x ~ 2000 pixel
collapse at stage 1
is it possible to modify some params to make it working?

The collapse happening on the step when it fits the image into the UNET model:
https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life/blob/master/Global/detection.py#L127

I didn't manage to process big images, but it's working in the colab with images whose sizes are ~700px.

@kuznetcoff777
Copy link

kuznetcoff777 commented Feb 14, 2021

You can also use 14GB of GPU in google colab:
https://colab.research.google.com/drive/1NEm6AsybIiC5TwTU_4DqDkQO0nFRB-uA?usp=sharing#scrollTo=KvqDOPXnLmkl

I attempted with subscription and v100 tesla:

Sun Feb 14 10:37:09 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    22W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Your runtime has 27.4 gigabytes of available RAM
You are using a high-RAM runtime!

And still the same issue (even without option --with_scratch):

Running Stage 1: Overall restoration
Now you are processing 1.jpeg
Skip 1.jpeg due to an error:
CUDA out of memory. Tried to allocate 486.00 MiB (GPU 0; 15.78 GiB total capacity; 13.34 GiB already allocated; 308.75 MiB free; 14.28 GiB reserved in total by PyTorch)
Now you are processing 10.jpeg
Skip 10.jpeg due to an error:
CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 15.78 GiB total capacity; 12.76 GiB already allocated; 610.75 MiB free; 13.99 GiB reserved in total by PyTorch)
Now you are processing 12.1.jpeg
Now you are processing 12.2.jpeg
Skip 12.2.jpeg due to an error:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 14.50 GiB already allocated; 16.75 MiB free; 14.57 GiB reserved in total by PyTorch)
Now you are processing 12.3.jpeg
Skip 12.3.jpeg due to an error:
CUDA out of memory. Tried to allocate 262.00 MiB (GPU 0; 15.78 GiB total capacity; 14.04 GiB already allocated; 206.75 MiB free; 14.38 GiB reserved in total by PyTorch)
Now you are processing 12.4.jpeg
Skip 12.4.jpeg due to an error:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 14.47 GiB already allocated; 20.75 MiB free; 14.56 GiB reserved in total by PyTorch)
Now you are processing 12.5.jpeg
Skip 12.5.jpeg due to an error:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 14.51 GiB already allocated; 4.75 MiB free; 14.58 GiB reserved in total by PyTorch)
Now you are processing 13.1.jpeg
Skip 13.1.jpeg due to an error:
CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 15.78 GiB total capacity; 14.30 GiB already allocated; 66.75 MiB free; 14.52 GiB reserved in total by PyTorch)
Now you are processing 13.2.jpeg
Skip 13.2.jpeg due to an error:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 14.49 GiB already allocated; 12.75 MiB free; 14.57 GiB reserved in total by PyTorch)
Now you are processing 14.jpeg
Skip 14.jpeg due to an error:
CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 15.78 GiB total capacity; 12.57 GiB already allocated; 372.75 MiB free; 14.22 GiB reserved in total by PyTorch)
Now you are processing 15.1.jpg
Skip 15.1.jpg due to an error:
CUDA out of memory. Tried to allocate 174.00 MiB (GPU 0; 15.78 GiB total capacity; 14.22 GiB already allocated; 36.75 MiB free; 14.55 GiB reserved in total by PyTorch)
Now you are processing 15.10.jpg
Skip 15.10.jpg due to an error:
CUDA out of memory. Tried to allocate 276.00 MiB (GPU 0; 15.78 GiB total capacity; 14.36 GiB already allocated; 90.75 MiB free; 14.49 GiB reserved in total by PyTorch)
Now you are processing 15.11.jpg
Now you are processing 15.12.jpg
Now you are processing 15.13.jpg
Now you are processing 15.2.jpg
Now you are processing 15.3.jpg
Now you are processing 15.4.jpg
Now you are processing 15.5.jpg
Now you are processing 15.6.jpg
Now you are processing 15.7.jpg
Now you are processing 15.8.jpg
Now you are processing 15.9.jpg
Now you are processing 15.jpeg
Skip 15.jpeg due to an error:
CUDA out of memory. Tried to allocate 8.53 GiB (GPU 0; 15.78 GiB total capacity; 4.90 GiB already allocated; 6.73 GiB free; 7.86 GiB reserved in total by PyTorch)
Now you are processing 2.jpeg
Skip 2.jpeg due to an error:
CUDA out of memory. Tried to allocate 8.53 GiB (GPU 0; 15.78 GiB total capacity; 5.30 GiB already allocated; 6.73 GiB free; 7.86 GiB reserved in total by PyTorch)
Now you are processing 3.jpeg
Skip 3.jpeg due to an error:
CUDA out of memory. Tried to allocate 8.53 GiB (GPU 0; 15.78 GiB total capacity; 5.70 GiB already allocated; 6.73 GiB free; 7.86 GiB reserved in total by PyTorch)
Now you are processing 4.1.jpeg
Skip 4.1.jpeg due to an error:
CUDA out of memory. Tried to allocate 262.00 MiB (GPU 0; 15.78 GiB total capacity; 13.78 GiB already allocated; 82.75 MiB free; 14.50 GiB reserved in total by PyTorch)
Now you are processing 4.2.jpeg
Now you are processing 4.3.jpeg
Skip 4.3.jpeg due to an error:
CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 15.78 GiB total capacity; 14.24 GiB already allocated; 110.75 MiB free; 14.47 GiB reserved in total by PyTorch)
Now you are processing 4.4.jpeg
Now you are processing 4.5.jpeg
Now you are processing 5.1.jpeg
Skip 5.1.jpeg due to an error:
CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 15.78 GiB total capacity; 14.05 GiB already allocated; 106.75 MiB free; 14.48 GiB reserved in total by PyTorch)
Now you are processing 5.2.jpeg
Skip 5.2.jpeg due to an error:
CUDA out of memory. Tried to allocate 164.00 MiB (GPU 0; 15.78 GiB total capacity; 13.95 GiB already allocated; 98.75 MiB free; 14.49 GiB reserved in total by PyTorch)
Now you are processing 5.3.jpeg
Skip 5.3.jpeg due to an error:
CUDA out of memory. Tried to allocate 190.00 MiB (GPU 0; 15.78 GiB total capacity; 14.28 GiB already allocated; 100.75 MiB free; 14.48 GiB reserved in total by PyTorch)
Now you are processing 5.4.jpeg
Skip 5.4.jpeg due to an error:
CUDA out of memory. Tried to allocate 166.00 MiB (GPU 0; 15.78 GiB total capacity; 14.00 GiB already allocated; 88.75 MiB free; 14.50 GiB reserved in total by PyTorch)
Now you are processing 5.5.jpeg
Skip 5.5.jpeg due to an error:
CUDA out of memory. Tried to allocate 174.00 MiB (GPU 0; 15.78 GiB total capacity; 14.07 GiB already allocated; 36.75 MiB free; 14.55 GiB reserved in total by PyTorch)
Now you are processing 5.6.jpeg
Now you are processing 6.1.jpeg
Skip 6.1.jpeg due to an error:
CUDA out of memory. Tried to allocate 342.00 MiB (GPU 0; 15.78 GiB total capacity; 14.21 GiB already allocated; 212.75 MiB free; 14.38 GiB reserved in total by PyTorch)
Now you are processing 6.2.jpeg
Skip 6.2.jpeg due to an error:
CUDA out of memory. Tried to allocate 252.00 MiB (GPU 0; 15.78 GiB total capacity; 12.96 GiB already allocated; 238.75 MiB free; 14.35 GiB reserved in total by PyTorch)
Now you are processing 6.3.jpeg
Now you are processing 6.4.jpeg
Now you are processing 6.5.jpeg
Now you are processing 6.6.jpeg
Now you are processing 6.7.jpeg
Now you are processing 8.1.jpeg
Skip 8.1.jpeg due to an error:
CUDA out of memory. Tried to allocate 194.00 MiB (GPU 0; 15.78 GiB total capacity; 13.78 GiB already allocated; 146.75 MiB free; 14.44 GiB reserved in total by PyTorch)
Now you are processing 8.2.jpeg
Skip 8.2.jpeg due to an error:
CUDA out of memory. Tried to allocate 274.00 MiB (GPU 0; 15.78 GiB total capacity; 13.63 GiB already allocated; 220.75 MiB free; 14.37 GiB reserved in total by PyTorch)
Now you are processing 8.3.jpeg
Skip 8.3.jpeg due to an error:
CUDA out of memory. Tried to allocate 270.00 MiB (GPU 0; 15.78 GiB total capacity; 13.59 GiB already allocated; 234.75 MiB free; 14.35 GiB reserved in total by PyTorch)
Finish Stage 1 ...

How to solve it? I dont want to downscale images...it works with small images, but it is not right, we fix one thing but making worse another.

@Disainer
Copy link

Hello.
I tried to process different images, at different capacities, even went up to 32 GB of GPU memory, but I still get an error.
Image 800x1200

16 GB
CUDA out of memory.
Tried to allocate 13.41 GiB (GPU 0;
14.76 GiB total capacity;
1.32 GiB already allocated;
7.68 GiB free;
6.13 GiB reserved in total by PyTorch)

32 GB
CUDA out of memory.
Tried to allocate 13.41 GiB (GPU 0;
31.75 GiB total capacity;
14.73 GiB already allocated;
10.98 GiB free;
19.54 GiB reserved in total by PyTorch)

But in both cases we see "Tried to allocate 13.41 GiB"
Do you have any ideas on this, or how to fix it?

@ghosal-sg
Copy link

I also got the same error while executing Stage 1 with Input Image Size: 2414*3632

Running Stage 1: Overall restoration
initializing the dataloader
model weights loaded
directory of testing image: /content/photo_restoration/test_images/upload
processing 2.jpg
You are using NL + Res
Now you are processing 2.png
Skip 2.png due to an error:
CUDA out of memory. Tried to allocate 1120.48 GiB (GPU 0; 14.76 GiB total capacity; 10.40 GiB already allocated; 1.23 GiB free; 12.61 GiB reserved in total by PyTorch)

However, it worked when I reduced the longer side of the image to 640 pixels, while maintaining same aspect ratio. Really appreciate the work of the authors. This is indeed an amazing tool, for hobbyist photographers and ML enthusiasts.

@SLiang-liao
Copy link

RTX 3070 has the same error !
python3 run.py --input_folder=input --output_folder=output --GPU=0 --with_scratch
Running Stage 1: Overall restoration
initializing the dataloader
model weights loaded
directory of testing image: /home/slleo_ubuntu/funy/Bringing-Old-Photos-Back-to-Life-master/input
processing 3.jpeg
You are using NL + Res
Now you are processing 3..png
Skip 3..png due to an error:
CUDA out of memory. Tried to allocate 3.44 GiB (GPU 0; 7.79 GiB total capacity; 4.21 GiB already allocated; 292.19 MiB free; 5.84 GiB reserved in total by PyTorch)
Finish Stage 1 ...

@jDavidnet
Copy link

@grenaud try to use NVtop, just to get a full view what is using your GPU and how much resource is using

NVtop doesn't seem to work in windows or windows ubuntu WSL. I'll try later when I boot in ubuntu.

@jDavidnet
Copy link

I'm new to all of this, but I'm wondering if some of the Tensors could be moved to the CPU/ System Memory while other Tensors are processes through the inference ( forward ) flow of the network?

To get debug working in pycharm, I've had to rewrite some of the code to replace call with popen. ( I'm not sure I'm doing it right yet ).

I've also tried to get this all working in a virtual environment venv as per the TensorFlow best practices, but there are several issues with getting each of the sub processes to execute in the same virtual environment. I think I have this finally working, but debugging this might prove impossible in Windows 10. I might have to try to recreate my development environment in Ubuntu 20.04 LTS.

My Machine

  • CPU: AMD Ryzen Threadripper 2950x
  • MB: ASRock Taichi x399
  • RAM: 64GB - 16x4 DDR4-3200-C16
  • GPU(s): 2x NVIDIA GeForce 2080 Ti 11GB
  • SSD(s): 2x 980 Pro 2TB, 1x 970 EVO 2TB

I wanted to get this working on my system, but was willing to upgrade one of my GPUs if it would get passed the memory issues, but it looks like maybe that won't help.

Has anyone else dug into the code? A few of us in this ML Discord group have been trying to hack on this code, with minimal luck. https://discord.gg/2Qq259hE

@Nick-Harvey
Copy link

I'm running this on a ubuntu 20.04 system with a Titan V and and a 3070 (ontop Lambda Stack) and run into the same issue described. This seems to be one fo the biggest things holding this project back and I wish I could help more. I'm getting ready to explore deeper to see if I can find some hanging fruit to reduce the amount of memory consumed at runtime.

@jDavidnet that discord link is no longer active. Would you be so kind to send a new one? I too am interested in trying to resolve this and more than happy to help where I can :)

@jDavidnet
Copy link

jDavidnet commented Jun 23, 2021 via email

@Nick-Harvey
Copy link

Nick-Harvey commented Jun 24, 2021

🥳 Congrats on your newborn! Definitely take some time and enjoy that time with your your family. And thanks for the discord link.

In the meantime, I tested @mebelz item #1 (changed test_mode Scale in run.py) and all the photos that were throwing memory issues processed instantly. I'm going to try and figure out how to implement step 2 and 3 (although still scratching my head to figure out how they went about it). While I'm there I'm also going to try and see if anything mentioned in pytorch/pytorch@47aa253 will help will Full scale mode. I'll report back with my results.

@zhangmozhe
Copy link
Contributor

zhangmozhe commented Jul 12, 2021

Hello, we just redesign the network to support high-resolution images. Welcome to have a try. You can run the code with arguments:

python run.py --input_folder [test_image_folder_path] \
              --output_folder [output_path] \
              --GPU 0 \
              --with_scratch \
              --HR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests