CUDA out of memory #180

karimfayed · 2021-03-31T12:31:20Z

I did run the fine-tuning scripts in a virtual environment and it worked. Later on, I created a new virtual environment and when i run the model again the following error keeps popping out:

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.28 GiB already allocated; 4.55 MiB free; 1.28 GiB reserved in total by PyTorch)

Note: batch size is 1
The fine-tuning Script: https://gist.github.com/jiahao87/50cec29725824da7ff6dd9314b53c4b3

JingqingZ · 2021-03-31T22:52:33Z

I didn't see any issue of your code after a quick scan. Some suggestion for your consideration:

GPU is recommended to have 16 GB memory or more.
Reduce max input length or max target length to reduce memory cost.
Model parallel: https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html

karrtikiyer · 2021-04-01T12:15:18Z

JingqingZ · 2021-04-01T18:27:18Z

K80 can be struggling on this from my experience.

karimfayed · 2021-04-01T18:41:23Z

2. put length or max target length to reduce memory cost.

How can I reduce max input length or max target length ? Do you mean by max input length the number of articles in the training dataset?

karrtikiyer · 2021-04-02T02:15:22Z

@JingqingZ : Instead of K80, do you have any recommended hardware configuration which you suggest will work fine for pegasus-large fine tuning?

JingqingZ · 2021-04-02T22:06:42Z

put length or max target length to reduce memory cost.

How can I reduce max input length or max target length ? Do you mean by max input length the number of articles in the training dataset?

You may truncate the input text (and target text) into shorter length, for example 256 tokens for input text instead of 512 or 1024 tokens.

JingqingZ · 2021-04-02T22:08:07Z

@JingqingZ : Instead of K80, do you have any recommended hardware configuration which you suggest will work fine for pegasus-large fine tuning?

V100 16 GB (or 32 GB) works fine for me. Or you may try TPU v2 or v3.

karimfayed · 2021-04-05T08:28:22Z

put length or max target length to reduce memory cost.

How can I reduce max input length or max target length ? Do you mean by max input length the number of articles in the training dataset?

You may truncate the input text (and target text) into shorter length, for example 256 tokens for input text instead of 512 or 1024 tokens.

I was going to do it but I thought to try the code on COLAB first and it worked great the first time, but it stopped after 1000 epochs due to me not having enough disk space. When I tried it again on another day with the same dataset this error pops up. Is it because the GPU's provided by COLAB vary from time to time or is there something else I don't see?

JingqingZ · 2021-04-06T15:54:20Z

I am sorry this particular issue is out of the scope of my knowledge.

karimfayed · 2021-04-27T00:02:57Z

I am sorry this particular issue is out of the scope of my knowledge.

Thank you for your previous tip about using 16 GB of memory.
I subscribed to Colab Pro and it worked fine as it provided me with enough disk space and GPU memory of 16 GB.
It turns out free Colab allocate resources according to previous usage and other variables which makes it hard to repeat the process as you will rarely get the same resources twice in a row.

karimfayed closed this as completed Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #180

CUDA out of memory #180

karimfayed commented Mar 31, 2021

JingqingZ commented Mar 31, 2021 •

edited

Loading

karrtikiyer commented Apr 1, 2021

JingqingZ commented Apr 1, 2021

karimfayed commented Apr 1, 2021

karrtikiyer commented Apr 2, 2021

JingqingZ commented Apr 2, 2021

JingqingZ commented Apr 2, 2021

karimfayed commented Apr 5, 2021

JingqingZ commented Apr 6, 2021

karimfayed commented Apr 27, 2021

CUDA out of memory #180

CUDA out of memory #180

Comments

karimfayed commented Mar 31, 2021

JingqingZ commented Mar 31, 2021 • edited Loading

karrtikiyer commented Apr 1, 2021

JingqingZ commented Apr 1, 2021

karimfayed commented Apr 1, 2021

karrtikiyer commented Apr 2, 2021

JingqingZ commented Apr 2, 2021

JingqingZ commented Apr 2, 2021

karimfayed commented Apr 5, 2021

JingqingZ commented Apr 6, 2021

karimfayed commented Apr 27, 2021

JingqingZ commented Mar 31, 2021 •

edited

Loading