GPU memory #13

SherlockHua1995 · 2021-04-04T12:01:30Z

Hello，thanks for your code.
How much GPU memory is needed for training SETR ?
I have 2 P40 GPU but I cann't start training cus OOM.
Looking forward to your reply.

lzrobots · 2021-04-06T12:00:18Z

Hi 2 * P40 probably are not enough for the task semantic segmentation for example 2*P40 even can’t run the Deeplab v3+ or DANet with ResNet-101.

Following is the minimum resource to run SETR (bs=8) on Cityscapes you can see it is on par with most existing segmentation models.

SETR-Naive-DeiT, 8 * 11.5G
SETR-PUP-DeiT, 8 * 12.8G
SETR-MLA-DeiT, 8 * 12.1G

qianmingduowan · 2021-04-06T12:14:01Z

I tried to train SETR-Naive-DeiT&SETR-MLA-DeiT on 4* TITAN RTX 24g GPU, I set samples_per_gpu=1 on config/SETR/.py，so my batch size is 4. But I can not start training cus OOM. You said SETR-Naive-DeiT, 811.5G SETR-PUP-DeiT, 812.8G SETR-MLA-DeiT, 812.1G . But It is different from my experimental result. How could I do to reduce the memory used?

lzrobots · 2021-04-06T13:47:42Z

we don't have such a problem from our side

lisenbuaa · 2021-04-09T01:57:02Z

RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 6; 23.70 GiB total capacity; 21.91 GiB already allocated; 36.81 MiB free; 22.28 GiB reserved in total by PyTorch)
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/functional.py", line 1605, in log_softmax
ret = input.log_softmax(dim)
RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 23.70 GiB total capacity; 21.91 GiB already allocated; 36.81 MiB free; 22.28 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/ls/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=7', 'configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

When i tried to train your model "SETR_PUP" by the "./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8", i get the above issues even my machine has 8*3090 with 24G. Can you help me to solve it? Thank you.

lzrobots · 2021-04-11T05:33:37Z

@lisenbuaa

try following three variants with DeiT

SETR-Naive-DeiT, 8 * 11.5G
SETR-PUP-DeiT, 8 * 12.8G
SETR-MLA-DeiT, 8 * 12.1G

vijaysamula · 2021-06-26T11:22:37Z

I have a similar issue when training on my own dataset. Always it is CUDA out of memory. I am using 6 GPUs with 12GB (4 GTX 1080TI, 2 RTX 2080TI). Is there any way to train without getting error?

lzrobots closed this as completed Apr 11, 2021

lzrobots mentioned this issue Apr 18, 2021

always CUDA out of memory #24

Closed

wsj20010128 mentioned this issue Aug 31, 2023

RuntimeError: no valid convolution algorithms available in CuDNN #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory #13

GPU memory #13

SherlockHua1995 commented Apr 4, 2021

lzrobots commented Apr 6, 2021 •

edited

qianmingduowan commented Apr 6, 2021

lzrobots commented Apr 6, 2021

lisenbuaa commented Apr 9, 2021

lzrobots commented Apr 11, 2021

vijaysamula commented Jun 26, 2021

GPU memory #13

GPU memory #13

Comments

SherlockHua1995 commented Apr 4, 2021

lzrobots commented Apr 6, 2021 • edited

qianmingduowan commented Apr 6, 2021

lzrobots commented Apr 6, 2021

lisenbuaa commented Apr 9, 2021

lzrobots commented Apr 11, 2021

vijaysamula commented Jun 26, 2021

lzrobots commented Apr 6, 2021 •

edited