Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory #13

Closed
SherlockHua1995 opened this issue Apr 4, 2021 · 6 comments
Closed

GPU memory #13

SherlockHua1995 opened this issue Apr 4, 2021 · 6 comments

Comments

@SherlockHua1995
Copy link

Hello,thanks for your code.
How much GPU memory is needed for training SETR ?
I have 2 P40 GPU but I cann't start training cus OOM.
Looking forward to your reply.

@lzrobots
Copy link
Contributor

lzrobots commented Apr 6, 2021

Hi 2 * P40 probably are not enough for the task semantic segmentation for example 2*P40 even can’t run the Deeplab v3+ or DANet with ResNet-101.

Following is the minimum resource to run SETR (bs=8) on Cityscapes you can see it is on par with most existing segmentation models.

SETR-Naive-DeiT, 8 * 11.5G
SETR-PUP-DeiT, 8 * 12.8G
SETR-MLA-DeiT, 8 * 12.1G

@qianmingduowan
Copy link

I tried to train SETR-Naive-DeiT&SETR-MLA-DeiT on 4* TITAN RTX 24g GPU, I set samples_per_gpu=1 on config/SETR/.py,so my batch size is 4. But I can not start training cus OOM. You said SETR-Naive-DeiT, 811.5G SETR-PUP-DeiT, 812.8G SETR-MLA-DeiT, 812.1G . But It is different from my experimental result. How could I do to reduce the memory used?

@lzrobots
Copy link
Contributor

lzrobots commented Apr 6, 2021

we don't have such a problem from our side

@lisenbuaa
Copy link

RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 6; 23.70 GiB total capacity; 21.91 GiB already allocated; 36.81 MiB free; 22.28 GiB reserved in total by PyTorch)
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/functional.py", line 1605, in log_softmax
ret = input.log_softmax(dim)
RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 23.70 GiB total capacity; 21.91 GiB already allocated; 36.81 MiB free; 22.28 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/ls/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/ls/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=7', 'configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

When i tried to train your model "SETR_PUP" by the "./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8", i get the above issues even my machine has 8*3090 with 24G. Can you help me to solve it? Thank you.

@lzrobots
Copy link
Contributor

@lisenbuaa

try following three variants with DeiT

SETR-Naive-DeiT, 8 * 11.5G
SETR-PUP-DeiT, 8 * 12.8G
SETR-MLA-DeiT, 8 * 12.1G

@vijaysamula
Copy link

I have a similar issue when training on my own dataset. Always it is CUDA out of memory. I am using 6 GPUs with 12GB (4 GTX 1080TI, 2 RTX 2080TI). Is there any way to train without getting error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants