RuntimeError: cuda runtime error (2) : out of memory at... #15

itsss · 2018-05-06T05:34:15Z

Hi NVlabs.
I used edges2handbags dataset according to Github.
After 1,000,000 training Iterations, I Execute this command again, An error message displayed like this.

root@bc5b90e77883:~/model# python train.py --config configs/edges2handbags_folder.yaml
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu line=25 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 5, in
from utils import get_all_data_loaders, prepare_sub_folder,
File "/root/model/utils.py", line 5, in
from torch.utils.serialization import load_lua
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/init.py", line 2, in
from .read_lua_file import load_lua, T7Reader
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 184, in
register_torch_class('Storage', make_storage_reader)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 181, in register_torch_class
reader_registry[cls_name] = reader_factory(cls_name)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 160, in make_storage_reader
element_size = python_class().element_size()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 492, in _lazy_new
_lazy_init()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 161, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu:25

I changed max-iteration 1,000,000 to 100,000 / 10,000 / 1,000 but this error is same.
Anyone know how to solve this problem? please answer to my question. Thank You!

wzbc-wuchanghao · 2018-05-07T04:45:44Z

Hi
You can change the display_size in the config file to 4 or 2, and try again.

itsss · 2018-05-08T01:22:27Z

@WuChanghao233 Still... i change display_size in the config file (changed to 2)

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu line=25 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 5, in
from utils import get_all_data_loaders, prepare_sub_folder,
File "/root/model/utils.py", line 5, in
from torch.utils.serialization import load_lua
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/init.py", line 2, in
from .read_lua_file import load_lua, T7Reader
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 184, in
register_torch_class('Storage', make_storage_reader)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 181, in register_torch_class
reader_registry[cls_name] = reader_factory(cls_name)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 160, in make_storage_reader
element_size = python_class().element_size()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 492, in _lazy_new
_lazy_init()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 161, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu:25

Cuiyirui · 2018-05-18T14:38:53Z

Same problem occurs to me, my GPU has 11GB memory, but it can't train nether either.

MichinariNukazawa · 2018-05-19T15:14:26Z

Same problem occurs to me, my GPU is GTX 1050Ti (4GB memory).
I try display_size in the config file to 4 or 2, not solve it.

visonpon · 2018-05-22T02:31:50Z

i have also encountered this problem , it seems like the part of write_image script lead to this bug since when i commented it , everything is ok. but i don't know how to fix it when i want to see the result during training @mingyuliutw

visonpon · 2018-05-31T07:49:17Z

solved it by adding with torch_no_grad:

Cuky88 · 2018-07-03T17:45:57Z

@visonpon I couldn't fix this. Could you please provide details? I have the same problem and I'm new to torch.

UPDATE: Ok I got it, just read the cmd error output of pytorch, it tells what to do.
Thx.

niehen6174 · 2018-11-26T14:11:32Z

solved it by adding with torch_no_grad:

Can you tell me where to add it?

Tahlor · 2019-01-17T21:34:04Z

Adjusting the .yaml to shrink/simplify the network works. I imagine the only other options are to optimize their code for memory usage or get a better GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuda runtime error (2) : out of memory at... #15

RuntimeError: cuda runtime error (2) : out of memory at... #15

itsss commented May 6, 2018 •

edited

Loading

wzbc-wuchanghao commented May 7, 2018

itsss commented May 8, 2018

Cuiyirui commented May 18, 2018

MichinariNukazawa commented May 19, 2018

visonpon commented May 22, 2018

visonpon commented May 31, 2018 •

edited

Loading

Cuky88 commented Jul 3, 2018 •

edited

Loading

niehen6174 commented Nov 26, 2018

Tahlor commented Jan 17, 2019

RuntimeError: cuda runtime error (2) : out of memory at... #15

RuntimeError: cuda runtime error (2) : out of memory at... #15

Comments

itsss commented May 6, 2018 • edited Loading

wzbc-wuchanghao commented May 7, 2018

itsss commented May 8, 2018

Cuiyirui commented May 18, 2018

MichinariNukazawa commented May 19, 2018

visonpon commented May 22, 2018

visonpon commented May 31, 2018 • edited Loading

Cuky88 commented Jul 3, 2018 • edited Loading

niehen6174 commented Nov 26, 2018

Tahlor commented Jan 17, 2019

itsss commented May 6, 2018 •

edited

Loading

visonpon commented May 31, 2018 •

edited

Loading

Cuky88 commented Jul 3, 2018 •

edited

Loading