Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cuda runtime error (2) : out of memory at... #15

Open
itsss opened this issue May 6, 2018 · 9 comments
Open

RuntimeError: cuda runtime error (2) : out of memory at... #15

itsss opened this issue May 6, 2018 · 9 comments

Comments

@itsss
Copy link

itsss commented May 6, 2018

Hi NVlabs.
I used edges2handbags dataset according to Github.
After 1,000,000 training Iterations, I Execute this command again, An error message displayed like this.

root@bc5b90e77883:~/model# python train.py --config configs/edges2handbags_folder.yaml
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu line=25 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 5, in
from utils import get_all_data_loaders, prepare_sub_folder,
File "/root/model/utils.py", line 5, in
from torch.utils.serialization import load_lua
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/init.py", line 2, in
from .read_lua_file import load_lua, T7Reader
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 184, in
register_torch_class('Storage', make_storage_reader)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 181, in register_torch_class
reader_registry[cls_name] = reader_factory(cls_name)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 160, in make_storage_reader
element_size = python_class().element_size()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 492, in _lazy_new
_lazy_init()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 161, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu:25

I changed max-iteration 1,000,000 to 100,000 / 10,000 / 1,000 but this error is same.
Anyone know how to solve this problem? please answer to my question. Thank You!

@wzbc-wuchanghao
Copy link

Hi
You can change the display_size in the config file to 4 or 2, and try again.

@itsss
Copy link
Author

itsss commented May 8, 2018

@WuChanghao233 Still... i change display_size in the config file (changed to 2)

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu line=25 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 5, in
from utils import get_all_data_loaders, prepare_sub_folder,
File "/root/model/utils.py", line 5, in
from torch.utils.serialization import load_lua
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/init.py", line 2, in
from .read_lua_file import load_lua, T7Reader
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 184, in
register_torch_class('Storage', make_storage_reader)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 181, in register_torch_class
reader_registry[cls_name] = reader_factory(cls_name)
File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/serialization/read_lua_file.py", line 160, in make_storage_reader
element_size = python_class().element_size()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 492, in _lazy_new
_lazy_init()
File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 161, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524580938250/work/aten/src/THC/THCTensorRandom.cu:25

@Cuiyirui
Copy link

Same problem occurs to me, my GPU has 11GB memory, but it can't train nether either.

@MichinariNukazawa
Copy link

Same problem occurs to me, my GPU is GTX 1050Ti (4GB memory).
I try display_size in the config file to 4 or 2, not solve it.

@visonpon
Copy link

i have also encountered this problem , it seems like the part of write_image script lead to this bug since when i commented it , everything is ok. but i don't know how to fix it when i want to see the result during training @mingyuliutw

@visonpon
Copy link

visonpon commented May 31, 2018

solved it by adding with torch_no_grad:

@Cuky88
Copy link

Cuky88 commented Jul 3, 2018

@visonpon I couldn't fix this. Could you please provide details? I have the same problem and I'm new to torch.

UPDATE: Ok I got it, just read the cmd error output of pytorch, it tells what to do.
Thx.

@niehen6174
Copy link

solved it by adding with torch_no_grad:

Can you tell me where to add it?

@Tahlor
Copy link

Tahlor commented Jan 17, 2019

Adjusting the .yaml to shrink/simplify the network works. I imagine the only other options are to optimize their code for memory usage or get a better GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants