Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory error when training best model on imagenet #14

Closed
tremblerz opened this issue Jul 11, 2018 · 3 comments
Closed

Out of memory error when training best model on imagenet #14

tremblerz opened this issue Jul 11, 2018 · 3 comments

Comments

@tremblerz
Copy link

I am using V100 gpu which has 16G memory. Here is the error log-

07/10 07:05:24 PM valid 000 2.609589e+00 47.656250 76.562500
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train_imagenet.py", line 230, in <module>
    main() 
  File "train_imagenet.py", line 152, in main
    valid_acc_top1, valid_acc_top5, valid_obj = infer(valid_queue, model, criterion)
  File "train_imagenet.py", line 214, in infer
    logits, _ = model(input)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/darts/cnn/model.py", line 207, in forward
    s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/darts/cnn/model.py", line 51, in forward
    h1 = op1(h1)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/darts/cnn/operations.py", line 66, in forward
    return self.op(x)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
@quark0
Copy link
Owner

quark0 commented Jul 11, 2018

Hard to tell without more details such as the pytorch version. If you use pytorch 0.4, be sure to wrap the validation scripts into torch.no_grad() as otherwise you would get OOM. I would also try smaller batch sizes and check the memory consumption during training & validation.

@tremblerz
Copy link
Author

Thank you, that solves the issue.

@dragen1860
Copy link

Thanks.
https://github.com/dragen1860/DARTS-PyTorch
Here is the darts version supporting pytorch 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants