Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory occurred while running demo.py #3

Closed
swoook opened this issue Nov 25, 2020 · 7 comments
Closed

RuntimeError: CUDA out of memory occurred while running demo.py #3

swoook opened this issue Nov 25, 2020 · 7 comments
Labels
question Further information is requested

Comments

@swoook
Copy link
Owner

swoook commented Nov 25, 2020

Issue description

  • demo.py fails to run with the error below
RuntimeError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 45.94 MiB free; 9.91 GiB reserved in total by PyTorch)

Code example

  • Command to reproduce the bug:
python demo.py --trained_model /swook/model/dsfd/WIDERFace_DSFD_RES152.pth --widerface_root /swook/dataset/wider-face/WIDER_val --save_folder ./save --visual_threshold 0.1 --cuda CUDA
  • Error messages:
RuntimeError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 45.94 MiB free; 9.91 GiB reserved in total by PyTorch)
  • Whole stack traces:
Traceback (most recent call last):
  File "demo.py", line 222, in <module>
    test_oneimage()
  File "demo.py", line 201, in test_oneimage
    det_b = infer(net , img , transform , thresh , cuda , bt)
  File "demo.py", line 72, in infer
    y = net(x)      # forward pass
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/swook/repos/tencent/dsfd/face_ssd.py", line 240, in forward
    conv5_3_x = self.layer3(conv4_3_x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torchvision/models/resnet.py", line 109, in forward
    out = self.bn3(out)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 79, in forward
    exponential_average_factor, self.eps)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1670, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 45.94 MiB free; 9.91 GiB reserved in total by PyTorch)

System Info

  • PyTorch or Caffe2: PyTorch
  • How you installed PyTorch (conda, pip, source): docker (nvcr.io/nvidia/pytorch)
  • Build command you used (if compiling from source): None
  • OS: Ubuntu 16.04 LTS
  • PyTorch version: 1.4.0
  • Python version: 3.6
  • CUDA/cuDNN version: 10.2
  • GPU models and configuration: 2080 Ti
  • GCC version (if compiling from source): None
  • CMake version: None
  • Versions of any other relevant libraries: None
@swoook
Copy link
Owner Author

swoook commented Nov 25, 2020

  • Refer to the issues below
  1. RuntimeError: CUDA out of memory. · Issue #60 · Tencent/FaceDetection-DSFD (github.com)
  2. RuntimeError: CUDA out of memory · Issue #44 · Tencent/FaceDetection-DSFD (github.com)
  • Recall the required version of PyTorch is 0.3.1
  • However, ours is 1.4.0
  • Tencent/FaceDetection-DSFD uses some deprecated methods
  • Trying to replace them with latest methods

@swoook
Copy link
Owner Author

swoook commented Nov 25, 2020

The suggestions i referred are also too old for the latest version.
Using a docker for torch==0.3.1 would be much easier.

@swoook
Copy link
Owner Author

swoook commented Nov 25, 2020

  • NVIDIA says the nvidia:pytorch for torch==0.3.1 is nvcr.io/nvidia/pytorch:18.04-py3 [here]
  • However, it actually contains torch==0.4.0a0
  • nvcr.io/nvidia/pytorch:18.03-py3 also contains torch==0.4.0a0

@swoook
Copy link
Owner Author

swoook commented Nov 25, 2020

  • NVIDIA says the nvidia:pytorch for torch==0.3.1 is nvcr.io/nvidia/pytorch:18.04-py3 [here]
  • However, it actually contains torch==0.4.0a0
  • nvcr.io/nvidia/pytorch:18.03-py3 also contains torch==0.4.0a0

It seems i have to build it myself.

@swoook
Copy link
Owner Author

swoook commented Nov 25, 2020

  • Refer to the issues below
  1. RuntimeError: CUDA out of memory. · Issue #60 · Tencent/FaceDetection-DSFD (github.com)
  2. RuntimeError: CUDA out of memory · Issue #44 · Tencent/FaceDetection-DSFD (github.com)
  • Recall the required version of PyTorch is 0.3.1
  • However, ours is 1.4.0
  • Tencent/FaceDetection-DSFD uses some deprecated methods
  • Trying to replace them with latest methods

Those suggestions point out the reason correctly.
However, the solutions from those suggestions don't solve this problem.

There's other solution in #6 from Tencent/FaceDetection-DSFD.

@swoook
Copy link
Owner Author

swoook commented Nov 25, 2020

Confirmed this solves the problem.

TODO

  1. Provide the environment for this repo
  2. Support the latest PyTorch

@swoook swoook closed this as completed Nov 25, 2020
@swoook
Copy link
Owner Author

swoook commented Nov 25, 2020

  • Refer to this commit for more details

@swoook swoook added bug Something isn't working question Further information is requested and removed bug Something isn't working labels Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant