Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory #79

Open
manyuyuya opened this issue Aug 7, 2018 · 2 comments
Open

out of memory #79

manyuyuya opened this issue Aug 7, 2018 · 2 comments

Comments

@manyuyuya
Copy link

Hello! When I run the train.py, I met the problem about out of memory after a few epoches. It also happened even if I add the number of GPU. And I found some other people met this question ,too. I don't it's reason. Could you offer some help?Thank you very much!
It's the information about the question below:

step 120, image: 005365.jpg, loss: 6.3531, fps: 3.71 (0.27s per batch)
TP: 0.00%, TF: 100.00%, fg/bg=(14/285)
rpn_cls: 0.6417, rpn_box: 0.0229, rcnn_cls: 1.9303, rcnn_box: 0.1354
step 130, image: 009091.jpg, loss: 4.8151, fps: 3.78 (0.26s per batch)
TP: 0.00%, TF: 100.00%, fg/bg=(22/277)
rpn_cls: 0.6486, rpn_box: 0.2012, rcnn_cls: 1.7988, rcnn_box: 0.1184
step 140, image: 008690.jpg, loss: 4.9961, fps: 3.55 (0.28s per batch)
TP: 0.00%, TF: 100.00%, fg/bg=(30/269)
rpn_cls: 0.6114, rpn_box: 0.0690, rcnn_cls: 1.4801, rcnn_box: 0.1088
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 138, in
loss.backward()
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

@jinsnowy
Copy link

try pytorch version 0.3.1 with cudatoolkit 8.0
I used 0.4.1 version either, but had same error (may be gpu memory leak in code). So I downgraded the version of pytorch.

@machanic
Copy link

machanic commented Oct 5, 2018

I think the memory leak due to RoI pooling layer, because when I copy the code of RoI pooling layer to my another project. It also memory leak on GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants