New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory not being deallocated in backward() #18643
Comments
after first |
That definitely explains the bulk of the memory usage, however it doesn鈥檛 explain the increase in memory usage. This is more at the root of the issue, and I may have chosen a bad title. If you look at the peak usage, it is higher by about 40MB in the second pass. In the model I was training when I discovered this it was more exaggerated, being almost 1GB higher. I鈥檝e checked and double checked that no tensors are staying referenced accidentally and not being garbage collected. In fact, I tried the same pattern as the reproduction gists where I don鈥檛 return anything at all, and use del on all tensors. Still runs about 1GB higher in the second iteration onward for that architecture. Nothing is being stored internally within the model鈥檚 child Also should note that no momentum was being used in any of the models I鈥檝e tested. All have been |
cc @malvika2147 |
@mdlockyer Can you please add the gists again, the links are currently broken. |
@malvika2147 They should be back up. Sorry about that. Finally got around to changing my old username which breaks all those links |
I encounter the same problem, and memory is about 8GB higher when executing the second loss.backward(). I do not know why. |
@KaiQiao1992 it may be worth noting in this discussion that if you are using adaptive optimizers like Adam, there are a lot of buffers being created under the hood. They are very memory hungry. Adam creates two buffers that are of equal size to the weights being optimized(so in memory, it鈥檚 model size x3) and if |
@KaiQiao1992 8GB sounds steep for optimizer buffers though. That is a significant amount. Hopefully you鈥檙e able to figure it out. This may help. It鈥檚 a memory profiler for PyTorch. I haven鈥檛 tested it out, but it could be of use. |
@mdlockyer I indeed used the Adam optimizer. Strangely, after rebooting the machine, i dno not encounter the "out of memory" again, though using the same Adam. because my fc layer is the size of 800000*1000, the consume of memory is big. |
@KaiQiao1992 that鈥檚 huge!! 800M parameters! My biggest model was 45M and I thought that was gigantic. Glad you鈥檙e not getting the OOM errors now though. |
When I add https://gist.github.com/prasunanand/0926fe1ea453a785c967d2c444a22402 |
If that's true, swapping |
@prasunanand that's interesting. I'll test your gist on my end when I get a chance. @ezyang In my reproduction, I have a call to |
I had same issue on python3.6.9 + torch1.3.0 but it works fine on python 3.7.5 + torch 1.3.0. |
I've been unable to reproduce the sharp memory spikes from the issue, only a very slight rise in memory usage. The profile looks roughly the same for all of the python and pytorch versions I tried. @TaehwanKwon would you mind posting the memory profile that you see running the cpu script on python 3.6? Also, are you using macOS like @mdlockyer? |
Given that this reproduces inconsistently / can be fixed by upgrading (either PyTorch or Python), I'm downgrading the priority of this issue. If someone can come up with a clear configuration on the newest Python/PyTorch which exactly causes the problem please let us know. |
馃悰 Bug
I've recently discovered an issue with memory not being freed after the first iteration of training. It's not a leak, as memory usage stays consistent after the second pass through the loop. It appears on both CPU and GPU, however it is much more significant when running on CPU.
The issue seems to come from the either backward or optimizer.step(), as removing their calls provides stable memory usage.
I ran into this while attempting to train a rather large model that uses pretty much all of my available GPU memory. It will complete the first iteration successfully, then OOM during the second.
To Reproduce
Steps to reproduce the behavior:
I have compiled a minimal CPU and GPU gist that should reproduce this issue:
CPU
GPU
The CPU gist uses the memory-profile package, so that will need to be installed with pip
Expected behavior
The memory usage should be relatively the same in the first pass through the training loop, and all following loops.
Environment
PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: None
OS: Mac OSX 10.13.6
GCC version: Could not collect
CMake version: version 3.9.4
Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] torch==1.0.1.post2
[pip3] torchvision==0.2.2.post3
[conda] torch 1.0.1.post2
[conda] torchsummary 1.5.1
[conda] torchvision 0.2.1
Additional context
I ran some profiles on the CPU memory usage that highlight the issue:
With backward pass and update:
Iteration 1
Iteration 2
Without backward pass and update:
Iteration 1
Iteration 2
cc @ezyang @gchanan @zou3519 @ssnl @albanD @gqchen
The text was updated successfully, but these errors were encountered: