New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch 0.3 compatibility #28
Conversation
@gpleiss Thanks for the hard work. I have some confusions though. Line 116 in densenet_efficient.py:
Based on my understanding, the Besides, I also test the previous multi-gpu version, it was weird that |
@wandering007 I put the conv outside because that made everythin much more memory efficient. When the conv is inside the rest of the bottleneck function, everything uses a lot more memory. But you're right, it does seem that the gradient is incorrect. In practice, it's not that incorrect, since often what's written in the memory buffer at any given time is similar to what the actual conv input should be. To fix this, we'll probably need to do something fancy with PyTorch hooks. I'll give this a shot, and hold off on merging for now. |
@gpleiss I've post a question on PyTorch forum. Based on the answers, I think what you've implemented is workable. |
@wandering007 I just pushed a fix, which should correctly compute the gradient. This "works" - in the sense that the network still learns something. However, it does not have the same capacity as a normal DenseNet, since the network effectively has one batch norm layer. And consequently, the network is not as accurate as the non-efficient network. TLDR The fixes I just pushed correctly re-populate the shared storage before the convolution backward pass. It seems to be getting the same accuracy now as the non-efficient network. I'm going to run one more test today, and push tomorrow. |
@gpleiss That is a nice workaround!
One question about the code is Line 389:
Will the backward do the whole rest backward process of the model since inputs are non-leaf Variables? Maybe use torch.autograd.grad function instead? �The above comments are not tested, just based on my experience. |
The efficient densenet matches the error of the normal densenet, so I'm merging. |
The efficient model now works on PyTorch 0.3
Some other changes:
cifar
option tosmall_inputs
(more generic).I will merge it in tomorrow after I confirm that the demo gets the same error on CIFAR-10.