PyTorch 0.3 compatibility #28

gpleiss · 2018-03-05T21:42:51Z

The efficient model now works on PyTorch 0.3

Some other changes:

Multi-GPU support is now baked directly into DenseNetEfficient, so I removed the multi-GPU specific model
Changed the name of the cifar option to small_inputs (more generic).

I will merge it in tomorrow after I confirm that the demo gets the same error on CIFAR-10.

gpleiss · 2018-03-05T21:44:59Z

This PR fixes #11 and #24 and #26.

wandering007 · 2018-03-06T08:02:38Z

@gpleiss Thanks for the hard work. I have some confusions though. Line 116 in densenet_efficient.py:

relu_output = fn(self.norm_weight, self.norm_bias, *inputs)
conv_output = F.conv2d(relu_output, self.conv_weight, bias=None, stride=1,
                       padding=0, dilation=1, groups=1)

Based on my understanding, the F.conv2d backward needs the relu_output to compute the gradient of conv_weight. As relu_output uses shared memory, the data could be corrupted. Am I wrong?

Besides, I also test the previous multi-gpu version, it was weird that _efficient_conv2d (aka cudnn_conv) can pass the test independently but grad got wrong when the module is used on the whole model. Is that the reason you put the conv ops outside of the function?

gpleiss · 2018-03-06T15:58:20Z

@wandering007 I put the conv outside because that made everythin much more memory efficient. When the conv is inside the rest of the bottleneck function, everything uses a lot more memory.

But you're right, it does seem that the gradient is incorrect. In practice, it's not that incorrect, since often what's written in the memory buffer at any given time is similar to what the actual conv input should be.

To fix this, we'll probably need to do something fancy with PyTorch hooks. I'll give this a shot, and hold off on merging for now.

wandering007 · 2018-03-07T04:24:45Z

@gpleiss I've post a question on PyTorch forum. Based on the answers, I think what you've implemented is workable.

gpleiss · 2018-03-12T14:01:44Z

@wandering007 I just pushed a fix, which should correctly compute the gradient.
What I had before sort of worked, for a very subtle reason. The gradients for the convolution were incorrect, but not too incorrect! The input variable to the convolution had a batch-norm'd version of the features, but an incorrect batch-norm'd version (i.e. the batch norm of the very final layer). What happens is that the network then performs gradient descent with respect to only a single set of batch norm parameters, rather than layer-specific batch norm parameters.

This "works" - in the sense that the network still learns something. However, it does not have the same capacity as a normal DenseNet, since the network effectively has one batch norm layer. And consequently, the network is not as accurate as the non-efficient network.

TLDR The fixes I just pushed correctly re-populate the shared storage before the convolution backward pass. It seems to be getting the same accuracy now as the non-efficient network. I'm going to run one more test today, and push tomorrow.

wandering007 · 2018-03-13T05:48:13Z

@gpleiss That is a nice workaround!
I think it can be further improved in two ways (#29):

for forward pass, since the autograd graph is not necessary to be built, the created Variables in the forward function can be volatile, aka in purely inference mode.
the running_mean and running_var restoring can be simpler to achieve.

One question about the code is Line 389:

 self.bn_output_var.backward(gradient=relu_grad_input)

Will the backward do the whole rest backward process of the model since inputs are non-leaf Variables? Maybe use torch.autograd.grad function instead?

�The above comments are not tested, just based on my experience.

gpleiss · 2018-03-13T19:15:54Z

The efficient densenet matches the error of the normal densenet, so I'm merging.

gpleiss added 8 commits March 5, 2018 16:32

Fix demo script for 0.3

92a610c

Rewrite of densenet efficient

b789e44

Re-write efficient densenet for 0.3 compatibility

b96c824

Remove unused tests

066777b

Fix flake8

d4faef0

Multi-gpu support for new efficient densenet

e2f20eb

Remove multi-GPU only model

568a272

Update README.md

10202bb

Fix conv gradient

2ea8e95

Remove old options from bottleneck fn

f4b7981

gpleiss merged commit b1874e7 into master Mar 13, 2018

This was referenced Mar 13, 2018

test failed on v0.2 #11

Closed

storage resize_ function #26

Closed

I meet this problem when I run the demo.py. How to solve it? #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch 0.3 compatibility #28

PyTorch 0.3 compatibility #28

gpleiss commented Mar 5, 2018

gpleiss commented Mar 5, 2018

wandering007 commented Mar 6, 2018 •

edited

gpleiss commented Mar 6, 2018

wandering007 commented Mar 7, 2018

gpleiss commented Mar 12, 2018

wandering007 commented Mar 13, 2018 •

edited

gpleiss commented Mar 13, 2018

PyTorch 0.3 compatibility #28

PyTorch 0.3 compatibility #28

Conversation

gpleiss commented Mar 5, 2018

gpleiss commented Mar 5, 2018

wandering007 commented Mar 6, 2018 • edited

gpleiss commented Mar 6, 2018

wandering007 commented Mar 7, 2018

gpleiss commented Mar 12, 2018

wandering007 commented Mar 13, 2018 • edited

gpleiss commented Mar 13, 2018

wandering007 commented Mar 6, 2018 •

edited

wandering007 commented Mar 13, 2018 •

edited