Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change resnet AvgPool2d to Adaptive_AvgPool_2d(global avg pooling) to make the input size changeable #155

Closed
wants to merge 1 commit into from

Conversation

XavierLinNow
Copy link

This PR change resnet final pooling layer before classifier from fixed AvgPool2d(7) to adaptive Adaptive_AvgPool_2d(1)
When we use AvgPool2d(7), the input size is forced to be 224*224. Otherwise it will cause the wrong incomplete output, even raise the error RuntimeError: size mismatch, m1: [1 x 2048], m2: [512 x 1000] at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:1229

@fmassa
Copy link
Member

fmassa commented Apr 21, 2017

I'm unsure about merging this in.
It has already been asked before #140, so this is a recurring question, but is this the expected behavior that one would want to have for torchvision models? If yes, then we should maybe modify that in all models, and not only resnets.
On the other hand, it's usually very easy to the end user to adapt the network to handle varying sizes. Something like

resnet = torchvision.models.resnet18()
resnet.avgpool = nn.AdaptiveAvgPool2d(1)

would be enough, and explicit to the user.

On the other hand, it hides a potential problem, which is that the pre-trained models might not work well if too big images are fed to the network, as they were trained to identify patterns at a specific resolution.

@Maratyszcza
Copy link
Contributor

I'm for using adaptive pooling in all models. If you are willing to accept this change, I will create a PR for all models.

@fmassa
Copy link
Member

fmassa commented Apr 30, 2017

Well, I suppose most common use-cases are just feeding images for classification without any changes in the architecture. Other scenarios (like segmentation) will anyway require some modification on the structure of the network.
Let's add adaptive pool for the models then. @soumith @colesbury are you ok with that?

@soumith
Copy link
Member

soumith commented Apr 30, 2017

there are two common use cases:

  • fully dense outputs
  • adaptively pooled outputs

fully dense is the case where we give bigger inputs and we get bigger outputs than 1000x1x1

adaptive is when we give bigger inputs and always get out 1000x4096 to be fed into fc layers.

they are both common and in conflicting paths.

we should plan for these both via constructor arguments

@alykhantejani
Copy link
Contributor

As there have been a few issues as PRs around this topic (#166, #184, #155, #190), we should probably make a decision on what to do

my 2¢ on these two cases:

Adaptive Pooling
Users might want to using adaptive pooling when passing images of a different size to the pretrained classifier, however, if we enable this by default the models will work with variable image sizes without the user being explicit and can exhibit degradation silently (as the model was trained with a different image size). For this reason, I think it should be explicit in the users code that they want this behavior, either by a constructor arg or doing some model surgery.

I think the model surgery is quite straight forward, and can be well documented in a tutorial/example. IMO the cost of cluttering the constructor + the docs to support this case is not worth it as it reads almost/if not more clearly by the user being explicit by doing:

resnet = torchvision.models.resnet18()
resnet.avgpool = nn.AdaptiveAvgPool2d(1)

The one problem with this is that in some models the final pooling is done using the functional interface (e.g. densenet), but this can be easily remedied by creating members on the module for the feature pooling (and is backwards compatible)

Fully Convolutional Networks
In this case the network surgery required is a little more complex, but again something we can clearly document in an example for reference. Additionally, in most cases people who want a FCN will be modifying the network structure for their new task and will likely have to do some network surgery anyway.

What are your thoughts @fmassa @soumith @colesbury?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 22, 2017

For DenseNet model, it would be good at least to write explicitly that input image size is 224x224, otherwise F.avg_pool2d(x, kernel_size=7) silently takes only first 7x7 feature map block.

@duygusar
Copy link

duygusar commented Jun 21, 2018

I have the similar problem but the model I use is a custom RecurrentAttention one and it has no attribute "AdaptivePooling" --> https://github.com/kevinzakka/recurrent-visual-attention/blob/master/model.py I did modify the nn parts that are imported and modified from torch models (in modules.py and model.py) but it didn't work.

@mamunir
Copy link

mamunir commented Jul 31, 2018

I have added the adaptive avg pooling but error still remain the same. Please help?
RuntimeError: size mismatch, m1: [512 x 1], m2: [512 x 2] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

Detailed Error

Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fcf3cf65990>> ignored


RuntimeError Traceback (most recent call last)
in ()
1 model_conv = train_model(model_conv, criterion, optimizer_ft, exp_lr_scheduler,
----> 2 num_epochs=25)

in train_model(model, criterion, optimizer, scheduler, num_epochs)
34 # print (inputs.shape)
35 # print (model)
---> 36 outputs = model(inputs)
37 _, preds = torch.max(outputs, 1)
38 # print (outputs)

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
--> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)

in forward(self, x)
62 # x = F.relu(F.max_pool2d(self.conv1(x), 2))
63 # x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
---> 64 x = self.model_f(x)
65 # x = x.view(-1, 14336)
66 # x = F.relu(self.fc1(x))

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
--> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.pyc in forward(self, input)
89 def forward(self, input):
90 for module in self._modules.values():
---> 91 input = module(input)
92 return input
93

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
--> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.pyc in forward(self, input)
53
54 def forward(self, input):
---> 55 return F.linear(input, self.weight, self.bias)
56
57 def extra_repr(self):

/usr/local/lib/python2.7/dist-packages/torch/nn/functional.pyc in linear(input, weight, bias)
992 return torch.addmm(bias, input, weight.t())
993
--> 994 output = input.matmul(weight.t())
995 if bias is not None:
996 output += bias

RuntimeError: size mismatch, m1: [512 x 1], m2: [512 x 2] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

@fmassa
Copy link
Member

fmassa commented Aug 2, 2018

@mamunir the error that you are facing happens because you added the wrong output size for the AdaptiveAvgPool layer. For resnets, it should be 1x1

@mamunir
Copy link

mamunir commented Aug 3, 2018

@fmassa thanks man for reply. I did input call wrong as not properly flow from the network. now fine :)

@fmassa
Copy link
Member

fmassa commented Nov 6, 2018

This has been implemented in #643

Thanks for the PR!

@fmassa fmassa closed this Nov 6, 2018
rajveerb pushed a commit to rajveerb/vision that referenced this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants