Skip to content

return identity when blocks less than 1 in resnet#3262

Open
ZhiyuanChen wants to merge 3 commits intopytorch:mainfrom
ZhiyuanChen:patch-1
Open

return identity when blocks less than 1 in resnet#3262
ZhiyuanChen wants to merge 3 commits intopytorch:mainfrom
ZhiyuanChen:patch-1

Conversation

@ZhiyuanChen
Copy link

Some modifications replace certain stage/s in resnet.
This allows such stage/s to be initialised to nn.Identity()

Some modifications replace certain stage/s in resnet. 
This allows such stage/s to be initialised to `nn.Identity()`
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZhiyuanChen Thanks for the PR. One of the tests fails due to typing. I left a suggestion on the comments, please have a look. Also could you please provide a bit more info on how this change will enable your use-cases. This will help us assess if the proposed change is necessary and whether there is a better recommended way to do it.

@fmassa: As far as I can see this is a BC change, so it should be OK. Any thoughts on whether it can have side-effects on any models that use it as backbone?

@ZhiyuanChen
Copy link
Author

ZhiyuanChen commented Jan 18, 2021

@ZhiyuanChen Thanks for the PR. One of the tests fails due to typing. I left a suggestion on the comments, please have a look.

Thank you, I have noticed that but I'm not quite sure should I change the return value to non.Sequential or nn.Identity or should I wrap it with a nn.Sequential.

Also could you please provide a bit more info on how this change will enable your use-cases. This will help us assess if the proposed change is necessary and whether there is a better recommended way to do it.

Visual transformer swap the blocks in the last stage of resnet with VT Module. Vision Transformer's hybrid variant remove the stage 4 entirely and extend the stage 3 to make it 50 layers (for resnet50). As the transformer architecture are attracting increasing attention, I believe there will be more hybrid architecture which will modifies resents like this

ZhiyuanChen and others added 2 commits January 18, 2021 22:14
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
@fmassa
Copy link
Member

fmassa commented Jan 21, 2021

Hi,

Thanks for the PR!

I think the solution implemented here does not completely address the use-case presented.

The model still contains a avgpool and fc layer after the residual blocks, so even with this patch you'll still have to do other modifications to the model to make it compatible with ViT.

Additionally, the same functionality implemented in this PR can be easily obtained with the following snippet:

model = torchvision.models.resnet50()
model.layer3 = nn.Identity()
model.layer4 = nn.Identity()

which I believe could live in use-code.

One of the reasons why I would prefer not to natively support this patch is that the meaning of layer now is blurry, and we can't infer the downsampling only by the level (as the user could do [0, 2, 2, 2], which is equivalent to [2, 2, 2, 0] but might not work well with dilation etc).

For those reasons, I think I would prefer not to move forward with this PR, but let us know if you have other thoughts

@ZhiyuanChen
Copy link
Author

ZhiyuanChen commented Feb 4, 2021

Hi,

Thanks for the PR!

I think the solution implemented here does not completely address the use-case presented.

The model still contains a avgpool and fc layer after the residual blocks, so even with this patch you'll still have to do other modifications to the model to make it compatible with ViT.

Additionally, the same functionality implemented in this PR can be easily obtained with the following snippet:

model = torchvision.models.resnet50()
model.layer3 = nn.Identity()
model.layer4 = nn.Identity()

which I believe could live in use-code.

One of the reasons why I would prefer not to natively support this patch is that the meaning of layer now is blurry, and we can't infer the downsampling only by the level (as the user could do [0, 2, 2, 2], which is equivalent to [2, 2, 2, 0] but might not work well with dilation etc).

For those reasons, I think I would prefer not to move forward with this PR, but let us know if you have other thoughts

This will not work if we are using the attributes of this object. For example, if I define a network like this:

class MyNet(ResNet):

    def __init__(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        layers: List[int],
        num_classes: int = 1000,
        zero_init_residual: bool = False,
        groups: int = 1,
        width_per_group: int = 64,
        replace_stride_with_dilation: Optional[List[bool]] = None,
        norm: Optional[Callable[..., nn.Module]] = nn.BatchNorm2d
    ) -> None:
        super(ResNet, self).__init__()
        self.layer4 = self._make_xxx_layer(512, layers[3])

    def _make_xxx_layer(planes, layers):
        return nn.Sequential(xxx(self.inplanes, planes) for _ in range layers)

It will not work as the self.inplanes is now 2048 which should have been 1024.

I do believe the programmers are mature enough to know what they are doing, if they specify a block has 0 layer, it should have 0 layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants