Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing non-linearity between ResNet layers? #21

Closed
alechoag opened this issue Oct 2, 2020 · 3 comments
Closed

Missing non-linearity between ResNet layers? #21

alechoag opened this issue Oct 2, 2020 · 3 comments

Comments

@alechoag
Copy link

alechoag commented Oct 2, 2020

I'm sorry if I'm just being silly, but shouldn't there be a ReLU (or other non-linearity) between ResNet layers (after the add)? For example, the mapping net architecture looks like:
...
(10): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(11): ReLU(inplace=True)
(12): ResnetBlock(
(conv_block): Sequential(
(0): ReflectionPad2d((1, 1, 1, 1))
(1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
(2): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(3): ReLU(inplace=True)
(4): ReflectionPad2d((1, 1, 1, 1))
(5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
(6): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
)
)
(13): ResnetBlock(
(conv_block): Sequential(
(0): ReflectionPad2d((1, 1, 1, 1))
(1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
...

@zhangmozhe
Copy link
Contributor

In fact, there are several variants of residual block. As discussed in ResNet ECCV paper (https://arxiv.org/pdf/1603.05027.pdf), the activation after the addition (original version) will impede the information propagation and cause performance degradation. You can see detailed comparison of different variants in the paper.

@alechoag
Copy link
Author

alechoag commented Oct 2, 2020

Thanks for the reply (and the excellent paper and code!). One follow-on question: if the goal is to use a full pre-activation configuration (to preserve the skip connection), shouldn't the norm and activation appear before the convolution in the ResNet?

@zhangmozhe
Copy link
Contributor

Since there are several variants, our implementation does not strictly follow the pre-activation manner, but still remove the activation after the addition operator. Such implementation style is also widely adopted in research work.

@alechoag alechoag closed this as completed Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants