Missing non-linearity between ResNet layers? #21

alechoag · 2020-10-02T00:10:16Z

I'm sorry if I'm just being silly, but shouldn't there be a ReLU (or other non-linearity) between ResNet layers (after the add)? For example, the mapping net architecture looks like:
...
(10): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(11): ReLU(inplace=True)
(12): ResnetBlock(
(conv_block): Sequential(
(0): ReflectionPad2d((1, 1, 1, 1))
(1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
(2): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(3): ReLU(inplace=True)
(4): ReflectionPad2d((1, 1, 1, 1))
(5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
(6): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
)
)
(13): ResnetBlock(
(conv_block): Sequential(
(0): ReflectionPad2d((1, 1, 1, 1))
(1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
...

zhangmozhe · 2020-10-02T12:20:40Z

In fact, there are several variants of residual block. As discussed in ResNet ECCV paper (https://arxiv.org/pdf/1603.05027.pdf), the activation after the addition (original version) will impede the information propagation and cause performance degradation. You can see detailed comparison of different variants in the paper.

alechoag · 2020-10-02T18:21:26Z

Thanks for the reply (and the excellent paper and code!). One follow-on question: if the goal is to use a full pre-activation configuration (to preserve the skip connection), shouldn't the norm and activation appear before the convolution in the ResNet?

zhangmozhe · 2020-10-03T06:00:29Z

Since there are several variants, our implementation does not strictly follow the pre-activation manner, but still remove the activation after the addition operator. Such implementation style is also widely adopted in research work.

alechoag closed this as completed Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing non-linearity between ResNet layers? #21

Missing non-linearity between ResNet layers? #21

alechoag commented Oct 2, 2020

zhangmozhe commented Oct 2, 2020

alechoag commented Oct 2, 2020

zhangmozhe commented Oct 3, 2020

Missing non-linearity between ResNet layers? #21

Missing non-linearity between ResNet layers? #21

Comments

alechoag commented Oct 2, 2020

zhangmozhe commented Oct 2, 2020

alechoag commented Oct 2, 2020

zhangmozhe commented Oct 3, 2020