question about backbone truncation #1

Senwang98 · 2021-11-01T09:30:47Z

Hi, @prakharg24
I feel confused about backbone truncation your paper. For mobilenetv2 arch:
, you say the last two blocks are not used. Can you tell me which two blocks?
You mean the last two bottleneck??(I think you don't mean FC layer, nobody will keep FC layer for detection task)
Wishing for your reply!

Senwang98 · 2021-11-01T09:36:03Z

So, truncate here means random weight init instead of imagenet pre-trained weights?

prakharg24 · 2021-11-01T10:06:40Z

Hi @Senwang98

Yes the truncation of last two blocks does not not include the fully connected layers. It represents the two MBConv blocks (or 'bottlenecks' as you mentiond) with 1280 and 320 channels.

In the paper, we try to motivate backbone truncation in two steps. First, we show that the last CNN blocks have no transfer learning importance. In this step, there is no truncation, but only changing the weight initialization from imagenet to random (Figure 2 in our paper). Next, once we have shown that these weights have no transfer learning importance, we claim that removing them (or truncating) will be a better way to make the model lightweight as compared to reducing width (i.e. the scaling factor).

So, truncate does not mean random weight init, it means truncation (or removing) the blocks. Its just that truncation was motivated by this other experiments where random initialization of last layers instead of transfer learning weights helped improve performance.

I hope that clarifies things.

Senwang98 · 2021-11-01T10:15:40Z

Okay, thanks for your quick reply!
So, I just remove the last MBConv(320 and 1280 channels)?
By the way, do manually designed classification networks generally have more flops in the last few layers, because I notice that the flops of last few layers which are obtained by NAS are small.

prakharg24 · 2021-11-01T10:36:18Z

That was definitely a common behavior for a long time. Since ImageNet has 1000 classes, most pre-trained models attempt to increase the number of channels to similar order of channels for the last layers. Some of the recent models don't follow this to the heart, but the overall behavior of significantly increase channel number for the last 2-3 layers is still there. However, as I mentioned, the fact that these layers are very heavy is only one part of the issue, the other being that last layers seem to not contain any relevant transfer learning features.

Btw, when working on say object detection, using transfer learning NAS backbone might not be the optimal choice, since NAS architectures are usually not easy to generalize. There is also a lot of work on creating object detection specific backbones using NAS which would be a better fit if someone wants to explore in that direction.

Senwang98 · 2021-11-01T10:40:32Z

@prakharg24
Thanks for your detailed explanation, I understand your paper better and I will use your idea to help imporve light-weight detection.
Thanks again, good luck!

Senwang98 · 2021-11-01T12:43:17Z

@prakharg24
This is my pytorch style of RFCR module:

from .conv import DepthwiseConvModule, ConvModule
import torch
import torch.nn as nn
import torch.nn.functional as F


class MobilenetSeparableConv2D(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(MobilenetSeparableConv2D, self).__init__()
        self.depthwiseconv = DepthwiseConvModule(in_channels=in_channels,
                                                 out_channels=in_channels,
                                                 kernel_size=5,
                                                 stride=1,
                                                 padding=2,
                                                 dilation=1,
                                                 bias="auto",
                                                 norm_cfg=dict(type="BN"),
                                                 activation="ReLU6")
        self.conv = ConvModule(in_channels=in_channels,
                               out_channels=out_channels,
                               kernel_size=1,
                               stride=1,
                               padding=0,
                               dilation=1,
                               bias="auto",
                               norm_cfg=dict(type="BN"),
                               activation="ReLU6")

    def forward(self, x):
        x = self.depthwiseconv(x)
        x = self.conv(x)
        return x


class RFCR_module(nn.Module):
    def __init__(self, in_channel, mid_channel=48, out_channel=96):
        super(RFCR_module, self).__init__()
        self.scale = nn.ParameterList(nn.Parameter(torch.tensor(
            [1.]), requires_grad=True) for _ in range(len(in_channel)))
        self.pwconv = nn.ModuleList(nn.Conv2d(
            in_channel[i], mid_channel, kernel_size=1, stride=1, padding=0, bias=False) for i in range(len(in_channel)))
        self.MB_conv = MobilenetSeparableConv2D(mid_channel, out_channel)

    def forward(self, model_outputs):
        fuse_out = []
        # fuse_out.append(F.max_pool2d(F.max_pool2d(
        #     model_outputs[0], 1, stride=2), 1, stride=2))
        fuse_out.append(F.max_pool2d(model_outputs[0], 2))
        fuse_out.append(model_outputs[1])
        fuse_out.append(F.interpolate(
            model_outputs[2], scale_factor=2, mode="bilinear"))
        for i in range(len(fuse_out)):
            fuse_out[i] = self.pwconv[i](fuse_out[i])

        sum_feat = self.scale[0] * fuse_out[0] + self.scale[1] * \
            fuse_out[1] + self.scale[2] * fuse_out[2] \
                # + self.scale[3] * fuse_out[3]

        mb_feat = self.MB_conv(sum_feat)
        redist_feat = []
        redist_feat.append(torch.cat(
            [F.interpolate(mb_feat, scale_factor=2, mode="bilinear"), model_outputs[0]], dim=1))
        redist_feat.append(torch.cat([mb_feat, model_outputs[1]], dim=1))
        redist_feat.append(
            torch.cat([F.max_pool2d(mb_feat, 1, stride=2), model_outputs[2]], dim=1))
        return redist_feat

I ignore the feature map with 4x stride, so the model_output length is 3 instead.
I am not sure if I have reproducted your module collectly?
I add RFCR into NanoDet repo to see if RFCR can improve detection performance!

Maybe pointwise conv and depthwise conv is fast enough, but I found that RFCR output will add 96 channel numbers which may increase GFlops dramaticly. So, I set pointwise conv output channel=16, and MBconv output channel=32.
In this way, NanoDet GFlops increase from 0.3GFlops to 0.32GFlops!

prakharg24 · 2021-11-02T04:40:11Z

Hi @Senwang98

Before anything else, while we are still working on adding more trained models in the repo and make our code easily adaptable, you can already find the model definition here.

As for your implementation, I think that also looks correct to me. You should also cross-check with our TensorFlow implementation, as two heads are better than one :)

For the channel number conundrum, there are a few things I would like to point out. Firstly, its true that the RFCR module can add some computations. While it also improves accuracy, overall there is a trade-off. We were able to overcome this by combining RFCR with backbone truncation. Second, one of the things that we focus on in our work is that indirect metrics of comparisons, like flops or size, are usually not the best measure of a model's execution requirements and thus even though it might seem RFCR causes significant damage to the flops count, network fragmentation in RFCR are limited and can operate well under proper parallelization. Finally, yes the additional channels we used might not be the right choice for you. It depends on the backbone and the detection head being used. I would suggest though that even though you reduce the MBConv output channel to 32, you can still keep the pointwise conv output channel high and not reduce it severely to 16 as that might be hurting to the performance.

I hope this helps

Senwang98 · 2021-11-02T06:14:16Z

@prakharg24
Ok, I got it! Thanks for your interesting work, I will modidy the channel number to see performance and inference speed.
Thanks again and I will close this issue.

vaerdu · 2021-11-23T11:55:28Z

@prakharg24 This is my pytorch style of RFCR module:

from .conv import DepthwiseConvModule, ConvModule
import torch
import torch.nn as nn
import torch.nn.functional as F


class MobilenetSeparableConv2D(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(MobilenetSeparableConv2D, self).__init__()
        self.depthwiseconv = DepthwiseConvModule(in_channels=in_channels,
                                                 out_channels=in_channels,
                                                 kernel_size=5,
                                                 stride=1,
                                                 padding=2,
                                                 dilation=1,
                                                 bias="auto",
                                                 norm_cfg=dict(type="BN"),
                                                 activation="ReLU6")
        self.conv = ConvModule(in_channels=in_channels,
                               out_channels=out_channels,
                               kernel_size=1,
                               stride=1,
                               padding=0,
                               dilation=1,
                               bias="auto",
                               norm_cfg=dict(type="BN"),
                               activation="ReLU6")

    def forward(self, x):
        x = self.depthwiseconv(x)
        x = self.conv(x)
        return x


class RFCR_module(nn.Module):
    def __init__(self, in_channel, mid_channel=48, out_channel=96):
        super(RFCR_module, self).__init__()
        self.scale = nn.ParameterList(nn.Parameter(torch.tensor(
            [1.]), requires_grad=True) for _ in range(len(in_channel)))
        self.pwconv = nn.ModuleList(nn.Conv2d(
            in_channel[i], mid_channel, kernel_size=1, stride=1, padding=0, bias=False) for i in range(len(in_channel)))
        self.MB_conv = MobilenetSeparableConv2D(mid_channel, out_channel)

    def forward(self, model_outputs):
        fuse_out = []
        # fuse_out.append(F.max_pool2d(F.max_pool2d(
        #     model_outputs[0], 1, stride=2), 1, stride=2))
        fuse_out.append(F.max_pool2d(model_outputs[0], 1, stride=2))
        fuse_out.append(model_outputs[1])
        fuse_out.append(F.interpolate(
            model_outputs[2], scale_factor=2, mode="bilinear"))
        for i in range(len(fuse_out)):
            fuse_out[i] = self.pwconv[i](fuse_out[i])

        sum_feat = self.scale[0] * fuse_out[0] + self.scale[1] * \
            fuse_out[1] + self.scale[2] * fuse_out[2] \
                # + self.scale[3] * fuse_out[3]

        mb_feat = self.MB_conv(sum_feat)
        redist_feat = []
        redist_feat.append(torch.cat(
            [F.interpolate(mb_feat, scale_factor=2, mode="bilinear"), model_outputs[0]], dim=1))
        redist_feat.append(torch.cat([mb_feat, model_outputs[1]], dim=1))
        redist_feat.append(
            torch.cat([F.max_pool2d(mb_feat, 1, stride=2), model_outputs[2]], dim=1))
        return redist_feat

I ignore the feature map with 4x stride, so the model_output length is 3 instead. I am not sure if I have reproducted your module collectly? I add RFCR into NanoDet repo to see if RFCR can improve detection performance!

Maybe pointwise conv and depthwise conv is fast enough, but I found that RFCR output will add 96 channel numbers which may increase GFlops dramaticly. So, I set pointwise conv output channel=16, and MBconv output channel=32. In this way, NanoDet GFlops increase from 0.3GFlops to 0.32GFlops!

您好，大佬，看了你写的pytorch代码，向您请教几个问题
1、您代码中输入的是使用了backbone的3种尺度的特征吗？
2、在对featuremap下采样的时候最大的池化，F.max_pool2d(model_outputs[0], 1, stride=2)池化核的大小是1吗？
3、使用dw卷积的时候group参数是？
期待大佬的回复

Senwang98 · 2021-11-23T12:01:52Z

@vaerdu
我在nandoet的检测框架上进行测试的，因为模型小训练够快

是的，3种。
kernel size = 1
默认使用的nanodet的dwconv，https://github.com/RangiLyu/nanodet/blob/c931de553e0ded55e8811e51cf0b74ac3aa5e9de/nanodet/model/module/conv.py#L191

vaerdu · 2021-11-23T13:35:44Z

@vaerdu 我在nandoet的检测框架上进行测试的，因为模型小训练够快

是的，3种。

kernel size = 1

默认使用的nanodet的dwconv，https://github.com/RangiLyu/nanodet/blob/c931de553e0ded55e8811e51cf0b74ac3aa5e9de/nanodet/model/module/conv.py#L191

抱歉大佬，还是要向您请教
1、1x1的池化核不太懂，可以实现下采样吗？池化核通常不都是2、5、7尺度的吗？
2、RFCR模块先对backbone的3种尺度特征进行pw卷积，通常backbone的3种尺度通道数128、256、512、1024等，但经过pw卷积后通道数都降到了48，这样会不会丢失较多的特征信息？
希望大佬不吝赐教

Senwang98 · 2021-11-23T14:14:08Z

@vaerdu

这里我应该写得不对（晕），直接max_pool2d(feat, 2)
RFCR我理解的主要适用于小模型吧，一般到不了512，而且这个参数是自定义的

vaerdu · 2021-11-24T01:45:51Z

@vaerdu

这里我应该写得不对（晕），直接max_pool2d(feat, 2)

RFCR我理解的主要适用于小模型吧，一般到不了512，而且这个参数是自定义的

嗯嗯，明白了
self.scale = nn.ParameterList(nn.Parameter(torch.tensor(
[1.]), requires_grad=True) for _ in range(len(in_channel)))
在weighted sum中，是用以上代码获得的权重参数与pw卷积后的featuremaps相乘然后求和吗？

Senwang98 closed this as completed Nov 1, 2021

Senwang98 reopened this Nov 2, 2021

Senwang98 closed this as completed Nov 2, 2021

prakharg24 mentioned this issue Nov 10, 2021

有RFCR模块的pytorch实现吗 #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about backbone truncation #1

question about backbone truncation #1

Senwang98 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021

prakharg24 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021

prakharg24 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021 •

edited

Loading

prakharg24 commented Nov 2, 2021

Senwang98 commented Nov 2, 2021

vaerdu commented Nov 23, 2021

Senwang98 commented Nov 23, 2021

vaerdu commented Nov 23, 2021

Senwang98 commented Nov 23, 2021 •

edited

Loading

vaerdu commented Nov 24, 2021

question about backbone truncation #1

question about backbone truncation #1

Comments

Senwang98 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021

prakharg24 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021

prakharg24 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021

Senwang98 commented Nov 1, 2021 • edited Loading

prakharg24 commented Nov 2, 2021

Senwang98 commented Nov 2, 2021

vaerdu commented Nov 23, 2021

Senwang98 commented Nov 23, 2021

vaerdu commented Nov 23, 2021

Senwang98 commented Nov 23, 2021 • edited Loading

vaerdu commented Nov 24, 2021

Senwang98 commented Nov 1, 2021 •

edited

Loading

Senwang98 commented Nov 23, 2021 •

edited

Loading