Replace FPN with ASFF #51

hegc · 2019-12-25T06:09:38Z

When I replace the FPN with ASFF in Retinaface, the model size is double, but the result is inferior to FPN.

GOATmessi8 · 2019-12-25T06:37:04Z

Hi, our experiments on RetinaNet show consistent improvements with ASFF, and the results can also be confirmed by another work EfficientDet. I am not familiar with Retinaface, but I suggest you look at BiFPN for re-implement ASFF with FPN.

hegc · 2019-12-25T07:03:56Z

class ASFF_FPN(nn.Module):
    def __init__(self,in_channels_list,out_channels):
        super(ASFF_FPN,self).__init__()
        leaky = 0
        if (out_channels <= 64):
            leaky = 0.1
        self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride=1, leaky=leaky)
        self.output2 = conv_bn1X1(in_channels_list[1], out_channels, stride=1, leaky=leaky)
        self.output3 = conv_bn1X1(in_channels_list[2], out_channels, stride=1, leaky=leaky)

        self.compress_level_2to1 = conv_bn1X1(in_channels_list[1], in_channels_list[0], stride=1, leaky=0.1)
        self.compress_level_3to1 = conv_bn1X1(in_channels_list[2], in_channels_list[0], stride=1, leaky=0.1)

        self.stride_conv_level_1to2 = conv_bn(in_channels_list[0], in_channels_list[1], stride=2, leaky=0.1)
        self.compress_level_3to2 = conv_bn1X1(in_channels_list[2], in_channels_list[1], stride=1, leaky=0.1)

        self.max_pool_level_1to3 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.stride_conv_level_1to3 = conv_bn(in_channels_list[0], in_channels_list[2], stride=2, leaky=0.1)
        self.stride_conv_level_2to3 = conv_bn(in_channels_list[1], in_channels_list[2], stride=2, leaky=0.1)        

        self.weight_level_1_1 = conv_bn1X1(in_channels_list[0], 8, stride=1, leaky=0.1)
        self.weight_level_1_2 = conv_bn1X1(in_channels_list[0], 8, stride=1, leaky=0.1)
        self.weight_level_1_3 = conv_bn1X1(in_channels_list[0], 8, stride=1, leaky=0.1)
        self.weight_level_1 = conv_bn(24, 3, stride=1, leaky=0.1)

        self.weight_level_2_1 = conv_bn1X1(in_channels_list[1], 8, stride=1, leaky=0.1)
        self.weight_level_2_2 = conv_bn1X1(in_channels_list[1], 8, stride=1, leaky=0.1)
        self.weight_level_2_3 = conv_bn1X1(in_channels_list[1], 8, stride=1, leaky=0.1)
        self.weight_level_2 = conv_bn(24, 3, stride=1, leaky=0.1)

        self.weight_level_3_1 = conv_bn1X1(in_channels_list[2], 8, stride=1, leaky=0.1)
        self.weight_level_3_2 = conv_bn1X1(in_channels_list[2], 8, stride=1, leaky=0.1)
        self.weight_level_3_3 = conv_bn1X1(in_channels_list[2], 8, stride=1, leaky=0.1)
        self.weight_level_3 = conv_bn(24, 3, stride=1, leaky=0.1)


    def forward(self, input):
        # names = list(input.keys())
        input = list(input.values())
        #---------------------------------------------------------------------------------------------
        level_1 = input[0]
        level_2to1 = F.interpolate(self.compress_level_2to1(input[1]), [level_1.size(2), level_1.size(3)], mode="nearest")
        level_3to1 = F.interpolate(self.compress_level_3to1(input[2]), [level_1.size(2), level_1.size(3)], mode="nearest")

        weight_level_1_1 = self.weight_level_1_1(level_1)
        weight_level_1_2 = self.weight_level_1_2(level_2to1)
        weight_level_1_3 = self.weight_level_1_3(level_3to1)

        weight_level_1 = torch.cat((weight_level_1_1, weight_level_1_2, weight_level_1_3), 1)
        weight_level_1 = self.weight_level_1(weight_level_1)
        weight_level_1 = F.softmax(weight_level_1, dim=1)
        fused_level_1 = level_1 * weight_level_1[:, 0:1, :, :] +\
                        level_2to1 * weight_level_1[:, 1:2, :, :] +\
                        level_3to1 * weight_level_1[:, 2:3, :, :]
        #---------------------------------------------------------------------------------------------
        level_2 = input[1]
        level_1to2 = self.stride_conv_level_1to2(input[0])
        level_3to2 = F.interpolate(self.compress_level_3to2(input[2]), [level_2.size(2), level_2.size(3)], mode="nearest")

        weight_level_2_1 = self.weight_level_2_1(level_1to2)
        weight_level_2_2 = self.weight_level_2_2(level_2)
        weight_level_2_3 = self.weight_level_2_3(level_3to2)

        weight_level_2 = torch.cat((weight_level_2_1, weight_level_2_2, weight_level_2_3), 1)
        weight_level_2 = self.weight_level_2(weight_level_2)
        weight_level_2 = F.softmax(weight_level_2, dim=1)
        fused_level_2 = level_1to2 * weight_level_2[:, 0:1, :, :] +\
                        level_2 * weight_level_2[:, 1:2, :, :] +\
                        level_3to2 * weight_level_2[:, 2:3, :, :]
        #---------------------------------------------------------------------------------------------
        level_3 = input[2]
        level_1to3 = self.stride_conv_level_1to3(self.max_pool_level_1to3(input[0]))
        level_2to3 = self.stride_conv_level_2to3(input[1])

        weight_level_3_1 = self.weight_level_3_1(level_1to3)
        weight_level_3_2 = self.weight_level_3_2(level_2to3)
        weight_level_3_3 = self.weight_level_3_3(level_3)

        weight_level_3 = torch.cat((weight_level_3_1, weight_level_3_2, weight_level_3_3), 1)
        weight_level_3 = self.weight_level_3(weight_level_3)
        weight_level_3 = F.softmax(weight_level_3, dim=1)
        fused_level_3 = level_1to3 * weight_level_3[:, 0:1, :, :] +\
                        level_2to3 * weight_level_3[:, 1:2, :, :] +\
                        level_3 * weight_level_3[:, 2:3, :, :]
        #---------------------------------------------------------------------------------------------

        output1 = self.output1(fused_level_1)
        output2 = self.output2(fused_level_2)
        output3 = self.output3(fused_level_3)

        out = [output1, output2, output3]
        return out

hegc · 2019-12-25T07:04:43Z

This is my code, can you help me？

abhigoku10 · 2020-01-16T11:10:12Z

@ruinmessi is ASFF and BiFPN same ??

djaym7 · 2020-01-23T00:09:11Z

@abhigoku10

BiFPN uses 3 weights of shape (1,) and multiplies it with 3 features while asff uses 3 weights of shape (1,1,n,n) and multiplies with 3 features i.e.

with x0.shape = 1,24,52,52, x1=1,24,26,26 x2=1,24,13,13
upsample/downsample x for current level

bifpn_feature_level0 = w0(1,) * x0 + w1(1,) * x1 + w2(1,) * x3

ASFF_level0 = w0((1,1,52,52) * x0 + w1(1,1,52,52) * x1 + w2(1,1,52,52) * x3

ASFF uses softmax, BiFPN uses their own version of faster fusion which is faster than softmax for computing weights from features.

glenn-jocher · 2020-03-09T21:08:35Z

@djaym7 thanks for the easy explanation with the dimensions. Your comment explained ASFF better for me than reading the entire paper.

Have you had success implementing ASFF and/contrasting it with BiFPN? I'm trying to implement it in
https://github.com/ultralytics/yolov3, where we start with a baseline yolov3-spp mAP of 42.1@0.5:0.95.

glenn-jocher · 2020-03-09T21:28:36Z

@djaym7 wait I think your explanation may be incorrect. The 52x52 grid shape is a function of the input image shape. This specific 52x52 grid you mention only appears if the input image is 416x416, thus you can not have a 1x1x52x52 weight parameter in the model, it's impossible.

I think perhaps ASFF instead has a weight of w0(1,24,1,1) in your example, or w0(1,255,1,1) in a default 80-class coco trained yolov3, vs w0(1,) for BiFPN. Is this correct?

djaym7 · 2020-03-10T11:42:06Z

@glenn-jocher check this out shape of weight-
batch, (1 of 3 channels),mat-size,mat-size
Shape of level 1 - batch, n_channels, mat-size,mat_size

fused_level_1 = level_1 * weight_level_1[:, 0:1, :, :] +
level_2to1 * weight_level_1[:, 1:2, :, :] +
level_3to1 * weight_level_1[:, 2:3, :, :]

glenn-jocher · 2020-03-10T18:33:20Z

@djaym7 yes I think I understand now, you are correct in your original explanation. So do you create a new convolutional module to create these weights at each yolo layer during runtime like this?

djaym7 · 2020-03-15T20:28:49Z

@djaym7 yes I think I understand now, you are correct in your original explanation. So do you create a new convolutional module to create these weights at each yolo layer during runtime like this?

I don't what you have plotted there without knowing the full network but in yolo you take 3 branches ( output of 3 branches) and do the following:
For 3 outputs of darknet 53 : [(n,c,x,x), (n,c2,X2,x2),(n,c3,x3,x3)],

You pass them through a asff_builder() and this function returns the same shape (or different shape with varying channel if you want to change it, but for this example let it be same).

Now, to generate fused output at any level L (3 outputs of darknet or 3 feature maps) , you downscale the 'x' and/or upscale the resolution or other two levels. Then you can concatenate these three feature maps which are now of same resolution (n, c+c1+c2, x,x) and pass it through one conv layer with 3 number of channels with 'same' padding (use channel//2 as padding value). You now have fused features for level L. You then multiply these n,3,x,x with the 2 upscaled/downscaled and 1 of current level feature maps ( no issue for 'x' ) and add them. This is your final output for level 1. Repeat for 2 and 3

Note: after upscaling and downscaling step, the author passes them through one conv layer each to get their weights.

anhnktp · 2020-04-25T23:53:55Z

@glenn-jocher Hi, could you tell me what is your tool to view cfg file in graph ? Tks

glenn-jocher · 2020-04-25T23:56:31Z

@anhnktp you can use Netron, it works really well with *.cfg files!

https://github.com/lutzroeder/netron

WangTianYuan · 2020-08-18T00:32:59Z

Hello, I replace the FPN with ASFF in RetinaNet, but the performance doesn't improve, is there something that I've missed? Can someone help me out?

joe660 · 2020-12-14T13:24:55Z

@djaym7 thanks for the easy explanation with the dimensions. Your comment explained ASFF better for me than reading the entire paper.

Have you had success implementing ASFF and/contrasting it with BiFPN? I'm trying to implement it in
https://github.com/ultralytics/yolov3, where we start with a baseline yolov3-spp mAP of 42.1@0.5:0.95.

Can we use ASFF in the pytorch version of Yolo? thank you

glenn-jocher mentioned this issue Mar 9, 2020

Try to train fast (grouped-conv) versions of csdarknet53 and csdarknet19 WongKinYiu/CrossStagePartialNetworks#6

Open

glenn-jocher mentioned this issue Mar 17, 2020

Effects of ASFF ultralytics/yolov3#934

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace FPN with ASFF #51

Replace FPN with ASFF #51

hegc commented Dec 25, 2019 •

edited

Loading

GOATmessi8 commented Dec 25, 2019

hegc commented Dec 25, 2019

hegc commented Dec 25, 2019

abhigoku10 commented Jan 16, 2020

djaym7 commented Jan 23, 2020 •

edited

Loading

glenn-jocher commented Mar 9, 2020 •

edited

Loading

glenn-jocher commented Mar 9, 2020

djaym7 commented Mar 10, 2020 •

edited

Loading

glenn-jocher commented Mar 10, 2020

djaym7 commented Mar 15, 2020

anhnktp commented Apr 25, 2020

glenn-jocher commented Apr 25, 2020

WangTianYuan commented Aug 18, 2020

joe660 commented Dec 14, 2020

Replace FPN with ASFF #51

Replace FPN with ASFF #51

Comments

hegc commented Dec 25, 2019 • edited Loading

GOATmessi8 commented Dec 25, 2019

hegc commented Dec 25, 2019

hegc commented Dec 25, 2019

abhigoku10 commented Jan 16, 2020

djaym7 commented Jan 23, 2020 • edited Loading

glenn-jocher commented Mar 9, 2020 • edited Loading

glenn-jocher commented Mar 9, 2020

djaym7 commented Mar 10, 2020 • edited Loading

glenn-jocher commented Mar 10, 2020

djaym7 commented Mar 15, 2020

anhnktp commented Apr 25, 2020

glenn-jocher commented Apr 25, 2020

WangTianYuan commented Aug 18, 2020

joe660 commented Dec 14, 2020

hegc commented Dec 25, 2019 •

edited

Loading

djaym7 commented Jan 23, 2020 •

edited

Loading

glenn-jocher commented Mar 9, 2020 •

edited

Loading

djaym7 commented Mar 10, 2020 •

edited

Loading