# Feature Pyramid Network (FPN) in PyTorch

This notebook is the tutorial of [Feature Pyramid Network](https://arxiv.org/abs/1612.03144) (applied on ResNet) for [PyTorch Taipei](https://pytorchtaipei.github.io/).
* The code is modified from [kuangliu's repo](https://github.com/kuangliu/pytorch-fpn/blob/master/fpn.py).
* Only model construction  in PyTorch (no training/testing in this notebook!).
* Other FPN repo: [yangxue0827(tf)](https://github.com/yangxue0827/FPN_Tensorflow), [unsky(caffe)](https://github.com/unsky/FPN), [xmyqsh](https://github.com/xmyqsh/FPN)

## Enabling PyTorch in Colab
more info of Colab: https://mattwang44.github.io/en/articles/colab/

In [0]:
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'
!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.3.0.post4-{platform}-linux_x86_64.whl torchvision

In [0]:
# torch packages
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

## Residual Blocks in ResNet

Ref:
1. Paper: [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf)
2. [Understand Deep Residual Networks](https://blog.waya.ai/deep-residual-learning-9610bb62c355)

![bottleneck](https://cdn-images-1.medium.com/max/1600/1*HYrB7apC0lbXDTzSDs2OFg.png)

In [0]:
# use the "bottleneck" residual block (right graph above)
class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_channel, channel, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channel, channel, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(channel)
        self.conv2 = nn.Conv2d(channel, channel, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(channel)
        self.conv3 = nn.Conv2d(channel, self.expansion*channel, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*channel)
        self.shortcut = nn.Sequential()
        
        # assuring two images tend to be added up (element-wise) has same size
        if stride != 1 or in_channel != self.expansion*channel:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channel, self.expansion*channel, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*channel)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

## ResNet-FPN model

Ref:
[Understanding Feature Pyramid Networks ](https://medium.com/@jonathan_hui/understanding-feature-pyramid-networks-for-object-detection-fpn-45b227b9106c) (This is a fabulous blog post!!)

![ResNet-FPN model](https://cdn-images-1.medium.com/max/1600/1*cHR4YRqdPBOx4IDqzU-GwQ.png =750x450)

In [0]:
class FPN_Res(nn.Module):
    def __init__(self, block, num_blocks):
        """
        blocks: bottleneck object
        num_blocks: a list of number, each refers to the number of bottleneck unit in a "stage".
        """
        super(FPN_Res, self).__init__()
        self.in_channel = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)

        # Bottom-up layers
        self.layer1 = self._make_stage(block,  64, num_blocks[0], stride=1)  #C2
        self.layer2 = self._make_stage(block, 128, num_blocks[1], stride=2)  #C3
        self.layer3 = self._make_stage(block, 256, num_blocks[2], stride=2)  #C4
        self.layer4 = self._make_stage(block, 512, num_blocks[3], stride=2)  #C5

        # Top layer (M5, for reducing channels)
        self.toplayer = nn.Conv2d(2048, 256, kernel_size=1, stride=1, padding=0)  

        # Lateral layers
        self.latlayer1 = nn.Conv2d(1024, 256, kernel_size=1, stride=1, padding=0)
        self.latlayer2 = nn.Conv2d( 512, 256, kernel_size=1, stride=1, padding=0)
        self.latlayer3 = nn.Conv2d( 256, 256, kernel_size=1, stride=1, padding=0)

        # Smooth layers (each produce a "head")
        self.smooth1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
        self.smooth2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
        self.smooth3 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
        self.smooth4 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
        
    def _make_stage(self, block, channel, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)                                   #use assigned stride for the 1st res-block, stride=1 for the rest 
        layers = []
        for stride in strides:                                                    #create list of res-blocks
            layers.append(block(self.in_channel, channel, stride))
            self.in_channel = channel * block.expansion
        return nn.Sequential(*layers)                                             #transfer a list of res-blocks to a stage 
      
    def _upsample_add(self, x, y):
        # Please see https://pytorch.org/docs/stable/nn.html#torch.nn.functional.upsample
        '''
        Upsample and add two feature maps.
        Args:
          x: (Variable) top feature map to be upsampled.
          y: (Variable) lateral feature map.
        Returns:
          (Variable) added feature map.
        Note in PyTorch, when input size is odd, the upsampled feature map
        with `F.upsample(..., scale_factor=2, mode='nearest')`
        maybe not equal to the lateral feature map size.
        e.g.
        original input size: [N,_,15,15] ->
        conv2d feature map size: [N,_,8,8] ->
        upsampled feature map size: [N,_,16,16]
        So we choose bilinear upsample which supports arbitrary output sizes.
        '''
        _,_,H,W = y.size()
        return F.upsample(x, size=(H,W), mode='bilinear') + y

    def forward(self, x):
        # Bottom-up
        C1 = F.relu(self.bn1(self.conv1(x)))
        C1 = F.max_pool2d(C1, kernel_size=3, stride=2, padding=1)
        C2 = self.layer1(C1)
        C3 = self.layer2(C2)
        C4 = self.layer3(C3)
        C5 = self.layer4(C4)
        # Top-down
        M5 = self.toplayer(C5)
        M4 = self._upsample_add(M5, self.latlayer1(C4))
        M3 = self._upsample_add(M4, self.latlayer2(C3))
        M2 = self._upsample_add(M3, self.latlayer3(C2))
        # Smooth
        P5 = self.smooth1(M5)
        P4 = self.smooth2(M4)
        P3 = self.smooth3(M3)
        P2 = self.smooth4(M2)
        return P2, P3, P4, P5

Different ResNet with different number of layers:
![res](https://images2017.cnblogs.com/blog/606386/201710/606386-20171016223757443-785220142.png =720x300)

In [0]:
def FPN_Res18():
    return FPN_Res(Bottleneck, [2,2,2,2]) # use bottleneck instead of naive res-block
def FPN_Res50():
    return FPN_Res(Bottleneck, [3,4,6,3])
def FPN_Res101():
    return FPN_Res(Bottleneck, [3,4,23,3])
  

## Check (print out the size of heads)

In [0]:
def test(net):
    #fms = net(Variable(torch.randn(1,3,600,900)))
    fms = net(Variable(torch.randn(1,3,64,64)))
    for fm in fms:
        print(fm.size())

In [7]:
test(FPN_Res18())

torch.Size([1, 256, 16, 16])
torch.Size([1, 256, 8, 8])
torch.Size([1, 256, 4, 4])
torch.Size([1, 256, 2, 2])


In [9]:
test(FPN_Res50())

torch.Size([1, 256, 16, 16])
torch.Size([1, 256, 8, 8])
torch.Size([1, 256, 4, 4])
torch.Size([1, 256, 2, 2])


In [8]:
test(FPN_Res101())

torch.Size([1, 256, 16, 16])
torch.Size([1, 256, 8, 8])
torch.Size([1, 256, 4, 4])
torch.Size([1, 256, 2, 2])
