# [Semantic Segmentation with Deep Learning](https://www.cc.gatech.edu/~hays/compvision/proj6/)

For this project we are going to focus on semantic segmentation for 11 semantic categories with a state-of-the-art approach: deep learning.

Basic learning objectives of this project:

1. Understanding the ResNet architecture.
2. Understand the concepts behind data augmentation and learning rate schedules for semantic segmentation
3. Understand the role of dilated convolution and context in increasing the receptive field of a network.
4. Experiment with different aspects of the training process and observe the performance.

The starter code is mostly initialized to 'placeholder' just so that the starter code does not crash when run unmodified and you can get a preview of how results are presented.

Your trained model should be able to produce an output like the one shown on the right below:

Camvid Image | Model Prediction
:-: | :--:
<img src="https://user-images.githubusercontent.com/16724970/114431741-d6b7dd00-9b8d-11eb-8822-e7fa7e915e37.jpg" width="300"> | <img src="https://user-images.githubusercontent.com/16724970/114431739-d61f4680-9b8d-11eb-9266-e56aeb08476f.jpg" width="300">


## PSPNet and ResNet-50

We'll be implementing PSPNet for this project, which uses a ResNet-50 backbone. ResNet-50 has 50 convolutional layers, which is significantly deeper than your SimpleNet of Project 5. We give you the implementation in `proj6_code/resnet.py`. 

The ResNet-50 is composed of 4 different sections (each called a "layer"), named `layer1`, `layer2`, `layer3`, `layer4`. Each layer is composed of a repeated number of blocks, and each such block is named a `BottleNeck`. Specifically, `layer1` has 3 Bottlenecks, `layer2` has 4 Bottlenecks, `layer3` has 6 Bottlenecks, and `layer4` has 3 Bottlenecks. In all, ResNet-50 has 16 Bottlenecks, which accounts for 48 of the conv layers.

### Visualizing a ResNet Bottleneck Module

The BottleNeck has a residual connection, from which ResNet gets its name:

<img width="300" src="https://user-images.githubusercontent.com/16724970/114430171-2ac1c200-9b8c-11eb-8341-fc943ff0945f.png">

See Figure 5 of the [ResNet paper](https://arxiv.org/pdf/1512.03385.pdf)

### Implementing a Bottleneck

The Bottleneck is implemented exactly as the figure above shows, with 1x1 Conv -> BN -> ReLU -> 3x3 Conv -> BN -> ReLU -> 1x1 Conv -> BN -> Optional Downsample -> Add Back Input -> ReLU. The channel dimension of the feature map will be expanded by 4x, as we can see by the conv layer `in_features` and `out_features` parameters. And notice that the stride is set at the `conv2` module, which will be very important later.

```python
class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride
```

and the forward method of the `Bottleneck` shows the residual connection. Notice that when we add back the input (the identity operation), we may need to downsample it for the shapes to match during the add operation (if the main branch downsampled the input):
```python
    def forward(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out
```

## Visualizing the Architecture
Plotting the whole network architecture would require a massive figure, but we can show how data flows through just one Bottleneck, starting with 64 channels, and ending up with 256 output channels:
<p float="left">
  <img src="https://user-images.githubusercontent.com/16724970/114427960-9eae9b00-9b89-11eb-9a3b-96817f205f32.png" width="400" />
</p>


## Part 1: Pyramid Pooling Module
In Part 1, you will implement the Pyramid Pooling Module (PPM). After feeding an image through the ResNet backbone and obtaining a feature map, PSPNet aggregates context over different portions of the image with the PPM.

The PPM splits the $H \times W$ feature map into KxK grids. Here, 1x1, 2x2, 3x3,and 6x6 grids are formed, and features are average-pooled within each grid cell. Afterwards, the 1x1, 2x2, 3x3, and 6x6 grids are upsampled back to the original $H \times W$ feature map resolution, and are stacked together along the channel dimension. These grids are visualized below (center):

<img src="https://user-images.githubusercontent.com/16724970/114436422-4b414a80-9b93-11eb-8f02-8e7506b5f9a1.jpg" width="900">

Implement this in `proj6_code/part1_ppm.py`.

In [None]:
%load_ext autoreload
%autoreload 2

from proj6_unit_tests.test_part1_ppm import test_PPM_6x6, test_PPM_fullres
from proj6_code.utils import verify

print("test_PPM_6x6(): ", verify(test_PPM_6x6))
print("test_PPM_fullres(): ", verify(test_PPM_fullres))

## Part 2: Dataset and Dataloader
Next, in `proj6_code/part2_dataset.py` you will implement the `make_dataset()` functions to create a list of paths to (image, ground truth) pairs. You will also implement the `__getitem__()` function that will load an RGB image and grayscale label map, and then apply a transform to them.

In [None]:
from proj6_unit_tests.test_part2_dataset import test_SemData_len, test_getitem_no_data_aug, test_make_dataset

print("test_SemData_len(): ", verify(test_SemData_len))
print("test_getitem_no_data_aug(): ", verify(test_getitem_no_data_aug))
print("test_make_dataset(): ", verify(test_make_dataset))

## Part 3: Online Data Preprocessing and Data Augmentation
Data preprocessing and augmentation is very important to good performance, and we'll implement this in `proj6_code/part3_training_utils.py`. We'll feed in square image crops to the network, but we must be careful to crop the same portion of the RGB image and ground truth semantic label map. Implement `get_train_transform(args)` and `get_val_transform(args)`, and check against the unit tests below:


In [None]:
from proj6_unit_tests.test_part3_training_utils import test_get_train_transform, test_get_val_transform

print("test_get_train_transform(): ", verify(test_get_train_transform))
print("test_get_val_transform(): ", verify(test_get_val_transform))

## Part 4: A Simple Segmentation Baseline
We'll start with a very simple baseline -- a pretrained ResNet-50, without the final averagepool/fc layer, and a single 1x1 conv as a final classifier, converting the (2048,7,7) feature map to scores over 11 classes, a (11,7,7) tensor. Note that our output is just 7x7, which is very low resolution. Implement upsampling to the original height and width, and compute the loss and predicted class per pixel in `proj6_code/part4_segmentation_net.py`.

If the "SimpleSegmentationNet" architecture is specified in the experiment arguments (`args`), return this model in `get_model_and_optimizer()` in `part3_training_utils.py`.

In [None]:
from proj6_unit_tests.test_part4_segmentation_net import (
    test_check_output_shapes,
    test_check_output_shapes_testtime,
    test_get_model_and_optimizer_simplearch
)

print("test_check_output_shapes(): ", verify(test_check_output_shapes))
print("test_check_output_shapes_testtime(): ", verify(test_check_output_shapes_testtime))
print("test_get_model_and_optimizer_simplearch(): ", verify(test_get_model_and_optimizer_simplearch))

## Part 5: Net Surgery for Increased Output Resolution and Receptive Field
The basic ResNet-50 has two major problems:
1. It does not have a large enough receptive field
2. If run fully-convolutionally, it produces a low-resolution output (just $7 \times 7$)!

To fix the first problem, will need to replace some of its convolutional layers with dilated convolution. To fix the second problem, we'll reduce the stride of the network from 2 to 1, so that we don't downsample so much. Instead of going down to 7x7, we'll reduce to 28x28 for 224x224 input, or 26x26 for 201x201, like we do in this project. In other words, the downsampling rate will go from (1/32) to just (1/8).

These animations depict how the dilated convolution (i.e. with dilation > 1) operation compares to convolution with no dilation (i.e. with dilation=1).

Conv w/ Stride=1, Dilation=1 | Conv w/ Stride=2, Dilation=1 | Conv w/ Stride=1, Dilation=2
:-: | :-: | :-:
<img src="https://raw.githubusercontent.com/vdumoulin/conv_arithmetic/master/gif/no_padding_no_strides.gif" width="300" align="center"> | <img src="https://raw.githubusercontent.com/vdumoulin/conv_arithmetic/master/gif/no_padding_strides.gif" width="300" align="center"> | <img src="https://github.com/vdumoulin/conv_arithmetic/raw/master/gif/dilation.gif" width="300" align="center"> 


In Layer3, in every `Bottleneck`, we will change the 3x3 `conv2`, we will replace the conv layer that had stride=2, dilation=1, and padding=1 with a new conv layer, that instead  has stride=1, dilation=2, and padding=2. In the `downsample` block, we'll also need to hardcode the stride to 1, instead of 2.

In Layer4, for every `Bottleneck`, we will make the same changes, except we'll change the dilation to 4 and padding to 4.

Make these edits in `proj6_code/part5_pspnet.py`.

In [None]:
from proj6_unit_tests.test_part5_pspnet import (
    test_pspnet_output_shapes,
    test_check_output_shapes_testtime_pspnet,
    test_get_model_and_optimizer_pspnet
)

print("test_pspnet_output_shapes():", verify(test_pspnet_output_shapes))
print("test_check_output_shapes_testtime_pspnet(): ", verify(test_check_output_shapes_testtime_pspnet))

print("test_get_model_and_optimizer_pspnet(): ", verify(test_get_model_and_optimizer_pspnet))