# Model Selection

Of the existing pretrained models, for Image Segmentation, from PyTorch Hub. The performance stats are:

| Model       | Mean IOU    | Global Pixelwise Accuracy |
| ----------- | ----------- | ------------------------- |
| DeepLabv3   | 67.4        | 92.4                      |
| FCN         | 63.7        | 91.9                      |

Hence, on a rough approximation, DeepLabv3 would be the better suited model for this application.<br />
Both these models are available pretrained with a ResNet-101 backbone.

In [1]:
import torch
from torch import nn
import numpy as np
from PIL import Image
import torchvision
from torch.backends import cudnn

In [2]:
#CUDA for PyTorch
use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")
cudnn.benchmark = True
print(torch.cuda.get_device_name())

GeForce GTX 1650


Now, getting the model class with custom output classes i.e. in this case 1 output channel, since only one mask for portrait per image.<br />
`aux_loss` used to get pixelwise loss map w.r.t the ground truth values.

In [3]:
model1 = torchvision.models.segmentation.deeplabv3_resnet101(num_classes=1, aux_loss=True)

### Model Architecture

In [4]:
model1

DeepLabV3(
  (backbone): IntermediateLayerGetter(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Se

### Pretrained Weights
Downloading the pretrained model weights from torchvision models.

In [5]:
model_dict = torchvision.models.segmentation.deeplabv3_resnet101(pretrained=True).state_dict()

Model weights type

In [6]:
model_dict['aux_classifier.4.weight'].dtype

torch.float32

There will be a weight mismatch between the pretrained weights and the custom DeepLabv3 instance in `model1`, since the final number of layers is different i.e. 21 and 1 respectively. Hence, the weights and biases in the final conv layer i.e. `classifier.4` and `aux_classifier.4` need to be reshaped. 

In [7]:
model_dict['aux_classifier.4.weight'] = torch.rand([1, 256, 1, 1], dtype=torch.float32,device=device,requires_grad=True)
model_dict['aux_classifier.4.bias'] = torch.rand([1], dtype=torch.float32,device=device,requires_grad=True)

In [8]:
model_dict['classifier.4.weight'] = torch.rand([1, 256, 1, 1], dtype=torch.float32,device=device,requires_grad=True)
model_dict['classifier.4.bias'] = torch.rand([1], dtype=torch.float32,device=device,requires_grad=True)

In [9]:
model1.load_state_dict(model_dict)

<All keys matched successfully>

### Finalized Model

In [10]:
model1

DeepLabV3(
  (backbone): IntermediateLayerGetter(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Se

The code in the above notebook has been executed in `src/training/models.py`.