In [3]:
import torch
import torchvision

In [14]:
%reload_ext autoreload
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Loading Pretrained YoLoV5   
- Using a YoLoV5 model from [PyTorch Hub](https://pytorch.org/hub/ultralytics_yolov5/) pretrained on COCO, we will perform transfer learning on our own dataset. Reasons for choosing this model will be detailed later.

### Finetuning
- Documentation from [Ultralytics](https://docs.ultralytics.com/yolov5/tutorials/transfer_learning_with_frozen_layers/#freeze-backbone) show that the network's backbone is from ```model.0``` to ```model.9```   
- Practically, in real life since we can never have as much compute power and data as world class research institutes, it is generally a good idea to start with a pretrained model of theirs, and finetune on our own dataset.  
    - As a general rule, we will freeze all the backbone layers, and will train the heads only. This is because typically, the early layers or the backbone of a CNN network often look at low-level details such as lines, edges, blobs of color, etc... which is generally similar across all computer vision task. We can leverage this learned knowledge to our advantage, and train only the later (head) layers of the model, which is in charge of more high level details specific to our task at hand.

In [48]:
import math

In [45]:
retina = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=True)

In [49]:
retina = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=True)
num_classes = 7

# replace classification layer 
in_features = retina.head.classification_head.conv[0].in_channels
num_anchors = retina.head.classification_head.num_anchors
retina.head.classification_head.num_classes = num_classes

cls_logits = torch.nn.Conv2d(in_features, num_anchors * num_classes, kernel_size = 3, stride=1, padding=1)
torch.nn.init.normal_(cls_logits.weight, std=0.01)  # as per pytorch code
torch.nn.init.constant_(cls_logits.bias, -math.log((1 - 0.01) / 0.01))  # as per pytorcch code 
# assign cls head to model
retina.head.classification_head.cls_logits = cls_logits

In [50]:
retina

RetinaNet(
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=0.0)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=0.0)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=0.0)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=0.0)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(256, eps=0.0)
