# Fine Tuning a PyTorch Model on Our Fruits Dataset
This notebook will demonstrate how to use PyTorch to fine tune a pre-existing object detection model for our fruits data set.

We will start by loading a pre-trained Faster RCNN model using the PyTorch library and modify it so that it can classify the various fruits in our data set.

Afterwards, we will write our training, evaluation, and testing loops used for training our model.
Our custom Dataset class will be used in combination with a DataLoader for batching our data for training.
During training, we will set it up so that if training is interrupted it can be resumed from where it stopped.
We will see how to save our "best" model found during our training loop.
Once training is completed, we will use our test data set to evaluate the precision, recall, and accuracy of our model.

Finally, we will discuss next steps, including making our model work with the PyTorch lightning framework.

## Load the Faster-RCNN Pre-Trained Model
We'll load our model from the TorchVision module. We will specify a cache directory that tells PyTorch where to download our model to.

In [1]:
# Import the required libraries
import os
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
import torch

In [2]:
# Specify a cache_dir we can use to tell PyTorch where to place the downloaded models
cache_dir = './data/models/'
# Set the "TORCH_HOME" environment variable. Torch will download to this directory
os.environ['TORCH_HOME'] = cache_dir

In [3]:
# Get the most up-to-date weights for our FasterRCNN pretrained model
weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
# Load our pretrained model using torchvision
model = fasterrcnn_resnet50_fpn_v2(weights=weights)

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_v2_coco-dd69338a.pth" to ./data/models/hub\checkpoints\fasterrcnn_resnet50_fpn_v2_coco-dd69338a.pth


  0%|          | 0.00/167M [00:00<?, ?B/s]

Once we run the above cell, we see our model is downloaded to our specified `cache_dir`. Each subsequent time we load our model, it will be loaded from our cached directory, so there will be no need to download it.

Now let's take a peek at what we'll be using from our pretrained model

In [4]:
model.roi_heads

RoIHeads(
  (box_roi_pool): MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'], output_size=(7, 7), sampling_ratio=2)
  (box_head): FastRCNNConvFCHead(
    (0): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (2): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (3): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), p

Without going into explicit detail regarding the Faster-RCNN model, we will just inspect the output layer (roi_heads) that we are interested in. [The Faster-RCNN paper](https://arxiv.org/abs/1506.01497) can be consulted to know the exact inner workings of the model.

We see above that our model's region of interest (roi) heads contain 3 different heads.

The first head is a region of interest (roi) pooling layer. This layer is responsible for sampling various region proposals generated by the region proposal network (RPN) and selecting probable candidates that contain an object.

The second head is a box head of type FastRCNNConvFCHead, a built in torchvision type. This head consists of several 2D convolutional activation layers and these layers are comprised of a 2D convolution layer, a batch norm layer, and a rectified linear unit (ReLU) layer. The final layers in our box_head layer are layers to "flatten" (shrink the dimensions) of the output, followed by a Linear layer with a ReLU unit applied to it. The Linear layer here is simply a fully connected layer.

Our box_predictor head is a FastRCNNPredictor TorchVision type variable that is comprised of a `cls_score` layer and `bbox_pred` layer. The classification score (`cls_score`) layer is responsible for predicting a label for a predicted bounding box and our bounding box prediction (`bbox_pred`) layer is responsible for predicting the bounding box. Each of these layers are Linear layers.

If we look closely at the final Linear layer of the `box_head`, we see that it generates output features of size 1024. If we look at the input features of both Linear layers in the `box_predictor` head, we see that it accepts 1024 features as input. The output dimension of our `box_predictor` head for the classification score and bounding box predictions are 91 and 364 respectively. There are 90 (+1 for the background class remember?) different objects in the COCO dataset that our TorchVision FasterRCNN model was trained on. There are 4 points per bounding box, hence our output of size 364 for the bounding box prediction head.

## Modifying Our Model To Work With Fruits Data
The output size of 91 and 364 is all fine and well, but it cannot account for our fruits data. And since we are only classifying various fruits, we really don't care about the 90 non-background classes.

Thus, we need to modify our model to account for fruits and only fruits data. In particular, we'll be adding our own FastRCNNPredictor as our `box_predictor` head and setting it up to account for our 3 different types of fruit.

In [5]:
# We have to account for the background class, hence 4
num_classes = 4
# Grab the input features from the classification score layer
input_features = model.roi_heads.box_predictor.cls_score.in_features
# Create a new FastRCNNPredictor box predictor to work with our classes
model.roi_heads.box_predictor = FastRCNNPredictor(input_features, num_classes)

In [6]:
model.roi_heads

RoIHeads(
  (box_roi_pool): MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'], output_size=(7, 7), sampling_ratio=2)
  (box_head): FastRCNNConvFCHead(
    (0): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (2): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (3): Conv2dNormActivation(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), p

When we inspect our updated model, we see that we did not change the pooling or bounding box head at all. That's because we want to use the pretrained weights for those tasks. However, we did add in our new box predictor head that takes in the 1024 input features and outputs only 4 values for the `cls_score` layer and 16 values for the `bbox_pred` layer. Our `cls_score` layer has 4 outputs because every class in our dataset is assigned a score. The class with the highest score for the predicted box will be chosen.

### Setting Up Training
The first thing we need to do is inspect the list of model parameters. These are the weights that we'll be training during our fine-tuning step.

We will set up an optimizer and learning rate scheduler for updating our weights based on the losses computed.

### Inspecting model parameters

In [7]:
params = [p for p in model.parameters()]
len(params)

209

In [8]:
params_update = [p for p in model.parameters() if p.requires_grad]
len(params_update)

176

Above we can see that there are 209 total parameters in our model that we can train on. Of these, only 176 require gradient updates during training. In our situation, we need to update the weights of all trained parameters as we are changing the output of the model to our fruit classes vs. the 90 total classes of the COCO datasets. In some instances, you will want to keep the weights for the entire layer and only train on added layers. For these cases you can freeze all original parameters of the model and only set the layers that you added to require gradient updates.

In [9]:
params = [p for p in model.parameters() if p.requires_grad]

In [10]:
# Define our AdamW optimizer with a higher learning rate to start
lr = 1e-3
optimizer = torch.optim.AdamW(params, lr=lr)

In [11]:
# Define our learning rate scheduler that will decrease the lr as training runs
step_size = 3  # Number of epochs to run before decaying the learning rate
gamma = 0.1   # The multiplicative value to reduce the learning rate by. We'll use 10%
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma)

Now that we have our parameters, optimizer, and learning rate scheduler set up, we are almost ready to write our training loop. Before we do that, let's run one sample through our model and determine what the outputs are. We will use our Fruits Dataset custom class for this.

In [12]:
from fruits_dataset import FruitsDataset, fruits_collate_fn
from torch.utils.data import DataLoader

In [13]:
# Let's create a dataset and dataloader from our training directory images/annotations
training_img_dir = './data/fruits/train/'
training_annotations_file = './data/fruits/train/annotations.csv'

In [14]:
# Create our dataset and dataloader
train_dataset = FruitsDataset(training_annotations_file, training_img_dir)
train_dataloader = DataLoader(train_dataset, batch_size=1, shuffle=True, collate_fn=fruits_collate_fn)

We only want to grab one image/target from our dataset for now that we can run through our model. Let's do that now

In [15]:
image, target = next(iter(train_dataloader))

In [20]:
model(image, target)

{'loss_classifier': tensor(1.6941, grad_fn=<NllLossBackward0>),
 'loss_box_reg': tensor(0.5060, grad_fn=<DivBackward0>),
 'loss_objectness': tensor(0.0098, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>),
 'loss_rpn_box_reg': tensor(0.0054, grad_fn=<DivBackward0>)}

Above we see that our loss function is Negative Log Likelihood loss (NllLossBackward), which is a standard loss function when training neural nets. What we want to do is use this loss function with our optimizer and learning rate scheduler to update the weights of our network so that it learns to predict each of our fruits.

To do this, we need to define a training loop for our model that will run the entire dataset through the model once and then update the weight gradients. When discussing training, when the entire set is ran through the model, this is referred to as one epoch. Because we are working with large data, it is likely impossible to run the data through and update the gradients in one pass. As such, we usually batch our data into smaller pieces using the dataloader. When discussing training, a single batch ran through the model is referred to as one training step.

If we have access to a GPU, model training will be more efficient. We'll begin by determining if we have GPU access and if so we will use it for training.

In [19]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device

device(type='cuda')

The following code was modified from the torchvision code available [here](https://github.com/pytorch/vision/blob/e59cf64bb6eb4c3a50e0a76d8019fa4c4d5f2a15/references/detection/engine.py)

To avoid cloning the entire repo, we just grab the bits and pieces we need and use them.

Let's define a function that will create a model for us and then proceed with defining our training and evaluation loops. Our model definition function will mimic what we did in the cells above.

In [None]:
def train_one_epoch(_model, _optimizer, _data_loader, _device, _epoch):
    # Set our model to training mode so weights will be updated
    _model.train()
    _lr_scheduler = None
    if _epoch == 0:
        # On the first epoch create a linear lr scheduler that will jumpstart the convergence hopefully
        warmup_factor = 1.0 / 1000
        warmup_iters = min(1000, len(_data_loader) - 1)
        _lr_scheduler = torch.optim.lr_scheduler.LinearLR(
             _optimizer, start_factor=warmup_factor, total_iters=warmup_iters)

    for images, targets in _data_loader:
        pass


After we've completed a training epoch, we'll likely want to evaluate our model's performance. We will write an evaluation loop to accomplish this. Our evaluation set is a small set of samples withheld from the training data so that we can ensure the model is actually learning and not overfitting to the training data.