# Transfer Learning for Object Detection

This notebook is adapted from a [PyTorch tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html) and demonstrates transfer learning with [Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) for an object detection task. It uses object detection models from [torchvision](https://pytorch.org/vision/stable/index.html) that were originally trained using [COCO](https://cocodataset.org/) and does transfer learning with the [PennFudan dataset](https://www.cis.upenn.edu/~jshi/ped_html/), consisting of 170 images with 345 labeled pedestrians.

The notebook performs the following steps:
1. [Import dependencies and setup parameters](#1.-Import-dependencies-and-setup-parameters)
2. [Prepare the dataset](#2.-Prepare-the-dataset)
3. [Predict using the original model](#3.-Predict-using-the-original-model)
4. [Transfer learning](#4.-Transfer-Learning)
5. [Visualize the model output](#5.-Visualize-the-model-output)
6. [Export the saved model](#6.-Export-the-saved-model)

## 1. Import dependencies and setup parameters

In [None]:
import os
from collections import Counter
import numpy as np
import torch
import torchvision
import intel_extension_for_pytorch as ipex
from PIL import Image
from pydoc import locate
import warnings

import torchvision.models.detection as detection
from torchvision.utils import make_grid, draw_bounding_boxes
from torchvision.transforms.functional import convert_image_dtype
import torchvision.transforms.functional as F
import matplotlib.pyplot as plt

from model_utils import torchvision_model_map, get_retrainable_model
from dataset_utils import PennFudanDataset, COCO_LABELS

warnings.filterwarnings("ignore")

print('Supported models:')
print('\n'.join(torchvision_model_map.keys()))

In [None]:
# Specify a model from the list above
model_name = "fasterrcnn_resnet50_fpn"

# Specify the location for the dataset to be downloaded
dataset_directory = os.environ["DATASET_DIR"] if "DATASET_DIR" in os.environ else \
    os.path.join(os.environ["HOME"], "dataset")
    
# Specify a directory for output
output_directory = os.environ["OUTPUT_DIR"] if "OUTPUT_DIR" in os.environ else \
    os.path.join(os.environ["HOME"], "output")

# Batch size
batch_size = 2

In [None]:
if model_name not in torchvision_model_map.keys():
    raise ValueError("The specified model_name ({}) is invalid. Please select from: {}".
                     format(model_name, torchvision_model_map.keys()))
    
# Get the info for the specified model from the map
model_map_values = torchvision_model_map[model_name]
predictor_handle = torchvision_model_map[model_name]["predictor_model"]
print("Pretrained Object Detection Model:", model_name)
print("Bounding Box Predictor/Classifier:", predictor_handle)

In [None]:
# Get reference scripts from the torchvision repo that are not in the package
if not os.path.exists("vision"):
    !git clone --depth 1 --branch v0.11.3 https://github.com/pytorch/vision.git

import sys
sys.path.append("vision/references/detection")

import utils
import transforms as T

# Define transform function for image inputs
def get_transform(train):
    transforms = []
    transforms.append(T.ToTensor())
    if train:
        transforms.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms)

## 2. Prepare the dataset

Download and extract the [PennFudan dataset](https://www.cis.upenn.edu/~jshi/ped_html/). If the dataset is not found in the dataset directory it is downloaded. Subsequent runs will reuse the already downloaded dataset.

In [None]:
num_classes = 2
dataset_path = os.path.join(dataset_directory, "PennFudanPed")
if not os.path.exists(dataset_path):
    !wget https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip .
    !unzip PennFudanPed.zip -d $dataset_directory
    !rm PennFudanPed.zip

For data performance tuning, see the PyTorch [DataLoader](https://pytorch.org/docs/stable/data.html#multi-process-data-loading) documentation. Setting num_workers optimally will depend on hardware and batch size, but 2, 4, or 8 workers will probably work well.

In [None]:
dataset = PennFudanDataset(dataset_path, get_transform(train=True))
dataset_test = PennFudanDataset(dataset_path, get_transform(train=False))
data_loader = torch.utils.data.DataLoader(dataset, 
                                          batch_size=batch_size, 
                                          shuffle=True, 
                                          num_workers=4, 
                                          collate_fn=utils.collate_fn)

## 3. Predict using the original model

Use the pretrained model that was trained using COCO to do predictions from the new dataset and view the results for a single batch.

In [None]:
# Load the detection model pre-trained on COCO
pretrained_model_class = locate('torchvision.models.detection.{}'.format(model_name))
predictor_class = locate('torchvision.models.detection.{}'.format(predictor_handle))
model = pretrained_model_class(pretrained=True)

# Get a batch of data
images, targets = next(iter(data_loader))
images = list(image for image in images)

model.eval()
predictions = model(images)

In [None]:
# Visualization functions
plt.rcParams["savefig.bbox"] = 'tight'

def show_image(img, objects_detected):
    if not isinstance(img, list):
        img = [img]
    fix, axs = plt.subplots(ncols=len(img), squeeze=False)
    for i, im in enumerate(img):
        im = im.detach()
        im = F.to_pil_image(im)
        axs[0, i].imshow(np.asarray(im))
        axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
        plt.title(objects_detected)
        
def show_image_results(images, predictions, label_map, score_threshold=0.8):
    for i in range(len(images)):
        if 'scores' in predictions[i]:
            indices_over_threshold = predictions[i]['scores'] > score_threshold
        else:
            # If there are no scores, show them all
            indices_over_threshold = [k for k in range(len(predictions[i]['labels']))]
        result = draw_bounding_boxes(convert_image_dtype(images[i], dtype=torch.uint8), 
                                     predictions[i]['boxes'][indices_over_threshold], 
                                     width=5)
        c = Counter(predictions[i]['labels'][indices_over_threshold].tolist())                         
        d = ["{}: {}".format(label_map[a], c[a]) for a in c.keys()]
        show_image(result, '\n'.join(d))

In [None]:
show_image_results(images, predictions, COCO_LABELS)

## 4. Transfer Learning

Replace the pretrained head of the network with a new layer based on the number of classes in our dataset. Train and evaluate the model using the new dataset for the specified number of epochs.

In [None]:
# Number of training epochs
training_epochs = 1

In [None]:
from importlib import reload
from engine import train_one_epoch, evaluate

def main(num_classes, dataset, dataset_test):
    # Train on the CPU
    device = torch.device('cpu')
    
    # Split the dataset into train and test subsets
    indices = torch.randperm(len(dataset)).tolist()
    dataset = torch.utils.data.Subset(dataset, indices[:-50])
    dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])

    # Define training and validation data loaders
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=batch_size, shuffle=True, num_workers=4,
        collate_fn=utils.collate_fn)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=1, shuffle=False, num_workers=4,
        collate_fn=utils.collate_fn)

    # Get the model using helper function
    model = get_retrainable_model(model_name, num_classes, 
                              pretrained_model_class, 
                              predictor_class)

    # Move model to the right device
    model.to(device)

    # Construct optimizer
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
    # Construct learning rate scheduler
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                   step_size=3,
                                                   gamma=0.1)
    # Apply the IPEX optimize function
    model, optimizer = ipex.optimize(model, optimizer=optimizer)
    for epoch in range(training_epochs):
        # Train for one epoch, printing every 10 iterations
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
        # Update the learning rate
        lr_scheduler.step()
        # Evaluate on the test dataset
        evaluate(model, data_loader_test, device=device)

    return model

In [None]:
# Train the model
model = main(num_classes, dataset, dataset_test)

## 5. Visualize the model output

After the training completes, test the model's predictions on the original batch.

In [None]:
# Show object detections from fine-tuned model
model.eval()
predictions = model(images)

In [None]:
# Show the predicted results for the fine-tuned model
show_image_results(images, predictions, COCO_LABELS)

## 6. Export the saved model

In [None]:
if not os.path.exists(output_directory):
    !mkdir -p $output_directory
file_path = "{}/object_detection.pt".format(output_directory)
torch.save(model.state_dict(), file_path)
print("Saved to {}".format(file_path))

## Dataset citation
```
@InProceedings{10.1007/978-3-540-76386-4_17,
    author="Wang, Liming
    and Shi, Jianbo
    and Song, Gang
    and Shen, I-fan",
    editor="Yagi, Yasushi
    and Kang, Sing Bing
    and Kweon, In So
    and Zha, Hongbin",
    title="Object Detection Combining Recognition and Segmentation",
    booktitle="Computer Vision -- ACCV 2007",
    year="2007",
    publisher="Springer Berlin Heidelberg",
    address="Berlin, Heidelberg",
    pages="189--199",
    abstract="We develop an object detection method combining top-down recognition with bottom-up image segmentation. There are two main steps in this method: a hypothesis generation step and a verification step. In the top-down hypothesis generation step, we design an improved Shape Context feature, which is more robust to object deformation and background clutter. The improved Shape Context is used to generate a set of hypotheses of object locations and figure-ground masks, which have high recall and low precision rate. In the verification step, we first compute a set of feasible segmentations that are consistent with top-down object hypotheses, then we propose a False Positive Pruning(FPP) procedure to prune out false positives. We exploit the fact that false positive regions typically do not align with any feasible image segmentation. Experiments show that this simple framework is capable of achieving both high recall and high precision with only a few positive training examples and that this method can be generalized to many object classes.",
    isbn="978-3-540-76386-4"
}
```