**DEEP LEARNING IN FIVE DAYS**

*Patrick Donnelly*

Day one: Defining a use case for deep learning

Chapter two: Other computer vision applications and object detection

So far we've looked at image classification, which accounts for maybe 90% of benchmark tests for deep learning (I'm exaggerating) and maybe 0.9% of practical applications. Even within computer vision, it's hard to do interesting things with a classification network alone.

To stand on the shoulders of the internet, let's start with this nice blog post that enumerates common applications of DL for CV: https://machinelearningmastery.com/applications-of-deep-learning-for-computer-vision/

Brownlee (the author) identifies nine applications, plus "other problems" in DL for CV:

1. Image classification (we've already done this)
2. Image classification with localization
3. Object detection
4. Object segmentation
5. Image style transfer
6. Image colorization
7. Image reconstruction
8. Image super-resolution
9. Image synthesis

We can also think of ways in which these problems can be extended to video. Think of video as multiple frames of images, or as an additional dimension.

We'll group *image classification with localization* and *object detection* together. The former is just object detection with a single image. Most classification applications involve some form of detection first, since we often have to extract an object of interest from a photo (or video) before classifying it.

We're gonna spend the rest of this chapter diving into object detection using the Faster R-CNN network (https://arxiv.org/abs/1506.01497). We'll explain more about what exactly this network does, but let's dive in to the code first!

PyTorch has a package called `torchvision` that implements Faster R-CNN and other neural networks for computer vision. Let's import it: 

In [1]:
import torchvision

The `faster_rcnn.py` code resides in https://github.com/pytorch/vision/blob/master/torchvision/models/detection/faster_rcnn.py. Let's take care of our imports. We'll modify the relative imports since we're directly executing the code here.

In [2]:
from collections import OrderedDict

import torch
from torch import nn
import torch.nn.functional as F
import torchvision.models as models
import torchvision.models.detection as detection

from torchvision.ops import misc as misc_nn_ops
from torchvision.ops import MultiScaleRoIAlign

from torchvision.models.utils import load_state_dict_from_url

from torchvision.models.detection.generalized_rcnn import GeneralizedRCNN
from torchvision.models.detection.rpn import AnchorGenerator, RPNHead, RegionProposalNetwork
from torchvision.models.detection.roi_heads import RoIHeads
from torchvision.models.detection.transform import GeneralizedRCNNTransform
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone