# Bird Object Detection

Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes an image as input and provides a bounding box on the image where an object of interest is found. It also identifies what type of object the box encapsulates. To create such a solution, we need to acquire and process a traning dataset, create and setup a training job for the alorithm so that it can learn about the dataset. Finally, we can then host the trained model in an endpoint, to which we can supply images.

This notebook is an end-to-end example showing how the Amazon SageMaker Object Detection algorithm can be used with a publicly available dataset of bird images. We demonstrate how to train and to host an object detection model based on the Caltech Birds (CUB 200 2011) dataset. Amazon SageMaker’s object detection algorithm uses the Single Shot multibox Detector (SSD) algorithm, and this notebook uses a ResNet base network with that algorithm.

## Data Preparation

Before preparing the data, there are some initial steps required for setup.

### Initial SageMaker Setup

In [None]:
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
# this will create a 'default' sagemaker bucket if it doesn't exist (sagemaker-region-accountid)
bucket = sagemaker_session.default_bucket()
print(bucket)
prefix = "DEMO-ObjectDetection-birds"

# Get the ARN of the IAM role used by this Studio instance to pass to training jobs and other Amazon SageMaker tasks.
role = get_execution_role()
print(role)

### Importing additional packages

This notebook requires two additional Python packages: * OpenCV is required for gathering image sizes and flipping of images horizontally. * The MXNet runtime is required for using the im2rec tool - you can also use the MXNet kernel provided by SageMaker Studio which includes these packages and libraries by default. If using another kernel you can try running:

```python
import sys

!{sys.executable} -m pip install opencv-python
!{sys.executable} -m pip install mxnet
```

In [None]:
# Verify the required packages are importable
import cv2
import mxnet

### Downloading and Unpacking the DataSet

In [None]:
%%time

import os
import urllib.request

def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)     
# download('http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz')
# CalTech's download is (at least temporarily) unavailable since August 2020.

# Can now use one made available by fast.ai .
download("https://s3.amazonaws.com/fast-ai-imageclas/CUB_200_2011.tgz")

In [None]:
%%time

# Unpack and then remove the downloaded compressed tar file
!gunzip -c ./CUB_200_2011.tgz | tar xopf -
!rm CUB_200_2011.tgz

The file parameters define names and locations of metadata files for the dataset. A description of the different files can be found under

./CUB_200_2011/README

In [None]:
BASE_DIR = "CUB_200_2011/"
IMAGES_DIR = BASE_DIR + "images/"

CLASSES_FILE = BASE_DIR + "classes.txt"
BBOX_FILE = BASE_DIR + "bounding_boxes.txt"
IMAGE_FILE = BASE_DIR + "images.txt"
LABEL_FILE = BASE_DIR + "image_class_labels.txt"

TRAIN_LST_FILE = "birds_ssd_train.lst"
VAL_LST_FILE = "birds_ssd_val.lst"

### Visualizing the Dataset

In [None]:
import utils

utils.show_species(IMAGES_DIR, "010.Red_winged_Blackbird")

### Preparing the RecordIO files

In [None]:
%%time

import utils

SIZE_COLS = ["idx", "width", "height"]
SIZE_FILE = BASE_DIR + "sizes.txt"

utils.gen_image_size_file(IMAGES_DIR, IMAGE_FILE, SIZE_COLS, SIZE_FILE)

RecordIO files can be created using the im2rec tool (images to RecordIO), which takes as input a pair of list files, one for training images and the other for validation images. Each list file has one row for each image. For object detection, each row must contain bounding box data and a class label.

For the CalTech birds dataset, we need to convert absolute bounding box dimensions to relative dimensions based on image size. We also need to adjust class id’s to be zero-based (instead of 1 to 200, they need to be 0 to 199). This dataset comes with recommended train/test split information (“is_training_image” flag) but in this notebook we will create a random train/test split with a specific train/test ratio.

**Generating LST files**

In [None]:
%%time

import utils

# To speed up training and experimenting, you can use a small handful of species.
# To see the full list of the classes available, look at the content of CLASSES_FILE.
CLASSES = [17, 36, 47, 68, 73]

TRAIN_LST_FILE = "birds_ssd_sample_train.lst"
VAL_LST_FILE = "birds_ssd_sample_val.lst"

TRAIN_RATIO = 0.8

IM2REC_SSD_COLS = [
    "header_cols",
    "label_width",
    "zero_based_id",
    "xmin",
    "ymin",
    "xmax",
    "ymax",
    "image_file_name",
]

train_df, val_df = \
    utils.gen_list_files(SIZE_FILE, BBOX_FILE, IMAGE_FILE, LABEL_FILE,
                   CLASSES,
                   IM2REC_SSD_COLS,
                   TRAIN_RATIO,
                   TRAIN_LST_FILE, VAL_LST_FILE)

Let's take a look at a few records from the training list file to understand better what is being fed to the RecordIO files.

The first column is the image number or index. The second column indicates that the label is made up of 2 columns (column 2 and column 3). The third column specifies the label width of a single object. In our case, the value 5 indicates each image has 5 numbers to describe its label information: the class index, and the 4 bounding box coordinates. If there are multiple objects within one image, all the label information should be listed in one line. Our dataset contains only one bounding box per image.

The fourth column is the class label. This identifies the bird species using a zero-based class id. Columns 4 through 7 represent the bounding box for where the bird is found in this image.

The classes should be labeled with successive numbers and start with 0. The bounding box coordinates are ratios of its top-left (xmin, ymin) and bottom-right (xmax, ymax) corner indices to the overall image size. Note that the top-left corner of the entire image is the origin (0, 0). The last column specifies the relative path of the image file within the images directory.

In [None]:
!tail -3 $TRAIN_LST_FILE

**Creating RecordIO .rec files**

Firstly, you will download the im2rec.py tool for pre-processing and packing images together in a RecordIO records file.

In [None]:
import urllib

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/im2rec.py",
    "im2rec.py")

Now we create the records file while resizing the shorter edge of image to 256 pixels

In [None]:
import sys

RESIZE_SIZE = 256

!python im2rec.py --resize $RESIZE_SIZE --pack-label birds_ssd_sample $IMAGES_DIR

Let's have a look at our packed images

In [None]:
import mxnet as mx
import matplotlib.pyplot as plt
import numpy as np

data_iter = mx.io.ImageRecordIter(
    path_imgrec='./birds_ssd_sample_val.rec',
    data_shape=(3, 500, 500), # output data shape. An RESIZE_SIZE X RESIZE_SIZE region will be cropped from the original image.
    batch_size=4, # number of samples per batch
    label_width=7
    #resize=256 # resize the shorter edge to 256 before cropping
    # ... you can add more augmentation options as defined in ImageRecordIter.
    )
data_iter.reset()
batch = data_iter.next()
data = batch.data[0]
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(data[i].asnumpy().astype(np.uint8).transpose((1,2,0)))
plt.show()