## Image Classification with Object Detection

---


In this colab, we utilize Tensorflows Object Detection API alongside transfer learning to load, customize and retrain the [RetinaNet](https://arxiv.org/abs/1512.03385) model to perform image classification and track rubber duckies with just 5 training examples. We will download the model from the Tensorflow model garden and restore its checkpointed weights with the addition of some fine tuning with our own custom training loop. Lets get started!

![ducky](images/ducky.png)

## Setup

We will begin by cloning the Tensorflow model garden as well as installing the Tensorflow 2 [Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection).

In [None]:
!rm -rf ./models/

# clone the tensorflow model repository
!git clone --depth 1 https://github.com/tensorflow/models/

In [None]:
# install the tensorflow object detection API
%%bash
sudo apt install -y protobuf-compiler
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

**In Google Colab, you will need to restart the runtime to ensure the installation of the packages before. Please select Runtime --> Restart Runtime located in the tool bar above. Do not continue into the next section without restarting otherwise some of the imports will fail.**

## Imports

Now we will import the necessary modules to perform the task.


In [None]:
import os
import random
import imageio
import matplotlib
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

%matplotlib inline

from tqdm.notebook import tqdm
from matplotlib.image import imread
from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.utils import colab_utils
from object_detection.builders import model_builder

# Data Preprocessing

Next we will read the 5 rubber ducky training images and store their NumPy equivalent to pass into our model for training later. Addtionally, we will create the bounding boxes

In [None]:
image_dir = "/content/models/research/object_detection/test_images/ducky/train"
train_images = []
for i in range(1, 6):
  image_path = os.path.join(image_dir, 'robertducky' + str(i) + '.jpg')
  train_images.append(imread(image_path))

We will also want to create the approriate bounding boxes for our examples and you can do so by running the cell below. Make sure to draw the box as tight as possible while still containing the entire rubber duck otherwise the model might pick up on some unwanted background features. Dont proceed without bounding all images (5) and only click 'submit' when the 'All images completed!' message appears.

In [None]:
bounding_boxes = []
colab_utils.annotate(train_images, box_storage_pointer=bounding_boxes)

Here we define the category index dictionary which will be used by succeeding functions to match the class_id to the name of the object.

In [None]:
%%writefile models/research/object_detection/data/ducky_label_map.pbtxt

item {
  name: "robertducky"
  id: 1
  display_name: "rubber_ducky"
}

In [None]:
PATH_TO_LABELS = "./models/research/object_detection/data/ducky_label_map.pbtxt"
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

We convert our NumPy data into well structured tensors for input and the  class labels will also need to be one-hot encoded as shown below.

In [None]:
train_image_tensors = []
bounding_box_tensors = []
class_label_tensors = []

for (train_image, bounding_box) in zip(train_images, bounding_boxes):
  train_image_tensors.append(tf.expand_dims(tf.convert_to_tensor(train_image, dtype='float32'), axis=0))
  bounding_box_tensors.append(tf.convert_to_tensor(bounding_box, dtype='float32'))
  zero_indexed_class_labels = tf.convert_to_tensor(np.ones(shape=[bounding_box.shape[0]], dtype=np.int32) - 1)
  class_label_tensors.append(tf.one_hot(zero_indexed_class_labels, 1))

## Visualize the images with bounding boxes

By running the following code cell, you should see 5 images with the bounding boxes you drew earlier. If not, please return to the annotation step and re-do the drawing process.

In [None]:
train_images_with_boxes = train_images.copy()

plt.figure(figsize=(30, 15))
for i in range(5):
  plt.subplot(2, 3, i+1)
  viz_utils.visualize_boxes_and_labels_on_image_array(
      train_images_with_boxes[i],
      bounding_boxes[i],
      np.ones(shape=(len(train_images)), dtype='int32'),
      np.array([1.0], dtype='float32'),
      category_index,
      use_normalized_coordinates=True,
      min_score_thresh=0.8)

  plt.imshow(train_images_with_boxes[i])

plt.show()

## Retreive the model checkpoints and restore weights

Now we will download RetinaNet and move it into the object detection directory.

We will restore and build everything expect the classification layer at the head thus it will be randomly intialized by default. We will perform fine tuning later on to allow our model to adapt to the specfic task we need it for. For reference, we are using the Resnet 50 V1, 640x640 checkpoint.

In [None]:
!wget "http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz"
!tar -xf "ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz"
!mv "ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint" "models/research/object_detection/test_data/"

In [None]:
pipeline_config = 'models/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'
checkpoint_path = 'models/research/object_detection/test_data/checkpoint/ckpt-0'

config = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = config['model']
model_config.ssd.freeze_batchnorm = True; model_config.ssd.num_classes = 1

resnet_model = model_builder.build(model_config=model_config, is_training=True)

temp_box_predictor = tf.compat.v2.train.Checkpoint(
    _base_tower_layers_for_heads=resnet_model._box_predictor._base_tower_layers_for_heads,
    _box_prediction_head=resnet_model._box_predictor._box_prediction_head
)

temp_model = tf.compat.v2.train.Checkpoint(
    _feature_extractor=resnet_model._feature_extractor,
    _box_predictor=temp_box_predictor
)

checkpoint_path = "/content/models/research/object_detection/test_data/checkpoint/ckpt-0"
checkpoint = tf.compat.v2.train.Checkpoint(model=temp_model)
checkpoint.restore(checkpoint_path).expect_partial()

The short cell below will forward pass a fake image through our instantiated model to create the trainable variables.

In [None]:
image, shape = resnet_model.preprocess(tf.zeros([1, 640, 640, 3]))
predictions = resnet_model.predict(image, shape)
tmp_detections = resnet_model.postprocess(predictions, shape)

## Fine tuning and custom training loop

To take full advantage of transfer learning and pre-trained weights, we will only train the parameters of the prediction layers at the top.

In [None]:
fine_tune_vars = []
prefixes_to_tune = ['WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead',
                    'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead']

for var in resnet_model.trainable_variables:
  if (any([var.name.find(prefix) != -1 for prefix in prefixes_to_tune])):
    fine_tune_vars.append(var)

The following function defines a training step in which we will provide the model with our ground_truth values, preprocess the images, make a prediction and calculate the following loss. Note the total loss is defined as the addition of the classification and object localization loss. We then use Tensorflows GradientTape for automatic differentiation to fine tune to trainable variables at every step following the SGD optimizer.

In [None]:
@tf.function
def model_train_loop(images, bounding_boxes, class_labels, model, optimizer, fine_tune_vars):
  model.provide_groundtruth(
      groundtruth_boxes_list=bounding_boxes,
      groundtruth_classes_list=class_labels
  )

  with tf.GradientTape() as tape:
    preprocess_img_list = []
    shapes = tf.constant(batch_size * [[640, 640, 3]], dtype=tf.int32)

    for image in images: preprocess_img_list.append(model.preprocess(image)[0])

    preprocess_img_tensor = tf.concat(preprocess_img_list, axis=0)

    predicts = model.predict(preprocess_img_tensor, shapes)
    loss = model.loss(predicts, shapes)

    total_loss = loss['Loss/localization_loss'] + loss['Loss/classification_loss']
    gradients = tape.gradient(total_loss, fine_tune_vars)
    optimizer.apply_gradients(zip(gradients, fine_tune_vars))

  return total_loss

## Begin training

We will now start the training loop with the function we authored above in graph mode using 100 batches with 4 examples in each batch.

In [None]:
num_batch = 100
batch_size = 4

optimizer = tf.keras.optimizers.SGD(learning_rate = 0.008, momentum = 0.9)

for i in tqdm(range(num_batch)):
  values = [0, 1, 2, 3, 4]
  random.shuffle(values)
  keys = values[0:batch_size]

  images = [train_image_tensors[j] for j in keys]
  labels = [class_label_tensors[k] for k in keys]
  boxes = [bounding_box_tensors[l] for l in keys]

  loss = model_train_loop(images, boxes, labels, resnet_model, optimizer, fine_tune_vars)

  if i % 10 == 0:
        print('batch ' + str(i) + ' of ' + str(num_batch)
        + ', loss=' +  str(loss.numpy()), flush=True)

# Run inference with the trained model

Now we can load in our test data from the test_images directory containing around 50 unseen images of rubber duckies for the model to predict upon. After that, we run a prediction loop where we save the classified images within the ./results directory for further inspection.

In [None]:
!rm -rf ./results/

test_image_dir = "/content/models/research/object_detection/test_images/ducky/test/"
test_images = []
for i in range(1, 50):
  image_path = os.path.join(test_image_dir, 'out' + str(i) + '.jpg')
  test_images.append(imread(image_path))

In [None]:
results_dir = "./results"
os.makedirs(results_dir)

for i in range(len(test_images)):
  test_image_tensor = tf.expand_dims(tf.convert_to_tensor(test_images[i], dtype='float32'), axis=0)
  preprocessed_image, shape = resnet_model.preprocess(test_image_tensor)

  predictions = resnet_model.predict(preprocessed_image, shape)
  final = resnet_model.postprocess(predictions, shape)
  viz_utils.visualize_boxes_and_labels_on_image_array(
      test_images[i],
      final['detection_boxes'][0].numpy(),
      final['detection_classes'][0].numpy().astype('int32') + 1,
      final['detection_scores'][0].numpy(),
      category_index,
      use_normalized_coordinates=True,
      min_score_thresh=0.8)

  plt.imsave(os.path.join(results_dir, "frame_" + str(i) + ".jpg"), test_images[i])


# View final result

Finally, we can take the images with detections from before and join them together to create a gif to easily view all results in one animation.

In [None]:
images = []
filenames = ["frame_" + str(i) + ".jpg" for i in range(len(test_images))]
gif_path = os.path.join(results_dir, "ducky.gif")

for filename in filenames:
  images.append(imageio.imread(os.path.join(results_dir, filename)))

imageio.mimsave(gif_path, images, fps=5)

Please navigate and click on the 'Files' pane on the left and double click on the 'results' folder where you will find the 'ducky.gif'. Clicking this will open a preview of the file on the right. Note this make take a minute or two to load, but trust me it will be worth while to see the final result!