<a href="https://colab.research.google.com/github/elephantscale/E2E-Object-Detection-in-TFLite/blob/master/colab_training/Tutorial_Object_Detection_with_TFLite_tf2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial: Object Detection with TFLite

## Introduction

Imagine that you have a niece or a nephew and you want to give them a present.
When you were growing up, your aunt gave you a "Find the Duck" book. You had lots of fun
finding the duck on every page of this board book. Today, you want to make this book
into a computer game. For that, you need to be able to teach the computer how to find
the duck. This is what this tutorial will teach you.

<img src="find-the-duck-cartoon.png"  alt="Find the duckie" width="512" height="512">

## Our plan

The task that you are about to undertake is called "Object Detection." The good news is that the
Google library called TensorFlow already does most of the groundwork for object detection.
Furthermore, the TensorFlow Lite part of the library will help you to put your application on a
phone or a device app. The end result of your object detection will look like a screenshot below,
where you will be able to detect, out of a known set of objects, which ones are present
in our picture and what are their locations.

<img src="object-detection-cartoon.png"  alt="Find the fruit" width="512" height="512">

We will do it in three steps. First, you will have to prepare the data: those objects that you will be looking to identify.
After you got the objects, you will have to convert them to TFrecord format that Object Detection API expects.
Then, you will train the model with this data. And finally, you will export the model
to TFLite, preparing it to be used in your phone app. In the next tutorial,
we will teach you how to use the resulting TFLite model in your phone app. So, let us start.

## Data collection

We have taken a dataset with pictures of fruit. Each data sample has an images
of a fruit and an XML file with the fruit coordinates.
This dataset came from Kaggle [here](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection), with a Creative Commons license,
so for ease of use we have placed it in next to this tutorial.

In Jupyter Notebook, you can do command lines if you only start with a exclamation sign.
Let us download this dataset first.

In [None]:
!wget -nc https://github.com/elephantscale/E2E-Object-Detection-in-TFLite/raw/master/data/Fruit_Images_for_Object_Detection.zip

In the same way, let us unzip the dataset

In [None]:
!unzip -qqn Fruit_Images_for_Object_Detection.zip

## Generate intermediate files

To be able to generate TFRecords from our fruits dataset we first generate a `.csv` file that would contain the following fields - 
- filename
- width
- height
- class
- xmin
- ymin
- xmax
- ymax

Now, we need to convert the XML descriptions to CSV. In your case,
you may have to adjust this code, depending on the format of the data
in your real-life dataset.

In [None]:
# Convert XML to CSV

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET


def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def call_xml_to_csv():
    train = "/content/train_zip/train"
    test = "/content/test_zip/test"
    for directory in [train, test]:
        xml_df = xml_to_csv(directory)
        xml_df.to_csv('{}_labels.csv'.format(directory), index=None)
        print('Successfully converted xml to csv.')


call_xml_to_csv()

In [None]:
!head -5 /content/train_zip/train_labels.csv

In [None]:
!head -5 /content/test_zip/test_labels.csv

Now that we have `.csv` files we can do some basic exploratory data analysis (EDA) to better understand the dataset.

## Basic EDA

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import cv2
import os

In [None]:
train_df = pd.read_csv("/content/train_zip/train_labels.csv")
test_df = pd.read_csv("/content/test_zip/test_labels.csv")

In [None]:
train_df.head()

In [None]:
test_df.head()

In [None]:
train_df["class"].value_counts()

In [None]:
test_df["class"].value_counts()

In [None]:
def show_images(df, is_train=True):
    if is_train:
        root = "/content/train_zip/train"
    else:
        root = "/content/test_zip/test"
    plt.figure(figsize=(15,15))
    for i in range(10):
        n = np.random.choice(df.shape[0], 1)
        plt.subplot(5,5,i+1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(True)
        image = plt.imread(os.path.join(root, df["filename"][int(n)]))
        plt.imshow(image)
        label = df["class"][int(n)]
        plt.xlabel(label)
    plt.show()

In [None]:
show_images(train_df)

In [None]:
show_images(test_df, is_train=False)

In [None]:
def verify_annotations(df, is_train=True):
    if is_train:
        root = "/content/train_zip/train"
    else:
        root = "/content/test_zip/test"
    
    plt.figure(figsize=(12,12))
    for i in range(3):
        n = np.random.choice(df.shape[0], 1)
        plt.subplot(1,3,i+1)
        plt.xticks([])
        plt.yticks([])
        
        image = plt.imread(os.path.join(root, df["filename"][int(n)]))
        xmin, ymin = int(df["xmin"][int(n)]), int(df["ymin"][int(n)])
        xmax, ymax = int(df["xmax"][int(n)]), int(df["ymax"][int(n)])
        cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (255,0,0), 3)
        plt.imshow(image)
    
    plt.show()

In [None]:
verify_annotations(train_df, is_train=True)

In [None]:
verify_annotations(test_df, is_train=False)

As we can see the dataset has annotation issues. So, our model training can suffer a lot from this. So, one can expect a model trained on this dataset might yield unexpected results. 

## Generate TFRecords and `.pbtxt`

The TFRecord format is a simple format for storing a sequence of binary records.
The format is explained [here](https://www.tensorflow.org/tutorials/load_data/tfrecord)
but for our purposes it is enough that the code below created these records for us.

The utility scripts that I used in the following cells were adapted from [this repository](https://github.com/anirbankonar123/CorrosionDetector). 

TODO - should we convert to TF2 here?

In [None]:
%tensorflow_version 2.x
import tensorflow as tf 
print(tf.__version__)

!git clone https://github.com/tensorflow/models.git

% cd models/research
!pip install --upgrade pip
# Compile protos.
!protoc object_detection/protos/*.proto --python_out=.
# Install TensorFlow Object Detection API.
!cp object_detection/packages/tf1/setup.py .
!python -m pip install --use-feature=2020-resolver .

In [None]:
#!wget https://raw.githubusercontent.com/elephantscale/E2E-Object-Detection-in-TFLite/master/colab_training/generate_tfrecord.py
!wget https://raw.githubusercontent.com/elephantscale/E2E-Object-Detection-in-TFLite/tim/colab_training/generate_tfrecord.py
!wget https://raw.githubusercontent.com/elephantscale/E2E-Object-Detection-in-TFLite/tim/colab_training/generate_tfrecord_tf2.py

In [None]:
!python generate_tfrecord_tf2.py \
    --csv_input=/content/train_zip/train_labels.csv \
    --output_path=/content/train_zip/train.record

Before the running the cell below please edit the `path` variable in the `main()` function of `generate_tfrecord.py`. `generate_tfrecord.py` should be located here - `/content/models/research`. 

In [None]:
!python generate_tfrecord_tf2.py \
    --csv_input=/content/test_zip/test_labels.csv \
    --output_path=/content/test_zip/test.record

In [None]:
!pwd
!ls -lh /content/test_zip/*.record
!ls -lh /content/train_zip/*.record

Be sure to store these `.record` files to somewhere safe. Next, we need to generate a `.pbtxt` file that defines a mapping between our classes and integers. In the `generate_tfrecord.py` script, we used the following mapping - 

```python
def class_text_to_int(row_label):
    if row_label == 'orange':
        return 1
    elif row_label == 'banana':
        return 2
    elif row_label == 'apple':
        return 3
    else:
    	return None
```

In [None]:
label_encodings = {
    "orange": 1,
    "banana": 2,
    "apple": 3
}

f = open("/content/label_map.pbtxt", "w")

for (k, v) in label_encodings.items():
    item = ("item {\n"
            "\tid: " + str(v) + "\n"
            "\tname: '" + k + "'\n"
            "}\n")
    f.write(item)

f.close()

!cat /content/label_map.pbtxt

Be sure to save this file as well. Next we will proceed toward training a custom detection model with what we have so far. Follow the steps in [this notebook](https://colab.research.google.com/github/sayakpaul/E2E-Object-Detection-in-TFLite/blob/master/colab_training/Training_MobileDet_Custom_Dataset.ipynb).

In this notebook we will be fine-tuning a **MobileDet** model on the [**fruits dataset**](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection). The original model checkpoints were generated in TensorFlow 1, so we need to stick to a TF 1 runtime. The purpose is to demonstrate the workflow here and not achieve state-of-the-art results. So, please expect unexpected performance for a shorter training schedule. Toward the very end, we will also see how to optimize our fine-tuned model using TensorFlow Lite APIs and run inference with it. This part will be executed on a TF 2 runtime. 

As a prerequisite, you should be familiar with the contents of [this notebook](https://colab.research.google.com/github/sayakpaul/E2E-Object-Detection-in-TFLite/blob/master/colab_training/Fruits_Detection_Data_Prep.ipynb). It deals with the dataset contstruction part. 

# Fetch pre-trained MobileDet model checkpoints and configuration

MobileDet comes in different variants (refer [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md)). We will be using the `ssdlite_mobiledet_cpu` variant. 

TFOD API operates with configuration files to train and evaluate models (the TF 2 release supports eager model execution too). For the purpose of this notebook, I created a configuration file following instructions from [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md). Note that I purposefully kept the `num_steps` argument to 2000. Here's a *non-exhaustive* list of the arguments I changed - 

-  `batch_size: 32`
- `label_map_path` and `input_path` inside `train_input_reader` and `tf_record_input_reader` respectively. The `num_examples` argument inside `eval_config` is set to 117.

### Download model checkpoint and config

In [None]:
!cd /content/models/research
!wget -q http://download.tensorflow.org/models/object_detection/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19.tar.gz
!wget -q http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz
!wget -q https://raw.githubusercontent.com/elephantscale/E2E-Object-Detection-in-TFLite/tim/colab_training/ssdlite_mobiledet_cpu_320x320_fruits_sync_4x4.config
!wget -q https://raw.githubusercontent.com/elephantscale/E2E-Object-Detection-in-TFLite/tim/colab_training/ssdlite_mobiledet_cpu_320x320_fruits_sync_4x4_tf2.config

### Untar and verify the file structure of the model checkpoints

In [None]:
!cd /content/models/research
!tar -xvf ssdlite_mobiledet_cpu_320x320_coco_2020_05_19.tar.gz
!tar -xvf ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz
!cp /content/label_map.pbtxt /content/models/research
!cp /content/test_zip/test.record /content/models/research
!cp /content/train_zip/train.record /content/models/research

# Model training

### Start training

**Note**: This script interleaves both training and evaluation. Before starting the training verify the paths carefully.



In [None]:

import os
os.environ['PYTHONPATH'] += ":/content/models"

import sys
sys.path.append("/content/models")


In [None]:
#PIPELINE_CONFIG_PATH="/content/models/research/ssdlite_mobiledet_cpu_320x320_fruits_sync_4x4.config"
PIPELINE_CONFIG_PATH="/content/models/research/ssdlite_mobiledet_cpu_320x320_fruits_sync_4x4_tf2.config"

#MODEL_DIR="/content/models/research/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19"
MODEL_DIR="/content/models/research/ssd_mobilenet_v2_320x320_coco17_tpu-8"


!python object_detection/model_main.py \
    --pipeline_config_path={PIPELINE_CONFIG_PATH} \
    --model_dir={MODEL_DIR} \
    --alsologtostderr

The above code block would take approximately **30 minutes** to run (although it depends on the GPU you got if you are running on Colab). If you increase the number of steps it would be even more. After the training was completed I got the following output - 

```
I0915 04:48:33.129830 139851326252928 estimator.py:371] Loss for final step: 1.0553685.
```

# Export TFLite compatible graph

To export the fine-tuned checkpoints to a TFLite model we first need to export a model graph that is compatible with TFLite. More instructions about this are available [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md). First, we need to determine which checkpoints to be used to export the graph. Let's first take a look at our `MODEL_DIR` to get an idea. 

In [None]:
!ls -lh $MODEL_DIR

The checkpoint files with the prefix `model.ckpt-2000` are the ones we would be going with. 

In [None]:
#Export TFLite compatible graph
#Always verify the paths before running this command. 

!python object_detection/export_tflite_ssd_graph.py \
    --pipeline_config_path=$PIPELINE_CONFIG_PATH \
    --trained_checkpoint_prefix=$MODEL_DIR/model.ckpt-2000 \
    --output_directory=$MODEL_DIR \
    --add_postprocessing_op=true

### Verify the TFLite compatible graph size

 It should have the `.pb` extension. Be sure to note down the path you would get as the output of code block.

In [None]:
!ls -lh $MODEL_DIR/*.pb

Now that we have the graph wcan convert it to TensorFlow Lite. Let's shift the runtime to TF 2. To do so, simply restart the Colab runtime. 

# Optionally see the model losses in TensorBoard (within Colab Notebook)

Note If you trained for 2000 steps only you are likely to see poor numbers in TensorBoard. But as I had mentioned training a SoTA model is not the purpose of this notebook. 

In [None]:

%tensorflow_version 2.x
%load_ext tensorboard
%tensorboard --logdir /content/models/research/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19

# Export to TFLite

In [None]:
#Imports
import tensorflow as tf
print(tf.__version__)

import os

### Quantize and serialize

For the purpose of this notebook, we will only be quantizing using the [dynamic-range quantization](https://www.tensorflow.org/lite/performance/post_training_quant). But you can follow [this notebook](https://colab.research.google.com/github/sayakpaul/Adventures-in-TensorFlow-Lite/blob/master/MobileDet_Conversion_TFLite.ipynb) if you are interested to try out the other ones like integer quantization and `float16` quantization. 

As the .pb file we generated in the earlier step is a frozen graph, we need to use `tf.compat.v1.lite.TFLiteConverter.from_frozen_graph` to convert it to TFLite.

The MobileDet checkpoints we used accept 320x320 images, hence the `input_shapes` argument is specified that way. I specified the other arguments following instructions from [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md).
**bold text**

In [None]:
model_to_be_quantized = "/content/models/research/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19/tflite_graph.pb"
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
    graph_def_file=model_to_be_quantized, 
    input_arrays=['normalized_input_image_tensor'],
    output_arrays=['TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'],
    input_shapes={'normalized_input_image_tensor': [1, 320, 320, 3]}
)
converter.allow_custom_ops = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

tflite_filename = "fruits_detector" + "_dr" + ".tflite"
open(tflite_filename, 'wb').write(tflite_model)
print(f"TFLite model generated: {tflite_filename}")
!ls -lh $tflite_filename

# Run inference

In [None]:
#Imports
import matplotlib
import matplotlib.pyplot as plt

import cv2
import re
import time
import numpy as np

from PIL import Image

### TFLite Interpreter and detection utils 

Sourced from [here](https://github.com/tensorflow/examples/blob/master/lite/examples/object_detection/raspberry_pi/detect_picamera.py).


In [None]:

def set_input_tensor(interpreter, image):
  """Sets the input tensor."""
  tensor_index = interpreter.get_input_details()[0]['index']
  input_tensor = interpreter.tensor(tensor_index)()[0]
  input_tensor[:, :] = image


def get_output_tensor(interpreter, index):
  """Returns the output tensor at the given index."""
  output_details = interpreter.get_output_details()[index]
  tensor = np.squeeze(interpreter.get_tensor(output_details['index']))
  return tensor


def detect_objects(interpreter, image, threshold):
  """Returns a list of detection results, each a dictionary of object info."""
  set_input_tensor(interpreter, image)
  interpreter.invoke()

  # Get all output details
  boxes = get_output_tensor(interpreter, 0)
  classes = get_output_tensor(interpreter, 1)
  scores = get_output_tensor(interpreter, 2)
  count = int(get_output_tensor(interpreter, 3))

  results = []
  for i in range(count):
    if scores[i] >= threshold:
      result = {
          'bounding_box': boxes[i],
          'class_id': classes[i],
          'score': scores[i]
      }
      results.append(result)
  return results

In [None]:
# Supply a path to download a relevant image
IMAGE_PATH = "https://i.ibb.co/2tsXmCV/image.png" 

!wget -q -O image.png $IMAGE_PATH
Image.open('image.png')

In [None]:
# Load the TFLite model
interpreter = tf.lite.Interpreter(model_path="/content/fruits_detector_dr.tflite")
interpreter.allocate_tensors()
_, HEIGHT, WIDTH, _ = interpreter.get_input_details()[0]['shape']
print(f"Height and width accepted by the model: {HEIGHT, WIDTH}")

In [None]:
# Image preprocessing utils
def preprocess_image(image_path):
    img = tf.io.read_file(image_path)
    img = tf.io.decode_image(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    original_image = img
    resized_img = tf.image.resize(img, (HEIGHT, WIDTH))
    resized_img = resized_img[tf.newaxis, :]
    return resized_img, original_image

In [None]:
# Define the label dictionary and color map
LABEL_DICT = {
    "orange": 1,
    "banana": 2,
    "apple": 3
}

REVERSE_LABEL_DICT = {
    1 : "orange",
    2 : "banana",
    3 : "apple"
}

COLORS = np.random.randint(0, 255, size=(len(LABEL_DICT), 3), 
                            dtype="uint8")

In [None]:
# Inference utils
def display_results(image_path, threshold=0.3):
    # Load the input image and preprocess it
    preprocessed_image, original_image = preprocess_image(image_path)

    # =============Perform inference=====================
    start_time = time.monotonic()
    results = detect_objects(interpreter, preprocessed_image, threshold=threshold)
    print(f"Elapsed time: {(time.monotonic() - start_time)*1000} miliseconds")

    # =============Display the results====================
    original_numpy = original_image.numpy()
    for obj in results:
        # Convert the bounding box figures from relative coordinates
        # to absolute coordinates based on the original resolution
        ymin, xmin, ymax, xmax = obj['bounding_box']
        xmin = int(xmin * original_numpy.shape[1])
        xmax = int(xmax * original_numpy.shape[1])
        ymin = int(ymin * original_numpy.shape[0])
        ymax = int(ymax * original_numpy.shape[0])

        # Grab the class index for the current iteration
        idx = int(obj['class_id'])
        # Skip the background
        if idx >= len(LABEL_DICT):
            continue

        # Draw the bounding box and label on the image
        color = [int(c) for c in COLORS[idx]]
        cv2.rectangle(original_numpy, (xmin, ymin), (xmax, ymax), 
                    color, 2)
        y = ymin - 15 if ymin - 15 > 15 else ymin + 15
        label = "{}: {:.2f}%".format(REVERSE_LABEL_DICT[idx],
            obj['score'] * 100)
        cv2.putText(original_numpy, label, (xmin, y),
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

    # return the final imaage
    original_int = (original_numpy * 255).astype(np.uint8)
    return original_int

In [None]:
# Run inference and measure the inference time
resultant_image = display_results("/content/image.png", threshold=0.3)
Image.fromarray(resultant_image)

In [None]:
**Note** that you might see some unexpected results because the annotations in the training dataset are faulty at places. Due to this the model training can suffer a lot. 

In [None]:
# Define the label dictionary and color map
LABEL_DICT = {
    "orange": 1,
    "banana": 2,
    "apple": 3
}

COLORS = np.random.randint(0, 255, size=(len(LABEL_DICT), 3), 
                            dtype="uint8")

In [None]:
# Inference utils
def display_results(image_path, threshold=0.3):
    # Load the input image and preprocess it
    preprocessed_image, original_image = preprocess_image(image_path)

    # =============Perform inference=====================
    start_time = time.monotonic()
    results = detect_objects(interpreter, preprocessed_image, threshold=threshold)
    print(f"Elapsed time: {(time.monotonic() - start_time)*1000} miliseconds")

    # =============Display the results====================
    original_numpy = original_image.numpy()
    for obj in results:
        # Convert the bounding box figures from relative coordinates
        # to absolute coordinates based on the original resolution
        ymin, xmin, ymax, xmax = obj['bounding_box']
        xmin = int(xmin * original_numpy.shape[1])
        xmax = int(xmax * original_numpy.shape[1])
        ymin = int(ymin * original_numpy.shape[0])
        ymax = int(ymax * original_numpy.shape[0])

        # Grab the class index for the current iteration
        idx = int(obj['class_id'])
        # Skip the background
        if idx >= len(LABEL_DICT):
            continue

        # Draw the bounding box and label on the image
        color = [int(c) for c in COLORS[idx]]
        cv2.rectangle(original_numpy, (xmin, ymin), (xmax, ymax), 
                    color, 2)
        y = ymin - 15 if ymin - 15 > 15 else ymin + 15
        label = "{}: {:.2f}%".format(REVERSE_LABEL_DICT[idx],
            obj['score'] * 100)
        cv2.putText(original_numpy, label, (xmin, y),
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

    # return the final imaage
    original_int = (original_numpy * 255).astype(np.uint8)
    return original_int

In [None]:
# Run inference and measure the inference time
resultant_image = display_results("/content/image.png", threshold=0.3)
Image.fromarray(resultant_image)

**Note** that you might see some unexpected results because the annotations in the training dataset are faulty at places. Due to this the model training can suffer a lot. 