<a href="https://colab.research.google.com/github/rahiakela/tensorflow-computer-vision-cookbook/blob/main/09-localizing-elements-in-images-with-object-detection/03_detecting_objects_with_tensorflow_object_detection_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Detecting objects with TensorFlow's Object Detection API

It's no secret that modern object detectors rank among the most complex and challenging architectures to implement and get it right! However, that doesn't mean we can't take advantage of the most recent advancements in this domain in order to train object detectors on our own datasets. How?. Enter TensorFlow's Object Detection API!

In this recipe, we'll install this API, prepare a custom dataset for training, tweak a couple of configuration files, and use the resulting model to localize objects on test images. This recipe is a bit different from the ones you've worked on so far, because we'll be switching back and forth between Python and the command line.

##Setup

Let's begin with the most important one: the TensorFlow Object Detection API.

In [11]:
!git clone --depth 1 https://github.com/tensorflow/models.git

Cloning into 'models'...
remote: Enumerating objects: 2802, done.[K
remote: Counting objects: 100% (2802/2802), done.[K
remote: Compressing objects: 100% (2333/2333), done.[K
remote: Total 2802 (delta 717), reused 1296 (delta 433), pack-reused 0[K
Receiving objects: 100% (2802/2802), 32.79 MiB | 29.40 MiB/s, done.
Resolving deltas: 100% (717/717), done.


Next, install the TensorFlow Object Detection API.

In [None]:
!cd models
!sudo apt install -y protobuf-compiler
!cd models/research
!protoc models/research/object_detection/protos/*.proto --python_out=.
!cp models/research/object_detection/packages/tf2/setup.py .
!python -m pip install -q .

Or alternatavly, we can install object-detection-api.

https://stackoverflow.com/questions/50113683/modulenotfounderror-no-module-named-object-detection

In [None]:
!pip install tensorflow-object-detection-api

Let's download Fruit Images for Object Detection dataset from [Kaggle](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection#)

In [None]:
from google.colab import files
files.upload() # upload kaggle.json file

In [4]:
%%shell

mkdir -p ~/.kaggle
mv kaggle.json ~/.kaggle/
ls ~/.kaggle
chmod 600 /root/.kaggle/kaggle.json

# download word embeddings from kaggle
kaggle datasets download -d mbkinaci/fruit-images-for-object-detection
unzip -qq fruit-images-for-object-detection.zip
rm -rf fruit-images-for-object-detection.zip

kaggle.json
Downloading fruit-images-for-object-detection.zip to /content
 32% 9.00M/28.4M [00:00<00:00, 37.3MB/s]
100% 28.4M/28.4M [00:00<00:00, 81.4MB/s]




In [5]:
import glob
import io
import os
from collections import namedtuple
from xml.etree import ElementTree as tree

import pandas as pd
import tensorflow.compat.v1 as tf
from PIL import Image
from object_detection.utils import dataset_util

## Define some helper function

We'll work with two files in this recipe: 

- the first one is used to prepare the data 
- the second one is used to make inferences with the object detector

Let's define the encode_class() function, which maps the text labels to their integer counterparts.

In [2]:
def encode_class(row_label):
  class_mapping = {"apple": 1, "orange": 2, "banana": 3}
  return class_mapping.get(row_label, None)

Let's define a function to split a dataframe of labels into groups.

In [6]:
def split(df, group):
  data = namedtuple("data", ["filename", "object"])
  groups = df.groupby(group)

  return [data(filename, groups.get_group(x)) for filename, x in zip(groups.group.keys(), groups.group)]

The TensorFlow Object Detection API works with a data structure known as
`tf.train.Example`. 

The next function takes the path to an image and its label
(which is the set of bounding boxes and the ground-truth classes of all objects
contained in it) and creates the corresponding `tf.train.Example`.

In [7]:
def create_tf_example(group, path):
  # First, load the image and its properties
  groups_path = os.path.join(path, f"{group.filename}")
  with tf.gfile.GFile(groups_path, "rb") as f:
    encoded_jpg = f.read()

  image = Image.open(io.BytesIO(encoded_jpg))
  width, height = image.size

  filename = group.filename.encode("utf8")
  image_format = b"jpg"

  # Now, store the dimensions of the bounding boxes, along with the classes of each object contained in the image
  xmins = []
  xmaxs = []
  ymins = []
  ymaxs = []
  classes_text = []
  classes = []

  for index, row in group.object.iterrows():
    xmins.append(row['xmin'] / width)
    xmaxs.append(row['xmax'] / width)
    ymins.append(row['ymin'] / height)
    ymaxs.append(row['ymax'] / height)
    classes_text.append(row['class'].encode('utf8'))
    classes.append(encode_class(row['class']))

  # Create a tf.train.Features object that will contain relevant information about the image and its objects
  features = tf.train.Features(feature={
      'image/height': dataset_util.int64_feature(height),
      'image/width': dataset_util.int64_feature(width),
      'image/filename': dataset_util.bytes_feature(filename),
      'image/source_id': dataset_util.bytes_feature(filename),
      'image/encoded': dataset_util.bytes_feature(encoded_jpg),
      'image/format': dataset_util.bytes_feature(image_format),
      'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
      'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
      'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
      'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
      'image/object/class/label': dataset_util.int64_list_feature(classes)
  })

  return tf.train.Features(features=features)

Now, define a function to transform an Extensible Markup Language (XML) file—
with information about the bounding boxes in an image—to an equivalent one in
Comma-Separated Values (CSV) format.

In [8]:
def bboxes_to_csv(path):
  xml_list = []

  bboxes_pattern = os.path.sep.join([path, '*.xml'])
  for xml_file in glob.glob(bboxes_pattern):
    t = tree.parse(xml_file)
    root = t.getroot()

    for member in root.findall('object'):
        value = (root.find('filename').text,
                  int(root.find('size')[0].text),
                  int(root.find('size')[1].text),
                  member[0].text,
                  int(member[4][0].text),
                  int(member[4][1].text),
                  int(member[4][2].text),
                  int(member[4][3].text))
        xml_list.append(value)

  column_names = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
  df = pd.DataFrame(xml_list, columns=column_names)
  
  return df

Now, iterate over the test and train subsets in the fruits folder, converting the labels from CSV to XML.

In [10]:
base = "fruits"
for subset in ["test", "train"]:
  folder = os.path.sep.join([f"{subset}_zip", subset])
  labels_path = os.path.sep.join([f"{subset}_labels.csv"])

  bboxes_df = bboxes_to_csv(folder)
  bboxes_df.to_csv(labels_path, index=None)

  # Then, use the same labels to produce the tf.train.Examples corresponding to the current subset of data being processed:
  writer = (tf.python_io.TFRecordWriter(f"resources/{subset}.record"))
  examples = pd.read_csv(f"fruits/{subset}_labels.csv")
  grouped = split(examples, "filename")

  path = os.path.join(f"fruits/{subset}_zip/{subset}")
  for group in grouped:
    tf_example = create_tf_example(group, path)
    writer.write(tf_example.SerializeToString())
  writer.close()

NotFoundError: ignored