# Object Detection

Object detection is a computer vision task that allows us to identify and figure out the location of objects within an image or video. In particular, it could be used to track locations of objects in a scene and draws a box around the objects of interest. In this tutorial, we will be using the [LISA Traffic Sign Dataset](https://cvrr.ucsd.edu/lisa-traffic-signs-dataset), which is a set of videos and annotated frames containing US traffic signs. Here, we have it already split up into frames and annotated with the different bounding boxes of various traffic signs within each frame. Our goal is to train an object detection model to be able to identify the different traffic signs within the data.

The KerasCV library contains various modular building blocks, including:

* Layers
* Metrics
* Losses
* Data augmentation

You can use these modular building blocks to build your own object detection model.

In [1]:
rm -rf ./LISA/ # Remove the directory to save some space

In [2]:
rm -rf ./LISA_processed/ # Clear the workspace to run through the tutorial

## Set up

Begin by installing and importing some necessary libraries, including: [KerasCV](https://keras.io/keras_cv/) for using bounding boxes and state-of-the-art computer vision models, [remotezip](https://github.com/gtsystem/python-remotezip) to inspect the contents of a ZIP file, and [tqdm](https://github.com/tqdm/tqdm) to use a progress bar. We will also install TensorFlow 2.11.0, which is compatible with KerasCV.

In [3]:
!pip install keras-cv remotezip tqdm datasets luketils --upgrade --quiet
!pip install -U "tensorflow==2.11.0" --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/634.9 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m286.7/634.9 KB[0m [31m8.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m634.9/634.9 KB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m462.8/462.8 KB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.0/132.0 KB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 KB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.6/140.6 KB[0m 

In [4]:
import os
import PIL
import tqdm
import random
import shutil
import pathlib
import numpy as np
import pandas as pd
import remotezip as rz

import tensorflow as tf
import tensorflow_datasets as tfds
import keras
import keras_cv
import luketils

import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter

You do not have Waymo Open Dataset installed, so KerasCV Waymo metrics are not available.


## Prepare the dataset

In [5]:
URL = 'https://storage.googleapis.com/download.tensorflow.org/data/lisa.zip'

In [6]:
def list_files_from_zip_url(zip_url):
  """ List the files in each class of the dataset given a URL with the zip file.

    Args:
      zip_url: A URL from which the files can be extracted from.

    Returns:
      List of files in each of the classes.
  """
  files = []
  with rz.RemoteZip(zip_url) as zip:
    for zip_info in zip.infolist():
      files.append(zip_info.filename)
  return files

List the files that have box annotations. We will use the directories listed here to do our image downloads later on.

In [7]:
annot_files = list_files_from_zip_url(URL)
annot_files = [f for f in annot_files if f.endswith('.csv') and 'BOX' in f]
annot_files

['Annotations/Annotations/daySequence1/frameAnnotationsBOX.csv',
 'Annotations/Annotations/daySequence2/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip1/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip10/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip11/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip12/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip13/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip2/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip3/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip4/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip5/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip6/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip7/frameAnnotationsBOX.csv',
 'Annotations/Annotations/dayTrain/dayClip8/frameAnnotationsBOX.csv',
 'Annotations/Annotations/

In [8]:
img_files = list_files_from_zip_url(URL)
img_files = [f for f in img_files if f.endswith('.jpg')]
img_files[:10]

['daySequence1/daySequence1/frames/daySequence1--00000.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00001.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00002.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00003.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00004.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00005.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00006.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00007.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00008.jpg',
 'daySequence1/daySequence1/frames/daySequence1--00009.jpg']

There are a lot of images to download, so let's take a subset of them.

In [9]:
len(img_files)

44075

In [10]:
dir = [] # Get the names of the different directories
for i in img_files:
  if i.split('/')[1] not in dir:
    dir.append(i.split('/')[1])

img_files_subset = []
count = {k:v for (k, v) in zip(dir, [0] * len(dir))}

for filename in img_files:
  for key in count:
    if key in filename and count[key] < 100:
      img_files_subset.append(filename)
      count[key] += 1

len(img_files_subset)

800

### Download the annotations

In [11]:
def download_annotations(zip_url, annot_files, download_dir):
  with rz.RemoteZip(zip_url) as zip:
    for fn in tqdm.tqdm(annot_files):
      zip.extract(fn, str(download_dir))

In [12]:
download_dir = pathlib.Path('./LISA/')
download_annotations(URL, annot_files, download_dir)

100%|██████████| 24/24 [00:01<00:00, 14.29it/s]


In [13]:
new_annot_path = './LISA_processed/annotations/'
os.makedirs(new_annot_path)
for path in download_dir.rglob("*.csv"):
  dir_name = str(path).split('/')[-2]
  new_file_name = new_annot_path + path.name.replace('.csv', '_' + dir_name + '.csv')
  shutil.move(str(path), new_file_name)

In [14]:
rm -rf ./LISA # Remove the directory to save some space

### Download the images

In [15]:
def download_images(zip_url, img_files, download_dir):
  with rz.RemoteZip(zip_url) as zip:
    for fn in tqdm.tqdm(img_files):
      zip.extract(fn, str(download_dir))

In [16]:
download_images(URL, img_files_subset, download_dir)

100%|██████████| 800/800 [00:46<00:00, 17.09it/s]


In [17]:
new_img_path = './LISA_processed/images/'
os.makedirs(new_img_path)
for path in download_dir.rglob("*.jpg"):
  new_file_name = new_img_path + path.name
  shutil.move(str(path), new_file_name)

In [18]:
rm -rf ./LISA/ # Remove the directory to save some space

### What does the annotation file look like?

Based on our current data preparation, let's take a look at what one of the annotation files look like. The first column of the annotation `.csv` file is the name of the file, and the second column is what the annotation is tagged as. Following this information, we have the upper left corner (x, y) coordinates and the lower right corner (x, y) coordinates.



In [19]:
df = pd.read_csv(new_annot_path + 'frameAnnotationsBOX_dayClip1.csv',
                 delimiter=';').drop(['Origin file', 'Origin frame number',
                                      'Origin track', 'Origin track frame number'], axis=1)
df

Unnamed: 0,Filename,Annotation tag,Upper left corner X,Upper left corner Y,Lower right corner X,Lower right corner Y
0,dayTraining/dayClip1--00000.jpg,go,698,333,710,358
1,dayTraining/dayClip1--00000.jpg,go,846,391,858,411
2,dayTraining/dayClip1--00001.jpg,go,698,337,710,357
3,dayTraining/dayClip1--00001.jpg,go,847,390,859,410
4,dayTraining/dayClip1--00002.jpg,go,698,331,710,356
...,...,...,...,...,...,...
7974,dayTraining/dayClip1--02160.jpg,go,660,322,672,342
7975,dayTraining/dayClip1--02160.jpg,go,777,351,792,376
7976,dayTraining/dayClip1--02160.jpg,go,838,174,868,234
7977,dayTraining/dayClip1--02160.jpg,go,987,213,1014,263


### Organize the dataset

Go through each csv file and change the filename to not include the full file path, but just the name of the image. This will make it easier to match the image name in the annotation file to the actual image down the line.

In [20]:
annot_path = pathlib.Path(new_annot_path)

for annot_file in tqdm.tqdm(annot_path.glob("*.csv")):
  # Open the .csv file into a Pandas dataframe
  df = pd.read_csv(str(annot_file), delimiter=';').drop(['Origin file',
                                                         'Origin frame number',
                                                         'Origin track',
                                                         'Origin track frame number'],
                                                        axis=1)
  for fn in df['Filename']:
    df['Filename'] = df['Filename'].replace(fn, fn.split('/')[-1])

  # Overwrite the dataframe as the new .csv file - can save space by dropping
  # the additional columns as well
  df.to_csv(str(annot_file), index=False)

24it [01:10,  2.94s/it]


Now we have the image name in the image folder, matched up to the annotations. Note that some of the image names appear more than once. This is due to the fact that a single image can have multiple bounding boxes, and each bounding box can be identified as a single class.

In [21]:
df = pd.read_csv(new_annot_path + 'frameAnnotationsBOX_dayClip1.csv', delimiter=',')
df

Unnamed: 0,Filename,Annotation tag,Upper left corner X,Upper left corner Y,Lower right corner X,Lower right corner Y
0,dayClip1--00000.jpg,go,698,333,710,358
1,dayClip1--00000.jpg,go,846,391,858,411
2,dayClip1--00001.jpg,go,698,337,710,357
3,dayClip1--00001.jpg,go,847,390,859,410
4,dayClip1--00002.jpg,go,698,331,710,356
...,...,...,...,...,...,...
7974,dayClip1--02160.jpg,go,660,322,672,342
7975,dayClip1--02160.jpg,go,777,351,792,376
7976,dayClip1--02160.jpg,go,838,174,868,234
7977,dayClip1--02160.jpg,go,987,213,1014,263


## Create the data input pipeline

### Writing a custom Tensorflow Dataset

Datasets are distributed in all kinds of formats and in all kinds of places, and they're not always stored in a format that's ready to feed into a machine learning pipeline. Creating our own custom class to bulid the dataset allows us to handle complex data types, such as nested structures, like the definition of the target bounding boxes and labels.

All datasets are implemented subclasses of `tfds.core.DatasetBuilder`, which takes care of most boilerplate. We can use it to support small/medium datasets which can be generated on a single machine, such as in this tutorial. Note that it can be used for larger datasets, and more information on this can be found in the [Writing custom datasets](https://www.tensorflow.org/datasets/add_dataset#write_your_dataset) tutorial.


We use `tfds.core.GeneratorBasedBuilder` as a base class for datasets with data generation. It expects subclasses to overwrite `_split_generators` to return a dictionary of splits, generators.

Let's consolidate our data even further. Instead of having the same image listed multiple times, let's gather all the bounding boxes and classes such that the image entry only appears once.

In [22]:
%%writefile LISABuilder.py
import PIL
import random
import pathlib
import numpy as np
import pandas as pd

import tensorflow as tf
import tensorflow_datasets as tfds

class LISABuilder(tfds.core.GeneratorBasedBuilder):
  """
    Dataset builder for LISA Traffic dataset.
  """
  VERSION = tfds.core.Version('1.0.0')
  RELEASE_NOTES = {
      '1.0.0': 'Initial release.'
  }

  def _info(self) -> tfds.core.DatasetInfo:
    """
      Dataset metadata:
      Homepage: https://cvrr.ucsd.edu/lisa-traffic-signs-dataset
      Citation: Andreas Møgelmose, Mohan M. Trivedi, and Thomas B. Moeslund,
        “Vision based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey,”
        IEEE Transactions on Intelligent Transportation Systems, 2012.
    """
    return self.dataset_info_from_configs(
        features=tfds.features.FeaturesDict({
            'image': tfds.features.Image(shape=(960, 1280, 3)),
            'bounding boxes': tfds.features.Sequence({
                'boxes': tfds.features.BBoxFeature(doc="[yxyx] with y / height, x / width"),
                'classes': tfds.features.ClassLabel(num_classes=7)
            })
        }),
        disable_shuffling=False
    )

  def train_test_split(self, img_labels_dict, training_percent = 0.80):
    # Shuffle the dictionary by the keys
    keys = list(img_labels_dict.keys())
    random.shuffle(keys)

    nkeys = int(len(keys) * training_percent)
    training_keys = keys[:nkeys]
    test_keys = keys[nkeys:]

    training_dict = {k: img_labels_dict[k] for k in training_keys}
    test_dict = {k: img_labels_dict[k] for k in test_keys}

    return training_dict, test_dict

  def format_bounding_boxes(self, upper_left_x, upper_left_y,
                            lower_right_x, lower_right_y,
                            height, width):
    """
      Package the coordinates into a tf.Tensor.
    """
    # Return ymin, xmin, ymax, xmax - yxyx format divided by height and width
    ymin = upper_left_y / height
    xmin = upper_left_x / width
    ymax = lower_right_y / height
    xmax = lower_right_x / width
    return np.asarray([ymin, xmin, ymax, xmax])

  def _split_generators(self, dl_manager: tfds.download.DownloadManager):
    annot_path = pathlib.Path('./LISA_processed/annotations/')
    img_path = pathlib.Path('./LISA_processed/images/')
    classes = ['stop', 'go', 'warning', 'warningLeft', 'stopLeft', 'goForward', 'goLeft']
    classes = sorted(classes)
    class_mapping = dict((name, idx) for idx, name in enumerate(classes))

    data_dict = dict() # Let each image have their entries for the bbox coordinates and classes
    for img_file in img_path.glob("*.jpg"):
      for annot_file in annot_path.glob("*.csv"):
        # Turn the .csv file into a dataframe for easy reading
        df = pd.read_csv(str(annot_file), delimiter=',')
        if any(img_file.name == name for name in df['Filename'].unique()):
          # Get indices of matched up file names
          idx = df.index[df['Filename'] == img_file.name].tolist()
          upper_left_x, upper_left_y, lower_right_x, lower_right_y, classes = [], [], [], [], []
          for i in idx:
            # Get the bounding box information: upper_left_x, upper_left_y,	lower_right_x	, lower_right_y
            upper_left_x.append(df.loc[i, 'Upper left corner X']) # xmin
            upper_left_y.append(df.loc[i, 'Upper left corner Y']) # ymin
            lower_right_x.append(df.loc[i, 'Lower right corner X']) # xmax
            lower_right_y.append(df.loc[i, 'Lower right corner Y']) # ymax
            # Get the class information
            classes.append(df.loc[i, 'Annotation tag'])
          # Ensure all new lists are the same length - sanity check
          assert len(upper_left_x) == len(upper_left_y) == len(lower_right_x) == len(lower_right_y) == len(classes)
          data_dict[img_file.name] = {'xmin': upper_left_x,
                                      'ymin': upper_left_y,
                                      'xmax': lower_right_x,
                                      'ymax': lower_right_y,
                                      'classes': classes}

    training_dict, test_dict = self.train_test_split(data_dict)

    return {
        'train': self._generate_examples(training_dict, class_mapping),
        'test': self._generate_examples(test_dict, class_mapping)
    }

  def _generate_examples(self, data_dict, class_mapping):
    image_id = 0
    for key in data_dict:
      image = PIL.Image.open('./LISA_processed/images/' + key)
      image_tensor = np.asarray(image)
      image_id += 1

      height, width, _ = image_tensor.shape

      bbox_coords, cls = [], []
      # Recall the value of this dictionary is also a dictionary
      for i in range(len(data_dict[key]['xmin'])): # All lists are same length, does not matter which one you choose
        # Get each individual coordinate and corresponding class
        xmin = data_dict[key]['xmin'][i]
        ymin = data_dict[key]['ymin'][i]
        xmax = data_dict[key]['xmax'][i]
        ymax = data_dict[key]['ymax'][i]
        # Format the bounding box coordinates
        bbox_coords.append(self.format_bounding_boxes(xmin, ymin,
                                                      xmax, ymax,
                                                      height, width))
        cls.append(class_mapping[data_dict[key]['classes'][i]])

      example = {'image': image_tensor,
                 'bounding boxes': {
                     'boxes': bbox_coords,
                     'classes': cls
                 }
                }

      yield image_id, example

Writing LISABuilder.py


In [23]:
import importlib
import LISABuilder
importlib.reload(LISABuilder)

<module 'LISABuilder' from '/content/LISABuilder.py'>

In [24]:
ds = tfds.load('LISABuilder')

Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/lisa_builder/1.0.0...


Generating splits...:   0%|          | 0/2 [00:00<?, ? splits/s]



Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/lisa_builder/1.0.0.incompleteZZ6OGA/lisa_builder-train.tfrecord*...:   0%|…

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/lisa_builder/1.0.0.incompleteZZ6OGA/lisa_builder-test.tfrecord*...:   0%| …

Dataset lisa_builder downloaded and prepared to /root/tensorflow_datasets/lisa_builder/1.0.0. Subsequent calls will reuse this data.


In [25]:
train_ds = ds['train']
train_ds.element_spec

{'bounding boxes': {'boxes': TensorSpec(shape=(None, 4), dtype=tf.float32, name=None),
  'classes': TensorSpec(shape=(None,), dtype=tf.int64, name=None)},
 'image': TensorSpec(shape=(960, 1280, 3), dtype=tf.uint8, name=None)}

In [26]:
test_ds = ds['test']
test_ds.element_spec

{'bounding boxes': {'boxes': TensorSpec(shape=(None, 4), dtype=tf.float32, name=None),
  'classes': TensorSpec(shape=(None,), dtype=tf.int64, name=None)},
 'image': TensorSpec(shape=(960, 1280, 3), dtype=tf.uint8, name=None)}

In [27]:
sample = next(iter(train_ds))
print(f"Image shape: {sample['image'].shape}")
print(f"Bounding boxes: {sample['bounding boxes']['boxes']}")
print(f"Classes: {sample['bounding boxes']['classes']}")

Image shape: (960, 1280, 3)
Bounding boxes: [[0.35416666 0.3875     0.37291667 0.396875  ]
 [0.36458334 0.6148437  0.38333333 0.62421876]
 [0.43229166 0.4609375  0.45520833 0.4703125 ]]
Classes: [3 3 4]


In [28]:
def unpackage_tfds_inputs(inputs):
  image = inputs['image']

  boxes = keras_cv.bounding_box.convert_format(
      inputs['bounding boxes']['boxes'],
      images=image,
      source='rel_yxyx', # Each value is relative to the width and height of the origin image
      target='xywh'
  )

  bounding_boxes = {
      "classes": tf.cast(inputs['bounding boxes']['classes'], dtype=tf.float32),
      "boxes": tf.cast(boxes, dtype=tf.float32)
  }

  return {"images": tf.cast(image, tf.float32),
          "bounding_boxes": bounding_boxes}

In [29]:
unpacked_train_ds = train_ds.map(unpackage_tfds_inputs, num_parallel_calls=tf.data.AUTOTUNE)
unpacked_test_ds = test_ds.map(unpackage_tfds_inputs, num_parallel_calls=tf.data.AUTOTUNE)

In [30]:
sample = next(iter(unpacked_train_ds))
print(f"Image shape: {sample['images'].shape}")
print(f"Bounding boxes: {sample['bounding_boxes']['boxes']}")
print(f"Classes: {sample['bounding_boxes']['classes']}")

Image shape: (960, 1280, 3)
Bounding boxes: [[496. 340.  12.  18.]
 [787. 350.  12.  18.]
 [590. 415.  12.  22.]]
Classes: [3. 3. 4.]


### Ragged tensors and bounding box format

The upper left corner is equivalent to (`x_min`, `y_min`) and the lower right corner is equivalent to (`x_max`, `y_max`). Coordinates of a bounding box are encoded with four values in pixels: [`x_min`, `y_min`, `x_max`, `y_max`]. KerasCV has a predefined specificication for bounding boxes. To comply with this, you should package your bounding boxes into a dictionary matching the format below:

```
bounding_boxes = {
    # num_boxes may be a Ragged dimension
    'boxes': Tensor(shape=[batch, num_boxes, 4]),
    'classes': Tensor(shape=[batch, num_boxes])
}
```

Next, let's batch the data. In KerasCV object detection tasks it is recommended that users use ragged batches. This is due to the fact that images may be of different sizes in PascalVOC and that there may be different numbers of bounding boxes per image.

Moreover, we specifically want to use a `RaggedTensor` to create this `bounding_boxes` dictionary. The easiest way to construct a ragged dataset in a [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) pipeline is to use [`tf.data.experimental.dense_to_ragged_batch`](https://www.tensorflow.org/api_docs/python/tf/data/experimental/dense_to_ragged_batch?version=nightly).

In [31]:
batch_size = 4
batched_train_ds = unpacked_train_ds.apply(tf.data.experimental.dense_to_ragged_batch(batch_size))
batched_test_ds = unpacked_test_ds.apply(tf.data.experimental.dense_to_ragged_batch(batch_size))

In [32]:
sample = next(iter(batched_train_ds))
print(f"Image shape: {sample['images'].shape}")
print(f"Bounding boxes: {sample['bounding_boxes']['boxes']}")
print(f"Classes: {sample['bounding_boxes']['classes']}")

Image shape: (4, 960, 1280, 3)
Bounding boxes: <tf.RaggedTensor [[[496.0, 340.0, 12.0, 18.0],
  [787.0, 350.0, 12.0, 18.0],
  [590.0, 415.0, 12.0, 22.0]], [[667.0, 383.0, 9.0, 15.0],
                                [520.0, 386.0, 12.0, 21.0],
                                [724.0, 383.0, 13.0, 32.0]],
 [[740.0, 441.0, 15.0, 22.0]], [[98.0, 260.0, 24.0, 31.0],
                                [666.0, 306.0, 12.0, 22.0],
                                [292.0, 413.0, 18.0, 27.0]]]>
Classes: <tf.RaggedTensor [[3.0, 3.0, 4.0], [0.0, 3.0, 3.0], [3.0], [3.0, 3.0, 4.0]]>


In [33]:
print(f"Bounding boxes shape: {sample['bounding_boxes']['boxes'].shape}")
print(f"Classes shape: {sample['bounding_boxes']['classes'].shape}")

Bounding boxes shape: (4, None, 4)
Classes shape: (4, None)


## Data Augmentation

In your pipeline, data augmentation is a possible step you might choose for many different reasons: making your dataset rich and sufficient with different types of data, wanting to do shape resizing, etc. Regardless, Image augmentation techniques must be aware of the underlying bounding boxes, and must update them accordingly.

[`JitteredResize`](https://keras.io/api/keras_cv/layers/augmentation/jittered_resize/) takes a three step approach to size-distortion based image augmentation. This technique is specifically tuned for object detection pipelines. The layer takes an input of images and bounding boxes, both of which may be ragged. It outputs a dense image tensor, ready to feed to a model for training. As such this layer will commonly be the final step in an augmentation pipeline.

In [34]:
augment = keras_cv.layers.Augmenter(
    layers=[
        keras_cv.layers.JitteredResize(
            target_size=(640, 640),
            scale_factor=(0.75, 1.3),
            bounding_box_format="xywh"
        ),
        keras_cv.layers.Resizing(
          640, 640, bounding_box_format="xywh", pad_to_aspect_ratio=True
        ),

    ]
)

In [35]:
augmented_train_ds = batched_train_ds.map(lambda x: augment(x, training=True), num_parallel_calls=tf.data.AUTOTUNE)
augmented_test_ds = batched_test_ds.map(lambda x: augment(x, training=False), num_parallel_calls=tf.data.AUTOTUNE)

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


In [36]:
sample = next(iter(augmented_train_ds))
print(f"Image shape: {sample['images'].shape}")
print(f"Bounding boxes: {sample['bounding_boxes']['boxes']}")
print(f"Classes: {sample['bounding_boxes']['classes']}")

Image shape: (4, 640, 640, 3)
Bounding boxes: <tf.RaggedTensor [[[338.4625, 206.08331, 9.253113, 13.875],
  [562.85077, 213.79166, 9.253113, 13.875],
  [410.9453, 263.8958, 9.253113, 16.958344]],
 [[209.06482, 103.667725, 10.188293, 16.984375],
  [42.65625, 107.064575, 13.584351, 23.778137],
  [273.59058, 103.667725, 14.716431, 36.233307]],
 [[523.6406, 243.17499, 12.621094, 18.516663]], []]>
Classes: <tf.RaggedTensor [[3.0, 3.0, 4.0], [0.0, 3.0, 3.0], [3.0], []]>


In [37]:
def dict_to_tuple(inputs):
  return inputs['images'], keras_cv.bounding_box.to_dense(inputs['bounding_boxes'],
                                                          max_boxes=5,
                                                          default_value=0)

In [38]:
input_train_ds = augmented_train_ds.map(dict_to_tuple, num_parallel_calls=tf.data.AUTOTUNE)
input_test_ds = augmented_test_ds.map(dict_to_tuple, num_parallel_calls=tf.data.AUTOTUNE)

model_train_ds = input_train_ds.prefetch(tf.data.AUTOTUNE)
model_test_ds = input_test_ds.prefetch(tf.data.AUTOTUNE)

In [39]:
sample = next(iter(model_train_ds))
print(f"Image shape: {sample[0].shape}")
print(f"Bounding boxes: {sample[1]['boxes']}")
print(f"Classes: {sample[1]['classes']}")

Image shape: (4, 640, 640, 3)
Bounding boxes: [[[419.03754  292.54166   15.121826  22.6875  ]
  [537.4922   387.07288   15.121887  27.729187]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]]

 [[171.2602   197.7573     7.58667   12.640625]
  [ 47.34375  200.28543   10.115631  17.696869]
  [219.30939  197.7573    10.958618  26.966675]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]]

 [[  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]]

 [[191.60468  193.30626    9.740662  17.852066]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.         0.      ]
  [  0.         0.         0.       

In [40]:
print(f"Bounding boxes shape: {sample[1]['boxes'].shape}")
print(f"Classes shape: {sample[1]['classes'].shape}")

Bounding boxes shape: (4, 5, 4)
Classes shape: (4, 5)


## Model creation and training

In [41]:
tf.keras.backend.clear_session()

In [42]:
retina_net = keras_cv.models.RetinaNet(
    classes=7, # Number of classes to be used in box classification
    bounding_box_format="xywh",
    # KerasCV offers a set of pre-configured backbones
    backbone=keras_cv.models.ResNet50(
        include_top=False,
        weights="imagenet",
        include_rescaling=False
    ).as_backbone(),
)

# For faster convergence, freeze the feature extraction filters by setting:
retina_net.backbone.trainable = False

Downloading data from https://storage.googleapis.com/keras-cv/models/resnet50/imagenet/classification-v0-notop.h5
