<a href="https://colab.research.google.com/github/ricky-kiva/dl-deep-tf-cv-advanced/blob/main/2_l1_simple_object_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Simple Object Detection in Tensorflow**

Import modules

In [21]:
import tempfile
import tensorflow as tf

Pick model from Tensorflow Hub (repository of trained ML models to be reused) to compare

In [2]:
# InceptionResNetV2
module_handle = "https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1"

# MobileNetV2-SSD
#module_handle = "https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1"

Load the model

In [5]:
import tensorflow_hub as hub

model = hub.load(module_handle)

Check *model signature* (shortly, tasks that could be done by the model. thoroughly, represents named collection of input & output tensors)

**Note:**
- Some model like `MobileNetV2` has `> 1` signature (`classification` & `feature_vector`)
- The output for `default` signature of `InceptionResNetV2`:
  - Class entities (None, 1)
  - Detection boxes (None, 4)
  - Detection scores (None, 1)
  - Class labels (None, 1)
  - Class names (None, 1)
    - **Note:** `None` in `shape` often indicates that the size of the dimension is not fixed (can vary), commonly used to represent `batch size`

In [15]:
model.signatures.keys()

KeysView(_SignatureMap({'default': <ConcreteFunction () -> Dict[['detection_class_entities', TensorSpec(shape=(None, 1), dtype=tf.string, name=None)], ['detection_boxes', TensorSpec(shape=(None, 4), dtype=tf.float32, name=None)], ['detection_scores', TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)], ['detection_class_labels', TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)], ['detection_class_names', TensorSpec(shape=(None, 1), dtype=tf.string, name=None)]] at 0x7828FC2524A0>}))

Picks model signature

In [20]:
detector = model.signatures['default']

Check model input for this specific signature (`default`)

**Note:** It accepts 1 batch of colored images (3 channel) with arbitrary width & height

In [19]:
detector.inputs

[<tf.Tensor 'hub_input/image_tensor:0' shape=(1, None, None, 3) dtype=float32>]

Function: download & resize image

In [23]:
from io import BytesIO
from PIL import Image
from PIL import ImageOps
from six.moves.urllib.request import urlopen

def download_and_resize_image(url, new_width=256, new_height=256):
  _, filename = tempfile.mkstemp(suffix='.jpg') # make temp. file with '.jpg' suffix
  response = urlopen(url) # opens given url

  img_data = response.read() # reads image fetched from URL
  img_data = BytesIO(img_data) # puts image data to memory buffer

  pil_img = Image.open(img_data) # open image using PIL

  # resizes the image. crop IF the aspect ratio is different
  # `ANTIALIAS`: ensure anti-aliasing to produce smoother results WHEN downscaling the image
  pil_img = ImageOps.fit(pil_img, (new_width, new_height), Image.ANTIALIAS)
  pil_img_rgb = pil_img.convert('RGB')  # convert to RGB colorspace

  pil_img_rgb.save(filename, format='JPEG', quality=90) # saves image to the temporary file

  print(f"Image saved to {filename}")

  return filename

Download image for detection

In [26]:
img_url = 'https://upload.wikimedia.org/wikipedia/commons/f/fb/20130807_dublin014.JPG'

dl_img_path = download_and_resize_image(img_url, 2872, 2592)

Image saved to /tmp/tmpjfhkzb5g.jpg


  pil_img = ImageOps.fit(pil_img, (new_width, new_height), Image.ANTIALIAS)


Function: Load image, before passing to the model

In [33]:
def load_img(path):
  img = tf.io.read_file(path) # read the file
  img = tf.image.decode_jpeg(img, channels=3) # convert image to tensor

  return img

Function: Runs inference on local file using object detection model

In [34]:
def run_detector(detector, path):
  img = load_img(path) # load image tensor from file path

  # converts data type of image tensor's values (with pixel normalization by image pixel range ([0, 255]))
  # - pixel normalization: if we got value 128 (range [0, 255]) -> (128/255) = 0.5
  # `[tf.newaxis, ...]`: adds new axis (dimension) at the beginning of the tensor
  converted_img = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, ...]

  # run inference using the model
  result = detector(converted_img)

  # save results in dictionary
  result = {key:value.numpy() for key, value in result.items()}

  print(f"Found {len(result['detection_scores'])} objects.")

  print(result['detection_class_entities'])
  print(result['detection_scores'])
  print(result['detection_boxes'])

Run inference using the model

In [35]:
run_detector(detector, dl_img_path)

Found 100 objects.
[b'Person' b'Person' b'Building' b'Person' b'Person' b'Building'
 b'Bicycle' b'Building' b'Wheel' b'Person' b'Person' b'Building' b'Person'
 b'Wheel' b'Building' b'Window' b'Bicycle wheel' b'Bicycle wheel'
 b'Person' b'Building' b'Land vehicle' b'Building' b'Window' b'Window'
 b'Land vehicle' b'Window' b'Person' b'Person' b'Person' b'Van' b'Man'
 b'Building' b'Man' b'Bicycle' b'Bus' b'Window' b'Clothing' b'Window'
 b'Person' b'Clothing' b'Person' b'Man' b'House' b'Bicycle' b'Person'
 b'Land vehicle' b'Person' b'Person' b'Clothing' b'Man' b'Land vehicle'
 b'Person' b'Person' b'Person' b'Land vehicle' b'Building' b'Person'
 b'Window' b'Clothing' b'Person' b'Person' b'House' b'Window' b'Person'
 b'Car' b'Wheel' b'Window' b'Person' b'Window' b'Window' b'Window'
 b'House' b'Land vehicle' b'Person' b'Clothing' b'Man' b'Woman' b'Vehicle'
 b'Window' b'Clothing' b'House' b'Person' b'Tire' b'Car' b'Furniture'
 b'Woman' b'Vehicle' b'Bicycle' b'Window' b'Clothing' b'Footwear'
 b