# Object Detection
Object detection is a task in the computer vision field of study which aims to *identify and locate objects* in an image or video, annotating such video with *labels* which explain the class in which the identified objects have been placed by the neural network. 

![picture](https://drive.google.com/uc?id=1562QV_NqoQOPDmhSYzDnUwlnqHiU7lTy)

With this kind of identification and localization, object detection can be used not only to count the objects, animals and/or persons appearing in a scene but it is also possible to determine and to track their precise locations in time.

In this exercise we will use a *pretrained* neural network, the Inception Resnet V2 (Version 2) architecture proposed by Google and based on the usage of stacked layers of Convolutional Neural Networks and Pooling Layers. A detailed scheme of the architecture follows.

![image](https://drive.google.com/uc?id=1gSxcfYxBbVLHcl4MhjiZqb78woS6SnLD)

A pretrained neural network, as the name suggests, is a network which was already trained and so is ready to be applied on any new input given to it; this process is called *inference*.
Before using the Inception Resnet V2 though, it's first necessary to import and write some supporting functions, hence let's start by importing all the necessary libraries.

In [1]:
# Libraries needed to run the inference on the TF-Hub module
import tensorflow as tf
import tensorflow_hub as hub

# Libraries needed to download the images
import matplotlib.pyplot as plt
import tempfile
from six.moves.urllib.request import urlopen
from six import BytesIO

# Libraries needed to draw onto the image
import numpy as np
from PIL import Image
from PIL import ImageColor
from PIL import ImageDraw
from PIL import ImageFont
from PIL import ImageOps

import time # Used to measure the inference time

Now we need to write some functions to handle the images we will feed to the network and to draw the bounding boxes over such images.

In [2]:
# Plots the image
def display_image(image):
  fig = plt.figure(figsize=(20, 15))
  plt.grid(False)
  plt.imshow(image)

# Downloads and resizes the image
def download_and_resize_image(url, new_width=256, new_height=256, display=False):

  # Downloading the image
  _, filename = tempfile.mkstemp(suffix=".jpg")
  response = urlopen(url)
  image_data = response.read()
  image_data = BytesIO(image_data)
  pil_image = Image.open(image_data)
  pil_image = ImageOps.fit(pil_image, (new_width, new_height), Image.ANTIALIAS)
  pil_image_rgb = pil_image.convert("RGB")
  pil_image_rgb.save(filename, format="JPEG", quality=90)
  #print("Image downloaded to %s." % filename)

  # Resizing and actual plotting
  if display:
    display_image(pil_image)
  return filename

# Draws a bounding box over an image
def draw_bounding_box_on_image(image, ymin, xmin, ymax, xmax, color, font,
                               thickness=4, display_str_list=()):
  
  
  draw = ImageDraw.Draw(image)
  im_width, im_height = image.size
  
  (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
                                ymin * im_height, ymax * im_height)
  
  draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
             (left, top)],
            width=thickness, fill=color)

  # If the total height of the display strings added to the top of the bounding
  # box exceeds the top of the image, stack the strings below the bounding box
  # instead of above.
  display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]

  # Each display_str has a top and bottom margin of 0.05x.
  total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)

  if top > total_display_str_height:
    text_bottom = top
  else:
    text_bottom = top + total_display_str_height

  # Reverse list and print from bottom to top.
  for display_str in display_str_list[::-1]:

    text_width, text_height = font.getsize(display_str)
    
    margin = np.ceil(0.05 * text_height)
    
    draw.rectangle([(left, text_bottom - text_height - 2 * margin),
                    (left + text_width, text_bottom)],
                   fill=color)
    
    draw.text((left + margin, text_bottom - text_height - margin),
              display_str, fill="white", font=font)
    
    text_bottom -= text_height - 2 * margin

# Overlays labeled boxes on an image with formatted scores and label names
def draw_boxes(image, boxes, class_names, scores, max_boxes=10, min_score=0.1):

  colors = list(ImageColor.colormap.values())

  try:
    font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/arial.ttf",
                              25)
  except IOError:
    #print("Font not found, using default font.")
    font = ImageFont.load_default()

  for i in range(min(boxes.shape[0], max_boxes)):
    if scores[i] >= min_score:
      
      ymin, xmin, ymax, xmax = tuple(boxes[i])
      
      display_str = "{}: {}%".format(class_names[i].decode("ascii"),
                                     int(100 * scores[i]))
      
      color = colors[hash(class_names[i]) % len(colors)]
      
      image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
      
      draw_bounding_box_on_image(image_pil,
                                 ymin, xmin, ymax, xmax, color,
                                 font,
                                 display_str_list=[display_str])
      
      np.copyto(image, np.array(image_pil))
  
  return image

Now, let's import an image we can give to the network. For this exercise we'll use the "Naxos Taverna" image, which we'll use as a sample image to see the performance of the object detection model.

In [3]:
image_url = "https://upload.wikimedia.org/wikipedia/commons/6/60/Naxos_Taverna.jpg"
downloaded_image_path = download_and_resize_image(image_url, 1280, 856, False)

We can now import the pre-trained module from [here](https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1); we will use Inception Resnet V2 with its default settings.

In [5]:
module_handle = "https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1"
detector = hub.load(module_handle).signatures['default']

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Let's now write the function to run the pretrained network on our sample image.

In [9]:
# Loads the image for the network to use
def load_img(path):
  img = tf.io.read_file(path)
  img = tf.image.decode_jpeg(img, channels=3)
  return img

# Running the detector
def run_detector(detector, path):
  img = load_img(path)

  converted_img  = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, ...]
  start_time = time.time()
  result = detector(converted_img)
  end_time = time.time()

  result = {key:value.numpy() for key,value in result.items()}

  print("Found %d objects." % len(result["detection_scores"]))
  print(result)
  print("Inference time: ", end_time-start_time)

  image_with_boxes = draw_boxes(
      img.numpy(), result["detection_boxes"],
      result["detection_class_entities"], result["detection_scores"])

  #display_image(image_with_boxes)

run_detector(detector, downloaded_image_path)

Found 100 objects.
{'detection_class_entities': array([b'Chair', b'Umbrella', b'Kitchen & dining room table', b'Table',
       b'Chair', b'Chair', b'Chair', b'Chair', b'Chair', b'Table',
       b'Chair', b'Tree', b'Tree', b'Chair', b'Tree', b'Chair', b'Tree',
       b'Tree', b'Tree', b'Tree', b'Tree', b'Tree', b'Chair', b'Chair',
       b'Flower', b'Tree', b'Tree', b'Flower', b'Umbrella', b'Tree',
       b'Tree', b'Tree', b'Tree', b'Porch', b'Table', b'Tree', b'Tree',
       b'Tree', b'Table', b'Tree', b'Tree', b'Flower', b'Table', b'Tree',
       b'Tree', b'Tree', b'Tree', b'Tree', b'Tree', b'Chair', b'Tree',
       b'Flower', b'Table', b'Tree', b'Tree', b'Chair', b'Tree', b'Chair',
       b'Tree', b'Tree', b'Tree', b'Flower', b'Chair', b'Tree', b'Table',
       b'Tree', b'Tree', b'Tree', b'Tree', b'Tree', b'Tree', b'Chair',
       b'Flower', b'Chair', b'Table', b'Flower', b'Flower', b'Table',
       b'Tree', b'Table', b'Tree', b'Chair', b'Tree', b'Flower', b'Tree',
       b'Chair', b

Let's now use the pretrained model to make an inference on other 3 images.

In [7]:
image_urls = {"Coleoptera": "https://upload.wikimedia.org/wikipedia/commons/1/1b/The_Coleoptera_of_the_British_islands_%28Plate_125%29_%288592917784%29.jpg",
              "Campus": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg/1024px-Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg",
              "Birds": "https://upload.wikimedia.org/wikipedia/commons/0/09/The_smaller_British_birds_%288053836633%29.jpg"}


def detect_img(image_url):
  start_time = time.time()
  image_path = download_and_resize_image(image_url, 640, 480, display=False)  # downloaded image (at a 640x480 resolution) to be fed to the pretrained
  detector = hub.load(module_handle).signatures['default']    
  run_detector(detector, image_path) # make the pretrained run
  end_time = time.time()
  print("detect_img total time:", end_time-start_time)  # printing the elapsed time for the inference

# Applying detect_img on the 3 new images
for image in image_urls:
    print(image)
    detect_img(image_urls[image])
    print()

Coleoptera
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Found 100 objects.
[0.9778602  0.9626732  0.94962853 0.94943225 0.94020146 0.8503012
 0.8173051  0.57327724 0.45883006 0.350584   0.17450964 0.09852738
 0.09182353 0.07497085 0.04629605 0.04598429 0.04143474 0.03997439
 0.03943128 0.03942169 0.03021665 0.02932613 0.02918878 0.02790124
 0.02694958 0.02456733 0.02300434 0.02189109 0.01929941 0.01920215
 0.01838593 0.0180631  0.01669725 0.01660074 0.0164266  0.01589503
 0.0150904  0.01479315 0.01426297 0.01311702 0.01241734 0.01211264
 0.01042742 0.01031207 0.01011779 0.00950792 0.00940581 0.00924837
 0.00874841 0.00818658 0.00766253 0.00733435 0.0073221  0.00723879
 0.00710741 0.00708327 0.00654369 0.00645531 0.00606113 0.00528344
 0.00512487 0.00502358 0.00472783 0.00472186 0.00454735 0.00447636
 0.00402213 0.00397097 0.00393712 0.00356663 0.00334545 0.00329525
 0.00312949 0.00292099 0.00276557 0.00274509 0.00271724 0.00264512
 0.0024809  0.0024756  0.00241141 0.00237939 0.00232675 0.00225014
 0.0022367  0.00214546 0.00213519 0.00213335

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Found 100 objects.
[0.9438628  0.8966501  0.81166774 0.774244   0.73853165 0.27011248
 0.24054703 0.17063358 0.17014578 0.12994286 0.07622842 0.07121973
 0.07017564 0.06897268 0.05690185 0.04029793 0.04001265 0.03844966
 0.03647082 0.03269174 0.02800994 0.02719952 0.02541589 0.02319722
 0.02232638 0.02206668 0.02131863 0.02130859 0.02122093 0.02108924
 0.01785898 0.01752537 0.01751808 0.01701841 0.01380865 0.01368599
 0.01203154 0.01146142 0.01145623 0.01117016 0.01079447 0.01072222
 0.01045169 0.01023653 0.00959844 0.00958574 0.0095366  0.00917782
 0.0091518  0.00913873 0.00879276 0.00869646 0.00863881 0.00855919
 0.00845344 0.00845183 0.00820703 0.00805747 0.00768932 0.00747154
 0.0074619  0.00728051 0.00680106 0.00674349 0.00673483 0.00657167
 0.00645279 0.00641499 0.00632667 0.00628989 0.00610058 0.00607276
 0.00590488 0.00535663 0.00524032 0.004859   0.00481867 0.004723
 0.00470682 0.00459157 0.00452992 0.00452304 0.00449044 0.00437413
 0.00418169 0.00417229 0.0040741  0.00391664 

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Found 100 objects.
[0.9785452  0.8780808  0.8361889  0.7508799  0.57874113 0.4969744
 0.16719551 0.16628632 0.1475123  0.13507499 0.13296469 0.11149114
 0.09334902 0.08116634 0.07981076 0.0734387  0.0727512  0.06550292
 0.06246734 0.04950894 0.04537462 0.03836504 0.0358756  0.03349325
 0.03225903 0.02772932 0.02564834 0.02446795 0.02171508 0.02169244
 0.02025024 0.01972702 0.01649113 0.01350184 0.01324136 0.01307821
 0.01302984 0.01294602 0.01261301 0.01256995 0.01206391 0.01167373
 0.01106824 0.00954537 0.00919462 0.00900873 0.00886381 0.00884432
 0.00842483 0.00752402 0.00719706 0.00671171 0.00624359 0.0061905
 0.00613227 0.00604937 0.00604339 0.005962   0.00570473 0.00567942
 0.00565076 0.00544587 0.0054434  0.00530543 0.00529949 0.0048144
 0.00464277 0.00438933 0.00436841 0.00426381 0.00415862 0.00410738
 0.003955   0.00394875 0.00390861 0.00357755 0.00356154 0.00340583
 0.00337822 0.00323494 0.00318031 0.00313879 0.00294527 0.00289319
 0.00285873 0.00285349 0.00277905 0.00277189 0

Let's now try running these experiments on *MobileNet V2*, another a convolutional neural network-based architecture for object detection. You can find an in-depth description of MobileNet V2 [here](https://machinethink.net/blog/mobilenet-v2/); the pretrained model can be downloaded [from this link](https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1).

In [8]:
# Loading the pretrained of MobileNet V2 and making it run on the "Naxos Taverna" image
module_handle = "https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1"
detector = hub.load(module_handle).signatures['default'] 

run_detector(detector, downloaded_image_path)

def detect_img(image_url):
  start_time = time.time()
  image_path = download_and_resize_image(image_url, 640, 480, display=False)  # downloaded image (at a 640x480 resolution) to be fed to the pretrained
  detector = hub.load(module_handle).signatures['default']    
  run_detector(detector, image_path) # make the pretrained run
  end_time = time.time()
  print("detect_img total time:", end_time-start_time)  # printing the elapsed time for the inference


# Applying detect_img on the 3 new images
for image in image_urls:
    print(image)
    detect_img(image_urls[image]) 

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Found 100 objects.
[0.3765772  0.3473224  0.32187063 0.2844692  0.252298   0.21863902
 0.21550015 0.21180788 0.20861351 0.20820129 0.20579556 0.1970036
 0.16537362 0.16522202 0.16399896 0.16175792 0.15904829 0.15305698
 0.15085104 0.14960766 0.14908716 0.14816138 0.14713651 0.14691901
 0.14652473 0.14401034 0.14371088 0.14278436 0.14215821 0.14187151
 0.14135218 0.14126116 0.1401403  0.14006689 0.13956839 0.1385769
 0.13807973 0.1365954  0.13603148 0.13237697 0.13161516 0.13148761
 0.13111785 0.13078484 0.13029683 0.13016969 0.12834069 0.12724724
 0.12670505 0.12630624 0.12574825 0.12539646 0.12488246 0.12438697
 0.12377411 0.12259111 0.12231299 0.1210233  0.11768624 0.1176098
 0.11675525 0.11671489 0.11646685 0.11625904 0.11608419 0.11562768
 0.11539444 0.11485761 0.11403474 0.11400717 0.11392587 0.11355361
 0.11322773 0.11233023 0.11186761 0.11146423 0.11138517 0.11125621
 0.11125278 0.11104095 0.11051321 0.10970974 0.10951686 0.10945681
 0.10942262 0.10937944 0.10925493 0.1088095  0

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Found 100 objects.
[0.903319   0.81101316 0.8081393  0.72810626 0.7162817  0.6965114
 0.6740742  0.39919484 0.3959492  0.3457873  0.3313371  0.29408523
 0.25905573 0.25874218 0.24704593 0.24266851 0.16455656 0.14731571
 0.13561413 0.12256306 0.1073859  0.10717526 0.10425571 0.10405046
 0.10098594 0.10031408 0.09838665 0.09403858 0.09398621 0.08897996
 0.08896789 0.08833358 0.08786052 0.08754873 0.08682534 0.08631313
 0.08615479 0.08604211 0.08387071 0.08344099 0.08254766 0.08232611
 0.08231997 0.08112463 0.08071682 0.08002344 0.07910737 0.07738921
 0.07716912 0.07709894 0.07690558 0.07685614 0.07680035 0.0756444
 0.07527956 0.07525915 0.07510081 0.07503638 0.07426032 0.07382214
 0.07351065 0.07335415 0.07315612 0.07276997 0.07173935 0.07164529
 0.07137984 0.07118767 0.07067242 0.07062802 0.07026353 0.06986114
 0.06972089 0.0689595  0.06884432 0.0685418  0.06838804 0.06828025
 0.06782037 0.06771821 0.06762806 0.06746873 0.06712642 0.06673852
 0.06658757 0.06640419 0.06593388 0.06573033 

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Found 100 objects.
[0.3088279  0.20544696 0.19057247 0.13318908 0.13017592 0.11788777
 0.1114637  0.09950134 0.09708747 0.08934605 0.08581877 0.08409065
 0.0837476  0.08128947 0.08090448 0.0796771  0.07665145 0.07351854
 0.06787941 0.06763098 0.06711239 0.06648007 0.06524613 0.06407431
 0.06344062 0.06337002 0.06195149 0.06105059 0.0608533  0.06070372
 0.0601666  0.05848241 0.05768564 0.05741993 0.0573009  0.05684248
 0.05608249 0.05582294 0.05571845 0.05448851 0.05446702 0.05443513
 0.05435085 0.05417299 0.05415317 0.05264318 0.05168974 0.05060163
 0.05004039 0.04965428 0.04951808 0.04944339 0.04944021 0.04930982
 0.04898804 0.0489732  0.04827729 0.0479933  0.04759657 0.04752758
 0.04606029 0.04595795 0.04588827 0.04564151 0.04560599 0.04542089
 0.04541913 0.04503351 0.04479921 0.04464462 0.04455972 0.04444045
 0.04443133 0.04429397 0.04353622 0.04304466 0.04298383 0.04291523
 0.0416919  0.0416204  0.04161537 0.04138911 0.04138735 0.04138336
 0.04104346 0.04101482 0.04093653 0.0408852

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Found 100 objects.
[0.38031673 0.23105821 0.22166541 0.20905161 0.1764996  0.17124161
 0.14889076 0.14184913 0.1369287  0.1288428  0.12685049 0.1250625
 0.11813104 0.11217046 0.10798398 0.10727975 0.10532096 0.10321879
 0.10308805 0.10076255 0.09951413 0.09887889 0.09655422 0.09627086
 0.0962474  0.0959844  0.09588021 0.09553578 0.09172225 0.09142306
 0.0907211  0.09047785 0.08967301 0.0894784  0.08920169 0.08914313
 0.08908731 0.08624831 0.08569485 0.08528277 0.08520728 0.085105
 0.08295664 0.08096156 0.08068433 0.08062741 0.08033827 0.08031407
 0.07976875 0.07971889 0.07947978 0.07942268 0.07891569 0.07854143
 0.0781672  0.07811797 0.07781231 0.07778296 0.07778034 0.07765561
 0.07704484 0.07676065 0.07637134 0.07579958 0.07535803 0.07531258
 0.075234   0.07503754 0.07499233 0.07496595 0.07457471 0.07425612
 0.07412714 0.07387203 0.07381183 0.07378682 0.07369411 0.07359469
 0.07347983 0.0734565  0.07316697 0.07313499 0.07303286 0.07301268
 0.07283393 0.07281467 0.07226551 0.07199922 0