#### Copyright 2019 Google LLC.

In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Using Pre-Trained Models Project

In this project we will import a pre-existing model that recognizes objects and use the model to identify those objects in a video. We'll edit the video to draw boxes around the identified object and then reassemble the video so that the boxes are shown around objects in the video.

## Overview

### Learning Objectives

* Use OpenCV to process images and video.
* Use a pre-trained model to identify and label objects in each frame of a video.
* Make judgements about classification quality and when to apply predicted labels.

### Prerequisites

* Classification
* Saving and Loading Models
* OpenCV
* Video Processing

# Exercises

## Exercise 1: Coding

For this workshop you will process a video frame-by-frame, identify objects in each frame, and draw a bounding box and label around each object.
 
Use the [SSD MobileNet V1 Coco](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) 'ssd_mobilenet_v1_coco' model. The video that you'll be processing can be [found on Pixabay](https://pixabay.com/videos/cars-motorway-speed-motion-traffic-1900/). The 640x360 version of the video is smallest and easiest to handle, though any should work since you must scale down the images for processing.
 
**Graded** demonstrations of competency:
1. Obtain the pre-trained model from the [TensorFlow Zoo](https://github.com/tensorflow/models).
1. Load the pre-trained model into a TensorFlow object.
1. Obtain a video file from Pixabay to use for classification.
1. Process the video frame-by-frame creating a modified output video.
1. Apply a classification model to an image.
1. Draw a bounding box around classified objects in each image.
 
The flow of the program is roughly:
 
* Read in a video file (use the one in this colab if you want)
* Load the TensorFlow model
* Loop over each frame of the video
> * Scale the frame down to a size the model expects
 * Feed the frame to the model
 * Loop over detections made by the model
 >  * If the detection score is above some threshold draw a bounding box onto the frame and put a label in or near the box
   * Write the frame back to a new video
 
Some tips:
 
* Processing an entire video is slow, consider truncating the video or skipping over frames during development. Skipping frames will make the video choppy, but you'll be able to see a wider variety of images than you would in a truncated video with all of the original frames in the clip.
* The model expects a 300x300 image. You'll likely have to scale your frames to fit the model. When you get a bounding box that box is relative to the scaled image. You'll need to scale the bounding box out to the original image size.
* Don't start by trying to process the video. Instead, capture one frame and work with it until you are happy with your object detection, bounding boxes, and labels. Once you get those done worry about video handling.
* The [Coco labels file](https://github.com/nightrome/cocostuff/blob/master/labels.txt) can be used to identify classified objects.
 

### Student Solution

In [None]:
import urllib.request
import os
import tarfile
import shutil
import tensorflow as tf

base_url = 'http://download.tensorflow.org/models/object_detection/'
file_name = 'ssd_mobilenet_v1_coco_2018_01_28.tar.gz'

url = base_url + file_name
urllib.request.urlretrieve(url, file_name)
os.listdir()

dir_name = file_name[0:-len('.tar.gz')]
if os.path.exists(dir_name):
  shutil.rmtree(dir_name) 

tarfile.open(file_name, 'r:gz').extractall('./')
os.listdir(dir_name)

frozen_graph = os.path.join(dir_name, 'frozen_inference_graph.pb')

with tf.gfile.FastGFile(frozen_graph,'rb') as f:
  graph_def = tf.GraphDef()
  graph_def.ParseFromString(f.read())


In [None]:
outputs = (
    'num_detections',
    'detection_classes',
    'detection_scores',
    'detection_boxes',
)

In [None]:
import cv2 as cv
import matplotlib.pyplot as plt

input_video = cv.VideoCapture('cars-sampled.mp4')

height = int(input_video.get(cv.CAP_PROP_FRAME_HEIGHT))
width = int(input_video.get(cv.CAP_PROP_FRAME_WIDTH))
fps = input_video.get(cv.CAP_PROP_FPS)
totalframe =  input_video.get(cv.CAP_PROP_FRAME_COUNT)



In [None]:
fourcc = cv.VideoWriter_fourcc(*'mp4v')
totalframe =  input_video.get(cv.CAP_PROP_FRAME_COUNT)

input_frames = []

for i in range(0, int(totalframe)):
  input_video.set(cv.CAP_PROP_POS_FRAMES, i)
  ret, frame = input_video.read()
  if not ret:
    raise Exception("Problem reading frame", i, " from video")
  input_frames.append(frame)

input_video.release()

In [None]:
# Only cars
with tf.Session() as sess:
  sess.graph.as_default()
  tf.import_graph_def(graph_def,name='')

  detections = sess.run(
        [sess.graph.get_tensor_by_name(f'{op}:0') for op in outputs],
        feed_dict={ 'image_tensor:0': input_frames }
  )
  
  fourcc = cv.VideoWriter_fourcc(*'mp4v')
  output_video = cv.VideoWriter('cars-sampled-detected-select.mp4', fourcc, fps, (width, height))
  for frame in range(len(input_frames)):
    current_frame = input_frames[frame]
    for j in range(detections[0][frame].astype(int)):
      if detections[1][frame][j] == 3:
      # Threshold
        if detections[2][frame][j] > 0.7:
          left = int(current_frame.shape[1]*detections[3][frame][j][1])
          right = int(current_frame.shape[1]*detections[3][frame][j][3])
          top = int(current_frame.shape[0]*detections[3][frame][j][0])
          bottom = int(current_frame.shape[0]*detections[3][frame][j][2])

          r = 0
          g = 255
          b = 0
          cv.rectangle(current_frame, (left, top), (right, bottom), (r, g, b), thickness=8)
          cv.putText(current_frame,'Car',(left, top),cv.FONT_HERSHEY_DUPLEX,1,(0,0,0),2)
      output_video.write(current_frame)

output_video.release()

In [None]:
label_dict = {
0: "unlabeled",
1: "person",
2: "bicycle",
3: "car",
4: "motorcycle",
5: "airplane",
6: "bus",
7: "train",
8: "truck",
9: "boat",
10: "traffic light",
11: "fire hydrant",
12: "street sign",
13: "stop sign",
14: "parking meter",
15: "bench",
16: "bird",
17: "cat",
18: "dog",
19: "horse",
20: "sheep",
21: "cow",
22: "elephant",
23: "bear",
24: "zebra",
25: "giraffe",
26: "hat",
27: "backpack",
28: "umbrella",
29: "shoe",
30: "eye glasses",
31: "handbag",
32: "tie",
33: "suitcase",
34: "frisbee",
35: "skis",
36: "snowboard",
37: "sports ball",
38: "kite",
39: "baseball bat",
40: "baseball glove",
41: "skateboard",
42: "surfboard",
43: "tennis racket",
44: "bottle",
45: "plate",
46: "wine glass",
47: "cup",
48: "fork",
49: "knife",
50: "spoon",
51: "bowl",
52: "banana",
53: "apple",
54: "sandwich",
55: "orange",
56: "broccoli",
57: "carrot",
58: "hot dog",
59: "pizza",
60: "donut",
61: "cake",
62: "chair",
63: "couch",
64: "potted plant",
65: "bed",
66: "mirror",
67: "dining table",
68: "window",
69: "desk",
70: "toilet",
71: "door",
72: "tv",
73: "laptop",
74: "mouse",
75: "remote",
76: "keyboard",
77: "cell phone",
78: "microwave",
79: "oven",
80: "toaster",
81: "sink",
82: "refrigerator",
83: "blender",
84: "book",
85: "clock",
86: "vase",
87: "scissors",
88: "teddy bear",
89: "hair drier",
90: "toothbrush",
99: "other",
}

In [None]:
# All labels
with tf.Session() as sess:
  sess.graph.as_default()
  tf.import_graph_def(graph_def,name='')

  detections = sess.run(
        [sess.graph.get_tensor_by_name(f'{op}:0') for op in outputs],
        feed_dict={ 'image_tensor:0': input_frames }
  )
  
  fourcc = cv.VideoWriter_fourcc(*'mp4v')
  output_video = cv.VideoWriter('cars-sampled-detected-all.mp4', fourcc, fps, (width, height))
  for frame in range(len(input_frames)):
    current_frame = input_frames[frame]
    for j in range(detections[0][frame].astype(int)):
      if detections[1][frame][j] == 3:
      # Threshold
        if detections[2][frame][j] > 0.7:
          left = int(current_frame.shape[1]*detections[3][frame][j][1])
          right = int(current_frame.shape[1]*detections[3][frame][j][3])
          top = int(current_frame.shape[0]*detections[3][frame][j][0])
          bottom = int(current_frame.shape[0]*detections[3][frame][j][2])

          r = 0
          g = 255
          b = 0
          cv.rectangle(current_frame, (left, top), (right, bottom), (r, g, b), thickness=8)
          cv.putText(current_frame,'Car',(left, top),cv.FONT_HERSHEY_DUPLEX,1,(0,0,0),2)
      else:
        left = int(current_frame.shape[1]*detections[3][frame][j][1])
        right = int(current_frame.shape[1]*detections[3][frame][j][3])
        top = int(current_frame.shape[0]*detections[3][frame][j][0])
        bottom = int(current_frame.shape[0]*detections[3][frame][j][2])

        r = 0
        g = 0
        b = 255
        cv.rectangle(current_frame, (left, top), (right, bottom), (r, g, b), thickness=8)
        cv.putText(current_frame,label_dict.get(j,99),(left, top),cv.FONT_HERSHEY_DUPLEX,1,(0,0,0),2)
      output_video.write(current_frame)

output_video.release()

## Exercise 2: Ethical Implications

Even the most basic of models have the potential to affect segments of the population in different ways. It is important to consider how your model might positively and negative effect different types of users.

In this section of the project you will reflect on the positive and negative implications of your model.

Frame the context of your models creation using this narriative:\n",

  > The city of Seattle is attempting to reduce traffic congestion in their downtown area. As part of this project, they plan on allowing each local driver one free downtown trip per week. After that the driver will have to pay a $50 toll for each extra day per week driven. As an early proof-of-concept for this project your team is tasked with using machine learning to correctly identify automobiles on the road. The next phase of the project will involve detecting license plate numbers and then cross-referencing that data with RFID chips that should be mounted in all local drivers cars."


### Student Solution

**Positive Impact**

Your model is trying to solve a problem. Think about who will benefit from that problem being solved and write a brief narrative about how the model will help.

---

Our model can help reduce traffic. It will be able to detect how many drivers made more than one trip to downtown and will therefore pay the required fees.

**Negative Impact**

Models don't often have universal benefit. Think about who might be negatively impacted by the predictions your model is making. This person or persons might not be directly using the model, but instead might be impacted indirectly.

---

This model could have a negative impact on the public in general. As much as it helps to reduce the traffic, people might want to visit downtown more than once and they have to pay. 
Although we use ML to detect objects, the results will not always be accurate. This is because some objects do not get detected which means the traffic results will be affected.


**Bias**

Models can be bias for many reasons. The bias can come from the data used to build the model (eg. sampling, data collection methods, available sources) and from the interpretation of the predictions generated by the model.

Think of at least two ways that bias might have been introduced to your model and explain both below.

---

As mentioned above, even if we have machine learning to detect objects, not all objects are detected. This brings in bias because some cars might not be detected which means some people could make more than one trip to downtown without any fees.

 Another bias is that some cars are being labelled as motorcycle.

**Changing the Dataset to Mitigate Bias**

Bias datasets are one of the primary ways in which bias is introduced to a machine learning model. Look back at the input data that you fed to your model. Think about how you might change something about the data to reduce bias in your model.

What change or changes could you make to your dataset less bias? Consider the data that you have, how and where that data was collected, and what other sources of data might be used to reduce bias.

Write a summary of change that could be made to your input data.

---

Retrain the model. Makesure the detections are 90% accurate.
Provide multiple images of different vehicles in the dataset.

**Changing the Model to Mitigate Bias**

Is there any way to reduce bias by changing the model itself? This could include modifying algorithmic choices, tweaking hyperparameters, etc.

Write a brief summary of changes that you could make to help reduce bias in your model.

---

As mentioned above, retrain the model with multiple images of the different colors/types/angles of cars 

**Mitigating Bias Downstream**

Models make predictions. Downstream processes make decisions. What processes and/or rules should be in place for people and systems interpreting and acting on the results of your model to reduce the bias? Describe these below.

---

When a car is detected visiting downtown more than once, the system should compare the license plate numbers to the former detection of the same car and make sure the plate numbers are the same.