<a href="https://colab.research.google.com/github/google/applied-machine-learning-intensive/blob/master/content/04_classification/08_video_processing_project/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2020 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Video Classification with Pre-Trained Models Project

In this project we will import a pre-existing model that recognizes objects and use the model to identify those objects in a video. We'll edit the video to draw boxes around the identified object, and then we'll reassemble the video so the boxes are shown around objects in the video.

# Exercises

## Exercise 1: Coding

You will process a video frame by frame, identify objects in each frame, and draw a bounding box with a label around each car in the video.
 
Use the [SSD MobileNet V1 Coco](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) (*ssd_mobilenet_v1_coco*) model. The video you'll process can be found [on Pixabay](https://pixabay.com/videos/cars-motorway-speed-motion-traffic-1900/). The 640x360 version of the video is smallest and easiest to handle, though any size should work since you must scale down the images for processing.
 
Your program should:
 
* Read in a video file (use the one in this colab if you want)
* Load the TensorFlow model linked above
* Loop over each frame of the video
* Scale the frame down to a size the model expects
* Feed the frame to the model
* Loop over detections made by the model
* If the detection score is above some threshold, draw a bounding box onto the frame and put a label in or near the box
* Write the frame back to a new video
 
Some tips:
 
* Processing an entire video is slow, so consider truncating the video or skipping over frames during development. Skipping frames will make the video choppy. But you'll be able to see a wider variety of images than you would with a truncated video with all of the original frames in the clip.
* The model expects a 300x300 image. You'll likely have to scale your frames to fit the model. When you get a bounding box, that box is relative to the scaled image. You'll need to scale the bounding box out to the original image size.
* Don't start by trying to process the video. Instead, capture one frame and work with it until you are happy with your object detection, bounding boxes, and labels. Once you get those done, use the same logic on the other frames of the video.
* The [Coco labels file](https://github.com/nightrome/cocostuff/blob/master/labels.txt) can be used to identify classified objects.
 

### **Student Solution**

In [0]:
# Your code goes here

---

### Answer Key

First download the smallest version of [this video](https://pixabay.com/videos/cars-motorway-speed-motion-traffic-1900/), and then upload it to the lab with the name `cars.mp4`.

Now we can download our pre-trained model and extract it.

In [0]:
import os
import shutil
import tarfile
import urllib.request

base_url = 'http://download.tensorflow.org/models/object_detection/'
file_name = 'ssd_mobilenet_v1_coco_2018_01_28.tar.gz'

url = base_url + file_name

urllib.request.urlretrieve(url, file_name)

dir_name = file_name[0:-len('.tar.gz')]

if os.path.exists(dir_name):
  shutil.rmtree(dir_name) 

tarfile.open(file_name, 'r:gz').extractall('./')

os.listdir(dir_name)

Next we will download a labels file, so we can more easily label our images.

In [0]:
base_url = 'https://raw.githubusercontent.com/nightrome/cocostuff/master/'
file_name = 'labels.txt'

url = base_url + file_name

urllib.request.urlretrieve(url, file_name)

with open(file_name, 'r') as f:
    labels = f.readlines()

# Remove the indexes
labels = list(map(lambda x: x[x.index(' '):], labels))

# Strip leading and trailing whitespace
labels = list(map(lambda x: x.strip(), labels))

print(len(labels))
print(labels[50])

Next we can load the graph and wrap it.

In [0]:
import tensorflow as tf

frozen_graph = os.path.join(dir_name, 'frozen_inference_graph.pb')

with tf.io.gfile.GFile(frozen_graph, "rb") as f:
    graph_def = tf.compat.v1.GraphDef()
    loaded = graph_def.ParseFromString(f.read())

outputs = (
    'num_detections:0',
    'detection_classes:0',
    'detection_scores:0',
    'detection_boxes:0',
)

def wrap_graph(graph_def, inputs, outputs, print_graph=False):
    wrapped = tf.compat.v1.wrap_function(
        lambda: tf.compat.v1.import_graph_def(graph_def, name=""), [])

    return wrapped.prune(
        tf.nest.map_structure(wrapped.graph.as_graph_element, inputs),
        tf.nest.map_structure(wrapped.graph.as_graph_element, outputs))
    
model = wrap_graph(graph_def=graph_def,
                   inputs=["image_tensor:0"],
                   outputs=outputs)

Next we can write a function that adds bounding boxes and labels to the detected objects in a frame.

In [0]:
def analyzeDetections(image, detections, show=True):
  """
  Display objects that come from the detections array, plot and label them
  """
  # num_detections:0
  detection_count = int(detections[0][0])
  print("Found", detection_count, "objects")
  
  # Get original image size
  height, width, _ = image.shape
  
  for i in range(detection_count):
    # detection_scores:0
    confidence_score = detections[2][0][i]
    
    if (confidence_score < .8):
      continue
    
    # detection_boxes:0
    box = detections[3][0][i]
    box_top = box[0]
    box_left = box[1]
    box_bottom = box[2]
    box_right = box[3]
    
    scaled_left, scaled_right = int(box_left * width), int(box_right * width)
    scaled_top, scaled_bottom = int(box_top * height), int(box_bottom * height)

    # detection_classes:0
    label_id = int(detections[1][0][i])
    label_name = labels[label_id]
    
    print("Found label {} (id {}) with a confidence of {}. Bounding box [top: {}, left: {}, bottom: {}, right: {}]".format(
        label_name, label_id, confidence_score, box_top, box_left, box_bottom, box_right))
    
    # Display the bounding box on the image
    cv.rectangle(image, (scaled_left, scaled_top), \
                 (scaled_right, scaled_bottom), (255, 0, 0), thickness=2)
    
    # Add text with black stroke
    cv.putText(image, "{}".format(label_name), \
               (scaled_left + 50, scaled_top + 50), cv.FONT_HERSHEY_SIMPLEX, \
               1.0, [0, 0, 0], 6)
    cv.putText(image, "{}".format(label_name), \
               (scaled_left + 50, scaled_top + 50), cv.FONT_HERSHEY_SIMPLEX, \
               1.0, [255, 255, 255], 2)
    
  if (show):
    plt.imshow(image)
    plt.show()
    
  return image

And finally, we detect objects in each frame and, frame by frame, write a new video with labeled output.

In [0]:
import cv2 as cv

cap = cv.VideoCapture('cars.mp4')

height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
fps = cap.get(cv.CAP_PROP_FPS)
total_frames = int(cap.get(cv.CAP_PROP_FRAME_COUNT))

fourcc = cv.VideoWriter_fourcc(*'MP4V')
output = cv.VideoWriter('labeled.mp4', fourcc, fps, (width, height))

for i in range(0, total_frames):
  if i % 100 == 0:
    print(i, total_frames)
  
  cap.set(cv.CAP_PROP_POS_FRAMES, i)
  ret, frame = cap.read()

  if not ret:
    raise Exception("Problem reading frame", i, " from video")

  tensor = tf.convert_to_tensor([frame], dtype=tf.uint8)

  detections = model(tensor)
  output.write(analyzeDetections(frame, detections, show=False))
  
cap.release()
output.release()

---

## Exercise 2: Ethical Implications

Even the most basic models have the potential to affect segments of the population in different ways. It is important to consider how your model might positively and negatively affect different types of users.

In this section of the project, you will reflect on the positive and negative implications of your model. Frame the context of your model creation using this narrative:

> The city of Seattle is attempting to reduce traffic congestion in its downtown area. As part of this project, they plan to allow each local driver one free trip to downtown Seattle per week. After that, the driver will have to pay a $50 toll for each extra day per week driven. As an early proof of concept for this project, your team is tasked with using machine learning to correctly identify automobiles on the road. The next phase of the project will involve detecting license plate numbers and then cross-referencing that data with RFID chips that should be mounted in all local drivers' cars.

### **Student Solution**

**Positive Impact**

Your model is trying to solve a problem. Think about who will benefit from that problem being solved and write a brief narrative about how the model will help.

> *Hypothetical entities will benefit because...*


**Negative Impact**

Models rarely benefit everyone equally. Think about who might be negatively impacted by the predictions your model is making. This person(s) might not be directly using the model, but they might be impacted indirectly.

> *Hypothetical entity will be negatively impacted because...*

**Bias**

Models can be biased for many reasons. The bias can come from the data used to build the model (e.g., sampling, data collection methods, available sources) and/or from the interpretation of the predictions generated by the model.

Think of at least two ways bias might have been introduced to your model and explain both below.

> *One source of bias in the model could be...*

> *Another source of bias in the model could be...*

**Changing the Dataset to Mitigate Bias**

Having bias in your dataset is one of the primary ways in which bias is introduced to a machine learning model. Look back at the input data you fed to your model. Think about how you might change something about the data to reduce bias in your model.

What change or changes could you make to reduce the bias in your dataset? Consider the data you have, how and where it was collected, and what other sources of data might be used to reduce bias.

Write a summary of changes that could be made to your input data.

> *Since the data has potential bias A we can adjust...*

**Changing the Model to Mitigate Bias**

Is there any way to reduce bias by changing the model itself? This could include modifying algorithmic choices, tweaking hyperparameters, etc.

Write a brief summary of changes you could make to help reduce bias in your model.

> *Since the model has potential bias A, we can adjust...*

**Mitigating Bias Downstream**

Models make predictions. Downstream processes make decisions. What processes and/or rules should be in place for people and systems interpreting and acting on the results of your model to reduce bias? Describe these rules and/or processes below.

> *Since the predictions have potential bias A, we can adjust...*

---

### Answer Key

*There are many acceptable solutions to this section. We are looking for critical thinking. An example solution follows.*

**Positive Impact**

Your model is trying to solve a problem. Think about who will benefit from that problem being solved, and write a brief narrative about how the model will help.

> *There are many potential beneficiaries. The government might make more money. Bikers might have less car traffic to deal with. Road workers might have to do fewer repairs.*

**Negative Impact**

Models rarely benefit everyone equally. Think about who might be negatively impacted by the predictions your model is making. This person(s) might not be directly using the model, but they might be impacted indirectly.

> *Obviously people who have to commute via car will be negatively impacted. Also, paid parking lots will suffer due to this model.*

**Bias**

Models can be biased for many reasons. The bias can come from the data used to build the model (e.g., sampling, data collection methods, available sources) and from the interpretation of the predictions generated by the model.

Think of at least two ways bias might have been introduced to your model, and explain both below.

> *The dataset, common objects in context, is curated by humans. The images in the dataset are submitted by humans. Their bias and data collection limitations can bleed into the dataset. For instance, if images of vehicles are taken by people living in cities, then there might be less data about trucks or other types of vehicles.*

> *The choice of "common objects" is itself a source of bias. Who chose these objects? Are they globally common?*

**Changing the Dataset to Mitigate Bias**

Having bias in your dataset is one of the primary ways in which bias is introduced to a machine learning model. Look back at the input data that you fed to your model. Think about how you might change something about the data to reduce bias in your model.

What change or changes could you make to reduce the bias in your dataset? Consider the data that you have, how and where it was collected, and what other sources of data might be used to reduce bias.

Write a summary of changes that could be made to your input data.

> *More and more diverse data is one change that could be made. The more we add and curate from different sources, the more balanced the model becomes.*

**Changing the Model to Mitigate Bias**

Is there any way to reduce bias by changing the model itself? This could include modifying algorithmic choices, tweaking hyperparameters, etc.

Write a brief summary of changes you could make to help reduce bias in your model.

> *The most obvious fix for this model is to actually pay attention to the confidences. There are probably some classifications that are being made that should be skipped.*

**Mitigating Bias Downstream**

Models make predictions. Downstream processes make decisions. What processes and/or rules should be in place for people and systems interpreting and acting on the results of your model to reduce bias? Describe these below.

> *This was already alluded to in the lab, but one way to minimize bias is to cross check the model. The model will identify vehicles and then cross check that with license and RFID data.*

---