#### Copyright 2020 Google LLC.

In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Group Members 
* **Patricia**
* **Jalen**
* **Javi**



# Video Classification with Pre-Trained Models Project

In this project we will import a pre-existing model that recognizes objects and use the model to identify those objects in a video. We'll edit the video to draw boxes around the identified object, and then we'll reassemble the video so the boxes are shown around objects in the video.

# Exercises

## Exercise 1: Coding

You will process a video frame by frame, identify objects in each frame, and draw a bounding box with a label around each car in the video.
 
Use the [SSD MobileNet V1 Coco](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md) (*ssd_mobilenet_v1_coco*) model. The video you'll process can be found [on Pixabay](https://pixabay.com/videos/cars-motorway-speed-motion-traffic-1900/). The 640x360 version of the video is smallest and easiest to handle, though any size should work since you must scale down the images for processing.
 
Your program should:
 
* Read in a video file (use the one in this colab if you want)
* Load the TensorFlow model linked above
* Loop over each frame of the video
* Scale the frame down to a size the model expects
* Feed the frame to the model
* Loop over detections made by the model
* If the detection score is above some threshold, draw a bounding box onto the frame and put a label in or near the box
* Write the frame back to a new video
 
Some tips:
 
* Processing an entire video is slow, so consider truncating the video or skipping over frames during development. Skipping frames will make the video choppy. But you'll be able to see a wider variety of images than you would with a truncated video with all of the original frames in the clip.
* The model expects a 300x300 image. You'll likely have to scale your frames to fit the model. When you get a bounding box, that box is relative to the scaled image. You'll need to scale the bounding box out to the original image size.
* Don't start by trying to process the video. Instead, capture one frame and work with it until you are happy with your object detection, bounding boxes, and labels. Once you get those done, use the same logic on the other frames of the video.
* The [Coco labels file](https://github.com/nightrome/cocostuff/blob/master/labels.txt) can be used to identify classified objects.
 

### **Student Solution**

#### Reading the Video

OpenCV is an open source library for performing computer vision tasks. One of these tasks is reading and writing video frames. To read the `cars.mp4` video file, we use the [VideoCapture](https://docs.opencv.org/2.4/modules/highgui/doc/reading_and_writing_images_and_video.html#videocapture) class.

In [None]:
import cv2 as cv

cars_video = cv.VideoCapture('cars.mp4')

Once you have created a `VideoCapture` object, you can obtain information about the video that you are processing.

In [None]:
height = int(cars_video.get(cv.CAP_PROP_FRAME_HEIGHT))
width = int(cars_video.get(cv.CAP_PROP_FRAME_WIDTH))
fps = cars_video.get(cv.CAP_PROP_FPS)
total_frames = int(cars_video.get(cv.CAP_PROP_FRAME_COUNT))

print(f'height: {height}')
print(f'width: {width}')
print(f'frames per second: {fps}')
print(f'total frames: {total_frames}')
print(f'video length (seconds): {total_frames / fps}')

When you are done processing a video file, it is a good idea to release the VideoCapture to free up memory in your program.

In [None]:
cars_video.release()

#### Loading the Model

##### Download labels

In [None]:
import urllib.request
import os

base_url = 'https://raw.githubusercontent.com/nightrome/cocostuff/master/'
file_name = 'labels.txt'

url = base_url + file_name

urllib.request.urlretrieve(url, file_name)

os.listdir()

###### Label dictionary

In [None]:
labels = {}
label_file = open('labels.txt')

for line in label_file:
  line = line[:-1]
  key, value = line.split(': ')
  key = int(key)
  labels[key] = value

print(labels)

label_file.close()

One of the most common places is the [official list](https://github.com/tensorflow/models) of pre-trained models curated by TensorFlow developers. This is often referred to as the *TensorFlow Model Garden*.
 
In this course we will be utilizing models stored in the TensorFlow detection model zoo. The zoo has models built with [TensorFlow 1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md) and [TensorFlow 2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). We'll be focusing on the [Common Objects in Context (COCO)](http://cocodataset.org/) dataset. This dataset contains over 270,000 labeled images in 91 categories.

In order to use a pre-trained model, we first need to obtain the model file. For this Colab we'll visit the [TensorFlow detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md) and download the `ssd_mobilenet_v1_coco` model.

The direct link to the model changes as the model is updated, so you'll need to browse the links in the zoo and find the model. Once you click the download link, you'll have a file on your system named similarly to (but not necessarily exactly the same as):

 > `ssd_mobilenet_v1_coco_2018_01_28.tar.gz`

This is a compressed version of the model file. It is a gzipped (`.gz`) tape archive (`.tar`) file. If you want to explore the file on your local system, you might need to install a program such as [7-zip](https://www.7-zip.org/). On Mac and Linux systems you should be able to right-click on the file and extract the contents without any extra software. If you are comfortable with the command line, you can use the following command to extract the file contents.

  > `tar -xzvf ssd_mobilenet_v1_coco_2018_01_28.tar.gz`

And finally, if you just want to directly load the file to this Colab, update the file name in the code snippet below and run the code.

Also notice that the documentation for the model we are downloading says the model was built using TensorFlow version 1.

##### Download model file

In [None]:
import urllib.request
import os

base_url = 'http://download.tensorflow.org/models/object_detection/'
file_name = 'ssd_mobilenet_v1_coco_2018_01_28.tar.gz'

url = base_url + file_name

urllib.request.urlretrieve(url, file_name)

os.listdir()

##### Extract the Model Data (Unzip file)

In order to load the model, it must be extracted from the compressed archive file (also called a "tarball" in this case). We will use Python's `tarfile` module to extract the contents of the file. The contents of the file will be saved in a directory named after the file. For example, the contents of `ssd_mobilenet_v1_coco_2018_01_28.tar.gz` will be saved in the `ssd_mobilenet_v1_coco_2018_01_28` directory.

In [None]:
import tarfile
import shutil

dir_name = file_name[0:-len('.tar.gz')]

if os.path.exists(dir_name):
  shutil.rmtree(dir_name)

tarfile.open(file_name, 'r:gz').extractall('./')

os.listdir(dir_name)

#### Loading the Frozen Graph

There are some interesting files contained in the archive, including checkpoints that can be used for resuming model training from a specific point. We care mostly about the `frozen_inference_graph.pb` file; this file contains a trained TensorFlow graph we can use for classification.

We can load the frozen graph using TensorFlow's `GFile` method to open the file, then call `GraphDef.ParseFromString` to load the graph into memory.

##### Import Tensorflow

In [None]:
%tensorflow_version 2.x

import tensorflow as tf
tf.__version__

In [None]:
import tensorflow as tf

frozen_graph = os.path.join(dir_name, 'frozen_inference_graph.pb')

with tf.io.gfile.GFile(frozen_graph, "rb") as f:
  graph_def = tf.compat.v1.GraphDef()
  loaded = graph_def.ParseFromString(f.read())

import urllib.request

Remember that TensorFlow allows you to request execution to any node in the graph. We want to know all of the *detection* outputs that we discovered in the graph. These were:

  * num_detections
  * detection_scores
  * detection_boxes
  * detection_classes

We will build a list of *outputs* that we want TensorFlow to generate.

##### Run the Graph

Now that we have a graph loaded, we need to test it out. Let's download an [image of a car](https://pixabay.com/illustrations/car-sports-car-racing-car-speed-49278/) and upload that image to Colab. Rename the file `car.jpg` or change the name of the `image_filename` variable below to match the name of the file you uploaded.

In [None]:
outputs = (
  'num_detections:0',
  'detection_classes:0',
  'detection_scores:0',
  'detection_boxes:0',
)

We can now execute the graph requesting our outputs and providing inputs.

In order to do this, we must first wrap the graph. This is necessary due to compatibility issues between TensorFlow version 1 and 2.

In [None]:
def wrap_graph(graph_def, inputs, outputs, print_graph=False):
  wrapped = tf.compat.v1.wrap_function(
    lambda: tf.compat.v1.import_graph_def(graph_def, name=""), [])

  return wrapped.prune(
    tf.nest.map_structure(wrapped.graph.as_graph_element, inputs),
    tf.nest.map_structure(wrapped.graph.as_graph_element, outputs))
    
model = wrap_graph(graph_def=graph_def,
                   inputs=["image_tensor:0"],
                   outputs=outputs)

And then to make predictions, we convert our image into a tensor and pass it to the model.

In [None]:
# tensor = tf.convert_to_tensor(input_frame, dtype=tf.uint8)

# detections = model(tensor)

##### Main Code - Output Video and Detections

Loop over the detections of the model
###### How do I access element of tensor

In [None]:
# detections[0][0] # number of detections

In [None]:
# import matplotlib.pyplot as plt

# cars_video = cv.VideoCapture('cars.mp4')
# cars_video.set(cv.CAP_PROP_POS_FRAMES, 850)
# ret, frame = cars_video.read()
# if not ret:
#   raise Exception(f'Problem reading frame from video')

# cars_video.release()

# frame = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
# plt.imshow(frame)
# plt.show()

In [None]:
# wrap Tensorflow 1 and Tensorflow 2
def wrap_graph(graph_def, inputs, outputs, print_graph=False):
  wrapped = tf.compat.v1.wrap_function(
    lambda: tf.compat.v1.import_graph_def(graph_def, name=""), [])

  return wrapped.prune(
    tf.nest.map_structure(wrapped.graph.as_graph_element, inputs),
    tf.nest.map_structure(wrapped.graph.as_graph_element, outputs))
    
model = wrap_graph(graph_def=graph_def,
                   inputs=["image_tensor:0"],
                   outputs=outputs)

In [None]:
# prepare output video
import os
import matplotlib.pyplot as plt

input_video = cv.VideoCapture('cars.mp4')

height = int(input_video.get(cv.CAP_PROP_FRAME_HEIGHT))
width = int(input_video.get(cv.CAP_PROP_FRAME_WIDTH))
fps = input_video.get(cv.CAP_PROP_FPS)

fourcc = cv.VideoWriter_fourcc(*'mp4v')
output_video = cv.VideoWriter('cars-project4.mp4', fourcc, fps, (width, height))

In [None]:
# grabs the 8th frame from the video
for i in range(0, total_frames):
  if i%8 == 0:
    input_video = cv.VideoCapture('cars.mp4')
    input_video.set(cv.CAP_PROP_POS_FRAMES, i)
    ret, frame = input_video.read() # gets frame i from video
    if not ret:
      raise Exception(f"Problem reading frame {i} from video")

    input_frame = [frame] # resize frame for model

    # make detections
    tensor = tf.convert_to_tensor(input_frame, dtype=tf.uint8)
    detections = model(tensor)

    for i in range(100): # num_detections:0
      confidence_score = detections[2][0][i] # detection_scores:0
  
      # only draw boundaries and labels for scores over 50%
      if confidence_score > 0.5:
        print(confidence_score)
          
        # draw boxes
        box = detections[3][0][i] # detection_boxes:0
        box_top = box[0]
        box_left = box[1]
        box_bottom = box[2]
        box_right = box[3]

        left = int(box_left * width)
        right = int(box_right * width)
        top = int(box_top * height)
        bottom = int(box_bottom * height)

        # black text and boundaries
        r = 0
        g = 0
        b = 0

        cv.rectangle(frame, (left, top), (right, bottom), (r, g, b), thickness=2)

        # add text
        label_id = int(detections[1][0][i]) # detection_classes:0
        label_name = labels[label_id]

        scale = 1.0
        thickness = 2

        # center_h = int((box_left + box_right) / 2)
        # center_v = int((box_bottom + box_top) / 2)

        cv.putText(frame, label_name, (right, bottom), cv.FONT_HERSHEY_SIMPLEX, scale,
              [r, g, b], thickness)

        
        plt.imshow(frame)
        plt.show()

    output_video.write(frame) # write the frame to the output video
  
input_video.release()

output_video.release()

os.listdir('./')

We inspected our model and found that it accepts a list of variable-sized images and that it returns:
- the number of matches
- the class
- the confidence
- the bounding boxes for each object found in an image

---

## Exercise 2: Ethical Implications

Even the most basic models have the potential to affect segments of the population in different ways. It is important to consider how your model might positively and negatively affect different types of users.

In this section of the project, you will reflect on the positive and negative implications of your model. Frame the context of your model creation using this narrative:

> The city of Seattle is attempting to reduce traffic congestion in its downtown area. As part of this project, they plan to allow each local driver one free trip to downtown Seattle per week. After that, the driver will have to pay a $50 toll for each extra day per week driven. As an early proof of concept for this project, your team is tasked with using machine learning to correctly identify automobiles on the road. The next phase of the project will involve detecting license plate numbers and then cross-referencing that data with RFID chips that should be mounted in all local drivers' cars.

### **Student Solution**

**Positive Impact**

Your model is trying to solve a problem. Think about who will benefit from that problem being solved and write a brief narrative about how the model will help.

> *Residents of Seattle will benefit with reduced traffic congestion and less smog since there will be less drivers on the road because of the $50 toll. This will hopefully incentivize carpooling or other forms of travel to downtown Seattle. Moslty the city of Seattle will benefit with more income coming from these tolls.*


**Negative Impact**

Models rarely benefit everyone equally. Think about who might be negatively impacted by the predictions your model is making. This person(s) might not be directly using the model, but they might be impacted indirectly.

> *Regular commuters that have to travel to downtown Seattle will be negatively affected because of the $50 toll. This will most likely hurt people of lower-income since they won't be able to repeatedly pay for the toll.*

**Bias**

Models can be biased for many reasons. The bias can come from the data used to build the model (e.g., sampling, data collection methods, available sources) and/or from the interpretation of the predictions generated by the model.

Think of at least two ways bias might have been introduced to your model and explain both below.

> *One source of bias in the model could be... sample bias. Our sample is only a video of cars. What about the different types of vehicles that travel. Our model is expected to be considerably less accurate when it comes to motorcycles, SUV's, trucks, and 18-wheelers. Could our model detect an emergency vehicle and forego its payment*

The interpretation of the predictions generated from the model, which has the the intention of reducing traffic congestion in the Seattle downtown area, could introduce bias in the sense that it could then generate more congestion in different areas. In an effort to avoid the $50 toll after using their free trip, this will create congestion in different areas. 


**Changing the Dataset to Mitigate Bias**

Having bias in your dataset is one of the primary ways in which bias is introduced to a machine learning model. Look back at the input data you fed to your model. Think about how you might change something about the data to reduce bias in your model.

What change or changes could you make to reduce the bias in your dataset? Consider the data you have, how and where it was collected, and what other sources of data might be used to reduce bias.

Write a summary of changes that could be made to your input data.

> *Since the data has potential bias A we can adjust... our input data. Maybe create a model with more diverse vehicles. Possibly detecting their type and putting them in different categories to charge different tolls based on weight or vehicle emission*

**Changing the Model to Mitigate Bias**

Is there any way to reduce bias by changing the model itself? This could include modifying algorithmic choices, tweaking hyperparameters, etc.

Write a brief summary of changes you could make to help reduce bias in your model.

> *Since the model has potential bias A, we can adjust... increase the threshold for confidence from 50% to maybe 65%.*

**Mitigating Bias Downstream**

Models make predictions. Downstream processes make decisions. What processes and/or rules should be in place for people and systems interpreting and acting on the results of your model to reduce bias? Describe these rules and/or processes below.

> Since the predictions have potential bias towards reading the license plate of cars, certain rules should be in place to make sure that no major laws/rules are being made on these results because it is not representative of all the vehicles riding on these roads. Taking action based on the results of the model could be somewhat detrimental to the communities to they are in.

---