<a href="https://colab.research.google.com/github/ian-mcnair/ForageSnap/blob/master/iNaturalist_Pre_trained_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2019 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Using Pre-Trained Models Project

In this project we will import a pre-existing model that recognizes objects and use the model to identify those objects in a video. We'll edit the video to draw boxes around the identified object and then reassemble the video so that the boxes are shown around objects in the video.

## Overview

### Learning Objectives

* Use OpenCV to process images and video.
* Use a pre-trained model to identify and label objects in each frame of a video.
* Make judgements about classification quality and when to apply predicted labels.

### Prerequisites

* Classification
* Saving and Loading Models
* OpenCV
* Video Processing

### Estimated Duration

330 minutes (285 minutes working time, 45 minutes for presentations)

### Deliverables

1. A copy of this Colab notebook containing your code and responses to the ethical considerations below. The code should produce a functional labeled video.
1. A group presentation. After everyone is done, we will ask each group to stand in front of the class and give a brief presentation about what they have done in this lab. The presentation can be a code walkthrough, a group discussion, a slide show, or any other means that conveys what you did over the course of the day and what you learned. If you do create any artifacts for your presentation, please share them in the class folder.

### Grading Criteria

This project is graded in separate sections that each contribute a percentage of the total score:

1. Building and Using a Model (80%)
1. Ethical Implications (10%)
1. Project Presentation (10%)

#### Building and Using a Model

There are 6 demonstrations of competency listed in the problem statement below. Each competency is graded on a 3 point scale for a total of 18 available points. The following rubric will be used:

| Points | Description |
|--------|-------------|
| 0      | No attempt at the competency |
| 1      | Attempted competency, but in an incorrect manner |
| 2      | Attempted competency correctly, but sub-optimally |
| 3      | Successful demonstration of competency |


#### Ethical Implications

There are six questions in the **Ethical Implications** secion. Each question is worth 2 points. The rubric for calculating those points is:

| Points | Description |
|--------|-------------|
| 0      | No attempt at question or answer was off-topic or didn't make sense |
| 1      | Question was answered, but answer missed important considerations  |
| 2      | Answer adequately considered ethical implications |

#### Project Presentation

The project presentation will be graded on participation. All members of a team should actively participate.

## Team

Please enter your team members names in the placeholders in this text area:

*   *Team Member Placeholder*
*   *Team Member Placeholder*
*   *Team Member Placeholder*



# Exercises

## Exercise 1: Coding

For this workshop you will process a video frame-by-frame, identify objects in each frame, and draw a bounding box and label around each object.
 
Use the [SSD MobileNet V1 Coco](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) 'ssd_mobilenet_v1_coco' model. The video that you'll be processing can be [found on Pixabay](https://pixabay.com/videos/cars-motorway-speed-motion-traffic-1900/). The 640x360 version of the video is smallest and easiest to handle, though any should work since you must scale down the images for processing.
 
**Graded** demonstrations of competency:
1. Obtain the pre-trained model from the [TensorFlow Zoo](https://github.com/tensorflow/models).
1. Load the pre-trained model into a TensorFlow object.
1. Obtain a video file from Pixabay to use for classification.
1. Process the video frame-by-frame creating a modified output video.
1. Apply a classification model to an image.
1. Draw a bounding box around classified objects in each image.
 
The flow of the program is roughly:
 
* Read in a video file (use the one in this colab if you want)
* Load the TensorFlow model
* Loop over each frame of the video
> * Scale the frame down to a size the model expects
 * Feed the frame to the model
 * Loop over detections made by the model
 >  * If the detection score is above some threshold draw a bounding box onto the frame and put a label in or near the box
   * Write the frame back to a new video
 
Some tips:
 
* Processing an entire video is slow, consider truncating the video or skipping over frames during development. Skipping frames will make the video choppy, but you'll be able to see a wider variety of images than you would in a truncated video with all of the original frames in the clip.
* The model expects a 300x300 image. You'll likely have to scale your frames to fit the model. When you get a bounding box that box is relative to the scaled image. You'll need to scale the bounding box out to the original image size.
* Don't start by trying to process the video. Instead, capture one frame and work with it until you are happy with your object detection, bounding boxes, and labels. Once you get those done worry about video handling.
* The [Coco labels file](https://github.com/nightrome/cocostuff/blob/master/labels.txt) can be used to identify classified objects.
 

# Student Solution

## Import Statements 

In [0]:
# Dataframe for 
import pandas as pd

# Video Editing
import cv2 as cv
import matplotlib.pyplot as plt

# Webpage Interaction & Saving
import urllib.request
import os

# Unzipping and Saving Files
import tarfile
import shutil

# Creating the tf Model
import tensorflow as tf

# Creating a Loading Bar
from tqdm import tqdm_notebook as tqdm

## Get Detection Model Uploaded
### Accessing the File via Online Request

In [0]:
base_url = 'http://download.tensorflow.org/models/object_detection/'
file_name = 'faster_rcnn_resnet101_fgvc_2018_07_19.tar.gz'

url = base_url + file_name

urllib.request.urlretrieve(url, file_name)

dir_name = file_name[0:-len('.tar.gz')]

if os.path.exists(dir_name):
  shutil.rmtree(dir_name) 

tarfile.open(file_name, 'r:gz').extractall('./')

os.listdir(dir_name)

This just navigates to the correct website to download the model and related files

### Downloading and Storing Model

In [0]:
frozen_graph = os.path.join(dir_name, 'frozen_inference_graph.pb')

with tf.gfile.GFile(frozen_graph,'rb') as f:
  graph_def = tf.GraphDef()
  graph_def.ParseFromString(f.read())

### Using the model for a single image

In [0]:
image = cv.imread('/content/poison.jpg')
plt.imshow(image)
plt.show()

In [0]:
image.shape

In [0]:
outputs = (
    'num_detections',
    'detection_classes',
    'detection_scores',
    'detection_boxes' 
)

In [0]:
input_frame = [image]
with tf.Session() as sess:
  sess.graph.as_default()
  tf.import_graph_def(graph_def, name = '')

  detections = sess.run(
      [sess.graph.get_tensor_by_name(f'{op}:0') for op in outputs],
      feed_dict={ 'image_tensor:0': input_frame }
  )

In [0]:
detections

In [0]:
w,h,_ = image.shape
for j in range(int(detections[0][0])):

  # Bounding Boxes coordinate points
  left = int(h * detections[3][0][j][1]) 
  top = int(w* detections[3][0][j][0]) 
  right = int(h * detections[3][0][j][3])
  bottom = int(w* detections[3][0][j][2])

  # Text coordinate point
  w_center = int(abs(left - right)/2 -25) + left
  h_center = int(top-15)
  ind = int(detections[1][0][j])

  # Create Rectangle
  cv.rectangle(image, (left, top), 
                      (right, bottom), 
                      (255, 0, 255), thickness=2)

  plt.imshow(image)
  plt.show()

This accesses the model and saves it.

###Preprocessing Images

In [0]:
#use Keras Preprocessing

## Processing the Video

Getting the original video and its attributes. Creates the output_video which is used to create the final output with the correct squares and labels drawn.

In [0]:
# Outputs needed for detection
outputs = (
    'num_detections',
    'detection_classes',
    'detection_scores',
    'detection_boxes' 
)

# TQDM is used to create a status bar for those long iterations
for i in tqdm(range(0, total_frames , 75)):
  
  # Accessing Video and grabbing each 75th frame
  input_video.set(cv.CAP_PROP_POS_FRAMES, i)
  ret, frame = input_video.read()
  if not ret:
    raise Exception("Problem reading frame", i, " from video")

  # Frame is thrown into the model and outputs are saved in the detections list
  input_frame = [frame]
  with tf.Session() as sess:
    sess.graph.as_default()
    tf.import_graph_def(graph_def, name = '')

    detections = sess.run(
        [sess.graph.get_tensor_by_name(f'{op}:0') for op in outputs],
        feed_dict={ 'image_tensor:0': input_frame }
    )
  
  # Drawing Box and text labels
  w,h,_ = frame.shape
  for j in range(int(detections[0][0])):
    
    # Bounding Boxes coordinate points
    left = int(h * detections[3][0][j][1]) 
    top = int(w* detections[3][0][j][0]) 
    right = int(h * detections[3][0][j][3])
    bottom = int(w* detections[3][0][j][2])
    
    # Text coordinate point
    w_center = int(abs(left - right)/2 -25) + left
    h_center = int(top-15)
    ind = int(detections[1][0][j])
    text = labels.iloc[ind,0]
    
    # Create Rectangle
    cv.rectangle(frame, (left, top), 
                        (right, bottom), 
                        (255, 0, 255), thickness=2)
    # Draws Text
    cv.putText(frame, text, 
               (w_center, h_center), cv.FONT_HERSHEY_COMPLEX, 
               1.0, [255, 0, 255], 2)
  
  # Saves frame to output video
  output_video.write(frame)
  
#   Used to check on video, makes it much longer to process
#   Use only when needed
#   plt.imshow(frame)
#   plt.show()
  
# Removes videos from memory
input_video.release()
output_video.release()

In [0]:
cars = cv.VideoCapture('cars-sampled.mp4')

The above two code blocks do quite a bit. The general gist is:
- Grab single frame
- Run it through model
- Use model outputs to draw boxes and text on frame
- Save frame to output video
- Repeat