<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Overview" data-toc-modified-id="Overview-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Overview</a></span><ul class="toc-item"><li><span><a href="#Problem-statement" data-toc-modified-id="Problem-statement-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Problem statement</a></span></li><li><span><a href="#Motivation-and-Applications" data-toc-modified-id="Motivation-and-Applications-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Motivation and Applications</a></span></li><li><span><a href="#Why-Transfer-Learning" data-toc-modified-id="Why-Transfer-Learning-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Why Transfer Learning</a></span></li></ul></li><li><span><a href="#Background" data-toc-modified-id="Background-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Background</a></span><ul class="toc-item"><li><span><a href="#What-is-a-Neural-Network" data-toc-modified-id="What-is-a-Neural-Network-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>What is a Neural Network</a></span></li><li><span><a href="#What-is-Transfer-Learning" data-toc-modified-id="What-is-Transfer-Learning-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>What is Transfer Learning</a></span></li></ul></li><li><span><a href="#Modeling" data-toc-modified-id="Modeling-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Modeling</a></span><ul class="toc-item"><li><span><a href="#Installing-TensorFlow-and-Object-Detection-API" data-toc-modified-id="Installing-TensorFlow-and-Object-Detection-API-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Installing TensorFlow and Object Detection API</a></span></li><li><span><a href="#Preparing-the-Data" data-toc-modified-id="Preparing-the-Data-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Preparing the Data</a></span><ul class="toc-item"><li><span><a href="#Extract-Images-from-Video" data-toc-modified-id="Extract-Images-from-Video-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Extract Images from Video</a></span></li><li><span><a href="#Create-CSV-files-for-training/testing" data-toc-modified-id="Create-CSV-files-for-training/testing-3.2.2"><span class="toc-item-num">3.2.2&nbsp;&nbsp;</span>Create CSV files for training/testing</a></span></li><li><span><a href="#Convert-CSV-to-TFR-Format" data-toc-modified-id="Convert-CSV-to-TFR-Format-3.2.3"><span class="toc-item-num">3.2.3&nbsp;&nbsp;</span>Convert CSV to TFR Format</a></span></li></ul></li><li><span><a href="#Training-the-Model" data-toc-modified-id="Training-the-Model-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Training the Model</a></span><ul class="toc-item"><li><span><a href="#Preparing-the-environment" data-toc-modified-id="Preparing-the-environment-3.3.1"><span class="toc-item-num">3.3.1&nbsp;&nbsp;</span>Preparing the environment</a></span></li></ul></li></ul></li><li><span><a href="#part-2" data-toc-modified-id="part-2-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>part 2</a></span></li></ul></div>

# Overview

## Problem statement

<p>My goal in this series is to deploy a neural network capable of identifying and localizing pedestrians in an image (the combination of image classification and localization is called object detection). In the first part of the series I will do this by downloading a pretrained model and using transfer learning to fine tune it for my problem. In the second part, I will try to create and deploy a model from scratch.</p>

## Motivation and Applications

<p>My main motivation for this project was simply to gain an understanding of deep learning, a field I knew nothing about prior to this project. Furthermore, object detection problems tend to involve much deeper networks than object classification, and the concepts involved here are highly transferable across other deep learning domains. </p>
<p>Of course, object detection has some pretty cool applications in its own right. For example, security companies build on top of object detection models to do things like gait analysis and tracking people across multiple cameras. Self-driving cars need to be able to perform object detection to avoid hitting pedestrians and other cars. And one could imagine lots of fun home applications for object detection.</p>

## Why Transfer Learning

Practically speaking, you will almost always use transfer learning when dealing with neural network. The reason for this is twofold:
1. Using a pretrained model drastically cuts down the amount of resources needed to fine tune a network. By using a pretrained model you will require fewer training samples and less computing time for a comparable level of accuracy.
2. It is unlikely that you will be creating a deep learning model that is completely new. Most new models are really variations of existing models, so it makes sense to take advantage of existing work
Consequently, transfer learning is one of the most important skills you can have with respect to deep learning

# Background

## What is a Neural Network

Intuitively speaking, we can think of transfer learning in the following way: When a deep learning model is trained for an application it develops insights. For example, a deep learning network trained for image recognition may learn how to recognize different shapes and textures and use those attributes as features. So if we have are performing a similar task on new data we want to keep those ins

## What is Transfer Learning

Transfer learning is essentially repurposing an an existing deep learning model for a new but similar task. 

To understand when and why transfer learning is effective, remember that the layers in a trained model represent features that the model has learned to extract. Furthermore, the features extracted tend to become more particular to the problem as we go into deeper layers. For example, an animal classifier may have a layer that extracts basic shapes from the image, another layer that recognizes textures, and so on. As you can see from the example, the features a model learns to extract are often useful for similar tasks. So if we wanted to create a face detector, we could start from randomized layers and see which features emerged from our model. Or we could take advantage of the fact that the features extracted by our first model are useful for other image classification tasks, and use that as a starting point rather than reinvent the wheel.


We can break down
transfer learning into the following steps:
1. download a model that has already been trained on a dataset
2. make whatever adjustments you want to test on the model (e.g. adjust the learning rate)
3. strip last layers of the existing model and replace them with randomized layers
4. fine-tune layers on new data

A slightly more technical way of looking at transfer learning is from an optimization perspective. Remember, a neural network is composed of layers of weights that transform the data in each layer. And when we train a model, we are moving those weights to a more optimal value through gradient descent. When we use a new model, those weights' values are completely random to start. However, if we assume similar input and output, then the weights of our pretrained model are likely closer to their optimum values than completely random weights. So our pretrained model requires less adjustment (training steps) to be optimized.


# Modeling

## Installing TensorFlow and Object Detection API

## Preparing the Data

In [None]:
#Load Libraries

In [1]:
import cv2
import pandas as pd
import numpy as np
import os
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

<p>I currently have 3 videos of pedestrians using crosswalks in different situations as well as CSV files for each video which give the bounding box information for the pedestrians in the video in YOLO format (x, y, height, width).</p>
<p>My goal in this section is to break down each video into its component frames in jpg format. I then need to create a data frame which contains the following columns: file path, width, height, class, xmin, xmax, ymin, ymax</p>
<p>It is also worth noting that all of my videos have the same format, so I don't need to do any image resizing. However, if I wanted to generalize a pipeline, I could perform a combination zero-padding and scaling to make sure all images were the same size.</p>

### Extract Images from Video

In [3]:
#get list of video files
dataPath = 'data/'
dataFiles = os.listdir(dataPath)
videoFiles = [dataPath+file for file in dataFiles if file.endswith('.avi')]

In [None]:
#define function to turn video into images
def frame_capture(path): 
    
    cap = cv2.VideoCapture(path) 
    currentFrame = 0
    directory = path.strip('.avi')
    try:
        if not os.path.exists(directory):
            os.makedirs(directory)
    except OSError:
        print ('Error: Creating directory of data')

    
    #while(True):
    while (currentFrame < 1300):
        # Capture frame-by-frame
        ret, frame = cap.read()

        # Saves image of the current frame in jpg file
        name = directory + '/frame' + str(currentFrame) + '.jpg'
        cv2.imwrite(name, frame)

        # To stop duplicate images
        currentFrame += 1

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()
  

In [None]:
#turn each video file into a directory of image files
for file in videoFiles:
    frame_capture(file)

### Create CSV files for training/testing

In [4]:
boundingBoxFiles = ['data/night.csv', 'data/fourway.csv', 'data/crosswalk.csv']    

In [5]:
#Reformat CSV data into required format
pedestrian_labels = pd.DataFrame()
for file in boundingBoxFiles:
    name = file.replace('/','.').split('.')[1]
    df = pd.read_csv(file)
    new_df = pd.DataFrame()
    #create columns for new dataframe
    new_df['filename'] = df.index.astype(str)
    new_df['filename'] = 'data/'+ name+ '/frame'+ new_df['filename']+ '.jpg'
    new_df['width'] = df.w
    new_df['height'] = df.h
    new_df['class'] = 'pedestrian'
    new_df['xmin'] = df.x
    new_df['ymin'] = df.y-df.h
    new_df['xmax'] = df.x + df.w
    new_df['ymax'] = df.y
    #store to central data frame
    pedestrian_labels = pedestrian_labels.append(new_df)

In [11]:
#split data into training and testing set, then save file
train_labels = pedestrian_labels.sample(frac=0.8,random_state=17)
test_labels = pedestrian_labels.drop(train_labels.index)
pedestrian_labels.to_csv('data/pedestrian_labels.csv', index=False)
train_labels.to_csv('data/train_labels.csv', index=False)
test_labels.to_csv('data/test_labels.csv', index=False)

In [10]:
test_labels.head()

Unnamed: 0,filename,width,height,class,xmin,ymin,xmax,ymax
118,data/night/frame118.jpg,156,312,pedestrian,1631,161,1787,473
400,data/night/frame400.jpg,172,345,pedestrian,1199,139,1371,484
439,data/night/frame439.jpg,145,291,pedestrian,1040,179,1185,470
452,data/night/frame452.jpg,143,286,pedestrian,966,183,1109,469
455,data/night/frame455.jpg,141,283,pedestrian,946,183,1087,466


### Convert CSV to TFR Format

In order to convert my files to TensorFlowRecords format, I borrowed this <a href="https://github.com/datitran/raccoon_dataset/blob/master/generate_tfrecord.py">script</a> which comes from the guide I used to learn the object detection API, see this <a href=https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9>link</a> for reference. I simply had to change the file paths to correspond to where I was storing my data, and change "racoon" to "pedestrian."

## Training the Model

### Preparing the environment

<p> If you have not installed the object detection API already, you can follow this <a href="https://medium.com/@viviennediegoencarnacion/how-to-setup-tensorflow-object-detection-on-mac-a0b72fbf470a">guide</a>. Once you have it installed, you can choose a pre-trained model from <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md">here</a>. I chose this <a href="http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz">model</a> and <a href=https://github.com/tensorflow/models/blob/a4944a57ad2811e1f6a7a87589a9fc8a776e8d3c/object_detection/samples/configs/ssd_mobilenet_v1_pets.config>checkpoint</a> to get started.</p>
<p>Once I had everything downloaded, I created a folder named "training", and created a file named "label_map.pbtxt." I then placed moved the config file from the tar folder containing my pre-trained model into "training" adjusted the number of classes to 1, reduced the batch size to accommodate for my machine's memory, and changed the filepaths for the testing data, training data, and the label map. Finally, I made a copy "train.py" from "TensorFlow/models/research/object_detection/legacy/train.py" inside my working directory</p>


In [12]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
tf.enable_eager_execution()

import numpy as np
import IPython.display as display

In [16]:
filenames = ['data/train.record']
raw_image_dataset = tf.data.TFRecordDataset(filenames)

In [17]:
# Create a dictionary describing the features.  
image_feature_description = {
    'height': tf.FixedLenFeature([], tf.int64),
    'width': tf.FixedLenFeature([], tf.int64),
    'depth': tf.FixedLenFeature([], tf.int64),
    'label': tf.FixedLenFeature([], tf.int64),
    'image_raw': tf.FixedLenFeature([], tf.string),
}

def _parse_image_function(example_proto):
  # Parse the input tf.Example proto using the dictionary above.
  return tf.parse_single_example(example_proto, image_feature_description)

parsed_image_dataset = raw_image_dataset.map(_parse_image_function)
parsed_image_dataset

<MapDataset shapes: {depth: (), height: (), image_raw: (), label: (), width: ()}, types: {depth: tf.int64, height: tf.int64, image_raw: tf.string, label: tf.int64, width: tf.int64}>

In [18]:
for image_features in parsed_image_dataset.take(1):
  image_raw = image_features['image_raw'].numpy()
  display.display(display.Image(data=image_raw))

InvalidArgumentError: Feature: depth (data type: int64) is required but could not be found.
	 [[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_INT64], dense_keys=["depth", "height", "image_raw", "label", "width"], dense_shapes=[[], [], [], [], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1, ParseSingleExample/Const_2, ParseSingleExample/Const_3, ParseSingleExample/Const_4)]] [Op:IteratorGetNextSync]

In [None]:
import matplotlib
matplotlib.use('TkAgg')
%matplotlib inline

# part 2