<a href="https://colab.research.google.com/github/kylehounslow/gdg_workshop/blob/master/notebooks/keras_object_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Object Detection using "Black-Box" YoloV3 trained on ImageNet  
In this notebook we will leverage a pretrained object detection network [YOLOv3](https://pjreddie.com/darknet/yolo/).  
In the [gdg_workshop repository](https://github.com/kylehounslow/gdg_workshop) I've written a few python modules to take care of the model setup so we can make predictions right away.   
___
Topics covered in this notebook:  
* Importing code from Github into Colab  
* Downloading imagery and video into Colab  
* Object detection with YOLOv3  

## Make sure our runtime has a GPU attached usig `nvidia-smi` shell command

In [0]:
!nvidia-smi

### Start by cloning the `gdg_workshop` repository to our colab instance and installing the required libraries

In [0]:
!rm -rf gdg_workshop && git clone --quiet https://github.com/kylehounslow/gdg_workshop.git
!pip install --quiet -r gdg_workshop/requirements_colab.txt
!pip --quiet install youtube-dl
!apt-get -qq install ffmpeg

In [0]:
!cd gdg_workshop/ && git pull --quiet origin master
!cd ..

### Import the YOLOV3 module and some other dependencies

In [0]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import youtube_dl
from google.colab.files import download
from gdg_workshop.models.keras_yolov3 import YOLOV3
from gdg_workshop import util
plt.style.use('default')
%matplotlib inline
%load_ext autoreload
%autoreload 2

### Instatiate the detector  
The `__init__` function for `YOLOV3` will take care of downloading the pretrained model weights and setting up the necessary variables for inference.  
See code [here](https://github.com/kylehounslow/gdg_workshop/blob/master/models/keras_yolov3/src/yolo.py)

In [0]:
detector = YOLOV3()

## Visualize the YOLOV3 Model Architecture
**Spoiler alert:** It is insane

In [0]:
import numpy as np
import tensorflow as tf
from IPython.display import  HTML
def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = tf.compat.as_bytes("<stripped %d bytes>"%size)
    return strip_def
  
def rename_nodes(graph_def, rename_func):
    res_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = res_def.node.add() 
        n.MergeFrom(n0)
        n.name = rename_func(n.name)
        for i, s in enumerate(n.input):
            n.input[i] = rename_func(s) if s[0]!='^' else '^'+rename_func(s[1:])
    return res_def
  
def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))
  
    iframe = """
        <iframe seamless style="width:800px;height:620px;border:0" srcdoc="{}"></iframe>
    """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))
show_graph(tf.get_default_graph())

## Define a function to download image from URL and run detection
 The following defines a form widget for user input, downloads image and runs inference 

In [0]:
#@title ## Detect objects from image url
#@markdown ### Enter an image URL:
url = "https://i.ytimg.com/vi/gcI1BP1SlCk/maxresdefault.jpg" #@param {type:"string"}
img = util.download_image(url)
from PIL import Image
detections = detector.detect(image=img)
img_draw = detector.draw_detections(img, detections)
plt.figure(figsize=(18,12))
plt.imshow(img_draw)

### Define a function to upload an image and run detection 

In [0]:
from PIL import Image
import cv2
from google.colab import files
def upload_detect_show(detector):
    uploaded_files = files.upload()
    image_filenames = list(uploaded_files.keys())
    for image_filename in image_filenames:
        image = Image.open(image_filename)
        detections = detector.detect(image=image)
        img_draw = detector.draw_detections(image, detections)
        return Image.fromarray(img_draw)
upload_detect_show(detector=detector)

## Images are a bit boring... Let's try on video!

### Download video from a YouTube url

In [0]:
#@title ## Detect objects from video url
#@markdown ### Enter a video URL:
video_url = "https://www.youtube.com/watch?v=brYP3XVf_sM" #@param {type:"string"}
video_filename = 'video.mp4'
!rm -f $video_filename
ydl_opts = {
    'outtmpl' : video_filename,
    'nooverwrites': False}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
  ydl.download([video_url])

### Use OpenCV (ffmpeg backend) to open video, loop through frames and perform detection.  
To save output video, we'll save individual frames as we perform detection.  
Once complete, compile video from frames afterward using `frames_to_video()` (found in [util](https://github.com/kylehounslow/gdg_workshop/blob/master/util.py))

In [0]:
import os
from IPython.display import clear_output
from datetime import datetime
from tqdm import tqdm
video_basename, video_ext = os.path.splitext(video_filename)
output_frames_dir = os.path.basename(video_basename) + '_frames/'
! rm -rf $output_frames_dir && mkdir $output_frames_dir
vc = cv2.VideoCapture()
vc.open(video_filename)
frame_count = int(vc.get(cv2.CAP_PROP_FRAME_COUNT))
video_fps = int(vc.get(cv2.CAP_PROP_FPS))
plt.figure(figsize=(18,12))
curr_frame = 0
MAX_FRAMES = 1200  # limit our frames if video is quite long
for frame_pos in tqdm(range(frame_count)):
  _, img = vc.read()
  detections = detector.detect(image=cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
  img_draw = detector.draw_detections(img, detections)
  img_filename = os.path.join(output_frames_dir, 'frame{}.jpg'.format(curr_frame))
  cv2.imwrite(img_filename, img_draw)
  curr_frame += 1
  if curr_frame > MAX_FRAMES:
    vc.release()
    break

## TODO: SUPPORT mp4 video output

In [0]:
output_video_filename = './output.mkv'
!rm -f $output_video_filename
util.frames_to_video(input_dir=output_frames_dir,
                    output_file=output_video_filename, # NOTE: only mkv output supported at the moment. TODO: mp4
                    fps=video_fps)
download(output_video_filename)