# **Object Detection and Segmentation on Videos**


## **Note:** Keras and Scipy may need to be downgraded for some of these code blocks to work. 

**Keras:** https://github.com/matterport/Mask_RCNN/issues/694

**Scipy:** https://stackoverflow.com/questions/56204985/how-to-fix-scipy-misc-has-no-attribute-imresize/56205147

The videos were uploaded to Google Drive. The drive was then mounted to import the videos.

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Navigating to the directory where we have the videos stored.

In [1]:
cd /content/gdrive/My Drive/Chopper_Videos

/content/gdrive/My Drive/Chopper_Videos


In [2]:
#Running this code to confirm TensorFlow can see the GPU

import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [0]:
#Checking the GPU Model being used

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 14133087383910997459, name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 6138563292070905991
 physical_device_desc: "device: XLA_CPU device", name: "/device:XLA_GPU:0"
 device_type: "XLA_GPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 3424987718634152524
 physical_device_desc: "device: XLA_GPU device", name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 15956161332
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 730707863069908659
 physical_device_desc: "device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0"]

# **Mask R-CNN Demo**

The demo is based on the Mask R-CNN GitHub repo. It is an implementation of Mask R-CNN on Keras+TensorFlow. It not only generates the bounding box for a detected object but also generates a mask over the object area.

#Install Dependencies and run Demo

Mask R-CNN has some dependencies to install before we can run the demo. Colab allows you to install Python packages through pip, and general Linux package/library through apt-get.

In case you don't know yet. Your current instance of Google Colab is running on an Ubuntu virtual machine. You can run almost every Linux command you usually do on a Linux machine.

Mask R-CNN depends on pycocotools, we are installing it with the following cell. 

# Install pycocotools

In [3]:
!pip install Cython



In [4]:
!ls

coco	      IMG_0101.MOV  IMG_0110.MOV  IMG_0139.MOV
IMG_0100.MOV  IMG_0109.MOV  IMG_0114.MOV  Mask_RCNN


In [5]:
!git clone https://github.com/waleedka/coco

fatal: destination path 'coco' already exists and is not an empty directory.


In [6]:
!pip install -U setuptools
!pip install -U wheel
!make install -C coco/PythonAPI

Requirement already up-to-date: setuptools in /usr/local/lib/python3.6/dist-packages (46.0.0)
Requirement already up-to-date: wheel in /usr/local/lib/python3.6/dist-packages (0.34.2)
make: Entering directory '/content/gdrive/My Drive/Chopper_Videos/coco/PythonAPI'
# install pycocotools to the Python site-packages
python setup.py build_ext install
running build_ext
building 'pycocotools._mask' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/pycocotools
creating build/common
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/numpy/core/include -I../common -I/usr/include/python3.6m -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -Wno-cpp -Wno-unused-function -std=c99
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Wer

# Git Clone the code


In [7]:
!git clone https://github.com/matterport/Mask_RCNN

fatal: destination path 'Mask_RCNN' already exists and is not an empty directory.


It clones the coco repository from GitHub. Install build dependencies. Finally, build and install the coco API library.

All this happens in the cloud virtual machine, and quite fast.

We are now ready to clone the Mask_RCNN repo from GitHub and cd into the directory.

#cd to the code directory and optionally download the weights file

In [8]:
import os
os.chdir('./Mask_RCNN')
!git checkout 555126ee899a144ceff09e90b5b2cf46c321200c
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

HEAD is now at 555126e Balloon Color Splash sample.
--2020-03-17 13:34:26--  https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5
Resolving github.com (github.com)... 140.82.118.4
Connecting to github.com (github.com)|140.82.118.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/107595270/872d3234-d21f-11e7-9a51-7b4bc8075835?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200317%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200317T133426Z&X-Amz-Expires=300&X-Amz-Signature=20877f4aa3fa42548c12f17e007e5b1f3ce717bb05d79327f62f66053d541eef&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dmask_rcnn_coco.h5&response-content-type=application%2Foctet-stream [following]
--2020-03-17 13:34:26--  https://github-production-release-asset-2e65be.s3.amazonaws.com/107595270/872d3234-d21f-11e7-9a51-7b4bc8075835?X-A

In [9]:
!ls

assets		       LICENSE		    README.md
coco.py		       mask_rcnn_coco.h5    samples
config.py	       mask_rcnn_coco.h5.1  shapes.py
demo.ipynb	       mask_rcnn_coco.h5.2  train_shapes.ipynb
images		       mask_rcnn_coco.h5.3  utils.py
inspect_data.ipynb     model.py		    videos
inspect_model.ipynb    parallel_model.py    visualize.py
inspect_weights.ipynb  __pycache__


In [10]:
!ls ./videos

IMG_0100.MOV  save


# Processing Videos

Processing a video file will take three steps.

1. Video to images frames.

2. Process images

3. Turn processed images to output videos.

If we are going to process the whole video one frame at a time, it will take a long time. So instead we are going to leverage GPU to process multiple frames in parallel.

The pipeline of Mask R-CNN is quite computationally intensive and takes a lot of GPU memory. We find the Tesla K80 GPU on Colab with 24G of memory can safely process 3 images at a time. If you go beyond that, the notebook might crash in the middle of processing the video.

So in the code below, we set the batch_size to 3 and use cv2 library to stage 3 images at a time before processing them with the model.

In [12]:
import cv2
import numpy as np


def random_colors(N):
    np.random.seed(1)
    colors = [tuple(255 * np.random.rand(3)) for _ in range(N)]
    return colors


def apply_mask(image, mask, color, alpha=0.5):
    """apply mask to image"""
    for n, c in enumerate(color):
        image[:, :, n] = np.where(
            mask == 1,
            image[:, :, n] * (1 - alpha) + alpha * c,
            image[:, :, n]
        )
    return image


def display_instances(image, boxes, masks, ids, names, scores):
    """
        take the image and results and apply the mask, box, and Label
    """
    n_instances = boxes.shape[0]
    colors = random_colors(n_instances)

    if not n_instances:
        print('NO INSTANCES TO DISPLAY')
    else:
        assert boxes.shape[0] == masks.shape[-1] == ids.shape[0]

    for i, color in enumerate(colors):
        if not np.any(boxes[i]):
            continue

        y1, x1, y2, x2 = boxes[i]
        label = names[ids[i]]
        score = scores[i] if scores is not None else None
        caption = '{} {:.2f}'.format(label, score) if score else label
        mask = masks[:, :, i]

        image = apply_mask(image, mask, color)
        image = cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)
        image = cv2.putText(
            image, caption, (x1, y1), cv2.FONT_HERSHEY_COMPLEX, 0.7, color, 2
        )

    return image


if __name__ == '__main__':
    """
        test everything
    """
    import os
    import sys
    import coco
    import utils
    import model as modellib
    
    # We use a K80 GPU with 24GB memory, which can fit 3 images.
    batch_size = 3

    ROOT_DIR = os.getcwd()
    MODEL_DIR = os.path.join(ROOT_DIR, "logs")
    VIDEO_DIR = os.path.join(ROOT_DIR, "videos")
    VIDEO_SAVE_DIR = os.path.join(VIDEO_DIR, "save")
    COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
    if not os.path.exists(COCO_MODEL_PATH):
        utils.download_trained_weights(COCO_MODEL_PATH)

    class InferenceConfig(coco.CocoConfig):
        GPU_COUNT = 1
        IMAGES_PER_GPU = batch_size

    config = InferenceConfig()
    config.display()

    model = modellib.MaskRCNN(
        mode="inference", model_dir=MODEL_DIR, config=config
    )
    model.load_weights(COCO_MODEL_PATH, by_name=True)
    class_names = [
        'BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
        'bus', 'train', 'truck', 'chopper', 'traffic light',
        'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
        'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
        'zebra', 'giraffe', 'backpack', 'House', 'handbag', 'tie',
        'suitcase', 'Helipad Number', 'skis', 'snowboard', 'sports ball',
        'kite', 'baseball bat', 'baseball glove', 'skateboard',
        'Helipad', 'tennis racket', 'bottle', 'wine glass', 'cup',
        'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
        'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
        'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
        'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
        'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
        'teddy bear', 'hair drier', 'toothbrush'
    ]

    capture = cv2.VideoCapture(os.path.join(VIDEO_DIR, 'IMG_0100.MOV'))
    try:
        if not os.path.exists(VIDEO_SAVE_DIR):
            os.makedirs(VIDEO_SAVE_DIR)
    except OSError:
        print ('Error: Creating directory of data')
    frames = []
    frame_count = 0
    # these 2 lines can be removed if you dont have a 1080p camera.
    #capture.set(cv2.CAP_PROP_FRAME_WIDTH, 1920)
    #capture.set(cv2.CAP_PROP_FRAME_HEIGHT, 1080)

    while True:
        ret, frame = capture.read()
        # Bail out when the video file ends
        if not ret:
            break
        
        # Save each frame of the video to a list
        frame_count += 1
        frames.append(frame)
        print('frame_count :{0}'.format(frame_count))
        if len(frames) == batch_size:
            results = model.detect(frames, verbose=0)
            print('Predicted')
            for i, item in enumerate(zip(frames, results)):
                frame = item[0]
                r = item[1]
                frame = display_instances(
                    frame, r['rois'], r['masks'], r['class_ids'], class_names, r['scores']
                )
                name = '{0}.jpg'.format(frame_count + i - batch_size)
                name = os.path.join(VIDEO_SAVE_DIR, name)
                cv2.imwrite(name, frame)
                print('writing to file:{0}'.format(name))
            # Clear the frames array to start the next batch
            frames = []

    capture.release()


Configurations:
BACKBONE                       resnet101
BACKBONE_SHAPES                [[256 256]
 [128 128]
 [ 64  64]
 [ 32  32]
 [ 16  16]]
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     3
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
GPU_COUNT                      1
IMAGES_PER_GPU                 3
IMAGE_MAX_DIM                  1024
IMAGE_MIN_DIM                  800
IMAGE_PADDING                  True
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           coco
NUM_CLASSES                    81
POOL_SIZE                      7
POST_NMS_ROIS_INFERENC

After running this code, we should now have all processed image files in one folder ./videos/save.

In [13]:
!ls ./videos/save

0.jpg	 168.jpg  235.jpg  302.jpg  370.jpg  438.jpg  505.jpg  573.jpg	640.jpg
100.jpg  169.jpg  236.jpg  303.jpg  371.jpg  439.jpg  506.jpg  574.jpg	641.jpg
101.jpg  16.jpg   237.jpg  304.jpg  372.jpg  43.jpg   507.jpg  575.jpg	642.jpg
102.jpg  170.jpg  238.jpg  305.jpg  373.jpg  440.jpg  508.jpg  576.jpg	643.jpg
103.jpg  171.jpg  239.jpg  306.jpg  374.jpg  441.jpg  509.jpg  577.jpg	644.jpg
104.jpg  172.jpg  23.jpg   307.jpg  375.jpg  442.jpg  50.jpg   578.jpg	645.jpg
105.jpg  173.jpg  240.jpg  308.jpg  376.jpg  443.jpg  510.jpg  579.jpg	646.jpg
106.jpg  174.jpg  241.jpg  309.jpg  377.jpg  444.jpg  511.jpg  57.jpg	647.jpg
107.jpg  175.jpg  242.jpg  30.jpg   378.jpg  445.jpg  512.jpg  580.jpg	648.jpg
108.jpg  176.jpg  243.jpg  310.jpg  379.jpg  446.jpg  513.jpg  581.jpg	649.jpg
109.jpg  177.jpg  244.jpg  311.jpg  37.jpg   447.jpg  514.jpg  582.jpg	64.jpg
10.jpg	 178.jpg  245.jpg  312.jpg  380.jpg  448.jpg  515.jpg  583.jpg	650.jpg
110.jpg  179.jpg  246.jpg  313.jpg  381.jpg  449.jpg  516

The next step is easy, we need to generate the new video from those images. We are going to use cv2's VideoWriter to accomplish this.

But two things we want to make sure:

1. The frames need to be ordered in the same way as they are extracted from the original video. (Or backward if we prefer to watch the video that way)

# Get all image file paths to a list.
images = list(glob.iglob(os.path.join(VIDEO_SAVE_DIR, '*.*')))
# Sort the images by name index.
images = sorted(images, key=lambda x: float(os.path.split(x)[1][:-3]))

2.The frame rate matches the original video. We can use the following code to check the frame rate of a video or just open the file property.

video = cv2.VideoCapture(os.path.join(VIDEO_DIR, 'trailer1.mp4'));

# Find OpenCV version
(major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.')

if int(major_ver)  < 3 :
    fps = video.get(cv2.cv.CV_CAP_PROP_FPS)
    print("Frames per second using video.get(cv2.cv.CV_CAP_PROP_FPS): {0}".format(fps))
else :
    fps = video.get(cv2.CAP_PROP_FPS)
    print("Frames per second using video.get(cv2.CAP_PROP_FPS) : {0}".format(fps))

video.release();

In [14]:
video = cv2.VideoCapture(os.path.join(VIDEO_DIR, 'IMG_0100.MOV'));

# Find OpenCV version
(major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.')

if int(major_ver)  < 3 :
    fps = video.get(cv2.cv.CV_CAP_PROP_FPS)
    print("Frames per second using video.get(cv2.cv.CV_CAP_PROP_FPS): {0}".format(fps))
else :
    fps = video.get(cv2.CAP_PROP_FPS)
    print("Frames per second using video.get(cv2.CAP_PROP_FPS) : {0}".format(fps))

video.release();

Frames per second using video.get(cv2.CAP_PROP_FPS) : 29.973238180196255


Finally here is the code to generate the video from processed image frames.

In [15]:
def make_video(outvid, images=None, fps=30, size=None,
               is_color=True, format="FMP4"):
    """
    Create a video from a list of images.
 
    @param      outvid      output video
    @param      images      list of images to use in the video
    @param      fps         frame per second
    @param      size        size of each frame
    @param      is_color    color
    @param      format      see http://www.fourcc.org/codecs.php
    @return                 see http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html
 
    The function relies on http://opencv-python-tutroals.readthedocs.org/en/latest/.
    By default, the video will have the size of the first image.
    It will resize every image to this size before adding them to the video.
    """
    from cv2 import VideoWriter, VideoWriter_fourcc, imread, resize
    fourcc = VideoWriter_fourcc(*format)
    vid = None
    for image in images:
        if not os.path.exists(image):
            raise FileNotFoundError(image)
        img = imread(image)
        if vid is None:
            if size is None:
                size = img.shape[1], img.shape[0]
            vid = VideoWriter(outvid, fourcc, float(fps), size, is_color)
        if size[0] != img.shape[1] and size[1] != img.shape[0]:
            img = resize(img, size)
        vid.write(img)
    vid.release()
    return vid

import glob
import os

# Directory of images to run detection on
ROOT_DIR = os.getcwd()
VIDEO_DIR = os.path.join(ROOT_DIR, "videos")
VIDEO_SAVE_DIR = os.path.join(VIDEO_DIR, "save")
images = list(glob.iglob(os.path.join(VIDEO_SAVE_DIR, '*.*')))
# Sort the images by integer index
images = sorted(images, key=lambda x: float(os.path.split(x)[1][:-3]))

outvid = os.path.join(VIDEO_DIR, "out.mp4")
make_video(outvid, images, fps=30)

<VideoWriter 0x7efe2529da10>

In [16]:
!ls -alh ./videos/

total 120M
-rw------- 1 root root  42M Mar 17 13:27 IMG_0100.MOV
-rw------- 1 root root  78M Mar 17 14:06 out.mp4
drwx------ 2 root root 4.0K Mar 17 14:01 save


Having gone this far, the processed video should now be ready to be downloaded to our local machine.

#Downlod the output video to our local machine

In [17]:
from google.colab import files
files.download('videos/out.mp4')

----------------------------------------
Exception happened during processing of request from ('::ffff:127.0.0.1', 42556, 0, 0)
Traceback (most recent call last):
  File "/usr/lib/python3.6/socketserver.py", line 320, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.6/socketserver.py", line 351, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.6/socketserver.py", line 364, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.6/socketserver.py", line 724, in __init__
    self.handle()
  File "/usr/lib/python3.6/http/server.py", line 418, in handle
    self.handle_one_request()
  File "/usr/lib/python3.6/http/server.py", line 406, in handle_one_request
    method()
  File "/usr/lib/python3.6/http/server.py", line 639, in do_GET
    self.copyfile(f, self.wfile)
  File "/usr/lib/python3.6/http/server.py", line 800, in copyfile
    shutil.copyfil

# **Summary**

In the post, we walked through how to run our model on Google Colab with GPU acceleration. 

We have learned how to do object detection and Segmentation on a video. Thanks to the powerful GPU on Colab, made it possible to process multiple frames in parallel to speed up the process.