# Goal

**The goal of this notebook is to explore method for object detection, concretely person detection, using MMDetection high level API.** 

## MMDdetection

MMDetection is an open source object detection toolbox based on PyTorch.

### Major features

* **Modular Design**

The detection framework consist of different components and one can easily construct a customized object detection framework by combining different modules.

* **Support of multiple frameworks out of box**

The toolbox directly supports popular and contemporary detection frameworks, e.g. Faster RCNN, Mask RCNN, RetinaNet, etc.

* **High efficiency**

All basic bbox and mask operations run on GPUs. The training speed is faster than or comparable to other codebases, including Detectron2, maskrcnn-benchmark and SimpleDet.

* **State of the art**

The toolbox stems from the codebase developed by the MMDet team, who won COCO Detection Challenge in 2018, and we keep pushing it forward.


### References:

* MMDetection tutorial https://github.com/open-mmlab/mmdetection/blob/master/demo/MMDet_Tutorial.ipynb
* NFL Helmet Assignment - Getting Started Guide https://www.kaggle.com/robikscube/nfl-helmet-assignment-getting-started-guide
* Convert MP4 to PNG/JPG and back https://www.kaggle.com/denispotapov/convert-mp4-to-png-jpg-and-back?scriptVersionId=72389425

In [None]:
# install dependencies: (use cu101 because colab has CUDA 10.1)
!pip install -U torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install mmcv-full thus we could use CUDA operators
!pip install mmcv-full

# Install mmdetection
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
%cd mmdetection

!pip install -e .

# install Pillow 7.0.0 back in order to avoid bug in colab
!pip install Pillow==7.0.0

In [None]:
# Check Pytorch installation
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# Check MMDetection installation
import mmdet
print(mmdet.__version__)

# Check mmcv installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print(get_compiling_cuda_version())
print(get_compiler_version())

In [None]:
!mkdir checkpoints
#download pretrained detector
!wget -c https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth \
      -O checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth

In [None]:
from mmdet.apis import inference_detector, init_detector, show_result_pyplot

from IPython.core.display import Video, display
import os
import subprocess
from tqdm.notebook import tqdm
import numpy as np
import gc
import cv2
import shutil

# Choose to use a config and initialize the detector
config = 'configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco.py'
# Setup a checkpoint file to load
checkpoint = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
# initialize the detector
model = init_detector(config, checkpoint, device='cuda:0')

In [None]:
data_dir = '/kaggle/input/nfl-health-and-safety-helmet-assignment/'
example_video = f'{data_dir}/train/57913_000218_Sideline.mp4'

frac = 0.65

display(Video(example_video, embed=True, height=int(720*frac), width=int(1280*frac)))

In [None]:

# create frames 
img_ext = 'png'
image_name = '57913_000218_Sideline'
frame_dir = '/kaggle/tmp/mp4_img/'
os.makedirs(frame_dir, exist_ok=True)

cmd = 'ffmpeg -i \"{}\" -qscale:v 2 \"{}/{}_%d.{}\"'.format(example_video, frame_dir, image_name, img_ext)
print(cmd)
subprocess.call(cmd, shell=True)

The model loaded above is trained to detect multiple kind of objects. Below we can see what it is capable to detect. Because of that we will filter only results for first class (person).

In [None]:
model = init_detector(config, checkpoint, device='cuda:0')
print('Number of classes: {}'.format(len(model.CLASSES)))
print(model.CLASSES)

In [None]:
def filter_results(result):
    """
    Filter only person class from results (first class)
    """
    bbox = [result[0][0]]
    for i in range(79):
        x = np.array([], dtype=np.float32)
        x.shape = (0, 5)
        bbox.append(x)

    objects = [result[1][0]]
    for i in range(79):
        objects.append([])

    return (bbox, objects)

frame_bbox_dir = '/kaggle/tmp/mp4_img_bbox/'
os.makedirs(frame_bbox_dir, exist_ok=True)

for f in tqdm(os.listdir(frame_dir)):
    
    img = f'{frame_dir}/{f}'
    # the model is initialized and deleted each time because of RAM usage
    model = init_detector(config, checkpoint, device='cuda:0')
    # get results
    result = inference_detector(model, img)
    # filter only person class
    result_filtered = filter_results(result)
    # save image with bboxes into out_file
    model.show_result(img, result_filtered, out_file=os.path.join(frame_bbox_dir,f))
    del result, result_filtered, model
    gc.collect()

In [None]:
# make video from frames
video_name = '57913_000218_Sideline_players_fps60.mp4'
tmp_video_path = os.path.join('/kaggle/working/', f'tmp_{video_name}')
video_path = os.path.join('/kaggle/working/', video_name)

frame_rate = 60

images = [img for img in os.listdir(frame_bbox_dir)]
images.sort(key = lambda x: int(x.split('_')[-1][:-4]))

frame = cv2.imread(os.path.join(frame_bbox_dir, images[0]))
height, width, layers = frame.shape

video = cv2.VideoWriter(tmp_video_path, cv2.VideoWriter_fourcc(*'MP4V'),
                        frame_rate, (width,height))

for f in images:
    img = cv2.imread(os.path.join(frame_bbox_dir, f))
    video.write(img)

video.release()

# Not all browsers support the codec, we will re-load the file at tmp_video_path
# and convert to a codec that is more broadly readable using ffmpeg

if os.path.exists(video_path):
    os.remove(video_path)
    
subprocess.run(["ffmpeg", "-i", tmp_video_path, "-crf", "18", "-preset", "veryfast",
                "-vcodec","libx264", video_path,])

os.remove(tmp_video_path)

In [None]:
frac = 0.65
display(Video(video_path, embed=True, height=int(720*frac), width=int(1280*frac)))

In [None]:
# remove directories with frames (optional)

for path in [frame_dir, frame_bbox_dir]:
    try:
        shutil.rmtree(path)
    except OSError as e:
        print ("Error: %s - %s." % (e.filename, e.strerror))