<a href="https://colab.research.google.com/github/gabilodeau/INF6804/blob/master/3D_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

INF6804 Vision par ordinateur

Polytechnique Montréal

Author: Soufiane Lamghari




Description : This notebook implements 3D-CNN for action recognition in inference. Using 3D ResNet as the backbone network (pre-trained on the 400 classes of Kinetics dataset), we predict actions for some sample videos.

Import libraries

In [1]:
import json
import subprocess
import torch
import os
import argparse
from IPython.display import HTML
from base64 import b64encode

Clone the repo

In [2]:
if not os.path.exists('video-classification-3d-cnn-pytorch'):
  !git clone -q https://github.com/kenshohara/video-classification-3d-cnn-pytorch.git
  %cd video-classification-3d-cnn-pytorch

/content/video-classification-3d-cnn-pytorch


Get videos (here from the github of the INF6804 course) and save them in a destination folder

In [3]:
source = 'https://raw.githubusercontent.com/gabilodeau/INF6804/master/videos'

examples = ['horse.mp4', 'golf1.mp4']
destination = 'videos'

if not os.path.exists(destination):
  os.system('mkdir {}'.format(destination))

for example in examples:
  os.system('wget -nv {} -O {}'.format(os.path.join(source,example),os.path.join(destination,example)))

Download the pretrained model, here we use ResNet with 34 layers

In [4]:
model_weights = 'resnet-34-kinetics.pth'
network='resnet'
network_depth=34

!gdown --id 1UkMe9zChUZhktin2MBYLrfQjx6V6UoSR # resnet-34-kinetics.pth

Downloading...
From: https://drive.google.com/uc?id=1UkMe9zChUZhktin2MBYLrfQjx6V6UoSR
To: /content/video-classification-3d-cnn-pytorch/resnet-34-kinetics.pth
508MB [00:02, 249MB/s]


Load the model and get the predictions. The predicted categories belong to the 400 classes of Kinetics dataset. Each video is treated as stacks of 16 frames for 3D convolutions. The model predicts the performed action for each clip. Results are saved in a json file used in the following to visualize the predictions.




In [5]:
from model import generate_model
from mean import get_mean
from classify import classify_video

phase = 'score'
opt = argparse.Namespace()
opt.mean = get_mean()
opt.model_name=network
opt.model_depth=network_depth
opt.arch = '{}-{}'.format(opt.model_name, opt.model_depth)
opt.sample_size = 112
opt.sample_duration = 16
opt.n_classes = 400  
opt.model = model_weights
opt.video_root = destination
# default args
opt.mode = phase
opt.output = 'predictions.json'
opt.batch_size = 32
opt.n_threads = 4
opt.resnet_shortcut='A'
opt.wide_resnet_k=2
opt.resnext_cardinality=32
opt.no_cuda=False
opt.verbose=False

model = generate_model(opt)
print('loading model {}'.format(opt.model))
model_data = torch.load(opt.model)
assert opt.arch == model_data['arch']
model.load_state_dict(model_data['state_dict'])
model.eval()
if opt.verbose:
    print(model)

class_names = []
with open('class_names_list') as f:
    for row in f:
        class_names.append(row[:-1])

ffmpeg_loglevel = 'quiet'
if opt.verbose:
    ffmpeg_loglevel = 'info'

if os.path.exists('tmp'):
    subprocess.call('rm -rf tmp', shell=True)

outputs = []
for input_file in examples:
    video_path = os.path.join(opt.video_root, input_file)
    if os.path.exists(video_path):
        print(video_path)
        subprocess.call('mkdir tmp', shell=True)
        subprocess.call('ffmpeg -i {} tmp/image_%05d.jpg'.format(video_path),
                        shell=True)

        result = classify_video('tmp', input_file, class_names, model, opt)
        for clip in result['clips']:
          print(clip['segment'],clip['label'])
        outputs.append(result)

        subprocess.call('rm -rf tmp', shell=True)
    else:
        print('{} does not exist'.format(input_file))

if os.path.exists('tmp'):
    subprocess.call('rm -rf tmp', shell=True)

with open(opt.output, 'w') as f:
    json.dump(outputs, f)


loading model resnet-34-kinetics.pth
videos/horse.mp4


  inputs = Variable(inputs, volatile=True)


[1, 16] riding mule
[17, 32] riding mountain bike
[33, 48] riding mountain bike
videos/golf1.mp4
[1, 16] golf driving
[17, 32] golf chipping
[33, 48] golf driving


Script used to visualize the predicted action and to draw the category label on the video. It takes arguments in the following order : results json file, source videos root directory, path to save the labeled videos, labels of Kinetics and the size of temporal unit.

In [6]:
! cd generate_result_video/ && python3 generate_result_video.py ../predictions.json ../videos ../videos_pred ../class_names_list 3 > /dev/null 2>&1

Let's see the labeled videos

In [7]:
for example in examples:
  mp4 = open(os.path.join('videos_pred',example),'rb').read()
  decoded_vid = "data:video/mp4;base64," + b64encode(mp4).decode()
  display(HTML(f'<video width=400 controls><source src={decoded_vid} type="video/mp4"></video>'))

**References:**

 - https://github.com/kenshohara/video-classification-3d-cnn-pytorch

 - [Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? (3D-CNN) paper](https://arxiv.org/abs/1711.09577) 

 