# Transfer learning and action inference on input video segments
In this notebook, we will demonstrate activity detection on a video segment with machine learning. We will use the MXNet framework in script mode with the gluoncv toolkit.

1) We will fine-tune the pre-trained model with this custom dataset to learn the typical video patterns belonging to these 101 action classes.

2) We will then deploy this model and host it on a sagemaker endpoint. 

3) Finally, we will  make a inference request for a test video. 

Install and import the required gluoncv library 

In [1]:
!pip install gluoncv

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m


In [1]:
import boto3, re, os
import numpy as np
import uuid

import mxnet as mx
from mxnet import gluon, nd, image
from mxnet.gluon.data.vision import transforms
from mxnet import gluon

from gluoncv.data.transforms import video
from gluoncv import utils
from gluoncv.model_zoo import get_model
from gluoncv import utils
from gluoncv.utils import export_block

import sagemaker
from sagemaker import get_execution_role
from sagemaker.mxnet import MXNet



In [2]:
sagemaker_session = sagemaker.Session()
role = get_execution_role()

Check the mxnet framework version = 1.6.0

In [3]:
mx.__version__

'1.6.0'

## Data preparation

Load the UCF101 dataset as described in the gluoncv guide here https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html#sphx-glr-build-examples-datasets-ucf101-py

Note : We are downloading only a tiny fraction of the entire UCF101 dataset here. You can modify the script flag below to download the entire dataset. The entire dataset size is 6.5 GB and will require update to the default volume size attached to the notebook instance. 

In [25]:
#%%capture
#!pip install rarfile --user
#!pip install Cython --user
#!pip install mmcv --user
#!pip install torch --user
#!python data-prep-code/ucf101.py --tiny_dataset
#!python data-prep-code/hmdb51.py
!python data-prep-code/youtube.py

Generating training files.
parse frames under folder datasets/youtube/rawframes
0 videos parsed
200 videos parsed
400 videos parsed
600 videos parsed
frame folder analysis done
1
[([('fall/357158', 0), ('fall/357172', 0), ('fall/357179', 0), ('fall/357206', 0), ('fall/357232', 0), ('fall/357239', 0), ('fall/357243', 0), ('fall/357245', 0), ('fall/357271', 0), ('fall/357273', 0), ('fall/357278', 0), ('fall/357280', 0), ('fall/357287', 0), ('fall/357288', 0), ('fall/357290', 0), ('fall/357291', 0), ('fall/357295', 0), ('fall/357302', 0), ('fall/357310', 0), ('fall/357985', 0), ('fall/358024', 0), ('fall/358480', 0), ('fall/358498', 0), ('fall/358790', 0), ('fall/358794', 0), ('fall/358807', 0), ('fall/358908', 0), ('fall/359081', 0), ('fall/359204', 0), ('fall/359253', 0), ('fall/359260', 0), ('fall/359263', 0), ('fall/359268', 0), ('fall/359327', 0), ('fall/359334', 0), ('fall/359401', 0), ('fall/359402', 0), ('fall/359412', 0), ('fall/359414', 0), ('fall/359416', 0), ('fall/359418', 0)

1) Raw frames have been extracted from the videos in a folder for each video. 

2) A settings file has been generated. There are three items in each line, separated by spaces. The first item is the path to your training videos, e.g., video_001. It should be a folder containing the frames of video_001.mp4. The second item is the number of frames in each video, e.g., 200. The third item is the label of the videos, e.g., 0.

Upload the raw frames and the settings list to S3 (can take upto 15 minutes)

In [30]:
import time
print(time.time())
sagemaker_session.upload_data(path='datasets/youtube/rawframes/', key_prefix='data/youtube/rawframes')
print(time.time())

1605866097.2462451
1605868971.7678823


In [31]:
import time
sagemaker_session.upload_data(path='datasets/youtube/testTrain_splits/', key_prefix='data/youtube/testTrain_splits')
print(time.time())

1605869451.1829128


In [32]:
bucket_name=sagemaker_session.default_bucket()
inputs = 's3://' + bucket_name + '/data/youtube'

output_path = 'i3d_transfer_learning_ps/output/'
code_location = 'i3d_transfer_learning_ps/code/'

## Transfer Learning 
Transfer learning focuses on storing knowledge gained while solving one task and applying it to a different but related task. 

I3D (Inflated 3D Networks) is a widely adopted 3D video classification network. It uses 3D convolution to learn spatiotemporal information directly from videos. I3D is proposed to improve C3D (Convolutional 3D Networks) by inflating from 2D models. We can not only reuse the 2D models’ architecture (e.g., ResNet, Inception), but also bootstrap the model weights from 2D pretrained models. In this manner, training 3D networks for video classification is feasible and getting much better results.

In this example, we use Inflated 3D model (I3D) with ResNet50 backbone trained on Kinetics400 dataset.

Dataset size is a big factor in the performance of deep learning models. Kinetics400 has 306,245 short trimmed videos from 400 action categories. However, most often we dont have so much labeled data in another domain. Training a deep learning model on small datasets may lead to severe overfitting. 

Transfer learning is a technique that addresses this problem. The idea is simple: start training with a pre-trained model, instead of starting from scratch. For simple fine-tuning, just replace the last classification (dense) layer to the number of classes in the dataset. We can obtain good models on our own data without large annotated datasets and with less computing resource utilization for training.

Review the following training script as the entrypoint script to the MXNet estimator framework. The script executes training with the following steps : 

1) Data transformation 

 The transformation function does three things: center crop the image to 224x224 in size, transpose it to num_channels*num_frames*height*width, and normalize with mean and standard deviation calculated across all ImageNet images.

2) Data loader

Use the general gluoncv dataloader VideoClsCustom to load the data with num_frames = 32 as the length. For another dataset, you can just replace the value of root and setting to your data directory and your prepared text file.

3) Model training 

a) Load the pre-trained model.

b) Load input hyperparameters and number of action classes.

c) Re-define the output layer for the new task. In GluonCV, you can get your customized model with one line of code.

d) Define optimizer, loss and metric. Train the network for the new dataset.

In [34]:
!cat transfer-learning-code-2classes/transfer_learning-ps.py

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

from __future__ import print_function

import argparse
import logging
import os
import numpy as np
import json
import time

import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag
from mxnet.gluon.data.vision import transforms

import gluoncv as gcv
from gluoncv.data.transforms import video
from gluoncv.data import VideoClsCustom
from gluoncv.model_zoo import get_model
from gluoncv.utils import makedirs, LRSequential, LRScheduler, split_and_load, TrainingHistory

logging.basicConfig(level=logging.DEBUG)

# ------------------------------------------------------------ #
# Training methods                                             #
# ------------------------------------------------------------ #


def train(args):
    # SageMaker passes num_cpus, num_gpus and other args we can use to tailor training to
   

Define the MXNet estimator to prepare for training. We use a p3 instance 'ml.p3.2xlarge' here to demonstrate gpu based training. You can update the instance type based on your dataset size and expected training times.

Training time recorded for the current dataset with 'ml.p3.2xlarge' is approximately 5 minutes.

Instance types for SageMaker are available here https://aws.amazon.com/sagemaker/pricing/instance-types/

In [52]:
m = MXNet("transfer_learning-ps.py",
          source_dir="transfer-learning-code-2classes/",
          debugger_hook_config=False,
          role=role,
          output_path='s3://' + bucket_name + '/' + output_path,
          code_location='s3://' + bucket_name + '/' + code_location,
          train_instance_count=1,
          train_instance_type="ml.p3.2xlarge",
          framework_version="1.6.0",
          py_version="py3",
          hyperparameters={'batch-size': 8,
                           'epochs': 30,
                           'learning-rate': 0.001,
                           'wd': 0.0001,
                           'momentum': 0.9, 
                           'log-interval': 100})

Launch a training job 

In [53]:
JOB_NAME=str(uuid.uuid4())
print(JOB_NAME)
m.fit(inputs,job_name=JOB_NAME)

's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


eedfcc1f-b971-485c-bc84-ab5c482c2220
2020-11-20 12:12:09 Starting - Starting the training job...
2020-11-20 12:12:11 Starting - Launching requested ML instances......
2020-11-20 12:13:15 Starting - Preparing the instances for training......
2020-11-20 12:14:17 Downloading - Downloading input data....................................
2020-11-20 12:20:35 Training - Downloading the training image...
2020-11-20 12:20:55 Training - Training image download completed. Training in progress.[34m2020-11-20 12:20:56,339 sagemaker-training-toolkit INFO     Imported framework sagemaker_mxnet_container.training[0m
[34m2020-11-20 12:20:56,364 sagemaker_mxnet_container.training INFO     MXNet training environment: {'SM_HOSTS': '["algo-1"]', 'SM_NETWORK_INTERFACE_NAME': 'eth0', 'SM_HPS': '{"batch-size":8,"epochs":30,"learning-rate":0.001,"log-interval":100,"momentum":0.9,"wd":0.0001}', 'SM_USER_ENTRY_POINT': 'transfer_learning-ps.py', 'SM_FRAMEWORK_PARAMS': '{}', 'SM_RESOURCE_CONFIG': '{"current_host

### Model Inference

First, create a MXNet SageMaker Model that can be deployed to a SageMaker Endpoint. By default, this will use the SageMaker MXNet Inference toolkit for serving MXNet models on Amazon SageMaker. 

1) This will use  a default framework image for MXNet version specified.

2) Provide the  S3 location of the SageMaker model data .tar.gz file.

3) Provide the path  to the Python inference file which should be executed as the entry point to model hosting.

4) Number of model server workers set to 10 to process parallel invocation requests

In [54]:
import time
JOB_NAME='eedfcc1f-b971-485c-bc84-ab5c482c2220'
print(JOB_NAME)

eedfcc1f-b971-485c-bc84-ab5c482c2220


In [55]:
from sagemaker.mxnet.model import MXNetModel
sagemaker_model2 = MXNetModel(model_data = 's3://' + bucket_name + '/' + output_path  + JOB_NAME + '/output/model.tar.gz', source_dir='inference-code-2classes/',
                                  role = role,framework_version='1.6.0',py_version='py3',entry_point='2class_inference-ps.py',model_server_workers=10,name='sagemaker-activity-detection-model-2classPS-{0}'.format(str(int(time.time()))))
print(sagemaker_model2.name)

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


sagemaker-activity-detection-model-2classPS-1605878581


In [56]:
!cat inference-code-2classes/2class_inference-ps.py

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

from __future__ import absolute_import

import subprocess
import sys
import io
import os
import boto3
import time
import json
import uuid

import mxnet as mx
import numpy as np
from mxnet import gluon,nd
from sagemaker_inference import content_types, default_inference_handler, errors
from io import BytesIO
from datetime import datetime


import gluoncv
from gluoncv.data.transforms import video
from gluoncv.data import VideoClsCustom
from gluoncv.utils.filesystem import try_import_decord

ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu()
#HMDB51 classes
classes = ['fall', 'no_fall']
dict_classes = dict(zip(range(len(classes)), classes))
# ------------------------------------------------------------ #
# Hosting methods                                              #
# ------------------------------------------------------------ #

def model_fn(

### Model hosting 

Deploy the  model on a single g4dn instance. 

G4 is a good platform for  ML inference on images at low cost. G4 is based on the Turing T4 GPU which is purposed built with RTX tracing cores, tensor cores. Here is a link to inference benchmarks from Nvidia
https://developer.nvidia.com/deep-learning-performance-training-inference .
G4 prove to have similar throughput with higher energy efficiency wrt P3 instances, which means they are a good choice for inference tasks at a low cost.

In [57]:
import logging
logging.getLogger().setLevel(logging.WARNING)
#Instance type used for deployment
MODEL_INSTANCE_TYPE = 'ml.g4dn.2xlarge'
#Number of instances used for deployment (could be increased based on the prediction requests)
INSTANCE_COUNT = 1
#Model endpoint name
ENDPOINT_NAME = '2class-endpoint-ps'
predictor = sagemaker_model2.deploy(initial_instance_count=INSTANCE_COUNT,instance_type=MODEL_INSTANCE_TYPE,endpoint_name=ENDPOINT_NAME)

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


---------------!

In [58]:
sm_client = boto3.client('sagemaker')
sm_client.describe_endpoint(EndpointName=ENDPOINT_NAME)['EndpointArn']

'arn:aws:sagemaker:eu-west-1:646744545246:endpoint/2class-endpoint-ps'

### Video inference test

Test ML inference on videos from another free video data source (Pexels)

In [75]:
#video_file = '../videos/357280.mp4'
#video_file = '../videos/377090.mp4'
video_file = './datasets/youtube/videos/fall/377810.avi'
#video_file = './datasets/hmdb51/videos/fall_floor/RETURN_OF_THE_KING_fall_floor_f_nm_np1_fr_med_49.avi'

In [76]:
payload = sagemaker_session.upload_data(path=video_file, key_prefix='data/youtube')
S3_VIDEO_PATH = payload
#Dict data to be passed to the endpoint
data = {
    'S3_VIDEO_PATH': S3_VIDEO_PATH,
}

Invoke endpoint and print results from the API with details 
1. S3 input path
2. Output class
3. Output probability
4. Time of detection

In [81]:
ENDPOINT_NAME='2class-endpoint-ps'
print(ENDPOINT_NAME)
import time
import json
sm_runtime2 = boto3.Session().client('sagemaker-runtime')
a_time = float(time.time())
response2 = sm_runtime2.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json',Accept='application/json',Body=json.dumps(data))
b_time = float(time.time())
#Get and print the response
response_body = json.loads(response2['Body'].read().decode('utf-8'))
#print('Inference on: ', data['S3_VIDEO_PATH'], '-', response_body['Predicted']['S'], "{:.2f}".format(float(response_body['Probability']['S'])*100), '% -', "{:.1f}".format(b_time - a_time), 'secs')
print(response_body)

2class-endpoint-ps
{'S3Path': {'S': 's3://sagemaker-eu-west-1-646744545246/data/youtube/377810.avi'}, 'Predicted': {'S': 'fall'}, 'Probability': {'S': '1.0000'}, 'DateCreatedUTC': {'S': '2020-12-01 13:50:32 '}}


In [63]:
###Iterative:
!pip install awswrangler
import awswrangler as wr
video_files = wr.s3.list_objects('s3://prosegur-cv-poc/dataset/prosegur-avi/')

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m


In [64]:
print(ENDPOINT_NAME)

for a in video_files:
    data = {'S3_VIDEO_PATH': a}
    a_time = float(time.time())
    response = sm_runtime2.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json',Accept='application/json',Body=json.dumps(data))
    b_time = float(time.time())
    #Get and print the response
    response_body = json.loads(response['Body'].read().decode('utf-8'))
    print('Inference on: ', data['S3_VIDEO_PATH'], '-', response_body['Predicted']['S'], "{:.2f}".format(float(response_body['Probability']['S'])*100), '% -', "{:.1f}".format(b_time - a_time), 'secs')


2class-endpoint-ps
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357158.avi - fall 99.93 % - 3.8 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357172.avi - fall 93.40 % - 3.9 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357179.avi - fall 99.31 % - 3.7 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357206.avi - fall 98.80 % - 4.6 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357232.avi - fall 79.47 % - 4.2 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357239.avi - fall 98.27 % - 0.7 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357243.avi - fall 91.55 % - 1.1 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357245.avi - fall 69.81 % - 1.0 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357271.avi - fall 99.71 % - 1.2 secs
Inference on:  s3://prosegur-cv-poc/dataset/prosegur-avi/357273.avi - fall 96.61 % - 0.7 secs
Inference on:  s3://prosegur-cv-poc/datas

Fall properly detected on 138/141 videos (97.87%)

In [65]:
###Negative cases:
video_files = wr.s3.list_objects('s3://prosegur-cv-poc/dataset/negative-avi/')

print(ENDPOINT_NAME)

for a in video_files:
    data = {'S3_VIDEO_PATH': a}
    a_time = float(time.time())
    response = sm_runtime2.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json',Accept='application/json',Body=json.dumps(data))
    b_time = float(time.time())
    #Get and print the response
    response_body = json.loads(response['Body'].read().decode('utf-8'))
    print('Inference on: ', data['S3_VIDEO_PATH'], '-', response_body['Predicted']['S'], "{:.2f}".format(float(response_body['Probability']['S'])*100), '% -', "{:.1f}".format(b_time - a_time), 'secs')


2class-endpoint-ps
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/358844.avi - no_fall 67.29 % - 0.6 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/360952.avi - fall 60.42 % - 0.4 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/360986.avi - fall 98.64 % - 0.4 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/362937.avi - fall 60.27 % - 0.4 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/366191.avi - fall 51.75 % - 0.4 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/367081.avi - no_fall 99.51 % - 0.6 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/373196.avi - no_fall 97.56 % - 0.2 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/374375.avi - fall 66.03 % - 0.5 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/374538.avi - no_fall 98.34 % - 0.3 secs
Inference on:  s3://prosegur-cv-poc/dataset/negative-avi/374543.avi - no_fall 98.49 % - 0.6 secs
Inference on:  s3://proseg

No-fall properly detected on 350/560 videos (62.50%)

In [80]:
symbol = mx.sym.load('./model-symbol.json')
symbol

<Symbol dense0_fwd>