# Action recognition using TAO ActionRecognitionNet

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">


## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* USE a Trained 3D RGB model for action recognition on the subset of [HMDB51](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) dataset.
* Evaluate the trained model.
* Run Inference on the trained model.
* Export the trained model to a .etlt file for deployment to DeepStream.

## Table of Contents

This notebook shows an example usecase of ActionRecognitionNet using Train Adapt Optimize (TAO) Toolkit.

1. Set up env variables and map drives
2. Prepare dataset
3. Evaluate the trained model
4. Inference
5. Deploy



## 1. Set up env variables and map drives <a class="anchor" id="head-0"></a>

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.




Please **FIX** the paths to data, specs, results ad project directory based on your workspace. Data, results and sppecs directories are inside project directory

In [24]:
%env HOST_DATA_DIR= /home/jupyter/imported_files/files/action_recognition_net/data
%env HOST_SPECS_DIR= /home/jupyter/imported_files/files/action_recognition_net/specs
%env HOST_RESULTS_DIR= /home/jupyter/imported_files/files/action_recognition_net/results
%env HOST_PROJECT_DIR= /home/jupyter/imported_files/files/action_recognition_net

# Set your encryption key, and use the same key for all commands
%env KEY = nvidia_tao

env: HOST_DATA_DIR=/home/jupyter/imported_files/files/action_recognition_net/data
env: HOST_SPECS_DIR=/home/jupyter/imported_files/files/action_recognition_net/specs
env: HOST_RESULTS_DIR=/home/jupyter/imported_files/files/action_recognition_net/results
env: HOST_PROJECT_DIR=/home/jupyter/imported_files/files/action_recognition_net
env: KEY=nvidia_tao


In [3]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

### 2.1 Prepare dataset

 We will be using the [HMDB51](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) dataset for the tutorial. Download the HMDB51 dataset and unrar them firstly (We choose fall_floor/ride_bike for this tutorial): 

In [4]:
# download the dataset and unrar the files
!wget -P $HOST_DATA_DIR "https://github.com/shokoufeh-monjezi/TAOData/releases/download/v1.0/hmdb51_org.zip"
!mkdir -p $HOST_DATA_DIR/videos && unzip  $HOST_DATA_DIR/hmdb51_org.zip -d $HOST_DATA_DIR/videos
!mkdir -p $HOST_DATA_DIR/raw_data
!unzip $HOST_DATA_DIR/videos/hmdb51_org/fall_floor.zip -d $HOST_DATA_DIR/raw_data
!unzip $HOST_DATA_DIR/videos/hmdb51_org/ride_bike.zip -d $HOST_DATA_DIR/raw_data

--2023-01-09 22:07:25--  https://github.com/shokoufeh-monjezi/TAOData/releases/download/v1.0/hmdb51_org.zip
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/490854586/606f6396-a03c-4b1d-9749-3983bd0da295?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230109%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230109T220725Z&X-Amz-Expires=300&X-Amz-Signature=4ee174290c35b600441e8370ff0954abc4314944161dfe9cd38b64adb19514cf&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=490854586&response-content-disposition=attachment%3B%20filename%3Dhmdb51_org.zip&response-content-type=application%2Foctet-stream [following]
--2023-01-09 22:07:25--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/490854586/606f6396-a03c-4b1d-9749-3983bd0da295?X-A

Clone the dataset process script

In [5]:
!if [ -d tao_toolkit_recipes ]; then rm -rf tao_toolkit_recipes; fi
!git clone https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes

Cloning into 'tao_toolkit_recipes'...
remote: Enumerating objects: 161, done.[K
remote: Counting objects: 100% (161/161), done.[K
remote: Compressing objects: 100% (114/114), done.[K
remote: Total 161 (delta 56), reused 128 (delta 29), pack-reused 0[K
Receiving objects: 100% (161/161), 220.93 KiB | 3.25 MiB/s, done.
Resolving deltas: 100% (56/56), done.


Install the dependency for data generator:

In [6]:
!pip3 install xmltodict opencv-python

Defaulting to user installation because normal site-packages is not writeable
Collecting xmltodict
  Downloading xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB)
Installing collected packages: xmltodict
Successfully installed xmltodict-0.13.0


Run the process script. 

In [7]:
!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && bash ./preprocess_HMDB_RGB.sh $HOST_DATA_DIR/raw_data $HOST_DATA_DIR/processed_data

/home/jupyter/imported_files/files/action_recognition_net/data/raw_data
/home/jupyter/imported_files/files/action_recognition_net/data/processed_data
Preprocess fall_floor
f cnt: 55.0
f cnt: 51.0
f cnt: 64.0
f cnt: 34.0
f cnt: 110.0
f cnt: 63.0
f cnt: 72.0
f cnt: 49.0
f cnt: 48.0
f cnt: 74.0
f cnt: 72.0
f cnt: 47.0
f cnt: 72.0
f cnt: 79.0
f cnt: 47.0
f cnt: 55.0
f cnt: 77.0
f cnt: 60.0
f cnt: 79.0
f cnt: 57.0
f cnt: 79.0
f cnt: 49.0
f cnt: 50.0
f cnt: 48.0
f cnt: 59.0
f cnt: 86.0
f cnt: 50.0
f cnt: 43.0
f cnt: 49.0
f cnt: 46.0
f cnt: 79.0
f cnt: 54.0
f cnt: 63.0
f cnt: 148.0
f cnt: 49.0
f cnt: 50.0
f cnt: 73.0
f cnt: 54.0
f cnt: 48.0
f cnt: 50.0
f cnt: 48.0
f cnt: 74.0
f cnt: 50.0
f cnt: 74.0
f cnt: 45.0
f cnt: 47.0
f cnt: 78.0
f cnt: 48.0
f cnt: 51.0
f cnt: 49.0
f cnt: 49.0
f cnt: 49.0
f cnt: 55.0
f cnt: 49.0
f cnt: 49.0
f cnt: 51.0
f cnt: 49.0
f cnt: 51.0
f cnt: 78.0
f cnt: 50.0
f cnt: 48.0
f cnt: 47.0
f cnt: 56.0
f cnt: 76.0
f cnt: 79.0
f cnt: 56.0
f cnt: 49.0
f cnt: 44.0
f cnt: 58.

We also provide scripts to preprocess optical flow dataset. The following cells for processing optical flow dataset is `Optional`.

`OPTIONAL:` Download the app based on NVOF SDK to generate optical flow. It is packaged with this notebook.

In [44]:
#!echo <passwd> | sudo -S apt install -y libfreeimage-dev

`OPTIONAL` Run the process script for HMDB. 

`IMPORTANT NOTE`: to run the `preprocess_HMDB.sh` generating optical flow, a Turing or Ampere above GPU is needed. 

In [None]:
#!cp ./AppOFCuda tao_toolkit_recipes/tao_action_recognition/data_generation/
#!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && bash ./preprocess_HMDB.sh $HOST_DATA_DIR/raw_data $HOST_DATA_DIR/processed_data

In [8]:
# download the split files and unrar

!wget -P $HOST_DATA_DIR https://github.com/shokoufeh-monjezi/TAOData/releases/download/v1.0/test_train_splits.zip
!mkdir -p $HOST_DATA_DIR/splits && unzip  $HOST_DATA_DIR/test_train_splits.zip -d $HOST_DATA_DIR/splits

--2023-01-09 22:13:32--  https://github.com/shokoufeh-monjezi/TAOData/releases/download/v1.0/test_train_splits.zip
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/490854586/583fd76f-6b90-4a6b-b85b-282ff2c9e448?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230109%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230109T221332Z&X-Amz-Expires=300&X-Amz-Signature=e72823b9fdc7a2812bf7482355738fea4e77a7045431ba04a60e26d3ba2e3c24&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=490854586&response-content-disposition=attachment%3B%20filename%3Dtest_train_splits.zip&response-content-type=application%2Foctet-stream [following]
--2023-01-09 22:13:32--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/490854586/583fd76f-6b90-4a6b-b85b-28

In [9]:
# run split_HMDB to generate training split
!if [ -d $HOST_DATA_DIR/train ]; then rm -rf $HOST_DATA_DIR/train $HOST_DATA_DIR/test; fi
!cd tao_toolkit_recipes/tao_action_recognition/data_generation/ && python3 ./split_dataset.py $HOST_DATA_DIR/processed_data $HOST_DATA_DIR/splits/test_train_splits/testTrainMulti_7030_splits $HOST_DATA_DIR/train  $HOST_DATA_DIR/test

Split 1: 
 Train: 140
 Test: 60


In [10]:
# verify
!ls -l $HOST_DATA_DIR/train
!ls -l $HOST_DATA_DIR/train/ride_bike
!ls -l $HOST_DATA_DIR/test
!ls -l $HOST_DATA_DIR/test/ride_bike

total 16
drwxr-xr-x 72 jupyter jupyter  4096 Jan  9 22:13 fall_floor
drwxr-xr-x 72 jupyter jupyter 12288 Jan  9 22:13 ride_bike
total 280
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09 '#437_How_To_Ride_A_Bike_ride_bike_f_cm_np1_ba_med_0'
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09 '#437_How_To_Ride_A_Bike_ride_bike_f_cm_np1_ba_med_1'
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09 '#437_How_To_Ride_A_Bike_ride_bike_f_cm_np1_ba_med_3'
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09 '#437_How_To_Ride_A_Bike_ride_bike_f_cm_np1_fr_med_2'
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09  1989_Tour_de_France_Final_Time_Trial_ride_bike_f_cm_np1_ba_med_0
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09  1989_Tour_de_France_Final_Time_Trial_ride_bike_f_cm_np1_ba_med_1
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09  1989_Tour_de_France_Final_Time_Trial_ride_bike_f_cm_np1_ba_med_2
drwxr-xr-x 3 jupyter jupyter 4096 Jan  9 22:09  1989_Tour_de_France_Final_Time_Trial_ride_bike_f_cm_np1_fr_med_6
drwxr-

## Evaluate trained models <a class="anchor" id="head-4"></a>

We pulled trained TAO action recognition model from NGC "nvidia/tao/actionrecognitionnet:trainable_v1.0"  and finetuned the model in another notebook and called the result checkpoint "rgb_only_model.tlt" and used it in this notebook for evaluating, inferencing and deploying. 
We provide two different sample strategy to evaluate the pretrained model on video clips.

* `center` mode: pick up the middle frames of a sequence to do inference. For example, if the model requires 32 frames as input and a video clip has 128 frames, then we will choose the frames from index 48 to index 79 to do the inference. 
* `conv` mode: convolutionly sample 10 sequences out of a single video and do inference. The final results are averaged.

Evaluate RGB model trained with PTM

In [22]:
!action_recognition evaluate \
                    -e $HOST_SPECS_DIR/evaluate_rgb.yaml \
                    -k $KEY \
                    model=$HOST_PROJECT_DIR/rgb_only_model.tlt  \
                    batch_size=1 \
                    test_dataset_dir=$HOST_DATA_DIR/test \
                    video_eval_mode=center

'evaluate_rgb.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
ResNet3d(
  (conv1): Conv3d(3, 64, kernel_size=(5, 7, 7), stride=(2, 2, 2), padding=(2, 3, 3), bias=False)
  (bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool3d(kernel_size=(1, 3, 3), stride=2, padding=(0, 1, 1), dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock3d(
      (conv1): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (bn2): BatchNorm3d(64, eps=1

`Optional:` Evaluate OF model

## 6. Inferences <a class="anchor" id="head-5"></a>
In this section, we run the action recognition inference tool to generate inferences with the trained RGB models and print the results. 

There are also two modes for inference just like evaluation: `center` mode and `conv` mode. And the final output will show each input sequence label in the videos like:
`[video_sample_path] [labels list for sequences in the video sample]`

In [23]:
!action_recognition inference \
                    -e $HOST_SPECS_DIR/infer_rgb.yaml \
                    -k $KEY \
                    model=$HOST_PROJECT_DIR/rgb_only_model.tlt \
                    inference_dataset_dir=$HOST_DATA_DIR/test/ride_bike \
                    video_inf_mode=center

    'infer_rgb.yaml' is validated against ConfigStore schema with the same name.
    This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
    See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
    
ResNet3d(
  (conv1): Conv3d(3, 64, kernel_size=(5, 7, 7), stride=(2, 2, 2), padding=(2, 3, 3), bias=False)
  (bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool3d(kernel_size=(1, 3, 3), stride=2, padding=(0, 1, 1), dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock3d(
      (conv1): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (bn2): BatchNo

`Optional:` Inference with OF-only model

## 7. Deploy! <a class="anchor" id="head-6"></a>

In [25]:
!mkdir -p $HOST_RESULTS_DIR/export

In [None]:
# Export the RGB model to encrypted ONNX model
!action_recognition export \
                   -e $HOST_SPECS_DIR/export_rgb.yaml \
                   -k $KEY \
                   model=$HOST_PROJECT_DIR/rgb_only_model.tlt\
                   output_file=$HOST_RESULTS_DIR/export/rgb_resnet18_3.etlt

In [28]:
print('Exported model:')
print('------------')
!ls -lth $HOST_RESULTS_DIR/export

Exported model:
------------
total 127M
-rw-r--r-- 1 jupyter jupyter 127M Jan  9 23:08 rgb_resnet18_3.etlt


This notebook has come to an end. You may continue by deploying this RGB model to [DeepStream](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_3D_Action.html)