# Convert and Optimize Temporal Shift Module (TSM) with OpenVINO™

The last decade has seen an exponential growth in video data. According to [Statista](https://www.statista.com/statistics/259477/hours-of-video-uploaded-to-youtube-every-minute/), the number of video content hours uploaded every minute on YouTube grew by ~40% between 2014 and 2020. As of June 2022, more than 500 hours of video were uploaded to YouTube every minute. One of the most important tasks that the YouTube platform engages in is the efficient (and automatic) removal of harmful content. 

This requires the accurate detection and recognition of actions, events, and/or context in the videos. This task is called *video understanding* and is one of the grand challenges in the field of Computer Vision as the temporal order of actions is crucial for accurate classification. For example, an algorithm should be able to differentiate between the action of opening and closing a door.

Lin et al. (2019) proposed the [Temporal Shift Module](https://arxiv.org/abs/1811.08383) (TSM) for efficient video understanding. The module allows for joint spatial-temporal modeling by shifting part of the channels along the temporal dimension to exchange information with neighboring frames. The TSM achieves state-of-the-art performance, at the level of 3D convolutional neural networks (CNNs), but at the lower computational cost of 2D CNNs. For more details of the TSM model, see the [paper](https://arxiv.org/abs/1811.08383) and [repository](https://github.com/mit-han-lab/temporal-shift-module).

This tutorial provides step-by-step instructions on how to run and optimize the PyTorch TSM model with OpenVINO.

The tutorial consists of the following steps:
- Prepare and load PyTorch TSM model
- Convert PyTorch model to ONNX
- Convert ONNX model to OpenVINO IR
- Download and prepare dataset
- Compare accuracy of PyTorch, ONNX, and OpenVINO IR models
- Optimize the OpenVINO IR model using post-training 8-bit integer quantization
- Compare accuracy and performance of the FP32 and quantized models.


## Preparation

As a first step, we will import and install some of the required libraries, download the TSM repository, and set up some constants that will be used throughout the tutorial.

### Prerequisites

In [4]:
import sys
import warnings
from os import PathLike
from pathlib import Path
from typing import Dict, Tuple, Optional, Union

import numpy as np
import torch

sys.path.append("../../notebooks/utils")
from notebook_utils import download_file

In [2]:
# Install the pytube library for downloading YouTube videos
!python -m pip install pytube



You should consider upgrading via the 'c:\Code\Internships\OpenVINO\openvino_notebooks\venv\Scripts\python.exe -m pip install --upgrade pip' command.


In [11]:
# Clone TSM repo
if not Path('temporal-shift-module').exists():
    !git clone https://github.com/mit-han-lab/temporal-shift-module
%cd temporal-shift-module

c:\Code\Internships\OpenVINO\openvino_notebooks\playground\101-tsm-quantize\temporal-shift-module


### Settings

In [12]:
# Directory settings
MODEL_DIR = Path("../model/")
DATA_ROOT_DIR = Path("../data/")
DATA_DIR = DATA_ROOT_DIR / 'kinetics400'
IMAGES_DIR = DATA_DIR / 'images'

MODEL_DIR.mkdir(exist_ok=True)
DATA_ROOT_DIR.mkdir(exist_ok=True)

# Paths where PyTorch, ONNX, and OpenVINO IR models will be stored.
weights_filename = 'TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth'
weights_path = Path(MODEL_DIR) / weights_filename
onnx_path = weights_path.with_suffix('.onnx')
ir_path = onnx_path.with_suffix(".xml")

### Load Model

Generally, PyTorch models represent an instance of the [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class, initialized by a state dictionary with model weights.
We will load a TSM model pre-trained on the Kinetics-400 dataset. The TSM uses ResNet-50 as its 2D-CNN backbone. The [repo](https://github.com/mit-han-lab/temporal-shift-module) contains a list of TSM model weights trained on various temporal datasets with different model backbones.

Typical steps to obtain pre-trained model:
1. Download the pre-trained weights
2. Create instance of model class
3. Load checkpoint state dict, which contains pre-trained model weights
4. Turn model to evaluation for switching some operations to inference mode

In [None]:
# Step 1: Download pre-trained model weights
MODEL_LINK = f"https://hanlab.mit.edu/projects/tsm/models/{weights_filename}" 
print(f'Downloading TSM pretrained weights from URL: {MODEL_LINK}')
download_file(MODEL_LINK, directory=MODEL_DIR, show_progress=True)

We instantiate a TSM model with the following arguments:
* `num_class` - Number of classes (or labels) in the dataset
* `num_segments` - Temporal size of the input image sequence
* `modality` - Type of input image used (e.g. *RGB*, *Flow*, *RGBDiff*)
* `base_model` - 2D-CNN model into which the TSM will be injected
* `is_shift` - Apply temporal shift

In [13]:
from ops.models import TSN

# Input settings
SEGMENT_SIZE = 8
IMAGE_WIDTH = 224
IMAGE_HEIGHT = 224
CHANNELS = 3
MODALITY = 'RGB'

# Step 2: Create instance of TSM model
model = TSN(
    num_class=400,
    num_segments=SEGMENT_SIZE,
    modality=MODALITY,
    base_model='resnet50',    
    is_shift=True
)

# Step 3: Load checkpoint state dict
checkpoint = torch.load(weights_path, map_location='cpu')['state_dict']

# Remove prefix 'module.' from model structure names
base_dict = {k.replace('module.', '', 1): v for k, v in checkpoint.items()}
model.load_state_dict(base_dict)

# Step 4: Set the model to inference mode
model.eval()

# Record some model attributes that will be used later
model_attr = {
    'scale_size': model.scale_size,
    'input_size': model.input_size,
    'input_mean': model.input_mean,
    'input_std': model.input_std
}


    Initializing TSN with base model: resnet50.
    TSN Configurations:
        input_modality:     RGB
        num_segments:       8
        new_length:         1
        consensus_module:   avg
        dropout_ratio:      0.8
        img_feature_dim:    256
            
=> base model: resnet50




Adding temporal shift...
=> n_segment per stage: [8, 8, 8, 8]
=> Processing stage with 3 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Processing stage with 4 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Processing stage with 6 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Processing stage with 3 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8


## ONNX and OpenVINO IR Model Conversion

### Convert PyTorch model to ONNX

OpenVINO supports PyTorch models that are exported in ONNX format. We will use the `torch.onnx.export` function to obtain the ONNX model, you can learn more about this feature in the [PyTorch documentation](https://pytorch.org/docs/stable/onnx.html). We need to provide a model object, example input for model tracing and path where the model will be saved. When providing example input, it is not necessary to use real data, dummy input data with specified shape is sufficient. Optionally, we can provide a target onnx opset for conversion and/or other parameters specified in documentation (e.g. input and output names or dynamic shapes).

Sometimes a warning will be shown, but in most cases it is harmless, so let us just filter it out. When the conversion is successful, the last line of the output will read: `ONNX model exported to ..\model\TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.onnx`.

In [8]:
with warnings.catch_warnings():
    warnings.filterwarnings("ignore")

    if not onnx_path.exists():
        dummy_input = torch.randn(1, SEGMENT_SIZE, CHANNELS, IMAGE_HEIGHT, IMAGE_WIDTH)                
        torch.onnx.export(
            model,
            dummy_input,
            onnx_path,
            opset_version=11,           # the ONNX version to export the model to
            input_names=['input'],      # the model's input names
            output_names=['output'],    # the model's output names
            dynamic_axes={
                'input': {0: 'batch_size'},  # variable length axes
                'output': {0: 'batch_size'}
            }
        )
        print(f"ONNX model exported to {onnx_path}.")
    else:
        print(f"ONNX model {onnx_path} already exists.")

ONNX model exported to ..\model\TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.onnx.


### Convert ONNX Model to OpenVINO Intermediate Representation (IR)

While ONNX models are directly supported by OpenVINO runtime, it can be useful to convert them to IR format to take the advantage of OpenVINO optimization tools and features. The `mo.convert_model` python function in OpenVINO Model Optimizer can be used for converting the model. The function returns instance of OpenVINO Model class, which is ready to use in Python interface. However, it can also be serialized to OpenVINO IR format for future execution.

In [14]:
from openvino.tools import mo
from openvino.runtime import serialize

model_ir = mo.convert_model(onnx_path)

# Save IR model for future use
serialize(model_ir, str(ir_path))

## Verify model accuracy

To confirm the successful conversion of the models, we will compare the accuracy of the converted models with that of the PyTorch model. We will evaluate the models on a subset of the [Kinetics-400](https://www.deepmind.com/open-source/kinetics) dataset.

### Download and prepare dataset

The [Kinetics-400](https://www.deepmind.com/open-source/kinetics) is a human action video dataset that contains 400 human action classes, such as *making tea*, *shaking hands*, and *playing saxophone*. The videos are taken from YouTube and have a variable resolution and frame rate.
The dataset is split into 3 partitions: *train*, *validate*, and *test*. We will use the *validate* split for the evaluation task.

The *validate* split consists of 17,727 video records. For demonstration purposes, we will use a very small portion of this split.

We will follow the steps below for preparing the dataset for evaluation:
1. Download the [dataset files](https://storage.googleapis.com/deepmind-media/Datasets/kinetics400.tar.gz) which contain the list of videos for each partition and the meta-data for each video.
2. Download a limited number of YouTube videos listed in the *validate* partition file.
3. Convert each video (or a portion thereof) to image frames, and save these in folders corresponding to their class labels.
4. Generate mappings between each video sequence and class indices.

The code below is adapted from *vid2img_kinetics.py* and *gen_label_kinetics.py* files in the TSM [repo](https://github.com/mit-han-lab/temporal-shift-module).

In [27]:
import tarfile

DATA_URL = 'https://storage.googleapis.com/deepmind-media/Datasets/kinetics400.tar.gz'

# Download dataset partition information
download_file(DATA_URL, directory=DATA_ROOT_DIR, show_progress=True)

with tarfile.open(DATA_ROOT_DIR / 'kinetics400.tar.gz', "r") as tar_ref:
    tar_ref.extractall(DATA_ROOT_DIR)

In [15]:
import re
from csv import DictReader
from pytube import YouTube
from pytube.exceptions import PytubeError


def sanitise_class_labels(label: str) -> str:
    """ 
    Filter out unwanted characters from class names/labels.
    
    :param label: class label
    :returns: sanitised class label  
    """
    label = re.sub(' ', '_', label)
    label = re.sub(r"[()'\"]", '', label)
    return label


def download_youtube_video(video_id: str, filename: PathLike, directory: PathLike):
    """
    Download a YouTube video to a given file path

    :param video_id: ID of YouTube video
    :param filename: Name of the local file to save
    :param directory: Directory to save the file to
    """
    # Create output directory if it doesn't exist
    directory = Path(directory)
    if not directory.exists():
        directory.mkdir(parents=True)

    yt = YouTube.from_id(video_id)

    try:        
        streams = yt.streams.filter(file_extension='mp4').get_highest_resolution()
        streams.download(output_path=directory, filename=filename)        
    except PytubeError as err:
        raise Exception(f"Downloading of video {video_id} failed with error: {err}") from None


def prepare_dataset(video_list_file: Path, data_dir: Path, max_videos: Optional[int] = None):
    """
    Download videos listed in file and convert them to image frames.
    
    :param video_list_file: Full path to file that contains information of videos 
                            to be downloaded. Each line should contain the following:
                            [label,youtube_id,time_start,time_end,dataset_split]
    :param data_dir: Root directory to save videos and extracted image frames to
    :param max_videos: If provided, the maximum number of videos to download
    """

    # Create output directories
    video_root_dir = data_dir / 'videos/'
    imgs_root_dir = data_dir / 'images/'
    
    with open(video_list_file, 'r') as f:
        # Read in the list of videos to download and additional information
        dict_reader = DictReader(f)
        video_list = list(dict_reader)
        print(f'Found {len(video_list)} video records in file "{video_list_file.name}"')
        
        print(f'\nProcessing videos:')
        for i, info in enumerate(video_list):
            if max_videos is not None and i >= max_videos:
                break
            
            youtube_id = info['youtube_id']

            # Extract and sanitise class name
            class_name = sanitise_class_labels(info['label'])           

            # Save video to the associated class directory
            video_path = video_root_dir / class_name / f"v{youtube_id}.mp4"
            if not video_path.parent.exists():
                video_path.parent.mkdir(parents=True)

            # Download YouTube video
            print(f"\t[{class_name}] Downloading '{video_path.name}' ... ", end='')
            try:                
                download_youtube_video(
                    video_id=youtube_id,
                    filename=video_path.name,
                    directory=video_path.parent
                )
                print('downloaded')
            except Exception as err:
                print(f"\n\t[ERROR]: {err}")
                continue
            
            # Extract image frames from video segment
            seg_start = int(info['time_start'])
            seg_end = int(info['time_end'])

            # Create images directory with video name
            imgs_out_dir = imgs_root_dir / class_name / video_path.stem
            if imgs_out_dir.exists() and len(list(imgs_out_dir.glob('*.jpg'))) > 0:
                print('\t** Conversion already done **\n')
                continue
            else:
                imgs_out_dir.mkdir(parents=True)
            
            print(f'\tExtracting video frames (segment: {seg_start} - {seg_end} seconds)\n')
            extract_cmd = f'ffmpeg -ss {seg_start} -i "{str(video_path)}" ' \
                            f'-to {seg_end} -loglevel error ' \
                            f'-threads 1 -vf scale=-1:331 ' \
                            f'-q:v 0 "{str(imgs_out_dir)}/img_%05d.jpg"'
            ! $extract_cmd            

    print('Dataset preparation complete.')


We limit the number of videos to download to **10**. Feel free to adjust the `max_videos` value or set it to `None` if you wish to download the entire split.

In [23]:
prepare_dataset(
    video_list_file=DATA_DIR / 'validate.csv',
    data_dir=DATA_DIR,
    max_videos=10
)

Found 17727 video records in file "validate.csv"

Processing videos:
	[javelin_throw] Downloading 'v--07WQ2iBlw.mp4' ... downloaded
	** Conversion already done **

	[flipping_pancake] Downloading 'v--33Lscn6sk.mp4' ... downloaded
	** Conversion already done **

Dataset preparation complete.


In [16]:
def generate_video_class_mappings(labels_file: PathLike, imgs_root_dir: PathLike, out_file: PathLike):
    """ 
    Generate a file containing image paths mapped to corresponding class indices and frame counts.

    :param labels_file: Path to file that contains a list of all dataset labels
    :param imgs_root_dir: Directory containing extracted video image frames
    :param out_file: Path to output file
    """

    # Read Kitenics-400 class list from file
    with open(labels_file) as f:
        classes = f.readlines()
        classes = [sanitise_class_labels(c.strip()) for c in classes]
        print(f'{len(classes)} classes found')

    # Map classes to numeric indices
    class_mapping = {c: i for i, c in enumerate(classes)}

    # Loop through image frame folders and map each video to a class index
    imgs_root_dir = Path(imgs_root_dir)

    output = []

    for class_dir in imgs_root_dir.iterdir():
        class_name = class_dir.name
        class_index = class_mapping[class_name]

        for imgs_dir in class_dir.iterdir():
            if imgs_dir.is_dir():
                # Count the number of frames in each video folder
                num_frames = len(list(imgs_dir.iterdir()))
                img_rel_path = '/'.join(imgs_dir.parts[-2:])
                info = f'{img_rel_path} {num_frames} {class_index}'
                output.append(info)

    if len(output) > 0:
        with open(out_file, 'w') as f:
            f.write('\n'.join(output))

In [17]:
# File containing all 400 labels of the Kinetics-400 dataset
labels_file = './tools/kinetics_label_map.txt'
# The full path to the output file
video_mapping_file = DATA_DIR / 'labels/video_class_mappings.txt'

# Create parent directory for output file
Path(video_mapping_file).parent.mkdir(parents=True, exist_ok=True)

generate_video_class_mappings(labels_file, IMAGES_DIR, video_mapping_file)

400 classes found


### Create Dataloader

In [18]:
from torch.utils.data import DataLoader
from ops.dataset import TSNDataSet
from ops.transforms import *

# Initialise model transforms
transforms = torchvision.transforms.Compose([
    GroupScale(model_attr['scale_size']),
    GroupCenterCrop(model_attr['input_size']),
    Stack(roll=False), 
    ToTorchFormatTensor(div=True),
    GroupNormalize(model_attr['input_mean'], model_attr['input_std']),
])

# Create dataset and data loader
dataset = TSNDataSet(
    root_path=IMAGES_DIR,
    list_file=video_mapping_file,
    num_segments=SEGMENT_SIZE,
    new_length=1,
    modality=MODALITY,
    test_mode=True,
    remove_missing=True,
    transform=transforms
)

data_loader = DataLoader(dataset, batch_size=32, pin_memory=True, shuffle=False)

video number:3




### Define validation functions

We will adopt the evaluation metrics used in the TSM [paper](https://arxiv.org/abs/1811.08383), namely, the Top-1 and Top-5 accuracy scores. We will use scikit-learn's [top_k_accuracy_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.top_k_accuracy_score.html) function which computes the number of times the correct label is in the top `k` predictions (ranked by predicted scores).

In [34]:
def prepare_inputs(images: torch.Tensor) -> torch.Tensor:
    """
    Expand the input image sequence shape to include a temporal dimension.
        
    :param images: Image sequence of shape [N,T*C,H,W]
    :returns: Image sequence reshaped to [N,T,C,H,W]
    """
    
    [batch_size, tc, height, width] = images.size()

    # Decompose the second dimension into channels and number of segments (temporal size)
    channels = 3  # RGB
    n_segments = int(tc / channels)
    assert channels * n_segments == tc

    images = images.view(batch_size, n_segments, channels, height, width)
    return images

In [35]:
import openvino.runtime as ov
from sklearn.metrics import top_k_accuracy_score

def validate(model: Union[ov.CompiledModel, torch.nn.Module],
             validation_loader: DataLoader,
             top_k: Tuple[int , ...] = (1,)) -> Dict[int, float]:
    """ 
    Evaluate TSM model and compute accuracy metrics.

    :param model: Model to validate
    :param validation_loader: Validation dataset
    :param top_k: Number of top elements to look at for computing accuracy. Allows
                  for the computation of multiple top_k metrics.
    :returns: Accuracy scores for all `k`s provided
    """
    predictions = []
    references = []

    is_compiled = isinstance(model, ov.CompiledModel)
    output = model.outputs[0] if is_compiled else None
  
    for images, target in validation_loader:
        images = prepare_inputs(images)

        if is_compiled:
            pred = model(images)[output]
        else:
            pred = model(images)
            pred = pred.detach().numpy()

        predictions.append(pred)
        references.append(target)

    predictions = np.concatenate(predictions, axis=0)
    references = np.concatenate(references, axis=0)

    # Generate a list of all class "labels" (i.e. indices)
    class_indices = list(range(predictions.shape[-1]))

    scores = {}
    for k in top_k:
        scores[k] = top_k_accuracy_score(references, predictions, k=k, labels=class_indices)        
    return scores

### Evaluate PyTorch, ONNX, and IR models

In [24]:
from openvino.runtime import Core

core = Core()

# Read converted models
model_ir = core.read_model(ir_path)
model_onnx = core.read_model(onnx_path)

# Compile models on CPU device
compiled_model_ir = core.compile_model(model_ir, 'CPU')
compiled_model_onnx = core.compile_model(model_onnx, 'CPU')

In [33]:
val_models = {
    'PyTorch': model,
    'ONNX': compiled_model_onnx,
    'OpenVINO IR': compiled_model_ir
}

print(f'Model Accuracy Comparison\n{"-" * 60}')
for name, val_model in val_models.items():
    scores = validate(val_model, data_loader, top_k=(1, 5))

    # Print results
    print(f'{name + " Model":17s}', end='')
    for k, score in scores.items():
        print(f'  |  Prec@{k}: {score * 100:-6.2f} %', end='')
    print('')


Model Accuracy Results
------------------------------------------------------------
OpenVINO IR Model  |  Prec@1:  66.67 %  |  Prec@5: 100.00 %
ONNX Model         |  Prec@1:  66.67 %  |  Prec@5: 100.00 %
PyTorch Model      |  Prec@1:  66.67 %  |  Prec@5: 100.00 %


## Optimize model using NNCF Post-training Quantization API

The OpenVINO™ [Neural Network Compression Framework](https://docs.openvino.ai/latest/nncf_ptq_introduction.html) (NNCF) provides a suite of advanced algorithms for Neural Networks inference optimization with minimal accuracy drop.

We will apply the post-training 8-bit integer quantization method which converts weights and activations from floating-point precision to integer precision, thus reducing the model size, memory footprint, and latency, as well as improving the computational efficiency using integer arithmetic.

The optimization process contains the following steps:
* Prepare the calibration dataset that is used to estimate quantization parameters of the activations within the model.
* Call `nncf.quantize` to apply 8-bit quantization to the model.
* Save the quantized model using `openvino.runtime.serialize`.

We will re-use the validation dataloader for the quantization process. This is achieved by wrapping the dataloader into the `nncf.Dataset` object and defining transformation function which extracts the input data and returns it in the state required by the model.

Note that the quantization process may take quite some time.

In [35]:
def transform_fn(data_item):
    """
    Quantization transform function. Extracts and preprocesses input data from dataloader item
    for quantization.

    :param data_item: Tuple with data item produced by DataLoader during iteration
    :returns: Input data for quantization
    """
    images, _ = data_item
    return prepare_inputs(images)


In [60]:
import nncf

calibration_dataset = nncf.Dataset(data_loader, transform_fn)
print('Calibration dataset created.')

quantized_model = nncf.quantize(model_ir, calibration_dataset, subset_size=1, preset=nncf.QuantizationPreset.MIXED)
print('Model successfully quantized.')

# Create filename by appending '_int8' to IR model filename 
quant_path = ir_path.parent / f'{ir_path.stem}_int8{ir_path.suffix}'

# Save quantized model
serialize(quantized_model, str(quant_path))
print(f'Quantized model saved to {quant_path}.')

INFO:openvino.tools.pot.pipeline.pipeline:Inference Engine version:                2022.3.0-9052-9752fafe8eb-releases/2022/3
INFO:openvino.tools.pot.pipeline.pipeline:Model Optimizer version:                 2022.3.0-9052-9752fafe8eb-releases/2022/3
INFO:openvino.tools.pot.pipeline.pipeline:Post-Training Optimization Tool version: 2022.3.0-9052-9752fafe8eb-releases/2022/3
INFO:openvino.tools.pot.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:openvino.tools.pot.statistics.collector:Computing statistics finished
INFO:openvino.tools.pot.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:openvino.tools.pot.algorithms.quantization.default.algorithm:Start computing statistics for algorithm : ActivationChannelAlignment
INFO:openvino.tools.pot.algorithms.quantization.default.algorithm:Computing statistics finished
INFO:openvino.tools.pot.algorithms.quantization.default.algorithm:Start computing statistics for algorithms : MinMaxQuantization,F

### Validate quantized model accuracy

In [64]:
int8_compiled_model = core.compile_model(quantized_model)

int8_scores = validate(int8_compiled_model, data_loader, top_k=(1, 5))

# Print results
print('Quantized Model Accuracy Results:')
for k, score in int8_scores.items():
    print(f'\tPrec@{k}: {score * 100:-6.2f} %')


Quantized Model Accuracy Results:
	Prec@1:  66.67 %
	Prec@5: 100.00 %


## Compare Performance of the Original and Quantized Models
Finally, use the OpenVINO [Benchmark Tool](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html) to measure the inference performance of the `FP32` and `INT8` models.

> **NOTE**: For more accurate performance, it is recommended to run `benchmark_app` in a terminal/command prompt after closing other applications. Run `benchmark_app -m model.xml -d CPU` to benchmark async inference on CPU for one minute. Change `CPU` to `GPU` to benchmark on GPU. Run `benchmark_app --help` to see an overview of all command-line options.

In [89]:
# Inference FP32 model (OpenVINO IR)
input_shape = f'"{[1, SEGMENT_SIZE, CHANNELS, IMAGE_HEIGHT, IMAGE_WIDTH]}"'
print(f"!benchmark_app -m {ir_path} -shape {input_shape} -d CPU -api async")

!benchmark_app -m {ir_path} -shape {input_shape} -d CPU -api async


input_shape: "[1, 8, 3, 224, 224]"
!benchmark_app -m ..\model\TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.xml -shape "[1, 8, 3, 224, 224]" -d CPU -api async
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 1204.65 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input (node: input) : f32 / [...] / [?,8,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [?,400]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 

In [91]:
# Inference INT8 model (OpenVINO IR)
print(f"!benchmark_app -m {quant_path} -shape {input_shape} -d CPU -api async")
!benchmark_app -m {quant_path} -shape {input_shape} -d CPU -api async

!benchmark_app -m ..\model\TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50_int8.xml -shape "[1, 8, 3, 224, 224]" -d CPU -api async
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 1259.10 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input (node: input) : f32 / [...] / [?,8,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [?,400]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[ INFO ] Reshaping model: 'i