In [None]:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.

## Introduction
This tutorial goes through how to use model zoo provided by PytorchVideo/Accelerator. To use model zoo in PytorchVideo/Accelerator, we should generally follow several steps:
- Use model builder to build selected model; 
- Load pretrain checkpoint;
- (Optional) Finetune;
- Deploy.

Before we start, let's install PytorchVideo.

In [None]:
!pip install pytorchvideo

## Use model builder to build selected model
We use model builder in PytorchVideo/Accelerator model zoo to build pre-defined efficient model. Here we use EfficientX3D-XS (for mobile_cpu) as an example. For more available models and details, please refer to [this page].

EfficientX3D-XS is an implementation of X3D-XS network as described in [X3D paper](https://arxiv.org/abs/2004.04730) using efficient blocks. It is arithmetically equivalent with X3D-XS, but our benchmark on mobile phone shows 4.6X latency reduction compared with vanilla implementation.

In order to build EfficientX3D-XS, we simply do the following:

In [1]:
from pytorchvideo.models.accelerator.mobile_cpu.efficient_x3d import EfficientX3d
model_efficient_x3d_xs = EfficientX3d(expansion='XS', head_act='identity')

Note that now the efficient blocks in the model are in original form, so the model is good for further training.

## Load pretrain checkpoint and (optional) finetune
For each model in model zoo, we provide pretrain checkpoint state_dict for model in original form. See [this page] for details about checkpoints and where to download them.

In [2]:
from torch.hub import load_state_dict_from_url
checkpoint_path = 'https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/efficient_x3d_xs_original_form.pyth'
checkpoint = load_state_dict_from_url(checkpoint_path)

model_efficient_x3d_xs.load_state_dict(checkpoint)

Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/efficient_x3d_xs_original_form.pyth" to /home/thisiswooyeol/.cache/torch/hub/checkpoints/efficient_x3d_xs_original_form.pyth


  0%|          | 0.00/14.8M [00:00<?, ?B/s]

<All keys matched successfully>

Now the model is ready for fine-tune. 

## Deploy
Now the model is ready to deploy. First of all, let's convert the model into deploy form. In order to do that, we need to use `convert_to_deployable_form` utility and provide an example input tensor to the model. Note that once the model is converted into deploy form, the input size should be the same as the example input tensor size during conversion.

In [3]:
import torch
from pytorchvideo.accelerator.deployment.mobile_cpu.utils.model_conversion import (
    convert_to_deployable_form,
)
input_blob_size = (1, 3, 4, 160, 160)
input_tensor = torch.randn(input_blob_size)
model_efficient_x3d_xs_deploy = convert_to_deployable_form(model_efficient_x3d_xs, input_tensor)

We can see that the network graph has been changed after conversion, which did kernel and graph optimization.

In [4]:
print(model_efficient_x3d_xs_deploy)

EfficientX3d(
  (s1): Sequential(
    (pathway0_stem_conv_xy): Conv3dTemporalKernel1BnAct(
      (kernel): Sequential(
        (conv): _Conv3dTemporalKernel1Decomposed(
          (conv2d_eq): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        )
        (act): Identity(
          (act): Identity()
        )
      )
    )
    (pathway0_stem_conv): Conv3d5x1x1BnAct(
      (kernel): Sequential(
        (conv): _Conv3dTemporalKernel5Decomposed(
          (_conv2d_0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), groups=24, bias=False)
          (_conv2d_1): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), groups=24, bias=False)
          (_conv2d_2): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), groups=24)
          (_conv2d_3): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), groups=24, bias=False)
          (_conv2d_4): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), groups=24, bias=False)
          (_add_funcs): ModuleList(
            (0): F

Next we have two options: either deploy floating point model, or quantize model into int8 and then deploy.

Let's first assume we want to deploy floating point model. In this case, all we need to do is to export jit trace and then apply `optimize_for_mobile` for final optimization.

In [7]:
from torch.utils.mobile_optimizer import (
    optimize_for_mobile,
)
traced_model = torch.jit.trace(model_efficient_x3d_xs_deploy, input_tensor, strict=False)
traced_model_opt = optimize_for_mobile(traced_model)
# Here we can save the traced_model_opt to JIT file using traced_model_opt.save(<file_path>)
traced_model_opt.save('/home/thisiswooyeol/PycharmProjects/pytorchvideo/tutorials/accelerator/efficient_x3d_xs_tutorial_float.pt')

Alternatively, we may also want to deploy a quantized model. Efficient blocks are quantization-friendly by design - just wrap the model in deploy form with `QuantStub/DeQuantStub` and it is ready for Pytorch eager mode quantization.

In [8]:
import torch.nn as nn
# Wrapper class for adding QuantStub/DeQuantStub.
class quant_stub_wrapper(nn.Module):
    def __init__(self, module_in):
        super().__init__()
        self.quant = torch.quantization.QuantStub()
        self.model = module_in
        self.dequant = torch.quantization.DeQuantStub()
    def forward(self, x):
        x = self.quant(x)
        x = self.model(x)
        x = self.dequant(x)
        return x

In [9]:
model_efficient_x3d_xs_deploy_quant_stub_wrapper = quant_stub_wrapper(model_efficient_x3d_xs_deploy)

Preparation step of quantization. Fusion has been done for efficient blocks automatically during `convert_to_deployable_form`, so we can just proceed to `torch.quantization.prepare`

In [10]:
model_efficient_x3d_xs_deploy_quant_stub_wrapper.qconfig = torch.quantization.default_qconfig
model_efficient_x3d_xs_deploy_quant_stub_wrapper_prepared = torch.quantization.prepare(model_efficient_x3d_xs_deploy_quant_stub_wrapper)

  reduce_range will be deprecated in a future release of PyTorch."


Calibration and quantization. After preparation we will do calibration of quantization by feeding calibration dataset (skipped here) and then do quantization.

In [11]:
# calibration is skipped here.
model_efficient_x3d_xs_deploy_quant_stub_wrapper_quantized = torch.quantization.convert(model_efficient_x3d_xs_deploy_quant_stub_wrapper_prepared)

  Returning default scale and zero point "


Then we can export trace of int8 model and deploy on mobile devices.

In [13]:
traced_model_int8 = torch.jit.trace(model_efficient_x3d_xs_deploy_quant_stub_wrapper_quantized, input_tensor, strict=False)
traced_model_int8_opt = optimize_for_mobile(traced_model_int8)
# Here we can save the traced_model_opt to JIT file using traced_model_int8_opt.save(<file_path>)
traced_model_int8_opt.save('/home/thisiswooyeol/PycharmProjects/pytorchvideo/tutorials/accelerator/efficient_x3d_xs_tutorial_int8.pt')