## Set up environments

- docker run --gpus '"device=0"' -it --rm -p 8887:8887 -v $(pwd):/hands_on nvcr.io/nvidia/pytorch:22.03-py3
- cd /hands_on
- jupyter notebook --ip 0.0.0.0 --port 8887

## Make Torchscript

In [1]:
import torch
from torch import nn
from torchvision import models


model = models.wide_resnet101_2(pretrained=True).eval().cuda()
script_model = torch.jit.script(model)
script_model.save('model.pt')

  from .autonotebook import tqdm as notebook_tqdm


TorchScript Trace vs Script

1. torch.jit.trace
-> Provide example inputs. The tracer runs the function, recording the tensor operations performed.
 (*Warning: Control-flow and data structures are ignored)

2. torch.jit.script
-> Translate model directly to TorchScript. Control-flow is preserved


## Make ONNX file

In [2]:
input_names = ["actual_input_1"]
output_names = ["output_1"]
torch.onnx.export(model, torch.randn(1, 3, 224, 224).cuda(), 'model.onnx',
                  input_names=input_names, output_names=output_names,
                  dynamic_axes={'actual_input_1':{0:'batch_size'}, 'output_1': {0:'batch_size'}})

## ONNX Image

https://netron.app/

![title](src/onnx.jpg)

## Build TensorRT (ONNX -> TRT)

trtexec: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec-serialized-timing-cache

In [4]:
!trtexec \
  --onnx=model.onnx \
  --explicitBatch \
  --optShapes=actual_input_1:16x3x224x224 \
  --maxShapes=actual_input_1:32x3x224x224 \
  --minShapes=actual_input_1:1x3x224x224 \
  --best \
  --saveEngine=model.plan

&&&& RUNNING TensorRT.trtexec [TensorRT v8203] # trtexec --onnx=model.onnx --explicitBatch --optShapes=actual_input_1:16x3x224x224 --maxShapes=actual_input_1:32x3x224x224 --minShapes=actual_input_1:1x3x224x224 --best --saveEngine=model.plan
[04/19/2022-05:44:24] [W] --explicitBatch flag has been deprecated and has no effect!
[04/19/2022-05:44:24] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[04/19/2022-05:44:24] [I] === Model Options ===
[04/19/2022-05:44:24] [I] Format: ONNX
[04/19/2022-05:44:24] [I] Model: model.onnx
[04/19/2022-05:44:24] [I] Output:
[04/19/2022-05:44:24] [I] === Build Options ===
[04/19/2022-05:44:24] [I] Max batch: explicit batch
[04/19/2022-05:44:24] [I] Workspace: 16 MiB
[04/19/2022-05:44:24] [I] minTiming: 1
[04/19/2022-05:44:24] [I] avgTiming: 8
[04/19/2022-05:44:24] [I] Precision: FP32+FP16+INT8
[04/19/2022-05:44:24] [I] Calibration: Dynamic
[04/19/2022-05:44:24] [I] Refit: D

In [14]:
!trtexec --loadEngine=model.plan --dumpOutput

&&&& RUNNING TensorRT.trtexec [TensorRT v8203] # trtexec --loadEngine=model.plan --dumpOutput
[04/18/2022-21:49:00] [I] === Model Options ===
[04/18/2022-21:49:00] [I] Format: *
[04/18/2022-21:49:00] [I] Model: 
[04/18/2022-21:49:00] [I] Output:
[04/18/2022-21:49:00] [I] === Build Options ===
[04/18/2022-21:49:00] [I] Max batch: 1
[04/18/2022-21:49:00] [I] Workspace: 16 MiB
[04/18/2022-21:49:00] [I] minTiming: 1
[04/18/2022-21:49:00] [I] avgTiming: 8
[04/18/2022-21:49:00] [I] Precision: FP32
[04/18/2022-21:49:00] [I] Calibration: 
[04/18/2022-21:49:00] [I] Refit: Disabled
[04/18/2022-21:49:00] [I] Sparsity: Disabled
[04/18/2022-21:49:00] [I] Safe mode: Disabled
[04/18/2022-21:49:00] [I] DirectIO mode: Disabled
[04/18/2022-21:49:00] [I] Restricted mode: Disabled
[04/18/2022-21:49:00] [I] Save engine: 
[04/18/2022-21:49:00] [I] Load engine: model.plan
[04/18/2022-21:49:00] [I] Profiling verbosity: 0
[04/18/2022-21:49:00] [I] Tactic sources: Using default tactic sources
[04/18/2022-21:49:

In [15]:
!trtexec --loadEngine=model.plan --dumpProfile

&&&& RUNNING TensorRT.trtexec [TensorRT v8203] # trtexec --loadEngine=model.plan --dumpProfile
[04/18/2022-21:49:09] [I] === Model Options ===
[04/18/2022-21:49:09] [I] Format: *
[04/18/2022-21:49:09] [I] Model: 
[04/18/2022-21:49:09] [I] Output:
[04/18/2022-21:49:09] [I] === Build Options ===
[04/18/2022-21:49:09] [I] Max batch: 1
[04/18/2022-21:49:09] [I] Workspace: 16 MiB
[04/18/2022-21:49:09] [I] minTiming: 1
[04/18/2022-21:49:09] [I] avgTiming: 8
[04/18/2022-21:49:09] [I] Precision: FP32
[04/18/2022-21:49:09] [I] Calibration: 
[04/18/2022-21:49:09] [I] Refit: Disabled
[04/18/2022-21:49:09] [I] Sparsity: Disabled
[04/18/2022-21:49:09] [I] Safe mode: Disabled
[04/18/2022-21:49:09] [I] DirectIO mode: Disabled
[04/18/2022-21:49:09] [I] Restricted mode: Disabled
[04/18/2022-21:49:09] [I] Save engine: 
[04/18/2022-21:49:09] [I] Load engine: model.plan
[04/18/2022-21:49:09] [I] Profiling verbosity: 0
[04/18/2022-21:49:09] [I] Tactic sources: Using default tactic sources
[04/18/2022-21:49

## What if I got error messages during this step?

1. Try [Onnx-Simplifier](https://github.com/daquexian/onnx-simplifier)
```python3 -m onnxsim model.onnx simplified_model.onnx```

2. [Custom Plugin?](https://github.com/NVIDIA/TensorRT), [Onnx-GraphSurgeon?](https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon)

3. Use Framework integration version (TF-TRT, Torch-TRT)

## Supported Matrix

https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#supported-ops

## Build TensorRT (Torch-TRT)

- Torch-TRT
https://nvidia.github.io/Torch-TensorRT/tutorials/getting_started_with_python_api.html

- TF-TRT
https://github.com/tensorflow/tensorrt/tree/master/tftrt/examples/image_classification

In [6]:
#Optional
import torch
import torch_tensorrt as torchtrt

model = models.wide_resnet101_2(pretrained=True).eval().cuda()
#Or you can load torchscript file directly likes
#model = torch.jit.load('ts_model_path')

trt_module = torchtrt.compile(model, inputs=[torchtrt.Input(
                                min_shape=[1, 3, 224, 224],
                                opt_shape=[16, 3, 224, 224],
                                max_shape=[32, 3, 224, 224], )], enabled_precisions={torch.half})

trt_module.save('test.ts')



In [5]:
!rm -rf models
!mkdir -p models/torch_model
!mkdir -p models/torch_model/1
!mkdir -p models/onnx_model
!mkdir -p models/onnx_model/1
!mkdir -p models/trt_model
!mkdir -p models/trt_model/1

!mv model.pt models/torch_model/1
!mv model.onnx models/onnx_model/1
!mv model.plan models/trt_model/1

!cp src/onnx_config.pbtxt models/onnx_model/config.pbtxt
!cp src/torch_config.pbtxt models/torch_model/config.pbtxt
!cp src/trt_config.pbtxt models/trt_model/config.pbtxt

## Let's go to the Chapter 2 (Triton)

### Additional Topic: PTQ & QAT

Low-level Interface
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Int8/EntropyCalibrator2.html

Polygraphy
https://github.com/NVIDIA/TensorRT/tree/master/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt

Great example(Efficientdet)
https://github.com/NVIDIA/TensorRT/tree/main/samples/python/efficientdet

Torch-TRT example
https://github.com/NVIDIA/Torch-TensorRT/blob/master/tests/py/test_ptq_dataloader_calibrator.py