# Convert your trained models to ONNX format
Dlacc is able to accelerate trained deep learning models. First, you must convert your trained models (pytorch, tensorflow, mxnet, etc) to onnx format. Open Neural Network Exchange (ONNX) is an open standard format for representing machine learning models. ONNX is supported by a community of partners who have implemented it in many frameworks and tools.

## Pytorch to ONNX
We import pre-trained resnet18 model.

In [1]:
import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
model.eval()

Using cache found in /home/mac_yuan/.cache/torch/hub/pytorch_vision_v0.10.0


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

Then we need to construct an example input. contruct_dummy_input() is a useful function for this. You just need to specify a (list of) valid input(s) and also their corresponding datatype(s).

In [2]:
def contruct_dummy_input(input_shape, input_dtype):
    dummy_input = tuple(
        [
            torch.randn(*v).type(
                {
                    "int32": torch.int32,
                    "int64": torch.int64,
                    "float32": torch.float32,
                    "float64": torch.float64,
                }[input_dtype[i]]
            )
            for i, v in enumerate(input_shape)
        ]
    )
    return dummy_input

input_shape = [[10,3,224,224]]
input_dtype = ["float32"]

dummy_input = contruct_dummy_input(input_shape, input_dtype)
model.eval()
# Export the model
torch.onnx.export(
    model,  # model being run
    dummy_input,  # model input (or a tuple for multiple inputs)
    "resnet18.onnx",
    export_params=True,  # store the trained parameter weights inside the model file
    do_constant_folding=True,  # whether to execute constant folding for optimization
    verbose=True,
)

graph(%input.1 : Float(10, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu),
      %fc.weight : Float(1000, 512, strides=[512, 1], requires_grad=1, device=cpu),
      %fc.bias : Float(1000, strides=[1], requires_grad=1, device=cpu),
      %193 : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu),
      %194 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %196 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu),
      %197 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %199 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu),
      %200 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %202 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu),
      %203 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %205 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu),
      %206 : Float(64, strides=[1], requir

# Parameter configuration
Then we create a global json configuration file. The tool will run optimization process according to this json file.

In [1]:
config = {
    "job_id": "100000",
    "status": 0,
    "model_name" : "resnet",
    "model_path": "./resnet18.onnx",
    "platform_type": 0, 
    "model_type" : 2,
    "target": "llvm -mcpu cascadelake",
    "model_config":{
        "input_shape":{
            "input.1": [10,3,224,224],
        },
        "input_dtype":{
            "input.1": "float32",
        }
    },
    "tuning_config": {
        "mode": "ansor",
        "num_measure_trials": 24,
        "verbose_print": 0
    },
    "tuned_log":"",
    "need_benchmark": True
}

Those fields are : 
- job_id : id of the job, random int
- status:
    - 0: ready
    - 1: import to onnx finished
    - 2: ansor tuning finished (time cost overhead, skippable if tunned_log specified)
    - 3: compile finished
    - 4: job done
    - -1: error
- model_name: name of the model
- model_path: path to the model.
    - an absolute local path if plateform_type==LOCAL
    - a google storage bucket link if plateform_type==GOOGLESTORAGE
- platform_type: type of source platform that stores the model file and input json.
    
    ```python
    class PlateformType(enum.IntEnum):
    	LOCAL = 0
    	GOOGLESTORAGE = 1
    	AWSSTORAGE = 2
    ```
    
- model_type: type of model. Only onnx format is supported by now.

```python
class ModelType(enum.IntEnum):
    PT = 0
    TF = 1
    ONNX = 2
    KERAS = 3
```

- target: target hardware backend information
- model_config
    - input_shape: shape of each input. **The first dimension must be batch size.**
    - input_dtype: datatype of each input.
- tuning_config: tuning parameter configuration
    - mode: string value. ansor or autotvm. Only ansor for now.
    - num_measure_trials: an int value. More trials, better performance, more time costs.
        - when testing, 10 for a quick execution
        - when in production, 20000 for best performance.
    - verbose_print: if enbale verbose print
- tuned_log: dev only. Tuning will not be executed if a tuned log is passed.
- error_info: dev only. Exception information raised during execution.
- need_benchmark: bool value. Whether need comparison with the original model.

## Run optimization

In [3]:
from optimum import Optimum
import onnx
model = onnx.load("./resnet18.onnx")
output =[node.name for node in model.graph.output]

input_all = [node.name for node in model.graph.input]
input_initializer =  [node.name for node in model.graph.initializer]
net_feed_input = list(set(input_all)  - set(input_initializer))

print('Inputs: ', net_feed_input)
print('Outputs: ', output)



Inputs:  ['input.1']
Outputs:  ['191']


The optimization process may produce many error massages. This is normal because the optimization engine will try some invalid schedules. You can safely ignore them if the tuning can continue, because these errors are isolated from the main process. After optimization, optimized model, statistics, logs will be saved in ./outputs folder. The optimized model will be saved in ./ouputs/optimized_model folder, containing 3 files, deploy_graph.json, deploy_lib.tar, deploy_param.params. You can reutilize those 3 files later in your own production environment.

In [4]:
optimum = Optimum("myresnet")
optimum.run(model, config)

[31-05-2022-11:40:48][AnsorEngine] Run tuning for network=myresnet
[31-05-2022-11:40:48][AnsorEngine] Extract tasks...
[31-05-2022-11:40:55][AnsorEngine] Begin tuning...


  from pandas import MultiIndex, Int64Index
Placeholder: placeholder
tensor.repl auto_unroll: 16
parallel ax0@ax1@ (None)
  for ax0 (None)
    for ax1 (None)
      for ax2 (None)
        for ax3 (None)
          for ax4 (None)
            for rv0_rv1_fused_o (None)
              vectorize rv0_rv1_fused_i (None)
                tensor.rf = ...
  for ax2 (None)
    for ax0 (None)
      for ax1 (None)
        for ax2 (None)
          for ax3 (None)
            vectorize ax4 (None)
              pad_temp = ...
    for ax3 (None)
      for ax4 (None)
        for rv0_rv1_fused_i_v (None)
          tensor.repl = ...

with: [11:41:00] ../src/te/schedule/bound.cc:175: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (found_attach || stage_attach.size() == 0) is false: Invalid Sched

[31-05-2022-11:55:19][AnsorEngine] Tuning Success, configuration file saved at outputs/tuninglog_network_name=myresnet--target=llvm -mcpu cascadelake_finished.json
[31-05-2022-11:55:19][AnsorEngine] Compile from outputs/tuninglog_network_name=myresnet--target=llvm -mcpu cascadelake_finished.json
[31-05-2022-11:55:50][AnsorEngine] Compile success.


# Load optimized model and make a single prediction

In [5]:
inputs_dict = {}
predict_model = optimum.load_model("./outputs/optimized_model/", "llvm -mcpu cascadelake")
result = predict_model.predict(inputs_dict)
print(result)

[31-05-2022-11:55:58][Optimum] Load module from ./outputs/optimized_model/
[31-05-2022-11:55:58][Optimum] Compile success.
{'output_0': array([[-30.162037  ,  35.15393   , -11.729017  , ..., -39.02967   ,
         19.354353  ,   4.572182  ],
       [ -1.2263709 ,   0.95507854,   1.9208261 , ...,  -0.93116546,
         -0.04104741,   2.6640966 ],
       [ -1.2506486 ,   0.9526835 ,   1.652109  , ...,  -0.86913544,
          0.15493488,   2.5679255 ],
       ...,
       [ -1.5307267 ,   1.0663735 ,   1.5047631 , ...,  -1.2226639 ,
         -0.07349799,   3.0583618 ],
       [ -0.5771091 ,   1.3409051 ,   1.6112486 , ...,  -0.77533424,
         -0.20042115,   2.7553709 ],
       [ -1.5438777 ,   1.453372  ,   1.6685232 , ...,  -1.2125641 ,
          0.0426856 ,   2.9649062 ]], dtype=float32)}
