# Run a Tool from Arachne CLI

In this notebook, we explain how to use Arachne CLI (i.e., `arachne.driver.cli`) for running a tool in Arachne.
Here, we will be working with a tool of TVM to compile ResNet-50 v2 from the Tensorflow Keras Applications. 
TVM is a deep learning compiler with supporting various DNN models as its input.
ResNet-50 v2 is one of the famous convolutional neural networks to classify images.

## Prepare a Model

First, we prepare an input model by using a Tensorflow Keras API.

In [1]:
import tensorflow as tf

model = tf.keras.applications.resnet_v2.ResNet50V2()
model.summary()
model.save("/tmp/resnet50-v2.h5")

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50v2_weights_tf_dim_ordering_tf_kernels.h5
Model: "resnet50v2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1_conv (Conv2D)             (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
pool1_pad (ZeroPadding2D)       (None, 114, 114, 



## Apply a Tool to the Input Model

Next, let's try to execute the TVM taking the prepared model as it's input from Arachne CLI.
Typically, you can start with the following command.

In [2]:
%%bash

python -m arachne.driver.cli +tools=tvm input=/tmp/resnet50-v2.h5 output=/tmp/output.tar



2022-03-18 01:39:19.116422: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-18 01:39:24.365843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 373 MB memory:  -> device: 0, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2022-03-18 01:39:24.368968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30554 MB memory:  -> device: 1, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
2022-03-18 01:39:24.371667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /jo

CalledProcessError: Command 'b'\npython -m arachne.driver.cli +tools=tvm input=/tmp/resnet50-v2.h5 output=/tmp/output.tar\n'' returned non-zero exit status 1.

### Deals with the Dynamic Shape
Now you can see there is a something worng because the TVM cannot deal with the negative shape value (, or the dynamic shape).
TVM requires to specify the static shape for the networks that have dynamic shapes.
To address this problem, we provide an option (i.e., `input_spec`) to specify the tensor specification of the input model.
Users can pass a path to the YAML file that describes such information.
For example, the file for this case looks like below.

In [None]:
# /tmp/resnet50-v2.yaml
inputs:
  - dtype: float32
    name: input
    shape:
    - 1
    - 224
    - 224
    - 3
outputs:
  - dtype: float32
    name: Identity
    shape:
    - 1
    - 1000

Finally, you can compile it.

In [3]:

%%bash

python -m arachne.driver.cli +tools=tvm input=/tmp/resnet50-v2.h5 output=/tmp/output.tar input_spec=/tmp/resnet50-v2.yaml



2022-03-18 01:39:59.099641: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-18 01:40:04.527953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 373 MB memory:  -> device: 0, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2022-03-18 01:40:04.531002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30554 MB memory:  -> device: 1, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
2022-03-18 01:40:04.533601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /jo

### Try Tool-Specific Configurations

You can configure the tool behavior by passing specific values to options.
To understand what options are available, you just add `--help` to the previous command.

In [6]:

%%bash

python -m arachne.driver.cli +tools=tvm input=/tmp/resnet50-v2.h5 output=/tmp/output.tar input_spec=/tmp/resnet50-v2.yaml --help

cli is powered by Hydra.

== Configuration groups ==
Compose your configuration from those groups (group=option)

tools: onnx_simplifier, openvino2tf, openvino_mo, tflite_converter, tftrt, torch2onnx, torch2trt, tvm
tvm_target: dgx-1, dgx-s, jetson-nano, jetson-xavier-nx, rasp4b64


== Config ==
Override anything in the config (foo.bar=value)

input: /tmp/resnet50-v2.h5
input_spec: /tmp/resnet50-v2.yaml
output: /tmp/output.tar
tools:
  tvm:
    cpu_target: x86-64
    cpu_attr: []
    cpu_name: null
    cuda_target_device: cuda
    composite_target:
    - cpu
    target: null
    target_host: null
    desired_layout: null
    disabled_pass: null
    opt_level: 3
    export_format: tar
    cross_compiler: null
    cross_compiler_options: null


Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help




Here, we only explain a simple usage to compile for TensorRT and CUDA targets for space problem. Please refer to the API documentation for `arachne.tools` to know details.
To compile for TensorRT and CUDA targets, you should set `tools.tvm.***` options appropriately like below:

In [7]:

%%bash

python -m arachne.driver.cli +tools=tvm input=/tmp/resnet50-v2.h5 output=/tmp/output.tar input_spec=/tmp/resnet50-v2.yaml \
    tools.tvm.composite_target=[tensorrt,cuda]

call_node: 
free_var %input_2: Tensor[(1, 224, 224, 3), float32];
%0 = (%input_2,);
%1 = call_lowered(@tvmgen_default_fused_nn_pad, %0, metadata={"relay_attrs"={__dict__={"Primitive"=1, "hash"="95b394356d414c2f"}}, "all_prim_fn_vars"=['tvmgen_default_fused_nn_pad']}) /* ty=Tensor[(1, 230, 230, 3), float32] */;
%2 = @tvmgen_default_tensorrt_main_0(%1) /* ty=Tensor[(1, 64, 112, 112), float32] */;
%3 = (%2,);
%4 = call_lowered(@tvmgen_default_fused_nn_pad_1, %3, metadata={"relay_attrs"={__dict__={"Primitive"=1, "hash"="fa69ab88c1d556e6"}}, "all_prim_fn_vars"=['tvmgen_default_fused_nn_pad_1']}) /* ty=Tensor[(1, 64, 114, 114), float32] */;
%5 = @tvmgen_default_tensorrt_main_3(%4) /* ty=(Tensor[(1, 256, 56, 56), float32], Tensor[(1, 64, 56, 56), float32]) */;
%6 = (%5,);
%7 = %5.0;
%8 = call_lowered(@tvmgen_default_fused_nn_pad_2, %6, metadata={"relay_attrs"={__dict__={"Primitive"=1, "hash"="5e2d9995767441e8"}}, "all_prim_fn_vars"=['tvmgen_default_fused_nn_pad_2']}) /* ty=Tensor[(1, 64, 58, 

2022-03-15 08:13:55.786154: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-15 08:14:00.910580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 373 MB memory:  -> device: 0, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2022-03-15 08:14:00.913654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30554 MB memory:  -> device: 1, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
2022-03-15 08:14:00.916149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /jo

### Pre-defined Configs for TVM Target

To ease setup the TVM target, we provide pre-defined configurations for some devices.
For example, you can pass `+tvm_target=dgx-1` for Nvidia DGX-1 instead of specifying multiple options.

In [8]:

%%bash

python -m arachne.driver.cli +tools=tvm input=/tmp/resnet50-v2.h5 output=/tmp/output.tar input_spec=/tmp/resnet50-v2.yaml \
    +tvm_target=dgx-1

call_node: 
free_var %input_2: Tensor[(1, 224, 224, 3), float32];
%0 = (%input_2,);
%1 = call_lowered(@tvmgen_default_fused_nn_pad, %0, metadata={"relay_attrs"={__dict__={"Primitive"=1, "hash"="95b394356d414c2f"}}, "all_prim_fn_vars"=['tvmgen_default_fused_nn_pad']}) /* ty=Tensor[(1, 230, 230, 3), float32] */;
%2 = @tvmgen_default_tensorrt_main_0(%1) /* ty=Tensor[(1, 64, 112, 112), float32] */;
%3 = (%2,);
%4 = call_lowered(@tvmgen_default_fused_nn_pad_1, %3, metadata={"relay_attrs"={__dict__={"Primitive"=1, "hash"="fa69ab88c1d556e6"}}, "all_prim_fn_vars"=['tvmgen_default_fused_nn_pad_1']}) /* ty=Tensor[(1, 64, 114, 114), float32] */;
%5 = @tvmgen_default_tensorrt_main_3(%4) /* ty=(Tensor[(1, 256, 56, 56), float32], Tensor[(1, 64, 56, 56), float32]) */;
%6 = (%5,);
%7 = %5.0;
%8 = call_lowered(@tvmgen_default_fused_nn_pad_2, %6, metadata={"relay_attrs"={__dict__={"Primitive"=1, "hash"="5e2d9995767441e8"}}, "all_prim_fn_vars"=['tvmgen_default_fused_nn_pad_2']}) /* ty=Tensor[(1, 64, 58, 

2022-03-15 08:19:08.041347: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-15 08:19:13.459450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 373 MB memory:  -> device: 0, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2022-03-15 08:19:13.462763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30554 MB memory:  -> device: 1, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
2022-03-15 08:19:13.465341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /jo

## Check Output TAR File

All of the Arachne Tool outputs a TAR file.
The file contains a converted or compiled DNN model and a YAML file that describes the runtime dependency and the tensor information of the model.

```
  output.tar
  ├── env.yaml
  └── tvm_package_0.tar
```

```
# env.yaml
dependencies:
- pip:
- tvm: 0.8.0
model_spec:
inputs:
- dtype: float32
    name: input0
    shape:
    - 1
    - 3
    - 224
    - 224
outputs:
- dtype: float32
    name: output0
    shape:
    - 1
    - 1000
tvm_device: cpu
```