# Use nn-Meter for latency prediction

## Use nn_meter as a python package
After nn-Meter installation, we can import `nn-Meter` package in python by:

In [4]:
import nn_meter
print(f"nn_meter version: {nn_meter.__version__}")

project_path = "/home/jiahang/nnmeter-demo/"

nn_meter version: 1.1


When using nn-Meter, the model of predictors will be automatically downloaded to the users' local device. We currently provide four predictors corresponding to four popular platforms, including mobile CPU (`"cortexA76cpu_tflite21"`), mobile Adreno 640 GPU (`"adreno640gpu_tflite21"`), mobile Adreno 630 GPU (`"adreno640gpu_tflite21"`), and Intel VPU (`"myriadvpu_openvino2019r2"`).  The whole four existing predictors will take up about 6.33GB. The folder is set as `~/.nn_meter/data/` by default. If users want to change the target directory, they could run:

In [5]:
nn_meter.change_user_data_folder(new_folder=project_path) # path to the new folder

Users could list all supporting latency predictors by running:

In [6]:
# list all supporting latency predictors
predictors = nn_meter.list_latency_predictors()
for p in predictors:
    print(f"[Predictor] {p['name']}: version={p['version']}")

[Predictor] cortexA76cpu_tflite21: version=1.0
[Predictor] adreno640gpu_tflite21: version=1.0
[Predictor] adreno630gpu_tflite21: version=1.0
[Predictor] myriadvpu_openvino2019r2: version=1.0


nn-Meter could predict latency for model with types of Tensorflow (with format of `.pb` file), ONNX (with format of `.onnx` file), and PyTorch ( with format of `nn.Module`). We provide some example files for users to quickly run nn-Meter. The data could be downloaded from [this link](). 

The first step is to load a predictor by specifying its name.

In [7]:
predictor_name = "adreno640gpu_tflite21" # user can change text here to test other predictors

# load predictor
predictor = nn_meter.load_latency_predictor(predictor_name)



If the user is the first time to use nn-Meter, it will take a while to download and unzip the required predictor model. 

After predictor loading, users could complete latency prediction by simply calling `predictor.predict()`. To use nn-Meter for specific model type, you also need to install corresponding required packages. The well tested versions are listed below:

| Testing Model Type |                                                       Requirements                                                       |
| :----------------: | :-----------------------------------------------------------------------------------------------------------------------: |
|     Tensorflow     |                                                  `tensorflow==2.6.0`                                                  |
|       Torch       | `torch==1.9.0`, `torchvision==0.10.0`, (alternative)[`onnx==1.9.0`, `onnx-simplifier==0.3.6`] or [`nni>=2.4`][1] |
|        Onnx        |                                                      `onnx==1.9.0`                                                      |

For Tensorflow `.pb` file:

In [9]:
test_model = project_path + "testmodel/mobilenetv3small_0.pb"

# predict latency
latency = predictor.predict(model=test_model, model_type="pb") # result is in unit of ms
print(f'[RESULT] predict latency for {test_model}: {latency} ms')

[RESULT] predict latency for /home/jiahang/nnmeter-demo/testmodel/mobilenetv3small_0.pb: 4.489849402954042 ms


For ONNX `.onnx` file:

In [10]:
test_model = project_path + "testmodel/mobilenetv3small_0.onnx"

# predict latency
latency = predictor.predict(model=test_model, model_type="onnx") # result is in unit of ms
print(f'[RESULT] predict latency for {test_model}: {latency} ms')

[RESULT] predict latency for /home/jiahang/nnmeter-demo/testmodel/mobilenetv3small_0.onnx: 6.705541180860482 ms


There is a little difference for PyTorch model in nn-Meter. For PyTorch model prediction, a torch model with `nn.Module` format is needed, and the input shape has to be specified. Here we generated a simple torch model to run a demo. Users could choose one group of required dependencies from [`onnx==1.9.0`, `onnx-simplifier==0.3.6`], which we mark as "onnx_based way", or [`nni>=2.4`], which we mark as "nni_based way". "onnx_based way" is applied by default.

In [11]:
import torch.nn as nn

class VGG(nn.Module):

    def __init__(self, features, num_classes=1000):
        super(VGG, self).__init__()
        self.features = features
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

def make_layers(cfg, batch_norm=False):
    layers = []
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)

A input shape should also be specified as the model cannot inference the input shape of the model by `nn.Module`. The prediction code should be:

In [13]:
vgg11 = VGG(make_layers([64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'])) # VGG 11-layer model

# predict latency
latency = predictor.predict(vgg11, model_type="torch", input_shape=(1, 3, 224, 224)) 
print(f'[RESULT] predict latency for vgg11: {latency} ms')

[RESULT] predict latency for vgg11: 109.77864175998361 ms


 For "nni_based way", the PyTorch modules should be defined by the `nn` interface from NNI `import nni.retiarii.nn.pytorch as nn` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model) for more information), and the parameter `apply_nni` should be set as True in the function `predictor.predict()`.

In [14]:
import nni.retiarii.nn.pytorch as nn  # different from "onnx_based way"

class VGG(nn.Module):

    def __init__(self, features, num_classes=1000):
        super(VGG, self).__init__()
        self.features = features
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

def make_layers(cfg, batch_norm=False):
    layers = []
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)


vgg11 = VGG(make_layers([64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'])) # VGG 11-layer model

# predict latency
latency = predictor.predict(
    vgg11, model_type="torch", input_shape=(1, 3, 224, 224), 
    apply_nni=True # different from "onnx_based way"
    ) 
print(f'[RESULT] predict latency for vgg11: {latency} ms')

INFO:root:Start latency prediction ...


[2021-11-16 20:35:00] INFO (root/MainThread) Start latency prediction ...


INFO:root:NNI-based Torch Converter is applied for model conversion


[2021-11-16 20:35:00] INFO (root/MainThread) NNI-based Torch Converter is applied for model conversion










INFO:root:{'op': 'fc', 'name': 'fc#0', 'input_tensors': [[1, 25088]], 'cin': 25088, 'cout': 4096, 'inbounds': [], 'outbounds': ['relu#1']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'fc', 'name': 'fc#0', 'input_tensors': [[1, 25088]], 'cin': 25088, 'cout': 4096, 'inbounds': [], 'outbounds': ['relu#1']}


INFO:root:{'op': 'relu', 'name': 'relu#1', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#0'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'relu', 'name': 'relu#1', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#0'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2']}


INFO:root:{'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#2', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#1'], 'outbounds': ['fc#3']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#2', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#1'], 'outbounds': ['fc#3']}


INFO:root:{'op': 'fc', 'name': 'fc#3', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2'], 'outbounds': ['relu#4']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'fc', 'name': 'fc#3', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2'], 'outbounds': ['relu#4']}


INFO:root:{'op': 'relu', 'name': 'relu#4', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#3'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'relu', 'name': 'relu#4', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#3'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5']}


INFO:root:{'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#5', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#4'], 'outbounds': ['fc#6']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#5', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#4'], 'outbounds': ['fc#6']}


INFO:root:{'op': 'fc', 'name': 'fc#6', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 1000, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5'], 'outbounds': []}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'fc', 'name': 'fc#6', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 1000, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5'], 'outbounds': []}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#7', 'input_tensors': [[1, 224, 224, 3]], 'ks': [3, 3], 'inputh': 224, 'inputw': 224, 'cin': 3, 'cout': 64, 'inbounds': [], 'outbounds': ['maxpool#8']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#7', 'input_tensors': [[1, 224, 224, 3]], 'ks': [3, 3], 'inputh': 224, 'inputw': 224, 'cin': 3, 'cout': 64, 'inbounds': [], 'outbounds': ['maxpool#8']}


INFO:root:{'op': 'maxpool', 'name': 'maxpool#8', 'input_tensors': [[1, 224, 224, 64]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 224, 'inputw': 224, 'cin': 64, 'cout': 64, 'inbounds': ['conv-relu#7'], 'outbounds': ['conv-relu#9']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#8', 'input_tensors': [[1, 224, 224, 64]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 224, 'inputw': 224, 'cin': 64, 'cout': 64, 'inbounds': ['conv-relu#7'], 'outbounds': ['conv-relu#9']}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#9', 'input_tensors': [[1, 112, 112, 64]], 'ks': [3, 3], 'inputh': 112, 'inputw': 112, 'cin': 64, 'cout': 128, 'inbounds': ['maxpool#8'], 'outbounds': ['maxpool#10']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#9', 'input_tensors': [[1, 112, 112, 64]], 'ks': [3, 3], 'inputh': 112, 'inputw': 112, 'cin': 64, 'cout': 128, 'inbounds': ['maxpool#8'], 'outbounds': ['maxpool#10']}


INFO:root:{'op': 'maxpool', 'name': 'maxpool#10', 'input_tensors': [[1, 112, 112, 128]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 112, 'inputw': 112, 'cin': 128, 'cout': 128, 'inbounds': ['conv-relu#9'], 'outbounds': ['conv-relu#11']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#10', 'input_tensors': [[1, 112, 112, 128]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 112, 'inputw': 112, 'cin': 128, 'cout': 128, 'inbounds': ['conv-relu#9'], 'outbounds': ['conv-relu#11']}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#11', 'input_tensors': [[1, 56, 56, 128]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 128, 'cout': 256, 'inbounds': ['maxpool#10'], 'outbounds': ['conv-relu#12']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#11', 'input_tensors': [[1, 56, 56, 128]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 128, 'cout': 256, 'inbounds': ['maxpool#10'], 'outbounds': ['conv-relu#12']}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#12', 'input_tensors': [[1, 56, 56, 256]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#11'], 'outbounds': ['maxpool#13']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#12', 'input_tensors': [[1, 56, 56, 256]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#11'], 'outbounds': ['maxpool#13']}


INFO:root:{'op': 'maxpool', 'name': 'maxpool#13', 'input_tensors': [[1, 56, 56, 256]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#12'], 'outbounds': ['conv-relu#14']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#13', 'input_tensors': [[1, 56, 56, 256]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#12'], 'outbounds': ['conv-relu#14']}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#14', 'input_tensors': [[1, 28, 28, 256]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 256, 'cout': 512, 'inbounds': ['maxpool#13'], 'outbounds': ['conv-relu#15']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#14', 'input_tensors': [[1, 28, 28, 256]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 256, 'cout': 512, 'inbounds': ['maxpool#13'], 'outbounds': ['conv-relu#15']}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#15', 'input_tensors': [[1, 28, 28, 512]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#14'], 'outbounds': ['maxpool#16']}


[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#15', 'input_tensors': [[1, 28, 28, 512]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#14'], 'outbounds': ['maxpool#16']}


INFO:root:{'op': 'maxpool', 'name': 'maxpool#16', 'input_tensors': [[1, 28, 28, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#15'], 'outbounds': ['conv-relu#17']}


[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#16', 'input_tensors': [[1, 28, 28, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#15'], 'outbounds': ['conv-relu#17']}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#17', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['maxpool#16'], 'outbounds': ['conv-relu#18']}


[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#17', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['maxpool#16'], 'outbounds': ['conv-relu#18']}


INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#18', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#17'], 'outbounds': ['maxpool#19']}


[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#18', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#17'], 'outbounds': ['maxpool#19']}


INFO:root:{'op': 'maxpool', 'name': 'maxpool#19', 'input_tensors': [[1, 14, 14, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#18'], 'outbounds': []}


[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#19', 'input_tensors': [[1, 14, 14, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#18'], 'outbounds': []}


INFO:root:Predict latency: 109.77864175998363 ms


[2021-11-16 20:35:04] INFO (root/MainThread) Predict latency: 109.77864175998363 ms
[RESULT] predict latency for vgg11: 109.77864175998363 ms


## Use nn-Meter by command line

Another way to run nn-Meter is be script command line. 

After nn-Meter installation, a command `nn-meter` is added. You can predict the latency by 
```Bash
# for Tensorflow (.pb) file
nn-meter predict --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> 

# for ONNX (*.onnx) file
nn-meter predict --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>

# for torch model from torchvision model zoo (str)
nn-meter predict --predictor <hardware> [--predictor-version <version>] --torchvision <model-name> <model-name>... 
```

Here are some concrete examples:
```Bash
project_path="/home/jiahang/nnmeter-demo/testmodel"

nn-meter predict --predictor adreno640gpu_tflite21 --tensorflow $project_path

nn-meter predict --predictor adreno640gpu_tflite21 --onnx $project_path

nn-meter predict --predictor adreno640gpu_tflite21 --torchvision mobilenet_v2
```
