## A short tutorial how to use the mlperf inference reference benchmark

We wrapped all inference models into a single benchmark app. The benchmark app will read the propper dataset, preprocesses it and interface with the backend. Traffic is generated by loadgen, which depending on the desired mode drives the desired traffic to the benchmark app. 

To run this notebook, pick a directory and clone the mlperf source tree:
```
cd /tmp
git clone --recurse-submodules https://github.com/mlcommons/inference.git --depth 1
cd inference/vision/classification_and_detection
jupyter notebook 
```

In [1]:
import os
root = os.getcwd()

In [None]:
!pip install pybind11=2.2

In [3]:
!cd ../../loadgen; CFLAGS="-std=c++14" python setup.py develop; cd {root}
!python setup.py develop

running develop
running egg_info
writing mlperf_loadgen.egg-info/PKG-INFO
writing dependency_links to mlperf_loadgen.egg-info/dependency_links.txt
writing top-level names to mlperf_loadgen.egg-info/top_level.txt
reading manifest file 'mlperf_loadgen.egg-info/SOURCES.txt'
writing manifest file 'mlperf_loadgen.egg-info/SOURCES.txt'
running build_ext
building 'mlperf_loadgen' extension
gcc -pthread -B /home/pliu/opt/miniconda3/envs/mlperf/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -std=c++14 -fPIC -DMAJOR_VERSION=3 -DMINOR_VERSION=0 -I. -I../third_party/pybind/include -I/home/pliu/opt/miniconda3/envs/mlperf/include/python3.8 -c bindings/python_api.cc -o build/temp.linux-x86_64-cpython-38/bindings/python_api.o
In file included from [01m[K/home/pliu/opt/miniconda3/envs/mlperf/include/python3.8/pybind11/cast.h:16[m[K,
                 from [01m[K/home/pliu/opt/miniconda3/envs/mlperf/include/python3.8/pybind11/attr.h:13[m[K,
      

The benchmark app uses a shell script to simplify command line options and the user can pick backend, model and device:

In [4]:
!./run_local.sh

usage: ./run_local.sh tf|onnxruntime|pytorch|tflite|tvm-onnx|tvm-pytorch [resnet50|mobilenet|ssd-mobilenet|ssd-resnet34|retinanet] [cpu|gpu]


Before running the benchmark, device on model and dataset and set the environment variable ```MODEL_DIR``` and ```DATA_DIR```. 

For this tutorial we use onnxruntime (tensorflow and pytorch will work as well), mobilenet and a fake imagetnet dataset with a few images.

In [4]:
!pip install onnxruntime pycocotools opencv-python



#### Step 1 - download the model. You find the links to the models [here](https://github.com/mlperf/inference/tree/master/v0.5/classification_and_detection#supported-models).

In [18]:
!wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx

#### Step 2 - download the dataset. For this tutorial we create a small, fake dataset that pretends to be imagenet.
Normally you'd need to download imagenet2012/valiation for image classification or coco2017/valiation for object detections.

Links and instructions how to download the datasets can be found in the [README](https://github.com/mlperf/inference/tree/master/v0.5/classification_and_detection#datasets)

In [6]:
!tools/make_fake_imagenet.sh

#### Step 3 - tell the benchmark where to find model and data 

In [9]:
import os
os.environ['MODEL_DIR'] = root
os.environ['DATA_DIR'] = os.path.join(root, "fake_imagenet")

For mlperf submission number of queries, time, latencies and percentiles are given and we default to those settings. But for this tuturial we pass in some extra options to make things go quicker.
run_local.sh will look for the evironment variable EXTRA_OPS and add this to the arguments. You can also add additional arguments in the command line.
The options below will limit the time that the benchmarks run to 10 seconds and adds accuracy reporting.

In [19]:
os.environ['EXTRA_OPS'] ="--time 10 --max-latency 0.2"

#### Step 4 - run the benchmark.

In [1]:
!echo "$MODEL_DIR"

/home/pliu/opt/inference/vision/classification_and_detection/model


In [2]:
!echo "$DATA_DIR"

/home/pliu/opt/open-mmlab/data/imagenet


In [3]:
!./run_local.sh onnxruntime mobilenet cpu --accuracy 

python3 python/main.py --profile mobilenet-onnxruntime --mlperf_conf ../../mlperf.conf --model "/home/pliu/opt/inference/vision/classification_and_detection/model/mobilenet_v1_1.0_224.onnx" --dataset-path /home/pliu/opt/open-mmlab/data/imagenet --output "/home/pliu/opt/inference/vision/classification_and_detection/output/onnxruntime-cpu/mobilenet" --accuracy
INFO:main:Namespace(accuracy=True, audit_conf='audit.config', backend='onnxruntime', cache=0, cache_dir=None, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=None, max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='/home/pliu/opt/inference/vision/classification_and_detection/model/mobilenet_v1_1.0_224.onnx', model_name='mobilenet', output='/home/pliu/opt/inference/vision/classification_and_detection/output/onnxruntime-cpu/mobilenet', outputs=['MobilenetV1/Predictions/Reshap

The line ```Accuracy``` reports accuracy or mAP together with some latencies in various percentiles so you can insight how this run was. Above accuracy was 87.5%.

The line ```TestScenario.SingleStream-1.0``` reports the latency and qps seen during the benchmark.

For submission the official logging is found in [mlperf_log_summary.txt](mlperf_log_summary.txt) and [mlperf_log_detail.txt](mlperf_log_detail.txt).

If you read over the mlperf inference rules guide you'll find multiple scenarios to be run for the inference benchmarks:

|scenario|description|
|:---|:---|
|SingleStream|The LoadGen sends the next query as soon as the SUT completes the previous one.|
|MultiStream|The LoadGen sends the next query as soon as the SUT completes the previous one. Each query contains multiple samples.|
|Server|The LoadGen sends new queries to the SUT according to a Poisson distribution. Overtime queries must not exceed 2x the latency bound.|
|Offline|The LoadGen sends all queries to the SUT at one time.|

We can run those scenario using the ```--scenario``` option in the command line, for example:

In [6]:
!./run_local.sh onnxruntime mobilenet cpu --scenario Offline --accuracy 

python3 python/main.py --profile mobilenet-onnxruntime --mlperf_conf ../../mlperf.conf --model "/home/pliu/opt/inference/vision/classification_and_detection/model/mobilenet_v1_1.0_224.onnx" --dataset-path /home/pliu/opt/open-mmlab/data/imagenet --output "/home/pliu/opt/inference/vision/classification_and_detection/output/onnxruntime-cpu/mobilenet" --scenario Offline --accuracy
INFO:main:Namespace(accuracy=True, audit_conf='audit.config', backend='onnxruntime', cache=0, cache_dir=None, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=None, max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='/home/pliu/opt/inference/vision/classification_and_detection/model/mobilenet_v1_1.0_224.onnx', model_name='mobilenet', output='/home/pliu/opt/inference/vision/classification_and_detection/output/onnxruntime-cpu/mobilenet', outputs=['MobilenetV1

### Additional logfiles

We log some additional information [here](output/mobilenet-onnxruntime-cpu/results.json) which can be used to plot graphs.

### Under the hood

In case you wonder what the run_local.sh does, it only assembles the command line for the python based benchmark app. Command ine options for the app are documented [here](https://github.com/mlperf/inference/blob/master/cloud/image_classification)

Calling
```
!bash -x ./run_local.sh onnxruntime mobilenet cpu  --accuracy 
```
will results in the following command line:
```
python python/main.py --profile mobilenet-onnxruntime --model /tmp/inference/cloud/image_classification/mobilenet_v1_1.0_224.onnx --dataset-path /tmp/inference/cloud/image_classification/fake_imagenet --output /tmp/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json --queries-offline 20 --time 10 --max-latency 0.2 --accuracy
```
During testing you can change some of the options to have faster test cycles but for final submission use the defaults.

### Using docker

Instead of run_local.sh you can use run_and_time.sh which does have the same options but instead of running local will run the benchmark under docker.

In [13]:
!./run_and_time.sh onnxruntime mobilenet cpu 

Sending build context to Docker daemon  18.54MB
Step 1/15 : FROM ubuntu:16.04
 ---> bd3d4369aebc
Step 2/15 : ENV PYTHON_VERSION=3.7
 ---> Using cache
 ---> e25f214201a2
Step 3/15 : ENV LANG C.UTF-8
 ---> Using cache
 ---> 12986ee696e1
Step 4/15 : ENV LC_ALL C.UTF-8
 ---> Using cache
 ---> 1460535b24e1
Step 5/15 : ENV PATH /opt/anaconda3/bin:$PATH
 ---> Using cache
 ---> f4c922578fdf
Step 6/15 : WORKDIR /root
 ---> Using cache
 ---> fb0ec9a436a5
Step 7/15 : ENV HOME /root
 ---> Using cache
 ---> edeb7c15ebfb
Step 8/15 : RUN apt-get update
 ---> Using cache
 ---> 42da1a4fa3fd
Step 9/15 : RUN apt-get install -y --no-install-recommends       git       build-essential       software-properties-common       ca-certificates       wget       curl       htop       zip       unzip
 ---> Using cache
 ---> a1de66a3c7bd
Step 10/15 : RUN cd /opt &&     wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh -O miniconda.sh &&     /bin/bash ./miniconda.sh -b -p /opt/anacond

### Preparing for offical submision

TODO.

In [13]:
!python --version

Python 3.8.16


In [15]:
!python python/main.py \
    --profile mobilenet-onnxruntime \
    --model /home/pliu/opt/inference/vision/classification_and_detection/model/mobilenet_v1_1.0_224.onnx \
    --dataset-path /home/pliu/opt/open-mmlab/data/imagenet \
    --output /tmp/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json \
    --scenario Offline --accuracy

INFO:main:Namespace(accuracy=True, audit_conf='audit.config', backend='onnxruntime', cache=0, cache_dir=None, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=None, max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='/home/pliu/opt/inference/vision/classification_and_detection/model/mobilenet_v1_1.0_224.onnx', model_name='mobilenet', output='/tmp/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json', outputs=['MobilenetV1/Predictions/Reshape_1:0'], performance_sample_count=None, preprocessed_dir=None, profile='mobilenet-onnxruntime', qps=None, samples_per_query=8, scenario='Offline', threads=144, time=None, use_preprocessed_dataset=False, user_conf='user.conf')
INFO:imagenet:Preprocessing 50000 images using 144 threads
INFO:imagenet:loaded 50000 images, cache=0, already_preprocessed=False, took=1.4se

In [17]:
!python python/main.py -h

usage: main.py [-h]
               [--dataset {imagenet,imagenet_mobilenet,imagenet_pytorch,coco-300,coco-300-pt,openimages-300-retinanet,openimages-800-retinanet,openimages-1200-retinanet,openimages-800-retinanet-onnx,coco-1200,coco-1200-onnx,coco-1200-pt,coco-1200-tf}]
               --dataset-path DATASET_PATH [--dataset-list DATASET_LIST]
               [--data-format {NCHW,NHWC}]
               [--profile {defaults,resnet50-tf,resnet50-pytorch,resnet50-onnxruntime,mobilenet-tf,mobilenet-onnxruntime,ssd-mobilenet-tf,ssd-mobilenet-pytorch,ssd-mobilenet-onnxruntime,ssd-resnet34-tf,ssd-resnet34-pytorch,ssd-resnet34-onnxruntime,ssd-resnet34-onnxruntime-tf,retinanet-pytorch,retinanet-onnxruntime}]
               [--scenario SCENARIO] [--max-batchsize MAX_BATCHSIZE] --model
               MODEL [--output OUTPUT] [--inputs INPUTS] [--outputs OUTPUTS]
               [--backend BACKEND] [--model-name MODEL_NAME]
               [--threads THREADS] [--qps QPS] [--cache CACHE]
               [

In [72]:
!python python/image_demo.py \
/home/pliu/opt/open-mmlab/data/imagenet/val/ILSVRC2012_val_00000001.JPEG \
/home/pliu/opt/open-mmlab/mmclassification/configs/resnet/resnet50_8xb32_in1k.py \
/home/pliu/opt/open-mmlab/mmclassification/models/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth

load checkpoint from local path: /home/pliu/opt/open-mmlab/mmclassification/models/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
/home/pliu/opt/open-mmlab/data/imagenet/val/ILSVRC2012_val_00000001.JPEG
65
{
    "pred_label": 65,
    "pred_score": 0.6649361252784729,
    "pred_class": "sea snake"
}


In [None]:
!python python/main.py \
    --profile resnet50-pytorchlocal \
    --inputs /home/pliu/opt/open-mmlab/mmclassification/configs/resnet/resnet50_8xb32_in1k.py \
    --model /home/pliu/opt/open-mmlab/mmclassification/models/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth \
    --dataset-path /home/pliu/opt/open-mmlab/data/imagenet \
    --use_preprocessed_data \
    --preprocessed_dir /home/pliu/opt/open-mmlab/data/imagenet \
    --output /home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchlocal-cpu-server-baseline \
    --scenario Server --count 50000 --qps 200

INFO:main:Namespace(accuracy=False, audit_conf='audit.config', backend='pytorch-local', cache=0, cache_dir=None, count=50000, data_format=None, dataset='imagenet_img', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=['/home/pliu/opt/open-mmlab/mmclassification/configs/resnet/resnet50_8xb32_in1k.py'], max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='/home/pliu/opt/open-mmlab/mmclassification/models/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth', model_name='resnet50', output='/home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchlocal-cpu-server-baseline', outputs=None, performance_sample_count=None, preprocessed_dir='/home/pliu/opt/open-mmlab/data/imagenet', profile='resnet50-pytorchlocal', qps=200, samples_per_query=8, scenario='Server', threads=72, time=None, use_preprocessed_dataset=True, user_conf='user.conf')
load checkpoint from local path: /home

In [2]:
!python python/main.py \
    --profile resnet50-pytorchserver \
    --inputs /home/pliu/opt/open-mmlab/mmclassification/configs/resnet/resnet50_8xb32_in1k.py \
    --model resnet50_8xb32_in1k \
    --dataset-path /home/pliu/opt/open-mmlab/data/imagenet \
    --use_preprocessed_data \
    --preprocessed_dir /home/pliu/opt/open-mmlab/data/imagenet \
    --output /home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline \
    --scenario Server --accuracy --count 1000 --qps 1

INFO:main:Namespace(accuracy=True, audit_conf='audit.config', backend='pytorch-server', cache=0, cache_dir=None, count=1000, data_format=None, dataset='imagenet_img', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=['/home/pliu/opt/open-mmlab/mmclassification/configs/resnet/resnet50_8xb32_in1k.py'], max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='resnet50_8xb32_in1k', model_name='resnet50', output='/home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline', outputs=None, performance_sample_count=None, preprocessed_dir='/home/pliu/opt/open-mmlab/data/imagenet', profile='resnet50-pytorchserver', qps=1, samples_per_query=8, scenario='Server', threads=72, time=None, use_preprocessed_dataset=True, user_conf='user.conf')
ERROR:main:thread: failed on contentid=['/home/pliu/opt/open-mmlab/data/imagenet/val/ILSVRC2012_val_00000835.JPEG'], '

In [10]:
!python python/main.py \
    --profile resnet50-pytorchserver \
    --inputs 127.0.0.1:8080 \
    --model resnet18_8xb32_in1k \
    --dataset-path /home/pliu/opt/open-mmlab/data/imagenet \
    --use_preprocessed_data \
    --preprocessed_dir /home/pliu/opt/open-mmlab/data/imagenet \
    --output /home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline \
    --scenario Server --accuracy --count 50

INFO:main:Namespace(accuracy=True, audit_conf='audit.config', backend='pytorch-server', cache=0, cache_dir=None, count=50, data_format=None, dataset='imagenet_img', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=['127.0.0.1:8080'], max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='resnet18_8xb32_in1k', model_name='resnet50', output='/home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline', outputs=None, performance_sample_count=None, preprocessed_dir='/home/pliu/opt/open-mmlab/data/imagenet', profile='resnet50-pytorchserver', qps=None, samples_per_query=8, scenario='Server', threads=72, time=None, use_preprocessed_dataset=True, user_conf='user.conf')
['http://127.0.0.1:8080/predictions/resnet18_8xb32_in1k']
TestScenario.Server qps=0.43, mean=34.6969, time=116.877, acc=64.000%, queries=50, tiles=50.0:36.0099,80.0:51.3212,90.0:57.44

In [12]:
!python python/main.py \
    --profile resnet50-pytorchserver \
    --inputs 127.0.0.1:8080,127.0.0.1:8083 \
    --model resnet18_8xb32_in1k \
    --dataset-path /home/pliu/opt/open-mmlab/data/imagenet \
    --use_preprocessed_data \
    --preprocessed_dir /home/pliu/opt/open-mmlab/data/imagenet \
    --output /home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline \
    --scenario Server --accuracy --count 50

INFO:main:Namespace(accuracy=True, audit_conf='audit.config', backend='pytorch-server', cache=0, cache_dir=None, count=50, data_format=None, dataset='imagenet_img', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=['127.0.0.1:8080', '127.0.0.1:8083'], max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='resnet18_8xb32_in1k', model_name='resnet50', output='/home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline', outputs=None, performance_sample_count=None, preprocessed_dir='/home/pliu/opt/open-mmlab/data/imagenet', profile='resnet50-pytorchserver', qps=None, samples_per_query=8, scenario='Server', threads=72, time=None, use_preprocessed_dataset=True, user_conf='user.conf')
['http://127.0.0.1:8080/predictions/resnet18_8xb32_in1k', 'http://127.0.0.1:8083/predictions/resnet18_8xb32_in1k']
http://127.0.0.1:8083/predictions/resnet18_8xb32_i

In [23]:
!python python/main.py \
    --profile resnet50-pytorchserver \
    --inputs 127.0.0.1:8083 \
    --model resnet18_8xb32_in1k \
    --dataset-path /home/pliu/opt/open-mmlab/data/imagenet \
    --use_preprocessed_data \
    --preprocessed_dir /home/pliu/opt/open-mmlab/data/imagenet \
    --output /home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline \
    --scenario Server --accuracy --count 10 --qps 1

INFO:main:Namespace(accuracy=True, audit_conf='audit.config', backend='pytorch-server', cache=0, cache_dir=None, count=10, data_format=None, dataset='imagenet_img', dataset_list=None, dataset_path='/home/pliu/opt/open-mmlab/data/imagenet', debug=False, find_peak_performance=False, inputs=['127.0.0.1:8083'], max_batchsize=32, max_latency=None, mlperf_conf='../../mlperf.conf', model='mobilenet_v2', model_name='resnet50', output='/home/pliu/opt/inference/vision/classification_and_detection/output/resnet50-pytorchserver-cpu-server-baseline', outputs=None, performance_sample_count=None, preprocessed_dir='/home/pliu/opt/open-mmlab/data/imagenet', profile='resnet50-pytorchserver', qps=1, samples_per_query=8, scenario='Server', threads=72, time=None, use_preprocessed_dataset=True, user_conf='user.conf')
['http://127.0.0.1:8083/predictions/mobilenet_v2']
http://127.0.0.1:8083/predictions/mobilenet_v2 {'pred_label': 57, 'pred_score': 0.6789546012878418, 'pred_class': 'garter snake, grass snake'}