## A short tutorial how to use the mlperf inference reference benchmark

We wrapped all inference models into a single benchmark app. The benchmark app will read the propper dataset, preprocesses it and interface with the backend. Traffic is generated by loadgen, which depending on the desired mode drives the desired traffic to the benchmark app. 

The benchmark app uses a shell script to simplify command line options and the user can pick backend, model and device:

In [1]:
!./run_local.sh

usage: ./run_local.sh tf|onnxruntime|pytorch|tflite [resnet50|mobilenet|ssd-mobilenet|ssd-resnet34] [cpu|gpu]


Before running the benchmark, device on model and dataset and set the environment variable ```MODEL_DIR``` and ```DATA_DIR```. 

For this tutorial we use onnxruntime (tensorflow and pytorch will work as well), mobilenet and a fake imagetnet dataset with a few images.

In [2]:
!pip install onnxruntime



#### Step 1 - download the model. You find the links to the models [here](https://github.com/mlperf/inference/blob/master/cloud/image_classification/#Supported%20Models).

In [3]:
!wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx

#### Step 2 - download the dataset. For this tutorial we create a small, fake dataset that pretends to be imagenet.
Normally you'd need to download imagenet2012/valiation for image classification or coco2017/valiation for object detections.

Links and instructions how to download the datasets can be found in the [README](https://github.com/mlperf/inference/blob/master/cloud/image_classification/#Datasets)

In [4]:
!tools/make_fake_imagenet.sh

#### Step 3 - tell the benchmark where to find model and data 

In [5]:
import os
os.environ['MODEL_DIR'] = os.getcwd()
os.environ['DATA_DIR'] = os.path.join(os.getcwd(), "fake_imagenet")

For mlperf submission number of queries, time, latencies and percentiles are given and we default to those settings. But for this tuturial we pass in some extra options to make things go quicker.
run_local.sh will look for the evironment variable EXTRA_OPS and add this to the arguments. You can also add additional arguments in the command line.
The options below will limit the time that the benchmarks run to 10 seconds and adds accuracy reporting.

In [6]:
os.environ['EXTRA_OPS'] ="--queries-offline 20 --time 10 --max-latency 0.2"

#### Step 4 - run the benchmark.

In [7]:
!./run_local.sh onnxruntime mobilenet cpu --accuracy 

INFO:main:Namespace(accuracy=True, backend='onnxruntime', cache=0, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/gs/inference/cloud/image_classification/fake_imagenet', inputs=None, max_batchsize=128, max_latency=[0.2], mode='Performance', model='/home/gs/inference/cloud/image_classification/mobilenet_v1_1.0_224.onnx', output='/home/gs/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json', outputs=['MobilenetV1/Predictions/Reshape_1:0'], profile='mobilenet-onnxruntime', qps=10, queries_multi=24576, queries_offline=20, queries_single=1024, scenario=[TestScenario.SingleStream], threads=8, time=10)
INFO:imagenet:loaded 8 images, cache=0, took=0.0sec
INFO:main:starting accuracy pass on 8 items
Accuracy qps=48.12, mean=0.019353, time=0.17, acc=87.50, queries=8, tiles=50.0:0.0152,80.0:0.0198,90.0:0.0278,95.0:0.0366,99.0:0.0436,99.9:0.0451
INFO:main:starting TestScenario.SingleStream, latency=1.0
TestScenario.S

The line ```Accuracy``` reports accuracy or mAP together with some latencies in various percentiles so you can insight how this run was. Above accuracy was 87.5%.

The line ```TestScenario.SingleStream-1.0``` reports the latency and qps seen during the benchmark.

For submission the official logging is found in [mlperf_log_summary.txt](mlperf_log_summary.txt) and [mlperf_log_detail.txt](mlperf_log_detail.txt).

If you read over the mlperf inference rules guide you'll find multiple scenarios to be run for the inference benchmarks:

|scenario|description|
|:---|:---|
|SingleStream|The LoadGen sends the next query as soon as the SUT completes the previous one|
|MultiStream|The LoadGen sends a new query every Latency Constraint, if the SUT has completed the prior query. Otherwise, the new query is dropped. Such an event is one overtime query.|
|Server|The LoadGen sends new queries to the SUT according to a Poisson distribution. Overtime queries must not exceed 2x the latency bound.|
|Offline|The LoadGen sends all queries to the SUT at one time.|

We can run those scenario using the ```--scenario``` option in the command line:

In [8]:
!./run_local.sh onnxruntime mobilenet cpu --scenario SingleStream,Server,Offline # ,MultiStream 

INFO:main:Namespace(accuracy=False, backend='onnxruntime', cache=0, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/gs/inference/cloud/image_classification/fake_imagenet', inputs=None, max_batchsize=128, max_latency=[0.2], mode='Performance', model='/home/gs/inference/cloud/image_classification/mobilenet_v1_1.0_224.onnx', output='/home/gs/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json', outputs=['MobilenetV1/Predictions/Reshape_1:0'], profile='mobilenet-onnxruntime', qps=10, queries_multi=24576, queries_offline=20, queries_single=1024, scenario=[TestScenario.SingleStream, TestScenario.Server, TestScenario.Offline], threads=8, time=10)
INFO:imagenet:loaded 8 images, cache=0, took=0.0sec
INFO:main:starting TestScenario.SingleStream, latency=1.0
TestScenario.SingleStream-1.0 qps=67.94, mean=0.014653, time=10.05, acc=0.00, queries=683, tiles=50.0:0.0138,80.0:0.0154,90.0:0.0173,95.0:0.0191,99.0:0.0256,99.

### Additional logfiles

We log some additional information [here](output/mobilenet-onnxruntime-cpu/results.json) which can be used to plot graphs.

### Under the hood

In case you wonder what the run_local.sh does, it only assembles the command line for the python based benchmark app. Command ine options for the app are documented [here](https://github.com/mlperf/inference/blob/master/cloud/image_classification)

Calling
```
!bash -x ./run_local.sh onnxruntime mobilenet cpu  --accuracy 
```
will results in the following command line:
```
python python/main.py --profile mobilenet-onnxruntime --model /tmp/inference/cloud/image_classification/mobilenet_v1_1.0_224.onnx --dataset-path /tmp/inference/cloud/image_classification/fake_imagenet --output /tmp/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json --queries-offline 20 --time 10 --max-latency 0.2 --accuracy
```
During testing you can change some of the options to have faster test cycles but for final submission use the defaults.

### Using docker

Instead of run_local.sh you can use run_and_time.sh which does have the same options but instead of running local will run the benchmark under docker.

In [10]:
!./run_and_time.sh onnxruntime mobilenet cpu 

Sending build context to Docker daemon  54.76MB
Step 1/16 : FROM ubuntu:16.04
 ---> bd3d4369aebc
Step 2/16 : ENV PYTHON_VERSION=3.7
 ---> Using cache
 ---> e25f214201a2
Step 3/16 : ENV LANG C.UTF-8
 ---> Using cache
 ---> 12986ee696e1
Step 4/16 : ENV LC_ALL C.UTF-8
 ---> Using cache
 ---> 1460535b24e1
Step 5/16 : ENV PATH /opt/anaconda3/bin:$PATH
 ---> Using cache
 ---> f4c922578fdf
Step 6/16 : WORKDIR /root
 ---> Using cache
 ---> fb0ec9a436a5
Step 7/16 : ENV HOME /root
 ---> Using cache
 ---> edeb7c15ebfb
Step 8/16 : RUN apt-get update
 ---> Using cache
 ---> 42da1a4fa3fd
Step 9/16 : RUN apt-get install -y --no-install-recommends       git       build-essential       software-properties-common       ca-certificates       wget       curl       htop       zip       unzip
 ---> Using cache
 ---> a1de66a3c7bd
Step 10/16 : RUN cd /opt &&     wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh -O miniconda.sh &&     /bin/bash ./miniconda.sh -b -p /opt/anacond

### Preparing for offical submision

TODO.