## A short tutorial how to use the mlperf inference reference benchmark

We wrapped all inference models into a single benchmark app. The benchmark app will read the propper dataset, preprocesses it and interface with the backend. Traffic is generated by loadgen, which depending on the desired mode drives the desired traffic to the benchmark app. 

The benchmark app uses a shell script to simplify command line options and the user can pick backend, model and device:

In [2]:
!./run_local.sh

usage: ./run_local.sh tf|onnxruntime|pytorch|tflite [resnet50|mobilenet|ssd-mobilenet|ssd-resnet34] [cpu|gpu]


Before running the benchmark, device on model and dataset and set the environment variable ```MODEL_DIR``` and ```DATA_DIR```. For this tutorial we use onnxruntime, mobilenet and a fake imagetnet dataset with a few images.

In [4]:
!pip install onnxruntime



#### Step 1 - download the model. You find the links to the models [here](https://github.com/mlperf/inference/tree/master/cloud/image_classification).

In [5]:
!wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx

#### Step 2 - download the dataset. For this tutorial we create a small, fake dataset that pretents to be imagenet.
Normally you'd need to download imagenet2012/valiation for image classification or coco2017 for object detections. Links and instructions can be found in the [README](README.md)

In [6]:
!tools/make_fake_imagenet.sh

/bin/sh: 1: tools/make_fake_imagetnet.sh: not found


#### Step 3 - tell the benchmark where to find model and data 

In [31]:
import os
os.environ['MODEL_DIR'] = os.getcwd()
os.environ['DATA_DIR'] = "fake_imagenet"

In [39]:
os.environ['EXTRA_OPS'] ="--queries-offline 20 --time 10 --max-latency 0.2"

#### Step 3 - run the benchmark.
We add the ```--time 10``` option to limit the time to 10 seconds in this tutorial. For submission you must keep default options. ```--accuracy``` is required for mlperf submission to validate correctnes of inference results. For testing it is not needed.

In [40]:
!bash -x ./run_local.sh onnxruntime mobilenet cpu  --accuracy 

+ source ./run_common.sh
++ '[' 4 -lt 1 ']'
++ '[' xfake_imagenet == x ']'
++ '[' x/home/gs/inference/cloud/image_classification == x ']'
++ backend=tf
++ model=resnet50
++ device=cpu
++ for i in '$*'
++ case $i in
++ backend=onnxruntime
++ shift
++ for i in '$*'
++ case $i in
++ model=mobilenet
++ shift
++ for i in '$*'
++ case $i in
++ device=cpu
++ shift
++ for i in '$*'
++ case $i in
++ '[' cpu == cpu ']'
++ export CUDA_VISIBLE_DEVICES=
++ CUDA_VISIBLE_DEVICES=
++ name=mobilenet-onnxruntime
++ extra_args=
++ '[' mobilenet-onnxruntime == resnet50-tf ']'
++ '[' mobilenet-onnxruntime == mobilenet-tf ']'
++ '[' mobilenet-onnxruntime == ssd-mobilenet-tf ']'
++ '[' mobilenet-onnxruntime == ssd-resnet34-tf ']'
++ '[' mobilenet-onnxruntime == resnet50-onnxruntime ']'
++ '[' mobilenet-onnxruntime == mobilenet-onnxruntime ']'
++ model_path=/home/gs/inference/cloud/image_classification/mobilenet_v1_1.0_224.onnx
++ profile=mobilenet-onnxruntime
++ '[' mobilenet-onnxruntime == ssd-mobilenet-onn

In [13]:
!./run_local.sh onnxruntime mobilenet cpu --accuracy --scenario SingleStream,MultiStream,Server,Offline 

INFO:main:Namespace(accuracy=True, backend='onnxruntime', cache=0, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='fake_imagenet', inputs=None, max_batchsize=128, max_latency=[0.01, 0.05, 0.1], model='/home/gs/resnet_for_mlperf/mobilenet_v1_1.0_224.onnx', output='/home/gs/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json', outputs=['MobilenetV1/Predictions/Reshape_1:0'], profile='mobilenet-onnxruntime', qps=10, queries_multi=24576, queries_offline=24576, queries_single=1024, scenario=[TestScenario.SingleStream, TestScenario.MultiStream, TestScenario.Server, TestScenario.Offline], threads=2, time=10)
INFO:imagenet:loaded 8 images, cache=0, took=0.0sec
INFO:main:starting accuracy pass on 8 items
Accuracy qps=16.48, mean=0.059025, time=0.49, acc=87.50, queries=8, tiles=50.0:0.0558,80.0:0.0618,90.0:0.0672,95.0:0.0700,99.0:0.0722,99.9:0.0728
INFO:main:starting TestScenario.SingleStream, latency=1.0
TestScenario.Si