# End to End MLPerf Submission example

This is following the [General MLPerf Submission Rules](https://github.com/mlcommons/policies/blob/master/submission_rules.adoc).


### Get the MLPerf Inference Benchmark Suite source code

You run this notebook from the root of the 'mlcommons/inference' repo that you cloned with
```
git clone --recurse-submodules https://github.com/mlcommons/inference.git --depth 1
```

### Build loadgen

In [None]:
# build loadgen
!pip install pybind11
!cd loadgen; CFLAGS="-std=c++14 -O3" python setup.py develop

In [None]:
!cd vision/classification_and_detection; python setup.py develop

### Set Working Directory

In [None]:
%cd vision/classification_and_detection

# Download data

For this example, the ImageNet and/or COCO validation data should already be on the host system. See the [MLPerf Image Classification task](https://github.com/mlcommons/inference/tree/master/vision/classification_and_detection#datasets) for more details on obtaining this. For the following step each validation dataset is stored in /workspace/data/. You should change this to the location in your setup.

In [None]:
%%bash

mkdir data
ln -s /workspace/data/imagenet2012 data/
ln -s /workspace/data/coco data/

### Download models

In [None]:
%%bash

mkdir models

# resnet50
wget -q https://zenodo.org/record/2535873/files/resnet50_v1.pb -O models/resnet50_v1.pb 
wget -q https://zenodo.org/record/2592612/files/resnet50_v1.onnx -O models/resnet50_v1.onnx

# ssd-mobilenet
wget -q http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz -O models/ssd_mobilenet_v1_coco_2018_01_28.tar.gz
tar zxvf ./models/ssd_mobilenet_v1_coco_2018_01_28.tar.gz -C ./models; mv models/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb ./models/ssd_mobilenet_v1_coco_2018_01_28.pb
wget -q https://zenodo.org/record/3163026/files/ssd_mobilenet_v1_coco_2018_01_28.onnx -O models/ssd_mobilenet_v1_coco_2018_01_28.onnx 

# ssd-resnet34
wget -q https://zenodo.org/record/3345892/files/tf_ssd_resnet34_22.1.zip -O models/tf_ssd_resnet34_22.1.zip
unzip ./models/tf_ssd_resnet34_22.1.zip -d ./models; mv models/tf_ssd_resnet34_22.1/resnet34_tf.22.1.pb ./models
wget -q https://zenodo.org/record/3228411/files/resnet34-ssd1200.onnx -O models/resnet34-ssd1200.onnx

### Run benchmarks using the reference implementation

Lets prepare a submission for ResNet-50 on a cloud datacenter server with a NVIDIA T4 GPU using TensorFlow. 

The following script will run those combinations and prepare a submission directory, following the general submission rules documented [here](https://github.com/mlcommons/policies/blob/master/submission_rules.adoc).

In [None]:
import logging
import os
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
os.environ['CUDA_VISIBLE_DEVICES'] = "0"

# final results go here
ORG = "mlperf-org"
DIVISION = "closed"
SUBMISSION_ROOT = "/tmp/mlperf-submission"
SUBMISSION_DIR = os.path.join(SUBMISSION_ROOT, DIVISION, ORG)
os.environ['SUBMISSION_ROOT'] = SUBMISSION_ROOT
os.environ['SUBMISSION_DIR'] = SUBMISSION_DIR
os.makedirs(SUBMISSION_DIR, exist_ok=True)
os.makedirs(os.path.join(SUBMISSION_DIR, "measurements"), exist_ok=True)
os.makedirs(os.path.join(SUBMISSION_DIR, "code"), exist_ok=True)

In [8]:
%%bash

# where to find stuff
export DATA_ROOT=`pwd`/data
export MODEL_DIR=`pwd`/models

# options for official runs
gopt="--max-batchsize 8 --samples-per-query 40 --threads 2 --qps 145"


function one_run {
    # args: mode count framework device model ...
    scenario=$1; shift
    count=$1; shift
    framework=$1
    device=$2
    model=$3
    system_id=$framework-$device
    echo "====== $model/$scenario ====="

    case $model in 
    resnet50) 
        cmd="tools/accuracy-imagenet.py --imagenet-val-file $DATA_ROOT/imagenet2012/val_map.txt"
        offical_name="resnet";;
    ssd-mobilenet) 
        cmd="tools/accuracy-coco.py --coco-dir $DATA_ROOT/coco"
        offical_name="ssd-small";;
    ssd-resnet34) 
        cmd="tools/accuracy-coco.py --coco-dir $DATA_ROOT/coco"
        offical_name="ssd-large";;
    esac
    output_dir=$SUBMISSION_DIR/results/$system_id/$offical_name
    
    # accuracy run
    ./run_local.sh $@ --scenario $scenario --accuracy --output $output_dir/$scenario/accuracy
    python $cmd --mlperf-accuracy-file $output_dir/$scenario/accuracy/mlperf_log_accuracy.json \
            >  $output_dir/$scenario/accuracy/accuracy.txt
    cat $output_dir/$scenario/accuracy/accuracy.txt

    # performance run
    cnt=0
    while [ $cnt -lt $count ]; do
        let cnt=cnt+1
        ./run_local.sh $@ --scenario $scenario --output $output_dir/$scenario/performance/run_$cnt
    done
    
    # setup the measurements directory
    mdir=$SUBMISSION_DIR/measurements/$system_id/$offical_name/$scenario
    mkdir -p $mdir
    cp ../../mlperf.conf $mdir

    # reference app uses command line instead of user.conf
    echo "# empty" > $mdir/user.conf
    touch $mdir/README.md
    impid="reference"
    cat > $mdir/$system_id"_"$impid"_"$scenario".json" <<EOF
    {
        "input_data_types": "fp32",
        "retraining": "none",
        "starting_weights_filename": "https://zenodo.org/record/2535873/files/resnet50_v1.pb",
        "weight_data_types": "fp32",
        "weight_transformations": "none"
    }
EOF
}

function one_model {
    # args: framework device model ...
    one_run SingleStream 1 $@ --max-latency 0.0005
    one_run Server 1 $@
    one_run Offline 1 $@ --qps 1000
    one_run MultiStream 1 $@
}


# run image classifier benchmarks 
export DATA_DIR=$DATA_ROOT/imagenet2012
one_model tf gpu resnet50 $gopt

TestScenario.SingleStream qps=7322.28, mean=0.0078, time=6.828, acc=76.456%, queries=50000, tiles=50.0:0.0077,80.0:0.0078,90.0:0.0078,95.0:0.0079,99.0:0.0131,99.9:0.0135
accuracy=76.456%, good=38228, total=50000
TestScenario.SingleStream qps=125.88, mean=0.0079, time=600.138, queries=75546, tiles=50.0:0.0079,80.0:0.0080,90.0:0.0080,95.0:0.0081,99.0:0.0081,99.9:0.0082
TestScenario.Server qps=7528.79, mean=0.0832, time=6.641, acc=76.456%, queries=50000, tiles=50.0:0.0809,80.0:0.0922,90.0:0.0932,95.0:0.0941,99.0:0.0963,99.9:0.1022
accuracy=76.456%, good=38228, total=50000
TestScenario.Server qps=128.84, mean=116.7138, time=2098.285, queries=270336, tiles=50.0:115.9511,80.0:185.2868,90.0:209.0362,95.0:220.8464,99.0:230.0520,99.9:231.5965
TestScenario.Offline qps=2008.52, mean=0.3050, time=3.112, acc=76.456%, queries=6250, tiles=50.0:0.3017,80.0:0.3416,90.0:0.3465,95.0:0.3525,99.0:0.3646,99.9:1.2464
accuracy=76.456%, good=38228, total=50000
TestScenario.Offline qps=285.33, mean=1157.2775, t

INFO:main:Namespace(accuracy=True, backend='tensorflow', cache=0, count=None, data_format=None, dataset='imagenet', dataset_list=None, dataset_path='/workspace/inference/vision/classification_and_detection/data/imagenet2012', find_peak_performance=False, inputs=['input_tensor:0'], max_batchsize=8, max_latency=0.0005, mlperf_conf='../../mlperf.conf', model='/workspace/inference/vision/classification_and_detection/models/resnet50_v1.pb', model_name='resnet50', output='/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/resnet/SingleStream/accuracy', outputs=['ArgMax:0'], profile='resnet50-tf', qps=145, samples_per_query=40, scenario='SingleStream', threads=2, time=None, user_conf='user.conf')
INFO:imagenet:loaded 50000 images, cache=0, took=419.7sec
INFO:main:starting TestScenario.SingleStream
INFO:main:Namespace(accuracy=False, backend='tensorflow', cache=0, count=None, data_format=None, dataset='imagenet', dataset_list=None, dataset_path='/workspace/inference/vision/classification_

There might be large trace files in the submission directory - we can delete them.

In [9]:
!find {SUBMISSION_DIR}/ -name mlperf_log_trace.json -delete

### Complete submission directory

Add the required meta data to the submission.

In [10]:
%%bash

#
# setup systems directory
#
if [ ! -d ${SUBMISSION_DIR}/systems ]; then
    mkdir ${SUBMISSION_DIR}/systems
fi

cat > ${SUBMISSION_DIR}/systems/tf-gpu.json <<EOF
{
        "division": "closed",
        "status": "available",
        "submitter": "mlperf-org",
        "system_name": "tf-gpu",
        "system_type": "datacenter",
        
        "number_of_nodes": 1,
        "host_memory_capacity": "32GB",
        "host_processor_core_count": 1,
        "host_processor_frequency": "3.50GHz",
        "host_processor_model_name": "Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz",
        "host_processors_per_node": 1,
        "host_storage_capacity": "512GB",
        "host_storage_type": "SSD",
        
        "accelerator_frequency": "-",
        "accelerator_host_interconnect": "-",
        "accelerator_interconnect": "-",
        "accelerator_interconnect_topology": "-",
        "accelerator_memory_capacity": "16GB",
        "accelerator_memory_configuration": "none",
        "accelerator_model_name": "T4",
        "accelerator_on-chip_memories": "-",
        "accelerators_per_node": 1,

        "framework": "v1.14.0-rc1-22-gaf24dc9",
        "operating_system": "ubuntu-18.04",
        "other_software_stack": "cuda-11.2",
        "sw_notes": ""
}
EOF

In [11]:
%%bash

#
# setup code directory
#
dir=${SUBMISSION_DIR}/code/resnet/reference
mkdir -p $dir
echo "git clone https://github.com/mlcommons/inference.git" > $dir/VERSION.txt
git rev-parse HEAD >> $dir/VERSION.txt

### What's in the submission directory now ?


In [12]:
!find {SUBMISSION_ROOT}/ -type f

/tmp/mlperf-submission/closed/mlperf-org/systems/tf-gpu.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/Offline/user.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/Offline/mlperf.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/Offline/README.md
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/Offline/tf-gpu_reference_Offline.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/SingleStream/user.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/SingleStream/mlperf.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/SingleStream/tf-gpu_reference_SingleStream.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/SingleStream/README.md
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/MultiStream/user.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/resnet/MultiStream/ml

If we look at some files:

In [13]:
!echo "-- SingleStream Accuracy"; head {SUBMISSION_DIR}/results/tf-gpu/resnet/SingleStream/accuracy/accuracy.txt
!echo "\n-- SingleStream Summary"; head {SUBMISSION_DIR}/results/tf-gpu/resnet/SingleStream/performance/run_1/mlperf_log_summary.txt
!echo "\n-- Server Summary"; head {SUBMISSION_DIR}/results/tf-gpu/resnet/Server/performance/run_1/mlperf_log_summary.txt

-- SingleStream Accuracy
accuracy=76.456%, good=38228, total=50000

-- SingleStream Summary
MLPerf Results Summary
SUT name : PySUT
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 8030958
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

-- Server Summary
MLPerf Results Summary
SUT name : PySUT
Scenario : Server
Mode     : PerformanceOnly
Scheduled samples per second : 144.87
Result is : INVALID
  Performance constraints satisfied : NO
  Min duration satisfied : Yes


## Run the submission checker

Finally, run the submission checker tool that does some sanity checking on your submission.
We run it at the end and attach the output to the submission.

In [None]:
!python ../../tools/submission/submission-checker.py --input {SUBMISSION_ROOT} > {SUBMISSION_DIR}/submission-checker.log 2>&1 
!cat {SUBMISSION_DIR}/submission-checker.log