This benchmark performs image classifications using the ResNet-50 network and the ImageNet dataset.
The dataset used for this benchmark is ImageNet 2012 validation set. Please manually download the dataset and unzip the images to $MLPERF_SCRATCH_PATH/data/imagenet/
. You can run bash code/resnet50/tensorrt/download_data.sh
to verify if the images are in the expected locations.
To process the input images to INT8 NCHW format, please run python3 code/resnet50/tensorrt/preprocess_data.py
. The preprocessed data will be saved to $MLPERF_SCRATCH_PATH/preprocessed_data/imagenet/ResNet50
.
The ONNX model resnet50_v1.onnx is downloaded from the zenodo link provided by the MLPerf inference repository.
This can also be downloaded by running bash code/resnet50/tensorrt/download_model.sh
.
The following TensorRT plugins are used to optimize ResNet50 benchmark:
RES2_FULL_FUSION
: fuses all the res2* layers into one CUDA kernel These plugins are available in TensorRT starting with the TensorRT 7.2 release.
To further optimize performance, with minimal impact on classification accuracy, we run the computations in INT8 precision.
Softmax layer is removed since it does not affect the predicted label.
ResNet50 INT8 is calibrated on a subset of the ImageNet validation set. The indices of this subset can be found at
data_maps/imagenet/cal_map.txt
. We use TensorRT symmetric calibration, and store the scaling factors in
code/resnet50/tensorrt/calibrator.cache
.
Run the following commands from within the container to run inference through LoadGen:
make run RUN_ARGS="--benchmarks=resnet50 --scenarios=<SCENARIO> --test_mode=PerformanceOnly"
make run RUN_ARGS="--benchmarks=resnet50 --scenarios=<SCENARIO> --test_mode=AccuracyOnly"
To run inference through Triton Inference Server and LoadGen:
make run RUN_ARGS="--benchmarks=resnet50 --scenarios=<SCENARIO> --config_ver=triton --test_mode=PerformanceOnly"
make run RUN_ARGS="--benchmarks=resnet50 --scenarios=<SCENARIO> --config_ver=triton --test_mode=AccuracyOnly"
The performance and the accuracy results will be printed to stdout, and the LoadGen logs can be found in build/logs
.
Follow these steps to run inference with new weights:
- If the new weights are in TensorFlow frozen graph format, please use resnet50-to-onnx.sh in the official MLPerf repository to convert it to ONNX format.
- Replace
build/models/ResNet50/resnet50_v1.onnx
with new ONNX model. - Run
make calibrate RUN_ARGS="--benchmarks=resnet50"
to generate a new calibration cache. - Run inference by
make run RUN_ARGS="--benchmarks=resnet50 --scenarios=<SCENARIO>"
.
Follow these steps to run inference with new validation dataset:
- Put the validation dataset under
build/data/imagenet
. - Modify
data_maps/imagenet/val_map.txt
to contain all the file names and the corresponding labels of the new validation dataset. - Preprocess data by
python3 code/resnet50/tensorrt/preprocess_data.py --val_only
. - Run inference by
make run RUN_ARGS="--benchmarks=resnet50 --scenarios=<SCENARIO>"
.
Follow these steps to generate a new calibration cache with new calibration dataset:
- Put the calibration dataset under
build/data/imagenet
. - Modify
data_maps/imagenet/cal_map.txt
to contain all the file names of the new calibration dataset. - Preprocess data by
python3 code/resnet50/tensorrt/preprocess_data.py --cal_only
. - Run
make calibrate RUN_ARGS="--benchmarks=resnet50"
to generate a new calibration cache.