diff --git a/README.md b/README.md index d369ba9e4..1c66fb66e 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,15 @@ # Model Zoo for Intel® Architecture -This repository contains **links to pre-trained models, benchmarking scripts, best practices, and step-by-step tutorials** for many popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors. +This repository contains **links to pre-trained models, sample scripts, best practices, and step-by-step tutorials** for many popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors. ## Purpose of the Model Zoo - Demonstrate the AI workloads and deep learning models Intel has optimized and validated to run on Intel hardware - Show how to efficiently execute, train, and deploy Intel-optimized models - - Make it easy to benchmark model performance on Intel hardware - Make it easy to get started running Intel-optimized models on Intel hardware in the cloud or on bare metal +***DISCLAIMER: These scripts are not intended for benchmarking Intel platforms. Please see [https://www.intel.ai/blog](https://www.intel.ai/blog) for any information on performance and/or benchmarking information on specific Intel platforms.*** + ## How to Use the Model Zoo ### Getting Started @@ -17,10 +18,10 @@ This repository contains **links to pre-trained models, benchmarking scripts, be ### Directory Structure The Model Zoo is divided into four main directories: -- **[benchmarks](/benchmarks)**: Look here for benchmarking scripts and complete instructions on downloading and benchmarking each Intel-optimized pre-trained model. +- **[benchmarks](/benchmarks)**: Look here for sample scripts and complete instructions on downloading and running each Intel-optimized pre-trained model. - **[docs](/docs)**: General best practices and detailed tutorials for a selection of models and frameworks can be found in this part of the repo. - **[models](/models)**: This directory contains optimized model code that has not yet been upstreamed to its respective official repository, such as dataset processing routines. - There are no user-friendly READMEs in this directory, but many supporting modules used for benchmarking are here. + There are no user-friendly READMEs in this directory, but many supporting modules are here. - **[tests](/tests)**: Look here for unit tests and information on how to run them. The benchmarks, models, and docs folders share a common structure. Each model (or document) is organized first by *use case* and then by *framework*. diff --git a/benchmarks/README.md b/benchmarks/README.md index d48642837..787949b75 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -1,10 +1,10 @@ -# Benchmark scripts +# Model Zoo Scripts Training and inference scripts with Intel-optimized MKL ## Prerequisites -The benchmarking scripts can be run on Linux and require the following +The model scripts can be run on Linux and require the following dependencies to be installed: * [Docker](https://docs.docker.com/install/) * [Python](https://www.python.org/downloads/) 2.7 or later @@ -13,7 +13,7 @@ dependencies to be installed: ## Use Cases -| Use Case | Framework | Model | Mode | Benchmarking Instructions | +| Use Case | Framework | Model | Mode | Instructions | | -----------------------| --------------| ------------------- | --------- |------------------------------| | Adversarial Networks | TensorFlow | [DCGAN](https://arxiv.org/pdf/1511.06434.pdf) | Inference | [FP32](adversarial_networks/tensorflow/dcgan/README.md#fp32-inference-instructions) | | Content Creation | TensorFlow | [DRAW](https://arxiv.org/pdf/1502.04623.pdf) | Inference | [FP32](content_creation/tensorflow/draw/README.md#fp32-inference-instructions) | @@ -31,9 +31,9 @@ dependencies to be installed: | Language Translation | TensorFlow | [GNMT](https://arxiv.org/pdf/1609.08144.pdf) | Inference | [FP32](language_translation/tensorflow/gnmt/README.md#fp32-inference-instructions) | | Language Translation | TensorFlow | [Transformer Language](https://arxiv.org/pdf/1706.03762.pdf)| Inference | [FP32](language_translation/tensorflow/transformer_language/README.md#fp32-inference-instructions) | | Language Translation | TensorFlow | [Transformer_LT_Official ](https://arxiv.org/pdf/1706.03762.pdf)| Inference | [FP32](language_translation/tensorflow/transformer_lt_official/README.md#fp32-inference-instructions) | -| Object Detection | TensorFlow | [R-FCN](https://arxiv.org/pdf/1605.06409.pdf) | Inference | [Int8](object_detection/tensorflow/rfcn/README.md#int8-inference-instructions) [FP32](object_detection/tensorflow/rfcn/README.md#fp32-inference-instructions) | +| Object Detection | TensorFlow | [R-FCN](https://arxiv.org/pdf/1605.06409.pdf) | Inference | [FP32](object_detection/tensorflow/rfcn/README.md#fp32-inference-instructions) | | Object Detection | TensorFlow | [Faster R-CNN](https://arxiv.org/pdf/1506.01497.pdf) | Inference | [Int8](object_detection/tensorflow/faster_rcnn/README.md#int8-inference-instructions) [FP32](object_detection/tensorflow/faster_rcnn/README.md#fp32-inference-instructions) | -| Object Detection | TensorFlow | [SSD-MobileNet](https://arxiv.org/pdf/1704.04861.pdf) | Inference | [Int8](object_detection/tensorflow/ssd-mobilenet/README.md#int8-inference-instructions) [FP32](object_detection/tensorflow/ssd-mobilenet/README.md#fp32-inference-instructions) | +| Object Detection | TensorFlow | [SSD-MobileNet](https://arxiv.org/pdf/1704.04861.pdf) | Inference | [FP32](object_detection/tensorflow/ssd-mobilenet/README.md#fp32-inference-instructions) | | Object Detection | TensorFlow | [SSD-ResNet34](https://arxiv.org/pdf/1512.02325.pdf) | Inference | [FP32](object_detection/tensorflow/ssd-resnet34/README.md#fp32-inference-instructions) | | Recommendation | TensorFlow | [NCF](https://arxiv.org/pdf/1708.05031.pdf) | Inference | [FP32](recommendation/tensorflow/ncf/README.md#fp32-inference-instructions) | | Recommendation | TensorFlow | [Wide & Deep Large Dataset](https://arxiv.org/pdf/1606.07792.pdf) | Inference | [Int8](recommendation/tensorflow/wide_deep_large_ds/README.md#int8-inference-instructions) [FP32](recommendation/tensorflow/wide_deep_large_ds/README.md#fp32-inference-instructions) | diff --git a/benchmarks/adversarial_networks/tensorflow/dcgan/README.md b/benchmarks/adversarial_networks/tensorflow/dcgan/README.md index 7852bcae3..e23fc9c6a 100644 --- a/benchmarks/adversarial_networks/tensorflow/dcgan/README.md +++ b/benchmarks/adversarial_networks/tensorflow/dcgan/README.md @@ -4,7 +4,7 @@ This document has instructions for how to run DCGAN for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference. +Script instructions for model training and inference. ## FP32 Inference Instructions @@ -35,19 +35,18 @@ repository: $ git clone https://github.com/IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks and the -an optimized version of the DCGAN model code. +This repository includes launch scripts for running an optimized version of the DCGAN model code. 5. Navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 4. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model script run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the external model directory for `--model-source-dir` (from step 1) `--data-location` (from step 2), and `--checkpoint` (from step 3). -Run benchmarking for throughput and latency with `--batch-size=100` : +Run the model script for batch and online inference with `--batch-size=100` : ``` $ cd /home//models/benchmarks @@ -66,7 +65,7 @@ $ python launch_benchmark.py \ 5. Log files are located at the value of `--output-dir`. -Below is a sample log file tail when running benchmarking for throughput: +Below is a sample log file tail when running for batch inference: ``` Batch size: 100 Batches number: 500 diff --git a/benchmarks/content_creation/tensorflow/draw/README.md b/benchmarks/content_creation/tensorflow/draw/README.md index c56c08712..f3ea0732f 100644 --- a/benchmarks/content_creation/tensorflow/draw/README.md +++ b/benchmarks/content_creation/tensorflow/draw/README.md @@ -18,7 +18,7 @@ modes/precisions: ``` The mnist directory will be passed as the dataset location when we - run the benchmarking script in step 4. + run the model script in step 4. 2. Download and extract the pretrained model: ``` @@ -27,8 +27,8 @@ modes/precisions: ``` 3. Clone this [intelai/models](https://github.com/IntelAI/models) repo, - which contains the scripts that we will be using to run benchmarking - for DRAW. After the clone has completed, navigate to the `benchmarks` + which contains the DRAW model scripts. + After the clone has completed, navigate to the `benchmarks` directory in the repository. ``` @@ -36,12 +36,12 @@ modes/precisions: $ cd models/benchmarks ``` -4. Run benchmarking for either throughput or latency using the commands +4. Run the model for either batch or online inference using the commands below. Replace in the path to the `--data-location` with your `mnist` dataset directory from step 1 and the `--checkpoint` files that you downloaded and extracted in step 2. - * Run benchmarking for latency (with `--batch-size 1`): + * Run DRAW for online inference (with `--batch-size 1`): ``` python launch_benchmark.py \ --precision fp32 \ @@ -54,7 +54,7 @@ modes/precisions: --batch-size 1 \ --socket-id 0 ``` - * Run benchmarking for throughput (with `--batch-size 100`): + * Run DRAW for batch inference (with `--batch-size 100`): ``` python launch_benchmark.py \ --precision fp32 \ @@ -70,9 +70,9 @@ modes/precisions: Note that the `--verbose` or `--output-dir` flag can be added to any of the above commands to get additional debug output or change the default output location. -5. The log files for each benchmarking run are saved at the value of `--output-dir`. +5. The log files for each run are saved at the value of `--output-dir`. - * Below is a sample log file tail when benchmarking latency: + * Below is a sample log file tail when testing online inference: ``` ... Elapsed Time 0.006622 @@ -88,7 +88,7 @@ modes/precisions: Log location outside container: {--output-dir value}/benchmark_draw_inference_fp32_20190123_012947.log ``` - * Below is a sample log file tail when benchmarking throughput: + * Below is a sample log file tail when testing batch inference: ``` Elapsed Time 0.028355 Elapsed Time 0.028221 diff --git a/benchmarks/face_detection_and_alignment/tensorflow/facenet/README.md b/benchmarks/face_detection_and_alignment/tensorflow/facenet/README.md index 7d30e25f2..0a3659d20 100644 --- a/benchmarks/face_detection_and_alignment/tensorflow/facenet/README.md +++ b/benchmarks/face_detection_and_alignment/tensorflow/facenet/README.md @@ -4,8 +4,7 @@ This document has instructions for how to run FaceNet for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference -other precisions are coming later. +Script instructions for model training and inference for other precisions are coming later. ## FP32 Inference Instructions @@ -37,18 +36,17 @@ Instructions for downloading the dataset and converting it can be found in the d 5. Navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 2. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image. Substitute in your own `--checkpoint` pretrained model file path (from step 3), and `--data-location` (from step 4). -FaceNet can be run for latency benchmarking, throughput -benchmarking, or accuracy. Use one of the following examples below, -depending on your use case. +FaceNet can be run for testing online inference, batch inference, or accuracy. +Use one of the following examples below, depending on your use case. -* For latency (using `--batch-size 1`): +* For online inference (using `--batch-size 1`): ``` python launch_benchmark.py \ @@ -63,7 +61,7 @@ python launch_benchmark.py \ --model-source-dir /home//facenet/ \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl ``` -Example log tail when benchmarking for latency: +Example log tail for online inference: ``` Batch 979 elapsed Time 0.0297989845276 Batch 989 elapsed Time 0.029657125473 @@ -85,7 +83,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_facenet_inference_fp32_20190328_205911.log ``` -* For throughput (using `--batch-size 100`): +* For batch inference (using `--batch-size 100`): ``` python launch_benchmark.py \ @@ -100,7 +98,7 @@ python launch_benchmark.py \ --model-source-dir /home//facenet/ \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl ``` -Example log tail when benchmarking for throughput: +Example log tail for batch inference: ``` Batch 219 elapsed Time 0.446497917175 Batch 229 elapsed Time 0.422048091888 @@ -134,7 +132,7 @@ python launch_benchmark.py \ --model-source-dir /home//facenet/ \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl ``` -Example log tail when benchmarking for accuracy: +Example log tail for accuracy: ``` Batch 219 elapsed Time 0.398629188538 Batch 229 elapsed Time 0.354953050613 diff --git a/benchmarks/face_detection_and_alignment/tensorflow/mtcc/README.md b/benchmarks/face_detection_and_alignment/tensorflow/mtcc/README.md index a659f397f..1963f9cbc 100644 --- a/benchmarks/face_detection_and_alignment/tensorflow/mtcc/README.md +++ b/benchmarks/face_detection_and_alignment/tensorflow/mtcc/README.md @@ -4,8 +4,7 @@ This document has instructions for how to run MTCC for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for the MTCC model training and inference -other precisions are coming later. +Instructions for MTCC model training and inference for other precisions are coming later. ## FP32 Inference Instructions @@ -33,7 +32,7 @@ other precisions are coming later. ``` 4. Clone the [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking. +This repo has the launch script for running models. ``` $ git clone https://github.com/IntelAI/models.git @@ -43,7 +42,7 @@ This repo has the launch script for running benchmarking. 5. Run the `launch_benchmark.py` script from the intelai/models repo with the appropriate parameters including: the `--model-source-dir` from step 1, `--data-location` from step 2, and the `--checkpoint` from step 3. -Run benchmarking: +Run: ``` $ cd /home//models/benchmarks @@ -61,7 +60,7 @@ Run benchmarking: 6. The log file is saved to the value of `--output-dir`. -Below is a sample log file tail when running benchmarking for throughput,latency and accuracy: +Below is a sample log file tail when running for batch inference, online inference, and accuracy: ``` time cost 0.459 pnet 0.166 rnet 0.144 onet 0.149 diff --git a/benchmarks/image_recognition/tensorflow/inception_resnet_v2/README.md b/benchmarks/image_recognition/tensorflow/inception_resnet_v2/README.md index 83577516f..3cc4fdccb 100644 --- a/benchmarks/image_recognition/tensorflow/inception_resnet_v2/README.md +++ b/benchmarks/image_recognition/tensorflow/inception_resnet_v2/README.md @@ -14,8 +14,7 @@ repository: $ git clone https://github.com/IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks and the -an optimized version of the Inception ResNet V2 model code. +This repository includes launch scripts for running an optimized version of the Inception ResNet V2 model code. 2. Download the pretrained model: ``` @@ -23,8 +22,7 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/inceptio ``` 3. If you would like to run Inception ResNet V2 inference and test for -accuracy, you will need the full ImageNet dataset. Benchmarking for latency -and throughput do not require the ImageNet dataset. +accuracy, you will need the full ImageNet dataset. Running for online and batch inference performance do not require the ImageNet dataset. Register and download the [ImageNet dataset](http://image-net.org/download-images). @@ -57,7 +55,7 @@ $ ll /home//datasets/ImageNet_TFRecords 4. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the ImageNet TF Records that you generated in step 3. @@ -67,9 +65,8 @@ only) and `--in-graph` pre-trained model file path (from step 2). Note that the docker image in the commands below is built using MKL PRs that are required to run Inception ResNet V2 Int8. -Inception ResNet V2 can be run for accuracy, latency benchmarking, or throughput -benchmarking. Use one of the following examples below, depending on -your use case. +Inception ResNet V2 can be run for accuracy, online inference, or batch inference. +Use one of the following examples below, depending on your use case. For accuracy (using your `--data-location`, `--accuracy-only` and `--batch-size 100`): @@ -87,7 +84,7 @@ python launch_benchmark.py \ --data-location /home//datasets/ImageNet_TFRecords ``` -For latency (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): +For online inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -102,7 +99,7 @@ python launch_benchmark.py \ --in-graph /home//inception_resnet_v2_int8_pretrained_model.pb ``` -For throughput (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): +For batch inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): ``` python launch_benchmark.py \ @@ -136,7 +133,7 @@ Ran inference with batch size 100 Log location outside container: /benchmark_inception_resnet_v2_inference_int8_20190330_012925.log ``` -Example log tail when benchmarking for latency: +Example log tail when running for online inference: ``` ... Iteration 37: 0.046 sec @@ -151,7 +148,7 @@ Ran inference with batch size 1 Log location outside container: /benchmark_inception_resnet_v2_inference_int8_20190330_012557.log ``` -Example log tail when benchmarking for throughput: +Example log tail when running for batch inference: ``` ... Iteration 37: 0.975 sec @@ -175,8 +172,7 @@ repository: $ git clone git@github.com:IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks and the -an optimized version of the Inception ResNet V2 model code. +This repository includes launch scripts for running an optimized version of the Inception ResNet V2 model code. 2. Download the pre-trained Inception ResNet V2 model files: @@ -186,7 +182,7 @@ For accuracy: $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/inception_resnet_v2_fp32_pretrained_model.pb ``` -For throughput and latency: +For batch and online inference: ``` $ wget http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz @@ -194,8 +190,8 @@ $ mkdir -p checkpoints && tar -C ./checkpoints/ -zxf inception_resnet_v2_2016_08 ``` 3. If you would like to run Inception ResNet V2 inference and test for -accuracy, you will need the full ImageNet dataset. Benchmarking for latency -and throughput do not require the ImageNet dataset. +accuracy, you will need the full ImageNet dataset. Running for online +and batch inference do not require the ImageNet dataset. Register and download the [ImageNet dataset](http://image-net.org/download-images). @@ -228,7 +224,7 @@ $ ll /home//datasets/ImageNet_TFRecords 4. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the ImageNet TF Records that you generated in step 3. @@ -236,9 +232,8 @@ TF Records that you generated in step 3. Substitute in your own `--data-location` (from step 3, for accuracy only), `--checkpoint` pre-trained model checkpoint file path (from step 2). -Inception ResNet V2 can be run for accuracy, latency benchmarking, or throughput -benchmarking. Use one of the following examples below, depending on -your use case. +Inception ResNet V2 can be run for accuracy, online inference, or batch inference. +Use one of the following examples below, depending on your use case. For accuracy (using your `--data-location`, `--accuracy-only` and `--batch-size 100`): @@ -256,7 +251,7 @@ python launch_benchmark.py \ --data-location /home//datasets/ImageNet_TFRecords ``` -For latency (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): +For online inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -272,7 +267,7 @@ python launch_benchmark.py \ --data-location /home//datasets/ImageNet_TFRecords ``` -For throughput (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): +For batch inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): ``` python launch_benchmark.py \ @@ -308,7 +303,7 @@ Ran inference with batch size 100 Log location outside container: {--output-dir value}/benchmark_inception_resnet_v2_inference_fp32_20190109_081637.log ``` -Example log tail when benchmarking for latency: +Example log tail when running for online inference: ``` eval/Accuracy[0] eval/Recall_5[0.01] @@ -323,7 +318,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_inception_resnet_v2_inference_fp32_20190108_015057.log ``` -Example log tail when benchmarking for throughput: +Example log tail when running for batch inference: ``` eval/Accuracy[0.00078125] eval/Recall_5[0.00375] diff --git a/benchmarks/image_recognition/tensorflow/inceptionv3/README.md b/benchmarks/image_recognition/tensorflow/inceptionv3/README.md index 512d4fd1e..9de17c994 100644 --- a/benchmarks/image_recognition/tensorflow/inceptionv3/README.md +++ b/benchmarks/image_recognition/tensorflow/inceptionv3/README.md @@ -5,8 +5,7 @@ following modes/precisions: * [Int8 inference](#int8-inference-instructions) * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference -other precisions are coming later. +Instructions for model training and inference for other precisions are coming later. ## Int8 Inference Instructions @@ -17,8 +16,7 @@ repository: $ git clone https://github.com/IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks and the -an optimized version of the inceptionv3 model code. +This repository includes launch scripts for running an optimized version of the Inception V3 model code. 2. Clone the [tensorflow/models](https://github.com/tensorflow/models) repository: @@ -69,7 +67,7 @@ $ ll /home//datasets/ImageNet_TFRecords 5. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the ImageNet TF Records that you generated in step 4. @@ -80,9 +78,8 @@ only), `--in-graph` pretrained model file path (from step 3) and [tensorflow/models](https://github.com/tensorflow/models) repo (from step 2). -Inception V3 can be run for accuracy, latency benchmarking, or throughput -benchmarking. Use one of the following examples below, depending on -your use case. +Inception V3 can be run for accuracy, online inference, or batch inference. +Use one of the following examples below, depending on your use case. For accuracy (using your `--data-location`, `--accuracy-only` and `--batch-size 100`): @@ -100,12 +97,12 @@ python launch_benchmark.py \ --data-location /home//datasets/ImageNet_TFRecords ``` -When running performance benchmarking, it is optional to specify the +When testing performance, it is optional to specify the number of `warmup_steps` and `steps` as extra args, as shown in the commands below. If these values are not specified, the script will default to use `warmup_steps=10` and `steps=50`. -For latency with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): +For online inference with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -122,7 +119,7 @@ python launch_benchmark.py \ -- warmup_steps=50 steps=500 ``` -For latency with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`), remove `--data-location` argument: +For online inference with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`), remove `--data-location` argument: ``` python launch_benchmark.py \ @@ -138,7 +135,7 @@ python launch_benchmark.py \ -- warmup_steps=50 steps=500 ``` -For throughput with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): +For batch inference with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): ``` python launch_benchmark.py \ @@ -155,7 +152,7 @@ python launch_benchmark.py \ -- warmup_steps=50 steps=500 ``` -For throughput with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`), remove `--data-location` argument:: +For batch inference with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`), remove `--data-location` argument:: ``` python launch_benchmark.py \ @@ -196,7 +193,7 @@ Ran inference with batch size 100 Log location outside container: {--output-dir value}/benchmark_inceptionv3_inference_int8_20190104_013246.log ``` -Example log tail when benchmarking for latency: +Example log tail when running for online inference: ``` ... steps = 470, 53.7256017113 images/sec @@ -209,7 +206,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_inceptionv3_inference_int8_20190223_194002.log ``` -Example log tail when benchmarking for throughput: +Example log tail when running for batch inference: ``` ... steps = 470, 370.435654276 images/sec @@ -237,8 +234,8 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/inceptio ``` 3. If you would like to run Inception V3 FP32 inference and test for -accuracy, you will need the ImageNet dataset. Benchmarking for latency -and throughput do not require the ImageNet dataset. Instructions for +accuracy, you will need the ImageNet dataset. Running for online +and batch inference do not require the ImageNet dataset. Instructions for downloading the dataset and converting it to the TF Records format can be found in the TensorFlow documentation [here](https://github.com/tensorflow/models/tree/master/research/slim#an-automated-script-for-processing-imagenet-data). @@ -246,17 +243,16 @@ be found in the TensorFlow documentation 4. Navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image. Substitute in your own `--in-graph` pretrained model file path (from step 2). -Inception V3 can be run for latency benchmarking, throughput -benchmarking, or accuracy. Use one of the following examples below, +Inception V3 can be run for online inference, batch inference, or accuracy. Use one of the following examples below, depending on your use case. -* For latency with dummy data (using `--batch-size 1`): +* For online inference with dummy data (using `--batch-size 1`): ``` python launch_benchmark.py \ @@ -269,7 +265,7 @@ python launch_benchmark.py \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \ --in-graph /home//inceptionv3_fp32_pretrained_model.pb ``` -Example log tail when benchmarking for latency: +Example log tail when running for online inference: ``` Inference with dummy data. Iteration 1: 1.075 sec @@ -289,7 +285,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_inceptionv3_inference_fp32_20190104_025220.log ``` -* For throughput with dummy data (using `--batch-size 128`): +* For batch inference with dummy data (using `--batch-size 128`): ``` python launch_benchmark.py \ @@ -302,7 +298,7 @@ python launch_benchmark.py \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \ --in-graph /home//inceptionv3_fp32_pretrained_model.pb ``` -Example log tail when benchmarking for throughput: +Example log tail when running for batch inference: ``` Inference with dummy data. Iteration 1: 2.024 sec @@ -336,7 +332,7 @@ python launch_benchmark.py \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \ --in-graph /home//inceptionv3_fp32_pretrained_model.pb ``` -Example log tail when benchmarking for accuracy: +Example log tail when running for accuracy: ``` Processed 49800 images. (Top1 accuracy, Top5 accuracy) = (0.7673, 0.9341) Processed 49900 images. (Top1 accuracy, Top5 accuracy) = (0.7674, 0.9341) diff --git a/benchmarks/image_recognition/tensorflow/inceptionv4/README.md b/benchmarks/image_recognition/tensorflow/inceptionv4/README.md index 14f1bed98..a1228c3ad 100644 --- a/benchmarks/image_recognition/tensorflow/inceptionv4/README.md +++ b/benchmarks/image_recognition/tensorflow/inceptionv4/README.md @@ -5,7 +5,7 @@ following modes/precisions: * [Int8 inference](#int8-inference-instructions) * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference +Instructions and scripts for model training and inference for other precisions are coming later. ## Int8 Inference Instructions @@ -15,7 +15,7 @@ other precisions are coming later. ``` $ git clone https://github.com/IntelAI/models.git ``` - This repository includes launch scripts for running benchmarks. + This repository includes launch scripts for running the model. 2. Download the pretrained model: ``` @@ -23,8 +23,7 @@ other precisions are coming later. ``` 3. If you would like to run Inception V4 inference and test for - accuracy, you will need the ImageNet dataset. Benchmarking for latency - and throughput do not require the ImageNet dataset. Instructions for + accuracy, you will need the ImageNet dataset. It is not necessary for batch or online inference, you have the option of using synthetic data instead. Instructions for downloading the ImageNet dataset and converting it to the TF Records format and be found [here](https://github.com/tensorflow/models/tree/master/research/slim#an-automated-script-for-processing-imagenet-data). @@ -32,13 +31,13 @@ other precisions are coming later. 4. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is - used for starting a benchmarking run in a optimized TensorFlow docker + used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the ImageNet TF Records that you generated in step 3. - Inception V4 can be run to test accuracy or benchmarking throughput or - latency. Use one of the following examples below, depending on your use + Inception V4 can be run to test accuracy, batch inference, or + online inference. Use one of the following examples below, depending on your use case. For accuracy (using your `--data-location`, `--accuracy-only` and @@ -57,7 +56,7 @@ other precisions are coming later. --data-location /home//ImageNet_TFRecords ``` - For throughput benchmarking (using `--benchmark-only`, `--socket-id 0` and `--batch-size 240`): + For batch inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 240`): ``` python launch_benchmark.py \ --model-name inceptionv4 \ @@ -71,7 +70,7 @@ other precisions are coming later. --in-graph /home//inceptionv4_int8_pretrained_model.pb ``` - For latency (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): + For online inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ --model-name inceptionv4 \ @@ -112,7 +111,7 @@ other precisions are coming later. Log location outside container: /benchmark_inceptionv4_inference_int8_20190306_221608.log ``` - Example log tail when benchmarking for throughput: + Example log tail when running for batch inference: ``` [Running warmup steps...] steps = 10, 185.108768528 images/sec @@ -128,7 +127,7 @@ other precisions are coming later. Log location outside container: /benchmark_inceptionv4_inference_int8_20190306_215858.log ``` - Example log tail when benchmarking for latency: + Example log tail when running for online inference: ``` [Running warmup steps...] steps = 10, 30.8738415788 images/sec @@ -152,7 +151,7 @@ other precisions are coming later. ``` $ git clone https://github.com/IntelAI/models.git ``` - This repository includes launch scripts for running benchmarks. + This repository includes launch scripts for running the model. 2. Download the pretrained model: ``` @@ -160,8 +159,8 @@ other precisions are coming later. ``` 3. If you would like to run Inception V4 inference and test for - accuracy, you will need the ImageNet dataset. Benchmarking for latency - and throughput do not require the ImageNet dataset. Instructions for + accuracy, you will need the ImageNet dataset. Running for online + and batch inference do not require the ImageNet dataset. Instructions for downloading the ImageNet dataset and converting it to the TF Records format and be found [here](https://github.com/tensorflow/models/tree/master/research/slim#an-automated-script-for-processing-imagenet-data). @@ -169,13 +168,13 @@ other precisions are coming later. 4. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is - used for starting a benchmarking run in a optimized TensorFlow docker + used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the ImageNet TF Records that you generated in step 3. - Inception V4 can be run to test accuracy or benchmarking throughput or - latency. Use one of the following examples below, depending on your use + Inception V4 can be run to test accuracy, batch inference, or + online inference. Use one of the following examples below, depending on your use case. For accuracy (using your `--data-location`, `--accuracy-only` and @@ -194,7 +193,7 @@ other precisions are coming later. --data-location /home//ImageNet_TFRecords ``` - For throughput benchmarking (using `--benchmark-only`, `--socket-id 0` and `--batch-size 240`): + For batch inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 240`): ``` python launch_benchmark.py \ --model-name inceptionv4 \ @@ -208,7 +207,7 @@ other precisions are coming later. --in-graph /home//inceptionv4_fp32_pretrained_model.pb ``` - For latency (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): + For online inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ --model-name inceptionv4 \ @@ -242,7 +241,7 @@ other precisions are coming later. Log location outside container: /benchmark_inceptionv4_inference_fp32_20190308_182729.log ``` - Example log tail when benchmarking for throughput: + Example log tail when running for batch inference: ``` [Running warmup steps...] steps = 10, 91.4372832625 images/sec @@ -256,7 +255,7 @@ other precisions are coming later. Log location outside container: /benchmark_inceptionv4_inference_fp32_20190308_184431.log ``` - Example log tail when benchmarking for latency: + Example log tail when running for online inference: ``` [Running warmup steps...] steps = 10, 15.6993019295 images/sec @@ -269,4 +268,4 @@ other precisions are coming later. Latency: 63.534 ms Ran inference with batch size 1 Log location outside container: /benchmark_inceptionv4_inference_fp32_20190307_221954.log - ``` \ No newline at end of file + ``` diff --git a/benchmarks/image_recognition/tensorflow/mobilenet_v1/README.md b/benchmarks/image_recognition/tensorflow/mobilenet_v1/README.md index 5b2d8e64d..17e274aa9 100644 --- a/benchmarks/image_recognition/tensorflow/mobilenet_v1/README.md +++ b/benchmarks/image_recognition/tensorflow/mobilenet_v1/README.md @@ -4,7 +4,7 @@ This document has instructions for how to run MobileNet V1 for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training is coming +Instructions and scripts for model training are coming later. ## FP32 Inference Instructions @@ -38,7 +38,7 @@ later. ``` The [tensorflow/models](https://github.com/tensorflow/models) files - are used for dependencies when running benchmarking. + are used for dependencies when running the model. 4. Clone the [intelai/models](https://github.com/IntelAI/models) repo and then navigate to the benchmarks directory: @@ -48,7 +48,7 @@ later. $ cd models/benchmarks ``` - Benchmarking can be run for either latency or throughput using the + MobileNet V1 can be run for either online or batch inference using the commands below. The `--data-location` should be the path to the ImageNet validation data from step 1, the `--checkpoint` arg should be the path to the checkpoint files from step 2, and the @@ -56,7 +56,7 @@ later. [tensorflow/models](https://github.com/tensorflow/models) repo that was cloned in step 3. - * Run benchmarking for latency (with `--batch-size 1` and `--checkpoint` with a path to the checkpoint file directory): + * Run for online inference (with `--batch-size 1` and `--checkpoint` with a path to the checkpoint file directory): ``` python launch_benchmark.py \ --precision fp32 \ @@ -70,7 +70,7 @@ later. --data-location /dataset/Imagenet_Validation \ --checkpoint /home//mobilenet_v1_fp32_pretrained_model ``` - * Run benchmarking for throughput (with `--batch-size 100` and `--checkpoint` with a path to the checkpoint file directory): + * Run for batch inference (with `--batch-size 100` and `--checkpoint` with a path to the checkpoint file directory): ``` python launch_benchmark.py \ --precision fp32 \ @@ -84,7 +84,7 @@ later. --data-location /dataset/Imagenet_Validation \ --checkpoint /home//mobilenet_v1_fp32_pretrained_model ``` - * Run benchmarking for accuracy (with `--batch-size 100`, `--accuracy-only` and `--in-graph` with a path to the frozen graph .pb file): + * Run for accuracy (with `--batch-size 100`, `--accuracy-only` and `--in-graph` with a path to the frozen graph .pb file): ``` python launch_benchmark.py \ --precision fp32 \ @@ -101,9 +101,9 @@ later. Note that the `--verbose` or `--output-dir` flag can be added to any of the above commands to get additional debug output or change the default output location. -5. The log files for each benchmarking run are saved at the value of `--output-dir`. +5. The log files for each run are saved at the value of `--output-dir`. - * Below is a sample log file snippet when benchmarking latency: + * Below is a sample log file snippet when testing online inference: ``` 2019-01-04 20:02:23.855441: step 80, 78.3 images/sec 2019-01-04 20:02:23.974862: step 90, 83.7 images/sec @@ -121,7 +121,7 @@ later. Log location outside container: {--output-dir value}/benchmark_mobilenet_v1_inference_fp32_20190104_200218.log ``` - * Below is a sample log file snippet when benchmarking throughput: + * Below is a sample log file snippet when testing batch inference: ``` 2019-01-04 20:06:01.151312: step 80, 184.0 images/sec 2019-01-04 20:06:06.719081: step 90, 180.5 images/sec diff --git a/benchmarks/image_recognition/tensorflow/resnet101/README.md b/benchmarks/image_recognition/tensorflow/resnet101/README.md index 4e25a41f1..442c9cb21 100644 --- a/benchmarks/image_recognition/tensorflow/resnet101/README.md +++ b/benchmarks/image_recognition/tensorflow/resnet101/README.md @@ -14,7 +14,7 @@ repository: $ git clone https://github.com/IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks and the +This repository includes launch scripts for running an optimized version of the ResNet101 model code. 2. Download the pre-trained model. @@ -56,7 +56,7 @@ $ ll /home//datasets/ImageNet_TFRecords 4. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the ImageNet TF Records that you generated in step 3. @@ -64,7 +64,7 @@ TF Records that you generated in step 3. Substitute in your own `--data-location` (from step 3, for accuracy only) and `--in-graph` pre-trained model file path (from step 2). -ResNet101 can be run for accuracy or performance benchmarking. Use one of +ResNet101 can be run for testing accuracy or performance. Use one of the following examples below, depending on your use case. For accuracy (using your `--data-location`,`--in-graph`, `--accuracy-only` and @@ -85,12 +85,12 @@ $ python launch_benchmark.py \ --in-graph=/home//resnet101_int8_pretrained_model.pb ``` -When running performance benchmarking, it is optional to specify the +When running for performance, it is optional to specify the number of `warmup_steps` and `steps` as extra args, as shown in the commands below. If these values are not specified, the script will default to use `warmup_steps=40` and `steps=100`. -For latency with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): +For online inference with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -106,7 +106,7 @@ python launch_benchmark.py \ -- warmup_steps=50 steps=500 ``` -For latency with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): +For online inference with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -123,7 +123,7 @@ python launch_benchmark.py \ -- warmup_steps=50 steps=500 ``` -For throughput with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): +For batch inference with dummy data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): ``` python launch_benchmark.py \ @@ -139,7 +139,7 @@ python launch_benchmark.py \ -- warmup_steps=50 steps=500 ``` -For throughput with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): +For batch inference with ImageNet data (using `--benchmark-only`, `--socket-id 0` and `--batch-size 128`): ``` python launch_benchmark.py \ @@ -182,7 +182,7 @@ Ran inference with batch size 100 Log location outside container: {--output-dir value}/benchmark_resnet101_inference_int8_20190104_205838.log ``` -Example log tail when benchmarking for latency: +Example log tail when running for online inference: ``` ... steps = 470, 48.3195530058 images/sec @@ -195,7 +195,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_resnet101_inference_int8_20190223_191406.log ``` -Example log tail when benchmarking for throughput: +Example log tail when running for batch inference: ``` ... steps = 470, 328.906266308 images/sec @@ -224,7 +224,7 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/resnet10 3. Download ImageNet dataset. - This step is required only required for running accuracy, for running benchmark we do not need to provide dataset. + This step is only required for running accuracy, for running online and batch inference we do not need to provide dataset. Register and download the ImageNet dataset. Once you have the raw ImageNet dataset downloaded, we need to convert it to the TFRecord format. The TensorFlow models repo provides @@ -246,9 +246,9 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/resnet10 -rw-r--r--. 1 user 52508270 Jun 20 15:09 validation-00126-of-00128 -rw-r--r--. 1 user 55292089 Jun 20 15:09 validation-00127-of-00128 ``` -4. Run the benchmark. +4. Run the script. - For latency measurements with dummy data set `--batch-size 1` and for throughput benchmarking set `--batch-size 128` + For online inference measurements with dummy data set `--batch-size 1` and for batch inference set `--batch-size 128` ``` $ cd /home//models/benchmarks @@ -265,7 +265,7 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/resnet10 The log file is saved to the value of `--output-dir`. - The tail of the log output when the benchmarking completes should look something like this: + The tail of the log output when the run completes should look something like this: ``` steps = 70, 193.428695737 images/sec @@ -296,7 +296,7 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/resnet10 The log file is saved to the value of `--output-dir`. - The tail of the log output when the benchmarking completes should look something like this: + The tail of the log output when the run completes should look something like this: ``` Processed 49600 images. (Top1 accuracy, Top5 accuracy) = (0.7639, 0.9289) diff --git a/benchmarks/image_recognition/tensorflow/resnet50/README.md b/benchmarks/image_recognition/tensorflow/resnet50/README.md index 8389c041f..fec96a4f2 100644 --- a/benchmarks/image_recognition/tensorflow/resnet50/README.md +++ b/benchmarks/image_recognition/tensorflow/resnet50/README.md @@ -5,7 +5,7 @@ following precisions: * [Int8 inference](#int8-inference-instructions) * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for ResNet50 model inference on `Int8` and `FP32` +Instructions and scripts for ResNet50 model inference on `Int8` and `FP32` precisions. ## Int8 Inference Instructions @@ -63,7 +63,7 @@ $ python launch_benchmark.py \ ``` The log file is saved to the value of `--output-dir`. -The tail of the log output when the benchmarking completes should look +The tail of the log output when the script completes should look something like this: ``` Processed 49600 images. (Top1 accuracy, Top5 accuracy) = (0.7361, 0.9155) @@ -79,7 +79,7 @@ Log location outside container: {--output-dir value}/benchmark_resnet50_inferenc * Evaluate the model performance: If just evaluate performance for dummy data, the `--data-location` is not needed. Otherwise `--data-location` argument needs to be specified: -Calculate the model throughput `images/sec`, the required parameters to run the inference script would include: +Calculate the model batch inference `images/sec`, the required parameters to run the inference script would include: the pre-trained `resnet50_int8_pretrained_model.pb` input graph file (from step 2), and the `--benchmark-only` flag. It is optional to specify the number of `warmup_steps` and `steps` as extra @@ -100,7 +100,7 @@ $ python launch_benchmark.py \ --docker-image intelaipg/intel-optimized-tensorflow:PR25765-devel-mkl -- warmup_steps=50 steps=500 ``` -The tail of the log output when the benchmarking completes should look +The tail of the log output when the script completes should look something like this: ``` ... @@ -132,7 +132,7 @@ $ git clone https://github.com/IntelAI/models.git ``` 3. If running resnet50 for accuracy, the ImageNet dataset will be -required (if running benchmarking for throughput/latency, then dummy +required (if running for batch or online inference performance, then dummy data will be used). The TensorFlow models repo provides @@ -142,10 +142,10 @@ to download, process, and convert the ImageNet dataset to the TF records format. 4. Run the inference script `launch_benchmark.py` with the appropriate parameters to evaluate the model performance. The optimized ResNet50 model files are attached to the [intelai/models](https://github.com/intelai/models) repo and located at `models/models/image_recognition/tensorflow/resnet50/`. -If benchmarking uses dummy data for inference, `--data-location` flag is not required. Otherwise, +If using dummy data for inference, `--data-location` flag is not required. Otherwise, `--data-location` needs to point to point to ImageNet dataset location. -* To measure the model latency, set `--batch-size=1` and run the benchmark script as shown: +* To measure online inference, set `--batch-size=1` and run the script as shown: ``` $ cd /home//models/benchmarks @@ -162,7 +162,7 @@ $ python launch_benchmark.py \ The log file is saved to the value of `--output-dir`. -The tail of the log output when the benchmarking completes should look +The tail of the log output when the script completes should look something like this: ``` Inference with dummy data. @@ -182,7 +182,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_resnet50_inference_fp32_20190104_215326.log ``` -* To measure the model Throughput, set `--batch-size=128` and run the benchmark script as shown: +* To measure batch inference, set `--batch-size=128` and run the launch script as shown: ``` $ cd /home//models/benchmarks @@ -199,7 +199,7 @@ $ python launch_benchmark.py \ The log file is saved to the value of `--output-dir`. -The tail of the log output when the benchmarking completes should look +The tail of the log output when the script completes should look something like this: ``` Inference with dummy data. @@ -249,8 +249,7 @@ Ran inference with batch size 100 Log location outside container: {--output-dir value}/benchmark_resnet50_inference_fp32_20190104_213452.log ``` -* The `--output-results` flag can be used along with above benchmarking -or accuracy test, in order to also output a file with the inference +* The `--output-results` flag can be used to also output a file with the inference results (file name, actual label, and the predicted label). The results output can only be used with real data. diff --git a/benchmarks/image_recognition/tensorflow/squeezenet/README.md b/benchmarks/image_recognition/tensorflow/squeezenet/README.md index 21bcf3fb0..e5e9cc86c 100644 --- a/benchmarks/image_recognition/tensorflow/squeezenet/README.md +++ b/benchmarks/image_recognition/tensorflow/squeezenet/README.md @@ -4,7 +4,7 @@ This document has instructions for how to run SqueezeNet for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference +Instructions and scripts for model training and inference for other precisions are coming later. ## FP32 Inference Instructions @@ -16,7 +16,7 @@ repository: $ git clone https://github.com/IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks, +This repository includes launch scripts for running SqueezeNet, checkpoint files for restoring a pre-trained SqueezeNet model, and CPU optimized SqueezeNet model scripts. @@ -62,14 +62,14 @@ $ cd /home//models/benchmarks ``` The `launch_benchmark.py` script in the `benchmarks` directory is used -for starting a benchmarking run in a TensorFlow docker container. It has +for starting a model run in a TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the ImageNet TF Records that you generated in step 3 and the checkpoint files that you downloaded in step 4. Substitute in your own `--data-location` and follow the steps in the -following example for throughput (using `--batch-size 64`): +following example for batch inference (using `--batch-size 64`): ``` $ python launch_benchmark.py \ @@ -84,7 +84,7 @@ $ python launch_benchmark.py \ --data-location /home//datasets/ImageNet_TFRecords ``` -Or for latency (using `--batch-size 1`): +Or for online inference (using `--batch-size 1`): ``` $ python launch_benchmark.py \ @@ -104,8 +104,8 @@ to get additional debug output or change the default output location. 6. The log file is saved to the value of `--output-dir`. -The tail of the log output when the benchmarking completes should look -something like this, when running for throughput with `--batch-size 64`: +The tail of the log output when the script completes should look +something like this, when running for batch inference with `--batch-size 64`: ``` SqueezeNet Inference Summary: @@ -120,7 +120,7 @@ Ran inference with batch size 64 Log location outside container: {--output-dir value}/benchmark_squeezenet_inference_fp32_20190104_220051.log ``` -Or for latency (with `--batch-size 1`): +Or for online inference (with `--batch-size 1`): ``` SqueezeNet Inference Summary: diff --git a/benchmarks/image_segmentation/tensorflow/maskrcnn/README.md b/benchmarks/image_segmentation/tensorflow/maskrcnn/README.md index 3c377ea30..15edaebba 100644 --- a/benchmarks/image_segmentation/tensorflow/maskrcnn/README.md +++ b/benchmarks/image_segmentation/tensorflow/maskrcnn/README.md @@ -4,7 +4,7 @@ This document has instructions for how to run Mask R-CNN for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference. +Instructions and scripts for model training and inference. ## FP32 Inference Instructions @@ -37,19 +37,18 @@ repository: $ git clone https://github.com/IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks and the -an optimized version of the Mask R-CNN model code. +This repository includes launch scripts for running an optimized version of the Mask R-CNN model code. 5. Navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 4. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the external model directory for `--model-source-dir` (from step 2) and `--data-location` (from step 1). -Run benchmarking for throughput and latency with `--batch-size=1` : +Run for batch and online inference with `--batch-size=1` : ``` $ cd /home//models/benchmarks @@ -67,8 +66,8 @@ $ python launch_benchmark.py \ 5. Log files are located at the value of `--output-dir`. -Below is a sample log file tail when running benchmarking for throughput -and latency: +Below is a sample log file tail when running for batch +and online inference: ``` Running per image evaluation... Evaluate annotation type *bbox* diff --git a/benchmarks/image_segmentation/tensorflow/unet/README.md b/benchmarks/image_segmentation/tensorflow/unet/README.md index 7660771f9..e7d9693e4 100644 --- a/benchmarks/image_segmentation/tensorflow/unet/README.md +++ b/benchmarks/image_segmentation/tensorflow/unet/README.md @@ -11,7 +11,7 @@ modes/precisions: ``` $ git clone git@github.com:IntelAI/models.git ``` - This repository includes launch scripts for running benchmarks. + This repository includes launch scripts for running Unet. 2. Download and extract the pretrained model: ``` @@ -39,13 +39,13 @@ modes/precisions: 4. Navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is - used for starting a benchmarking run in a optimized TensorFlow docker + used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with the checkpoint files that were downloaded in step 2 and the path to the UNet model repository that you cloned in step 3. - UNet benchmarking can be run to test throughput and latency using the + UNet can be run to test batch and online inference using the following command with your checkpoint and model-source-dir paths: ``` diff --git a/benchmarks/language_translation/tensorflow/gnmt/README.md b/benchmarks/language_translation/tensorflow/gnmt/README.md index 285df8ee5..fd92755b2 100644 --- a/benchmarks/language_translation/tensorflow/gnmt/README.md +++ b/benchmarks/language_translation/tensorflow/gnmt/README.md @@ -13,7 +13,7 @@ repository: $ git clone https://github.com/IntelAI/models.git ``` -This repository includes launch scripts for running benchmarks and the +This repository includes launch scripts for running an optimized version of the GNMT model code. 2. Download the pre-trained model. @@ -58,7 +58,7 @@ newstest2010.tok.bpe.32000.en newstest2012.tok.de newstest2014.tok.e 4. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo from step 1. The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, platform, and docker image to use, along with your path to the dataset that you generated in step 3. @@ -66,10 +66,10 @@ that you generated in step 3. Substitute in your own `--data-location` (from step 3), `--checkpoint` pre-trained model file path (from step 2), and the name/tag for your docker image. -GNMT can be run for latency benchmarking and throughput benchmarking. Use one of +GNMT can be run for online and batch inference. Use one of the following examples below, depending on your use case. -For latency (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): +For online inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -86,7 +86,7 @@ python launch_benchmark.py \ -- infer_mode=beam_search ``` -For throughput (using `--benchmark-only`, `--socket-id 0` and `--batch-size 32`): +For batch inference (using `--benchmark-only`, `--socket-id 0` and `--batch-size 32`): ``` python launch_benchmark.py \ @@ -108,7 +108,7 @@ python launch_benchmark.py \ examples of what the tail of your log file should look like for the different configs. -Example log tail when benchmarking for latency: +Example log tail when running for online inference: ``` dynamic_seq2seq/decoder/multi_rnn_cell/cell_3/basic_lstm_cell/bias:0, (4096,), /device:CPU:0 dynamic_seq2seq/decoder/output_projection/kernel:0, (1024, 36548), @@ -124,7 +124,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_gnmt_inference_fp32_20190206_011740.log ``` -Example log tail when benchmarking for throughput: +Example log tail when running for batch inference: ``` dynamic_seq2seq/decoder/multi_rnn_cell/cell_3/basic_lstm_cell/bias:0, (4096,), /device:CPU:0 dynamic_seq2seq/decoder/output_projection/kernel:0, (1024, 36548), diff --git a/benchmarks/language_translation/tensorflow/transformer_language/README.md b/benchmarks/language_translation/tensorflow/transformer_language/README.md index d548bb0aa..100b1e16d 100644 --- a/benchmarks/language_translation/tensorflow/transformer_language/README.md +++ b/benchmarks/language_translation/tensorflow/transformer_language/README.md @@ -1,10 +1,10 @@ # Transformer Language -This document has instructions for how to run Transformer Language benchmark for the +This document has instructions for how to run Transformer Language for the following modes/platforms: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference for +Instructions and scripts for model training and inference for other platforms are coming later. ## FP32 Inference Instructions @@ -58,7 +58,7 @@ $ git clone https://github.com/IntelAI/models.git 5. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo (from step 4). The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the dataset location (from step 2), and the checkpoint directory (from step 3). @@ -67,11 +67,10 @@ Substitute the `--model-source-dir` for the location where you cloned the [tensorflow/tensor2tensor](https://github.com/tensorflow/tensor2tensor) repo (from step 1). -Transformer Language can run for latency or throughput -benchmarking. Use one of the following examples below, depending on +Transformer Language can run for online or batch inference. Use one of the following examples below, depending on your use case. -For latency (using `--socket-id 0` and `--batch-size 1`): +For online inference (using `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -88,7 +87,7 @@ python launch_benchmark.py \ -- decode_from_file=newstest2015.en reference=newstest2015.de ``` -For throughput (using `--socket-id 0` and `--batch-size 32`): +For batch inference (using `--socket-id 0` and `--batch-size 32`): ``` python launch_benchmark.py \ @@ -113,7 +112,7 @@ to get additional debug output. examples of what the tail of your log file should look like for the different configs. -Example log tail when benchmarking for latency: +Example log tail when running for online inference: ``` INFO:tensorflow:Decoding batch 2167 INFO:tensorflow:Inference results INPUT: Move! @@ -131,7 +130,7 @@ Ran inference with batch size 1 Log location outside container: {--output-dir value}/benchmark_transformer_language_inference_fp32_20190210_050451.log ``` -Example log tail when benchmarking for throughput: +Example log tail when running for batch inference: ``` INFO:tensorflow:Inference results INPUT: Move! INFO:tensorflow:Inference results OUTPUT: Move! diff --git a/benchmarks/language_translation/tensorflow/transformer_lt_official/README.md b/benchmarks/language_translation/tensorflow/transformer_lt_official/README.md index 0237cf8e3..f0d79e4e3 100644 --- a/benchmarks/language_translation/tensorflow/transformer_lt_official/README.md +++ b/benchmarks/language_translation/tensorflow/transformer_lt_official/README.md @@ -1,10 +1,10 @@ # Transformer Language Translation (LT) Official -This document has instructions for how to run Transformer Language official benchmark from TensorFlow models +This document has instructions for how to run Transformer Language official from TensorFlow models for the following modes/platforms: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model inference for other platforms are coming later. +Instructions and scripts for model inference for other platforms are coming later. ## FP32 Inference Instructions 1. Clone an older commit from the [tensorflow/models](https://github.com/tensorflow/models.git) repository: @@ -44,7 +44,7 @@ $ git clone https://github.com/IntelAI/models.git 4. Next, navigate to the `benchmarks` directory in your local clone of the [intelai/models](https://github.com/IntelAI/models) repo (from step 3). The `launch_benchmark.py` script in the `benchmarks` directory is -used for starting a benchmarking run in a optimized TensorFlow docker +used for starting a model run in a optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image to use, along with your path to the dataset location (from step 2). @@ -52,11 +52,10 @@ Substitute the `--model-source-dir` for the location where you cloned the [tensorflow/models](https://github.com/tensorflow/models.git) repo (from step 1). -Transformer LT official can run for latency or throughput -benchmarking. Use one of the following examples below, depending on +Transformer LT official can run for online or batch inference. Use one of the following examples below, depending on your use case. -For latency (using `--socket-id 0` and `--batch-size 1`): +For online inference (using `--socket-id 0` and `--batch-size 1`): ``` python launch_benchmark.py \ @@ -76,7 +75,7 @@ python launch_benchmark.py \ vocab_file=vocab.txt ``` -For throughput (using `--socket-id 0` and `--batch-size 64`): +For batch inference (using `--socket-id 0` and `--batch-size 64`): ``` python launch_benchmark.py \ diff --git a/benchmarks/object_detection/tensorflow/faster_rcnn/README.md b/benchmarks/object_detection/tensorflow/faster_rcnn/README.md index 89dc463f9..c5a64aecf 100644 --- a/benchmarks/object_detection/tensorflow/faster_rcnn/README.md +++ b/benchmarks/object_detection/tensorflow/faster_rcnn/README.md @@ -5,8 +5,8 @@ following modes/precisions: * [FP32 inference](#fp32-inference-instructions) * [Int8 inference](#int8-inference-instructions) -Benchmarking instructions and scripts for the Faster R-CNN ResNet50 model training and inference -other precisions are coming later. +Instructions and scripts for the Faster R-CNN ResNet50 model training and inference +for other precisions are coming later. ## FP32 Inference Instructions @@ -119,7 +119,7 @@ $ tar -xzvf faster_rcnn_resnet50_fp32_coco_pretrained_model.tar.gz ``` 5. Clone the [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking. +This repo has the launch script for running the model. ``` $ git clone https://github.com/IntelAI/models.git @@ -138,7 +138,7 @@ Resolving deltas: 100% (3/3), done. `pipeline.config` file and the checkpoint location (from step 4, and the location of your `tensorflow/models` clone (from step 1). -Run benchmarking for throughput and latency: +Run for batch and online inference: ``` $ cd /home//models/benchmarks @@ -173,8 +173,8 @@ python launch_benchmark.py \ 7. The log file is saved to the value of `--output-dir`. -Below is a sample log file tail when running benchmarking for throughput -and latency: +Below is a sample log file tail when running for batch +and online inference: ``` Time spent : 167.353 seconds. @@ -224,7 +224,7 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/faster_r ``` 3. Clone the [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking. +This repo has the launch script for running the model. ``` $ git clone https://github.com/IntelAI/models.git @@ -242,7 +242,7 @@ with the appropriate parameters. To run on single socket use `--socket_id` switc by default it will be using all available sockets. Optional parameter `number_of_steps` (default value = 5000) can be added at the end of command after `--` as shown below: -Run benchmarking for throughput and latency: +Run for batch and online inference: ``` $ cd /home//models/benchmarks @@ -285,8 +285,8 @@ used in the commands above were built using 5. The log file is saved to the value of `--output-dir`. -Below is a sample log file tail when running benchmarking for throughput -and latency: +Below is a sample log file tail when running for batch +and online inference: ``` Step 4950: 0.0722849369049 seconds diff --git a/benchmarks/object_detection/tensorflow/rfcn/README.md b/benchmarks/object_detection/tensorflow/rfcn/README.md index 39e6ac3be..02db3210a 100644 --- a/benchmarks/object_detection/tensorflow/rfcn/README.md +++ b/benchmarks/object_detection/tensorflow/rfcn/README.md @@ -5,8 +5,8 @@ following modes/precisions: * [Int8 inference](#int8-inference-instructions) * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for the R-FCN ResNet101 model training and inference -other precisions are coming later. +Instructions and scripts for the R-FCN ResNet101 model training and inference +for other precisions are coming later. ## Int8 Inference Instructions @@ -118,15 +118,14 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/rfcn_res ``` 5. Clone the [intelai/models](https://github.com/intelai/models) repo -and then run the benchmarking scripts for either benchmarking throughput -and latency or accuracy. +and then run the scripts for either batch/online inference performance or accuracy. ``` $ git clone https://github.com/IntelAI/models.git $ cd models/benchmarks ``` -Run benchmarking for throughput and latency where the `--data-location` +Run for batch and online inference where the `--data-location` is the path to the directory with the raw coco validation images: ``` python launch_benchmark.py \ @@ -171,8 +170,8 @@ to get additional debug output or change the default output location. 7. Log files are located at the value of `--output-dir` (or `models/benchmarks/common/tensorflow/logs` if no path has been specified): -Below is a sample log file tail when running benchmarking for throughput -and latency: +Below is a sample log file tail when running for batch +and online inference: ``` Step 0: 10.6923000813 seconds Step 10: 0.168856859207 seconds @@ -302,7 +301,7 @@ $ tar -xzvf rfcn_resnet101_fp32_coco_pretrained_model.tar.gz ``` 5. Clone the [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking. +This repo has the launch script for running the model. ``` $ git clone https://github.com/IntelAI/models.git @@ -322,7 +321,7 @@ Resolving deltas: 100% (3/3), done. `pipeline.config` file and the checkpoint location (from step 4), and the location of your `tensorflow/models` clone (from step 1). -Run benchmarking for throughput and latency: +Run for batch and online inference: ``` $ cd /home//models/benchmarks @@ -359,8 +358,8 @@ python launch_benchmark.py \ 7. Log files are located at the value of `--output-dir` (or `models/benchmarks/common/tensorflow/logs` if no path has been specified): -Below is a sample log file tail when running benchmarking for throughput -and latency: +Below is a sample log file tail when running for batch and +online inference: ``` Average time per step: 0.262 sec diff --git a/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md b/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md index c6688f159..8dc015d61 100644 --- a/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md +++ b/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md @@ -5,8 +5,8 @@ following modes/precisions: * [Int8 inference](#int8-inference-instructions) * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference -other precisions are coming later. +Instructions and scripts for model training and inference +for other precisions are coming later. ## Int8 Inference Instructions @@ -33,7 +33,7 @@ $ unzip val2017.zip $ cd .. ``` -If you would like to run benchmarks for throughput and latency, the +If you would like to run the model for batch and online inference, the validation dataset is all that you will need. If you would like to get accuracy metrics, then continue the instructions below to generate the TF record file as well. @@ -99,14 +99,14 @@ $ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/ssdmobil ``` 5. Clone the [intelai/models](https://github.com/intelai/models) repo -and then run the benchmarking scripts for either benchmarking throughput -and latency or accuracy. +and then run the scripts for either batch/online inference performance +or accuracy. ``` $ git clone git@github.com:IntelAI/models.git $ cd benchmarks ``` -Run benchmarking for throughput and latency where the `--data-location` +Run for batch and online inference where the `--data-location` is the path to the directory with the unzipped coco validation images: ``` python launch_benchmark.py \ @@ -150,8 +150,8 @@ to get additional debug output or change the default output location. 6. The log file is saved to the value of `--output-dir`. -Below is a sample log file tail when running benchmarking for throughput -and latency: +Below is a sample log file tail when running for batch +and online inference: ``` Step 4970: 0.0340421199799 seconds @@ -313,7 +313,7 @@ drwxr-sr-x. 3 4096 Feb 1 2018 saved_model ``` 6. Clone the [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking, which we will +This repo has the launch script for running the model, which we will use in the next step. ``` @@ -329,11 +329,11 @@ Resolving deltas: 100% (3/3), done. 7. Next, navigate to the `benchmarks` directory of the [intelai/models](https://github.com/intelai/models) repo that was just -cloned in the previous step. SSD-MobileNet can be run for benchmarking -throughput and latency, or testing accuracy. Note that we are running +cloned in the previous step. SSD-MobileNet can be run for testing +batch and online inference, or testing accuracy. Note that we are running SSD-MobileNet with a TensorFlow 1.12 docker image. -To benchmarking for throughput and latency, use the following command, +To run for batch and online inference, use the following command, but replace in your path to the unzipped coco dataset images from step 3 for the `--dataset-location`, the path to the frozen graph that you downloaded in step 5 as the `--in-graph`, and use the `--benchmark-only` @@ -376,7 +376,7 @@ $ python launch_benchmark.py \ 8. The log file is saved to the value of `--output-dir`. -Below is a sample log file tail when running benchmarking: +Below is a sample log file tail when running for performance: ``` INFO:tensorflow:Processed 5001 images... moving average latency 37 ms diff --git a/benchmarks/object_detection/tensorflow/ssd-resnet34/README.md b/benchmarks/object_detection/tensorflow/ssd-resnet34/README.md index 7cf4b2339..4171e1984 100644 --- a/benchmarks/object_detection/tensorflow/ssd-resnet34/README.md +++ b/benchmarks/object_detection/tensorflow/ssd-resnet34/README.md @@ -4,8 +4,8 @@ This document has instructions for how to run SSD-ResNet34 for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference -other precisions are coming later. +Instructions and scripts for model training and inference +for other precisions are coming later. ## FP32 Inference Instructions @@ -98,7 +98,7 @@ The `coco_val.record` file is what we will use in this inference example. 5. A link to download the pre-trained model is coming soon. 6. Clone the [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking, which we will +This repo has the launch script for running the model, which we will use in the next step. ``` @@ -107,11 +107,11 @@ $ git clone https://github.com/IntelAI/models.git 7. Next, navigate to the `benchmarks` directory of the [intelai/models](https://github.com/intelai/models) repo that was just -cloned in the previous step. SSD-ResNet34 can be run for benchmarking -throughput and latency, or testing accuracy. Note that we are running +cloned in the previous step. SSD-ResNet34 can be run for +batch and online inference, or accuracy. Note that we are running SSD-ResNet34 with a TensorFlow 1.13 docker image. -To benchmarking for throughput and latency, use the following command, +To run for batch and online inference, use the following command, but replace in your path to the unzipped coco dataset images from step 3 for the `--dataset-location`, the path to the frozen graph that you downloaded in step 5 as the `--in-graph`, and use the `--benchmark-only` @@ -156,7 +156,7 @@ $ python launch_benchmark.py \ 8. The log file is saved to the value of `--output-dir`. -Below is a sample log file tail when running benchmarking: +Below is a sample log file tail when running for performance: ``` Batchsize: 1 diff --git a/benchmarks/recommendation/tensorflow/ncf/README.md b/benchmarks/recommendation/tensorflow/ncf/README.md index c6b92d938..2ee7b070b 100644 --- a/benchmarks/recommendation/tensorflow/ncf/README.md +++ b/benchmarks/recommendation/tensorflow/ncf/README.md @@ -1,10 +1,10 @@ -## Benchmark Neural Collaborative Filtering (NCF) ## +## Neural Collaborative Filtering (NCF) ## This document has instructions for how to run NCF for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference. +Instructions and scripts for model training and inference. ## FP32 Inference Instructions @@ -41,7 +41,7 @@ $ tar -xzvf ncf_fp32_pretrained_model.tar.gz * `--checkpoint` - Path to checkpoint directory for the Pre-trained model from step4 -For Throughput, `--batch-size 256`, `--socket-id 0`, `--checkpoint` path from step5, `--model-source-dir` path from step2 +For batch inference, `--batch-size 256`, `--socket-id 0`, `--checkpoint` path from step5, `--model-source-dir` path from step2 ``` $ python launch_benchmark.py \ @@ -56,7 +56,7 @@ $ python launch_benchmark.py \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl ``` -The tail of Throughput log, looks as below. +The tail of batch inference log, looks as below. ``` ... 2018-11-12 19:42:44.851050: step 22900, 931259.2 recommendations/sec, 0.27490 msec/batch @@ -71,7 +71,7 @@ Average recommendations/sec across 23594 steps: 903932.8 (0.28381 msec/batch) ... ``` -For Latency, `--batch-size 1`, `--socket-id 0`, `--checkpoint` path from step5, `--model-source-dir` path from step2 +For online inference, `--batch-size 1`, `--socket-id 0`, `--checkpoint` path from step5, `--model-source-dir` path from step2 ``` $ python launch_benchmark.py \ @@ -86,7 +86,7 @@ $ python launch_benchmark.py \ --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl ``` -The tail of Latency log, looks as below. +The tail of online inference log, looks as below. ``` ... 2018-11-12 20:24:24.986641: step 6039100, 4629.5 recommendations/sec, 0.21601 msec/batch diff --git a/benchmarks/recommendation/tensorflow/wide_deep/README.md b/benchmarks/recommendation/tensorflow/wide_deep/README.md index 2f5229907..e6698bd5d 100644 --- a/benchmarks/recommendation/tensorflow/wide_deep/README.md +++ b/benchmarks/recommendation/tensorflow/wide_deep/README.md @@ -1,11 +1,11 @@ # Wide & Deep -This document has instructions for how to run Wide & Deep benchmark for the +This document has instructions for how to run Wide & Deep for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference -other precisions are coming later. +Instructions and scripts for model training and inference +for other precisions are coming later. ## FP32 Inference Instructions @@ -25,7 +25,7 @@ other precisions are coming later. ``` 3. Clone the [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking, which we will +This repo has the launch script for running the model, which we will use in the next step. ``` @@ -41,9 +41,9 @@ use in the next step. $ python benchmarks/recommendation/tensorflow/wide_deep/inference/fp32/data_download.py --data_dir /home//widedeep_dataset ``` -5. How to run benchmarks +5. How to run - * Running benchmarks in latency mode, set `--batch-size` = `1` + * Running the model in online inference mode, set `--batch-size` = `1` ``` $ cd /home//models/benchmarks @@ -59,7 +59,7 @@ use in the next step. --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \ --verbose ``` - * Running benchmarks in throughput mode, set `--batch-size` = `1024` + * Running the model in batch inference mode, set `--batch-size` = `1024` ``` $ cd /home//models/benchmarks @@ -77,7 +77,7 @@ use in the next step. ``` 6. The log file is saved to the value of `--output-dir`. - The tail of the log output when the benchmarking completes should look + The tail of the log output when the script completes should look something like this: ``` diff --git a/benchmarks/recommendation/tensorflow/wide_deep_large_ds/README.md b/benchmarks/recommendation/tensorflow/wide_deep_large_ds/README.md index 89fb2b244..d4fb5fef4 100755 --- a/benchmarks/recommendation/tensorflow/wide_deep_large_ds/README.md +++ b/benchmarks/recommendation/tensorflow/wide_deep_large_ds/README.md @@ -1,13 +1,13 @@ # Wide & Deep -This document has instructions for how to run Wide & Deep benchmark for the +This document has instructions for how to run Wide & Deep for the following modes/precisions: * [Prepare dataset](#Prepare-dataset) * [INT8 inference](#int8-inference-instructions) * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training coming later. +Instructions and scripts for model training coming later. ## Prepare dataset @@ -61,7 +61,7 @@ Benchmarking instructions and scripts for model training coming later. ``` 2. Clone the [intelai/models](https://github.com/intelai/models) repo. - This repo has the launch script for running benchmarks, which we will + This repo has the launch script for running the model, which we will use in the next step. ``` git clone https://github.com/IntelAI/models.git @@ -87,7 +87,7 @@ Benchmarking instructions and scripts for model training coming later. 4. Run Performance test - * Running benchmarks in latency mode, set `--batch-size 1` + * Running in online inference mode, set `--batch-size 1` ``` cd /home//models/benchmarks @@ -105,7 +105,7 @@ Benchmarking instructions and scripts for model training coming later. --data-location /root/user/wide_deep_files/dataset_preprocessed_test.tfrecords \ -- num_parallel_batches=1 ``` - * Running benchmarks in throughput mode, set `--batch-size 512` + * Running in batch inference mode, set `--batch-size 512` ``` cd /home//models/benchmarks @@ -121,7 +121,7 @@ Benchmarking instructions and scripts for model training coming later. --in-graph /root/user/wide_deep_files/wide_deep_int8_pretrained_model.pb \ --data-location /root/user/wide_deep_files/dataset_preprocessed_test.tfrecords ``` - * The log file is saved to the value of `--output-dir`. The tail of the log output when the benchmarking completes + * The log file is saved to the value of `--output-dir`. The tail of the log output when the script completes should look something like this: ``` -------------------------------------------------- @@ -146,7 +146,7 @@ Benchmarking instructions and scripts for model training coming later. ``` 2. Clone the [intelai/models](https://github.com/intelai/models) repo. - This repo has the launch script for running benchmarks, which we will + This repo has the launch script for running the model, which we will use in the next step. ``` @@ -173,7 +173,7 @@ Benchmarking instructions and scripts for model training coming later. 4. Run Performance test - * Running benchmarks in latency mode, set `--batch-size 1` + * Running in online inference mode, set `--batch-size 1` ``` cd /home//models/benchmarks @@ -191,7 +191,7 @@ Benchmarking instructions and scripts for model training coming later. --data-location /root/user/wide_deep_files/dataset_preprocessed_test.tfrecords \ -- num_parallel_batches=1 ``` - * Running benchmarks in throughput mode, set `--batch-size 512` + * Running in batch inference mode, set `--batch-size 512` ``` cd /home//models/benchmarks @@ -207,7 +207,7 @@ Benchmarking instructions and scripts for model training coming later. --in-graph /root/user/wide_deep_files/wide_deep_fp32_pretrained_model.pb \ --data-location /root/user/wide_deep_files/dataset_preprocessed_test.tfrecords ``` - * The log file is saved to the value of `--output-dir`. The tail of the log output when the benchmarking completes + * The log file is saved to the value of `--output-dir`. The tail of the log output when the script completes should look something like this: ``` -------------------------------------------------- diff --git a/benchmarks/text_to_speech/tensorflow/wavenet/README.md b/benchmarks/text_to_speech/tensorflow/wavenet/README.md index 49c1a47fb..fa193aa07 100644 --- a/benchmarks/text_to_speech/tensorflow/wavenet/README.md +++ b/benchmarks/text_to_speech/tensorflow/wavenet/README.md @@ -4,15 +4,15 @@ This document has instructions for how to run WaveNet for the following modes/precisions: * [FP32 inference](#fp32-inference-instructions) -Benchmarking instructions and scripts for model training and inference -other precisions are coming later. +Instructions and scripts for model training and inference +for other precisions are coming later. ## FP32 Inference Instructions 1. Clone the [tensorflow-wavenet](https://github.com/ibab/tensorflow-wavenet) repo and get pull request #352 for the CPU optimizations. The path to the cloned repo will be passed as the model source directory when -running the benchmarking script. +running the launch script. ``` $ git clone https://github.com/ibab/tensorflow-wavenet.git @@ -39,7 +39,7 @@ $ pwd ``` 2. Clone this [intelai/models](https://github.com/intelai/models) repo. -This repo has the launch script for running benchmarking, as well as +This repo has the launch script for running the model, as well as checkpoint files for a pre-trained model. After cloning the repo, navigate to the benchmarks directory, which is where the launch script is located. @@ -56,7 +56,7 @@ $ cd models/benchmarks $ tar -xvf wavenet_fp32_pretrained_model.tar.gz ``` -4. Start benchmarking by executing the launch script and passing args +4. Start a model run by executing the launch script and passing args specifying that we are running wavenet fp32 inference using TensorFlow, along with a dockerfile that includes Intel Optimizations for TensorFlow and the path to the model source dir (from step 1) and the checkpoint diff --git a/docs/general/tensorflow/GeneralBestPractices.md b/docs/general/tensorflow/GeneralBestPractices.md index e71648442..98ced496a 100644 --- a/docs/general/tensorflow/GeneralBestPractices.md +++ b/docs/general/tensorflow/GeneralBestPractices.md @@ -9,10 +9,10 @@ Please see the [install guide](https://software.intel.com/en-us/articles/intel-o ## Performance Metrics -* **Throughput** measures how many input tensors can be processed per second with batches of size greater than one. -Typically for maximum throughput, optimal performance is achieved by exercising all the physical cores on a socket. -* **Latency** (also called real-time inference) is a measurement of the time it takes to process a single input tensor, i.e. a batch of size one. -In a real-time inference scenario, optimal latency is achieved by minimizing thread launching and orchestration between concurrent processes. +* **Batch Inference** measures how many input tensors can be processed per second with batches of size greater than one. +Typically for batch inference, optimal performance is achieved by exercising all the physical cores on a socket. +* **Online Inference** (also called real-time inference) is a measurement of the time it takes to process a single input tensor, i.e. a batch of size one. +In a real-time inference scenario, optimal performance is achieved by minimizing thread launching and orchestration between concurrent processes. This guide will help you set your TensorFlow runtime options for good balanced performance over both metrics. However, if you want to prioritize one metric over the other or further tune Tensorflow for your specific model, please see the tutorials. A link to these can be found in the [Model Zoo docs readme](/docs/README.md). diff --git a/docs/general/tensorflow/LaunchBenchmark.md b/docs/general/tensorflow/LaunchBenchmark.md index 082bcbc8f..ad358b6aa 100644 --- a/docs/general/tensorflow/LaunchBenchmark.md +++ b/docs/general/tensorflow/LaunchBenchmark.md @@ -10,7 +10,7 @@ Below the general description is an [index of links](#model-scripts-for-tensorfl ## How it Works 1. The script [`launch_benchmark.py`](/benchmarks/launch_benchmark.py) pulls a docker image specified by the script's `--docker-image` argument and runs a container. - [Here](#launch_benchmarkpy-flags) is the full list of available flags. To run benchmarking without a docker container, + [Here](#launch_benchmarkpy-flags) is the full list of available flags. To run a model without a docker container, see the [bare metal instructions](#alpha-feature-running-on-bare-metal). 2. The container's entrypoint script [`start.sh`](/benchmarks/common/tensorflow/start.sh) installs required dependencies, e.g. python packages and `numactl`, and sets the PYTHONPATH environment variable to point to the required dependencies. [Here](#startsh-flags) is the full list of available flags. @@ -57,14 +57,14 @@ optional arguments: -mo {training,inference}, --mode {training,inference} Specify the type training or inference -m MODEL_NAME, --model-name MODEL_NAME - model name to run benchmarks for + model name to run -b BATCH_SIZE, --batch-size BATCH_SIZE Specify the batch size. If this parameter is not specified or is -1, the largest ideal batch size for the model will be used -d DATA_LOCATION, --data-location DATA_LOCATION Specify the location of the data. If this parameter is - not specified, the benchmark will use random/dummy + not specified, the script will use random/dummy data. -i SOCKET_ID, --socket-id SOCKET_ID Specify which socket to use. Only one socket will be @@ -91,12 +91,12 @@ optional arguments: written to this location. If mode=inference assumes that the location points to a model that has already been trained. - -k, --benchmark-only For benchmark measurement only. If neither + -k, --benchmark-only For performance measurement only. If neither --benchmark-only or --accuracy-only are specified, it - will default to run benchmarking. + will default to run for performance. --accuracy-only For accuracy measurement only. If neither --benchmark- only or --accuracy-only are specified, it will default - to run benchmarking. + to run for performance. --output-results Writes inference output to a file, when used in conjunction with --accuracy-only and --mode=inference. --output-dir OUTPUT_DIR @@ -109,7 +109,7 @@ optional arguments: ## Alpha feature: Running on bare metal We recommend using [Docker](https://www.docker.com) to run the -benchmarking scripts, as that provides a consistent environment where +model scripts, as that provides a consistent environment where the script can install all the necessary dependencies to run the models in this repo. For this reason, the tutorials and model README files provide instructions on how to run the model in a Docker container. @@ -122,8 +122,8 @@ Since the `launch_benchmark.py` is intended to run in an Ubuntu-based Docker container, running on bare metal also will only work when running on Ubuntu. -Before running benchmarking, you must also install all the dependencies -that are required to run the model. +Before running a model, you must also install all the dependencies +that are required to run that model. Basic requirements for running all models include: * python (If the model's README file specifies to use a python3 TensorFlow docker image, then use python 3 on bare metal, otherwise use python 2.7) @@ -160,7 +160,7 @@ you previously used with docker, you may need to change the owner on your log directory, or run with `sudo` in order for the `tee` commands writing to the log file to work properly. -For example, in order to run ResNet50 FP32 benchmarking on bare metal, +For example, in order to run ResNet50 FP32 on bare metal, the following command can be used: ``` diff --git a/docs/general/tensorflow_serving/GeneralBestPractices.md b/docs/general/tensorflow_serving/GeneralBestPractices.md index 887613736..2597f1b8d 100644 --- a/docs/general/tensorflow_serving/GeneralBestPractices.md +++ b/docs/general/tensorflow_serving/GeneralBestPractices.md @@ -11,10 +11,10 @@ but the following information will help you get started. ## Performance Metrics -* **Throughput** measures how many input tensors can be processed per second with batches of size greater than one. -Typically for maximum throughput, optimal performance is achieved by exercising all the physical cores on a socket. -* **Latency** (also called real-time inference) is a measurement of the time it takes to process a single input tensor, i.e. a batch of size one. -In a real-time inference scenario, optimal latency is achieved by minimizing thread launching and orchestration between concurrent processes. +* **Batch Inference** measures how many input tensors can be processed per second with batches of size greater than one. +Typically for batch inference, optimal performance is achieved by exercising all the physical cores on a socket. +* **Online Inference** (also called real-time inference) is a measurement of the time it takes to process a single input tensor, i.e. a batch of size one. +In a real-time inference scenario, optimal performance is achieved by minimizing thread launching and orchestration between concurrent processes. This guide will help you set your TensorFlow model server options for good balanced performance over both metrics. However, if you want to prioritize one metric over the other or further tune TensorFlow Serving for your specific model, see the [tutorials](/docs#tutorials-by-use-case). diff --git a/docs/general/tensorflow_serving/InstallationGuide.md b/docs/general/tensorflow_serving/InstallationGuide.md index 3c6b84c2f..2ef0489eb 100644 --- a/docs/general/tensorflow_serving/InstallationGuide.md +++ b/docs/general/tensorflow_serving/InstallationGuide.md @@ -220,7 +220,7 @@ $ curl -s http://download.tensorflow.org/models/official/20181001_resnet/savedmo ``` Prediction class: 286, avg latency: 34.7315 ms ``` - **Note:** The real avg latency you see will depend on your hardware, environment, and whether or not you have configured the server parameters optimally. See the [General Best Practices](GeneralBestPractices.md) for more information. + **Note:** The real performance you see will depend on your hardware, environment, and whether or not you have configured the server parameters optimally. See the [General Best Practices](GeneralBestPractices.md) for more information. * After you are fininshed with querying, you can stop the container which is running in the background. To restart the container with the same name, you need to stop and remove the container from the registry. To view your running containers run `docker ps`. ``` diff --git a/docs/image_recognition/quantization/Tutorial.md b/docs/image_recognition/quantization/Tutorial.md index 040668765..72409a1db 100644 --- a/docs/image_recognition/quantization/Tutorial.md +++ b/docs/image_recognition/quantization/Tutorial.md @@ -11,8 +11,8 @@ Content: ## Goal Post-training model quantization and optimization objective is to: * Reduce the model size, -* Run faster inference (less latency), -* Maintain the model performance (throughput and accuracy). +* Run faster online inference (batch size = 1), +* Maintain the model performance (larger batch inference and accuracy). This is highly recommended in the case of mobile applications and systems of constrained memory and processing power. Usually, there will be some loss in performance, but it has to be within the [acceptable range](#performance-evaluation). @@ -210,18 +210,18 @@ Validating the model performance is required after each step to verify if the ou * The quantized `Int8` graph accuracy should not drop more than ~0.5-1%. -This section explains how to run ResNet50 inference and calculate the model accuracy using [Intel Model Zoo Benchmarks](https://github.com/IntelAI/models). +This section explains how to run ResNet50 inference and calculate the model accuracy using the [Intel Model Zoo](https://github.com/IntelAI/models). Clone the [IntelAI/models](https://github.com/IntelAI/models) repository, and follow the [documented steps](/benchmarks/image_recognition/tensorflow/resnet50/README.md#int8-inference-instructions) -to benchmark `ResNet50` inference performance for both FP32 and Int8 cases. +to run `ResNet50` inference performance for both FP32 and Int8 cases. -**Note that the benchmarking script should be run outside of the quantization docker container -and that some inputs to the benchmarking script are slightly different for `FP32` and `Int8` models (i.e. `--precision` and `--docker-image`).** +**Note that the script should be run outside of the quantization docker container +and that some inputs to the script are slightly different for `FP32` and `Int8` models (i.e. `--precision` and `--docker-image`).** ### Accuracy for FP32 Optimized Graph -Clone the [IntelAI/models](https://github.com/IntelAI/models) repository and follow the steps to run the FP32 benchmark +Clone the [IntelAI/models](https://github.com/IntelAI/models) repository and follow the steps to run the FP32 script to calculate `accuracy` and use the optimized FP32 graph in `--in-graph`. ``` $ git clone https://github.com/IntelAI/models.git @@ -251,7 +251,7 @@ The tail of the log output when the accuracy run completes should look something ### Accuracy for Int8 Optimized Graph -Clone the [IntelAI/models](https://github.com/IntelAI/models) repository and follow the steps to run the Int8 benchmark +Clone the [IntelAI/models](https://github.com/IntelAI/models) repository and follow the steps to run the Int8 script to calculate `accuracy` and use the Int8 graph in `--in-graph`. ``` $ git clone https://github.com/IntelAI/models.git diff --git a/docs/image_recognition/tensorflow/Tutorial.md b/docs/image_recognition/tensorflow/Tutorial.md index 5088a2ac2..235ea4109 100644 --- a/docs/image_recognition/tensorflow/Tutorial.md +++ b/docs/image_recognition/tensorflow/Tutorial.md @@ -162,7 +162,7 @@ but if you choose to set your own options, refer to the full list of available f explanation of the ```launch_benchmark.py``` script [here](/docs/general/tensorflow/LaunchBenchmark.md). This step will automatically launch a new container on every run and terminate. Go to the [Step 4](#step_4) to interactively run the script on the container. -3.1. *Real Time inference*(batch_size=1 for latency) +3.1. *Online inference*(or real-time inference, batch_size=1) 3.1.1 ResNet50 @@ -255,7 +255,7 @@ Note: As per the recommended settings `socket-id` is set to 0 for InceptionV3. T --socket-id 0 \ --docker-image intelaipg/intel-optimized-tensorflow:latest -3.2. *Best Throughput inference*(batch_size=128 for throughput) +3.2. *Best Batch inference*(batch_size=128) 3.2.1 ResNet50 @@ -368,7 +368,7 @@ Note: As per the recommended settings `socket-id` is set to 0 for InceptionV3. T The logs are captured in a directory outside of the container.
-4. If you want to run the benchmarking script interactively within the docker container, run ```launch_benchmark.py``` with ```--debug``` flag. This will launch a docker container based on the ```--docker_image```, +4. If you want to run the model script interactively within the docker container, run ```launch_benchmark.py``` with ```--debug``` flag. This will launch a docker container based on the ```--docker_image```, performs necessary installs, runs the ```launch_benchmark.py``` script and does not terminate the container process. As an example, this step will demonstrate ResNet50 Real Time inference on Synthetic Data use case, you can implement the same strategy on different use cases demoed in Step 3. @@ -389,7 +389,7 @@ you can implement the same strategy on different use cases demoed in Step 3. lscpu located here: b'/usr/bin/lscpu' root@a78677f56d69:/workspace/benchmarks/common/tensorflow# -To rerun the bechmarking script, execute the ```start.sh``` bash script from your existing directory with additional or modified flags. For e.g to rerun with the best max throughput (batch size=128) settings run with ```BATCH_SIZE``` +To rerun the bechmarking script, execute the ```start.sh``` bash script from your existing directory with additional or modified flags. For e.g to rerun with the best batch inference (batch size=128) settings run with ```BATCH_SIZE``` and to skip the run from reinstalling packages pass ```True``` to ```NOINSTALL```. chmod +x ./start.sh diff --git a/docs/image_recognition/tensorflow_serving/Tutorial.md b/docs/image_recognition/tensorflow_serving/Tutorial.md index f7c325686..0c9ad527f 100644 --- a/docs/image_recognition/tensorflow_serving/Tutorial.md +++ b/docs/image_recognition/tensorflow_serving/Tutorial.md @@ -1,5 +1,5 @@ # Image Recognition with TensorFlow Serving on CPU -### Real-time and Max Throughput Inference +### Online and Batch Inference Models: ResNet50, InceptionV3 ## Goal @@ -24,18 +24,18 @@ The Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) offer Tuning TensorFlow Serving to take full advantage of your hardware for image recognition deep learning inference involves: 1. Working through this tutorial to set up servable versions of the well-known [ResNet50](https://arxiv.org/pdf/1512.03385.pdf) and [InceptionV3](https://arxiv.org/pdf/1512.00567v1.pdf) CNN models 2. Running a TensorFlow Serving docker container configured for performance given your hardware resources -3. Running a client script to measure latency and throughput +3. Running a client script to measure online and batch inference performance 4. Experimenting with the TensorFlow Serving settings on your own to further optimize for your model and use case ## Hands-on Tutorial - ResNet50 or InceptionV3 -For steps 1 and 2, refer to the Intel Model Zoo FP32 benchmarks: +For steps 1 and 2, refer to the Intel Model Zoo FP32 READMEs: * [ResNet50 README](/benchmarks/image_recognition/tensorflow/resnet50#fp32-inference-instructions) * [InceptionV3 README](/benchmarks/image_recognition/tensorflow/inceptionv3#fp32-inference-instructions) 1. **Download the Model**: Download and extract the ResNet50 or InceptionV3 pre-trained model (FP32), using the instructions in one of the READMEs above. -2. **(Optional) Download Data**: If you are interested only in testing latency and throughput, not accuracy, you can skip this step and use synthetic data. +2. **(Optional) Download Data**: If you are interested only in testing performance, not accuracy, you can skip this step and use synthetic data. If you want to verify prediction accuracy by testing on real data, follow the instructions in one of the READMEs above to download the ImageNet dataset. 3. **Clone this repository**: Clone the [intelai/models](https://github.com/intelai/models) repository and `cd` into the `docs/image_recognition/tensorflow_serving/src` directory. @@ -114,8 +114,8 @@ For steps 1 and 2, refer to the Intel Model Zoo FP32 benchmarks: ``` The output should be a tensor of class probabilities and `Predicted class: 286`. -9. **Real-time inference**: Real-time inference is measured by latency and is usually defined as batch size 1. - To see average inference latency (in ms), run the benchmark script `image_recognition_benchmark.py` using batch_size 1: +9. **Online inference**: Online (or real-time) inference is usually defined as the time it takes to return a prediction for batch size 1. + To see average online inference performance (in ms), run the script `image_recognition_benchmark.py` using batch_size 1: ``` (venv)$ python image_recognition_benchmark.py --batch_size 1 --model inceptionv3 Iteration 1: 0.017 sec @@ -152,8 +152,8 @@ For steps 1 and 2, refer to the Intel Model Zoo FP32 benchmarks: tensorflow/serving:mkl ``` -10. **Maximum throughput**: Regardless of hardware, the best batch size for throughput is 128. - To see average throughput (in images/sec), run the benchmark script `image_recognition_benchmark.py` using batch_size 128: +10. **Batch inference**: Regardless of hardware, the best batch size is 128. + To see average batch inference performance (in images/sec), run the script `image_recognition_benchmark.py` using batch_size 128: ``` (venv)$ python image_recognition_benchmark.py --batch_size 128 --model inceptionv3 Iteration 1: 1.706 sec @@ -175,7 +175,7 @@ You have now seen two end-to-end examples of serving an image recognition model 1. How to create a SavedModel from a TensorFlow model graph 2. How to choose good values for the performance-related runtime parameters exposed by the `docker run` command 3. How to verify that the served model can correctly classify an image using a GRPC client -4. How to benchmark latency and throughput metrics using a GRPC client +4. How to measure online and batch inference metrics using a GRPC client With this knowledge and the example code provided, you should be able to get started serving your own custom image recognition model with good performance. diff --git a/docs/object_detection/tensorflow_serving/Tutorial.md b/docs/object_detection/tensorflow_serving/Tutorial.md index 479a34aea..dfb3d724f 100644 --- a/docs/object_detection/tensorflow_serving/Tutorial.md +++ b/docs/object_detection/tensorflow_serving/Tutorial.md @@ -5,10 +5,10 @@ This tutorial will introduce you to the CPU performance considerations for object detection in deep learning models and how to use [Intel® Optimizations for TensorFlow Serving](https://www.tensorflow.org/serving/) to improve inference time on CPUs. This tutorial uses a pre-trained Region-based Fully Convolutional Network (R-FCN) model for object detection and provides sample code that you can use to get your optimized TensorFlow model server and REST client up and running quickly. In this tutorial using R-FCN, you will measure inference performance in two situations: -* **Real-Time**, where batch_size=1. In this case, lower latency means better runtime performance. -* **Throughput**, where batch_size>1. In this case, higher throughput means better runtime performance. +* **Online inference**, where batch_size=1. In this case, lower time to result means better runtime performance. +* **Batch inference**, where batch_size>1. In this case, a higher number means better runtime performance. -**NOTE about REST vs. GRPC**: This tutorial is focused on optimizing the model server, not the client that sends requests. For optimal client-side serialization and de-serialization, you may want to use TensorFlow Serving's GRPC option instead of the REST API, especially if you are optimizing for maximum throughput (here is one [article](https://medium.com/@avidaneran/tensorflow-serving-rest-vs-grpc-e8cef9d4ff62) with a relevant analysis). +**NOTE about REST vs. GRPC**: This tutorial is focused on optimizing the model server, not the client that sends requests. For optimal client-side serialization and de-serialization, you may want to use TensorFlow Serving's GRPC option instead of the REST API, especially if you are optimizing for batch inference (here is one [article](https://medium.com/@avidaneran/tensorflow-serving-rest-vs-grpc-e8cef9d4ff62) with a relevant analysis). We use REST in this tutorial for illustration, not as a best practice, and offer another [tutorial](/docs/image_recognition/tensorflow_serving/Tutorial.md) that illustrates the use of GRPC with TensorFlow Serving. ## Prerequisites @@ -25,7 +25,7 @@ This tutorial assumes you have already: [Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)](https://github.com/intel/mkl-dnn) offers significant performance improvements for convolution, pooling, normalization, activation, and other operations for object detection, using efficient vectorization and multi-threading. Tuning TensorFlow Serving to take full advantage of your hardware for object detection deep learning inference involves: 1. Running a TensorFlow Serving docker container configured for performance given your hardware resources -2. Running a REST client notebook to verify object detection and measure latency and throughput +2. Running a REST client notebook to verify object detection and measure online and batch inference performance 3. Experimenting with the TensorFlow Serving settings on your own to further optimize for your model and use case ## Hands-on Tutorial with pre-trained R-FCN model @@ -134,7 +134,7 @@ To optimize overall performance, use the following recommended settings from the **Note**: For some models, playing around with these settings values can improve performance even further. We recommend that you experiment with your own hardware and model if you have strict performance requirements. -6. **Benchmark Real-Time and Throughput performance**: Clone the Intel Model Zoo into a directory called `intel-models` and run `rfcn-benchmark.py` [python script](/docs/object_detection/tensorflow_serving/rfcn-benchmark.py), which will benchmark both Real-Time and Throughput performance. +6. *Measure Online and Batch inference performance**: Clone the Intel Model Zoo into a directory called `intel-models` and run `rfcn-benchmark.py` [python script](/docs/object_detection/tensorflow_serving/rfcn-benchmark.py), which will test both Online and Batch performance. ``` (rfcn_venv)$ git clone https://github.com/IntelAI/models.git intel-models (rfcn_venv)$ python intel-models/docs/object_detection/tensorflow_serving/rfcn-benchmark.py \ @@ -186,7 +186,7 @@ For example, with a GCP VM, add `--ssh-flag="-L 8888:localhost:8888"` to your ss You have now seen an end-to-end example of serving an object detection model for inference using TensorFlow Serving, and learned: 1. How to choose good values for the performance-related runtime parameters exposed by the `docker run` command 2. How to verify that the served model can correctly detect objects in an image using a sample Jupyter notebook -3. How to benchmark latency and throughput metrics using a REST client +3. How to measure online and batch inference metrics using a REST client With this knowledge and the example code provided, you should be able to get started serving your own custom object detection model with good performance. If desired, you should also be able to investigate a variety of different settings combinations to see if further performance improvement are possible. diff --git a/docs/recommendation/tensorflow/Tutorial.md b/docs/recommendation/tensorflow/Tutorial.md index f814daac1..aa33a6643 100644 --- a/docs/recommendation/tensorflow/Tutorial.md +++ b/docs/recommendation/tensorflow/Tutorial.md @@ -5,7 +5,7 @@ This tutorial will introduce CPU performance considerations for the popular [Wide and Deep](https://arxiv.org/abs/1606.07792) model to solve recommendation system problems and how to tune run-time parameters to maximize performance using Intel® Optimizations for TensorFlow. This tutorial also includes a hands-on demo on Intel Model Zoo's Wide and Deep pretrained model built using a dataset from [Kaggle's Display Advertising Challenge](http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/) -to run real-time and max throughput inference. +to run online (real-time) and batch inference. ## Background Google's latest innovation to solve some of the shortcomings in traditional recommendation systems is the @@ -115,7 +115,7 @@ cd ~/wide_deep_files wget https://storage.googleapis.com/intel-optimized-tensorflow/models/wide_deep_fp32_pretrained_model.pb ``` -Refer to the Wide and Deep benchmarks [README](/benchmarks/recommendation/tensorflow/wide_deep_large_ds) to get the latest location of the pretrained model. +Refer to the Wide and Deep [README](/benchmarks/recommendation/tensorflow/wide_deep_large_ds) to get the latest location of the pretrained model. 3. Install [Docker](https://docs.docker.com/v17.09/engine/installation/) since the tutorial runs on a Docker container. @@ -170,7 +170,7 @@ but if you choose to set your own options, refer to the full list of available f explanation of the ```launch_benchmark.py``` script [here](/docs/general/tensorflow/LaunchBenchmark.md). This step will automatically launch a new container on every run and terminate. Go to [Step 4](#step_4) to interactively run the script in the container. -    3.1. *Real Time Inference* (batch_size=1 for latency) +    3.1. *Online Inference* (also called real-time inference, batch_size=1) Note: As per the recommended settings `socket-id` is set to -1 to run on all sockets. Set this parameter to a socket id to run the workload on a single socket. @@ -188,7 +188,7 @@ Set this parameter to a socket id to run the workload on a single socket. --data-location ~/models/models/eval_preprocessed.tfrecords \ --verbose -    3.2. *Max Throughput Inference* (batch_size=512 for throughput) +    3.2. *Batch Inference* (batch_size=512) Note: As per the recommended settings `socket-id` is set to -1 to run on all sockets. Set this parameter to a socket id to run the workload on a single socket. @@ -281,7 +281,7 @@ perform necessary installs, run the ```launch_benchmark.py``` script, and does n lscpu located here: b'/usr/bin/lscpu' root@a78677f56d69:/workspace/benchmarks/common/tensorflow# -To rerun the benchmarking script, execute the ```start.sh``` bash script from your existing directory with additional or modified flags. For example, to rerun with the best max throughput (batch size=512) settings, run with ```BATCH_SIZE``` +To rerun the model script, execute the ```start.sh``` bash script from your existing directory with additional or modified flags. For example, to rerun with the best batch inference (batch size=512) settings, run with ```BATCH_SIZE``` and to skip the run from reinstalling packages pass ```True``` to ```NOINSTALL```. chmod +x ./start.sh @@ -290,9 +290,9 @@ and to skip the run from reinstalling packages pass ```True``` to ```NOINSTALL`` All other flags will be defaulted to values passed in the first ```launch_benchmark.py``` that starts the container. [See here](/docs/general/tensorflow/LaunchBenchmark.md) to get the full list of flags. -5. Inference benchmarking on a large dataset (optional) +5. Inference on a large dataset (optional) -To run inference benchmarking on a large dataset, download the test dataset in `~/wide_deep_files/real_dataset`. Note that this dataset supports only `benchmark-only` flag. +To run inference on a large dataset, download the test dataset in `~/wide_deep_files/real_dataset`. Note that this dataset supports only `benchmark-only` flag. ``` cd ~/wide_deep_files/real_dataset @@ -330,7 +330,7 @@ Untar the file to create three files: ``` - Exit the docker container and find the processed dataset `test_preprocessed.tfrecords` in the location `~/models/models`. -    5.1. *Max Throughput or Real-Time Inference* +    5.1. *Batch or Online Inference* cd ~/models/benchmarks @@ -346,4 +346,4 @@ Untar the file to create three files: --data-location ~/models/models/test_preprocessed.tfrecords \ --verbose -Set batch_size to 1 to run for real-time inference +Set batch_size to 1 to run for online (real-time) inference