Skip to content

Utilities to perform deep learning models benchmarking (number of parameters, FLOPS and inference latency)

License

Notifications You must be signed in to change notification settings

roatienza/benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

benchmark

Utilities to perform deep learning models benchmarking.

Model inference efficiency is a big concern in deploying deep learning models. Efficiency is quantified as the Pareto-optimality of the target metric (eg accuracy) and model number of parameters, computational complexity like FLOPS and latency. benchmark is a tool to compute parameters, FLOPS and latency. The sample usage below shows how to determine the number of parameters and FLOPS. Also indicated are the different latency improvements as a function of accelerator and model format. The fastest is when both ONNX and TensorRT are utilized.

FLOPS, Parameters and Latency of ResNet18

Experiment performed on GPU: Quadro RTX 6000 24GB, CPU: AMD Ryzen Threadripper 3970X 32-Core Processor. Assuming 1k classes, 224x224x3 image and batch size of 1.

FLOPS: 1,819,065,856
Parameters: 11,689,512
Accelerator Latency (usec) Speed up (x)
CPU 8,550 1
CPU + ONNX 3,830 2.7
GPU 1,982 5.4
GPU + ONNX 1,218 8.8
GPU + ONNX + TensorRT 917 11.7

Install requirements

pip3 install -r requirements.txt

Additional packages.

  • CUDA: Remove the old.
conda uninstall cudatoolkit

Update to the new cudnn

conda install cudnn
python3 -m pip install --upgrade setuptools pip
python3 -m pip install nvidia-pyindex
python3 -m pip install --upgrade nvidia-tensorrt
  • (Optional) Torch-tensort
pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases

Warning: need super user access

sudo apt install python3-libnvinfer-dev python3-libnvinfer 

Sample benchmarking of resnet18

  • GPU + ONNX + TensorRT
python3 benchmark.py --model resnet18 --onnx --tensorrt
  • GPU + ONNX
python3 benchmark.py --model resnet18 --onnx
  • GPU
python3 benchmark.py --model resnet18 
  • CPU
python3 benchmark.py --model resnet18  --device cpu
  • CPU + ONNX
python3 benchmark.py --model resnet18 --device cpu --onnx

Compute model accuracy on ImageNet1k

Assuming imagenet dataset folder is /data/imagenet. Else modify the location using --imagenet option.

python3 benchmark.py --model resnet18 --compute-accuracy

List all supported models

All torchvision.models and timm models will be listed:

python3 benchmark.py --list-models

Find a specific model

python3 benchmark.py --find-model xcit_tiny_24_p16_224

Other models

  • Latency in usec
Accelerator R50 MV2 MV3 SV2 Sq SwV2 De Ef0 CNext RN4X RN64X
CPU 29,840 11,870 6,498 6,607 8,717 52,120 14,952 14,089 33,182 11,068 41,301
CPU + ONNX 10,666 2,564 4,484 2,479 3,136 50,094 10,484 8,356 28,055 1,990 14,358
GPU 1,982 4,781 3,689 4,135 1,741 6,963 3,526 5,817 3,588 5,886 6,050
GPU + ONNX 2,715 1,107 1,128 1,392 851 3,731 1,650 2,175 2,789 1,525 3,280
GPU + ONNX + TensorRT 1,881 670 570 404 443 3,327 1,170 1,250 2,630 1,137 2,283

R50 - resnet50, MV2 - mobilenet_v2, MV3 - mobilenet_v3_small, SV2 - shufflenet_v2_x0_5, Sq - squeezenet1_0, SwV2 - swinv2_cr_tiny_ns_224, De - deit_tiny_patch16_224, Ef0 - efficientnet_b0 , CNext - convnext_tiny, RN4X - regnetx_004 , RN64X - regnetx_064

  • Parameters and FLOPS
Model Parameters (M) GFLOPS Top1 (%) Top5 (%)
resnet18 11.7 1.8 69.76 89.08
resnet50 25.6 4.1 80.11 94.49
mobilenet_v2 3.5 0.3 71.87 90.29
mobilenet_v3_small 2.5 0.06 67.67 87.41
shufflenet_v2_x0_5 1.4 0.04 60.55 81.74
squeezenet1_0 1.2 0.8 58.10 80.42
swinv2_cr_tiny_ns_224 28.3 4.7 81.54 95.77
deit_tiny_patch16_224 5.7 1.3 72.02 91.10
efficientnet_b0 5.3 0.4 77.67 93.58
convnext_tiny 28.6 4.5 82.13 95.95
regnetx_004 5.2 0.4 72.30 90.59
regnetx_064 26.2 6.5 78.90 94.44

About

Utilities to perform deep learning models benchmarking (number of parameters, FLOPS and inference latency)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages