-
-
Notifications
You must be signed in to change notification settings - Fork 56.7k
OpenCV 5 DNN Benchmarks
Abhishek Gola edited this page Jun 5, 2026
·
4 revisions
This page holds the complete CPU inference benchmarks for the new OpenCV 5.x DNN engine, compared against ONNX Runtime. For the summary and context, see the Performance section of the OpenCV 5 page.
- All numbers are inference time in milliseconds (lower is better).
- Columns:
- OpenCV — the new OpenCV 5.x DNN engine.
- ONNX Runtime (1.25.1) — the official ONNX Runtime binaries published on the project's GitHub releases page.
-
errormarks a model that failed to run in that configuration;—marks a measurement that is not yet available.
| Model | OpenCV | ONNX Runtime (1.25.1) |
|---|---|---|
| BiRefNet | 7178 | 9503.1 |
| BlazeFace | 1.11 | 1.03 |
| depth_anything | 135.35 | 126.5 |
| Dinov2 | 21 | 26.5 |
| Dinov3 | 16 | 14.5 |
| NAFNet | 1826 | 1518.85 |
| NanoTrack | 0.67 | 0.71 |
| owlv2 | 1074.73 | 1542.6 |
| ppocr | 152.9 | 161.8 |
| RAFT | 1193 | 685.2 |
| RealEsrGAN | 488 | 580 |
| RetinaFACE | 23.63 | 21 |
| RTMPose | 6.1 | 4.8 |
| SAM2 ENCODER | 2350.57 | 2280.78 |
| SAM2 DECODER | 22.77 | 10.94 |
| SAM3 (Encoder) | 7753.16 | 6194.47 |
| SAMURAI (Encoder) | 1104.8 | 1074.7 |
| SAMURAI (Decoder) | 22 | 30.7 |
| Segformer | 172 | 152.6 |
| SigLip | 23.67 | 26.28 |
| SwinIR | 225.7 | 164.7 |
| trOCR (Encoder) | 120 | 106.8 |
| trOCR (Decoder) | 47.3 | 38.9 |
| U2 Net | 105 | 100 |
| X-Feat | 6.2 | 10.3 |
| yolo26m_seg | 110.2 | 105 |
| yolo26n_seg | 17 | 15.9 |
| yolo26n | 9.9 | 9.6 |
| yunet | 4.04 | 2.34 |
| RF-DETR | 93.3 | 97.4 |
| RT-DETR | 118.6 | 99.9 |
| Grounding DINO | 1845 | 1862 |
| Fast Neural Style | 14.86 | 40.89 |
| AlexNet_ONNX | 6.67 | 6.23 |
| GoogLeNet_ONNX | 4.04 | 4.58 |
| SqueezeNet_ONNX | 1.12 | 0.86 |
| DenseNet121_ONNX | 15.14 | 10.92 |
| Inception_ONNX | 3.37 | 4.27 |
| ResNet_18_v1 | 3.46 | 3.06 |
| ResNet_50_v1 | 7.28 | 6.78 |
| ResNet50_QDQ | 4.39 | 5.53 |
| MobileNetv2 | 1.35 | 1.06 |
| EfficientNet | 6.32 | 3.49 |
| VIT_B_32 | 13.99 | 12.57 |
| VIT_Base_Patch16_224 | 38.2 | 35.68 |
| DeiT_Tiny_Patch16_224 | 6.41 | 4.89 |
| MobileViT_XS | 5.81 | 5.39 |
| MobileViTv2_100 | 7.07 | 7.09 |
| BEiT_Base_Patch16_224 | 42.36 | 35.83 |
| YOLOv3 | 13.35 | 14.78 |
| YOLOv4 | 106.09 | 179.31 |
| YOLOv4_tiny | 6.94 | 6.47 |
| YOLOv5 | 9.39 | 8.67 |
| YOLOv8 | 12.42 | 12.98 |
| YOLOX | 24.16 | 23.8 |
| MobileNet_SSD_v1_ONNX | 10.56 | 7.15 |
| SFace | 1.92 | 3.87 |
| MPPalm | 1.9 | 2.12 |
| MPHand | 1.94 | 1.32 |
| MPPose | 6.17 | 2.79 |
| PPHumanSeg | 2.16 | 2.45 |
| CRNN | 15.56 | 1.79 |
| VitTrack | 1.31 | 1.04 |
| BERT | 11.33 | 11.71 |
| FacePaint | 422.39 | 478.35 |
| Model | OpenCV | ONNX Runtime (1.25.1) |
|---|---|---|
| BiRefNet | 14649.64 | 18964.991 |
| BlazeFace | 1.657 | 0.9 |
| depth_anything | 324.342 | 332.495 |
| Dinov2 | 40.463 | 72.477 |
| Dinov3 | 34.924 | 27.967 |
| NAFNet | 5231.817 | 3424.311 |
| NanoTrack | 1.71 | 1.12 |
| owlv2 | 2761.63 | 3809.29 |
| ppocr | 340.8 | 235 |
| RAFT | error | 1041.5 |
| RealEsrGAN | 1664.339 | 1394.293 |
| RetinaFACE | 73.02 | 77.54 |
| RTMPose | 15.723 | 12.87 |
| SAM2 ENCODER | 5669.38 | 7261.39 |
| SAM2 DECODER | 52.03 | 30.29 |
| SAM3 (Encoder) | 20133.3 | 15859.25 |
| SAMURAI (Encoder) | 2457.7 | 2222.2 |
| SAMURAI (Decoder) | 48.3 | 24.1 |
| Segformer | 503.91 | 453.72 |
| SigLip | 91.35 | 72.51 |
| SwinIR | 688.242 | 485.465 |
| trOCR (Encoder) | 234.3 | 217.3 |
| trOCR (Decoder) | 94.4 | 75.9 |
| U2 Net | 306.478 | 249.562 |
| X-Feat | 14.358 | 22.187 |
| yolo26m_seg | 373.5 | 391.9 |
| yolo26n_seg | 43.7 | 43.4 |
| yolo26n | 24.9 | 19 |
| yunet | 10.7 | 4.68 |
| RF-DETR | 226.6 | 227.4 |
| RT-DETR | 313.1 | 243.8 |
| Grounding DINO | 4277.66 | 4459.09 |
| Fast Neural Style | 43.759 | 75.517 |
| AlexNet_ONNX | 14.08 | 14.88 |
| GoogLeNet_ONNX | 10.44 | 11.59 |
| SqueezeNet_ONNX | 3.58 | 1.6 |
| DenseNet121_ONNX | 35.03 | 17.71 |
| Inception_ONNX | 11.02 | 10.91 |
| ResNet_18_v1 | 10.34 | 6.15 |
| ResNet_50_v1 | 18.51 | 13.28 |
| ResNet50_QDQ | 64.39 | 11.79 |
| MobileNetv2 | 3.51 | 2.11 |
| EfficientNet | 14.03 | 7.01 |
| VIT_B_32 | 33.83 | 33.12 |
| VIT_Base_Patch16_224 | 76 | 69.1 |
| DeiT_Tiny_Patch16_224 | 11.02 | 7.4 |
| MobileViT_XS | 12.13 | 10.32 |
| MobileViTv2_100 | 17.09 | 13.55 |
| BEiT_Base_Patch16_224 | 80.89 | 85.87 |
| YOLOv3 | 38.8 | 31.09 |
| YOLOv4 | 331.39 | 794.21 |
| YOLOv4_tiny | 20.18 | 21.35 |
| YOLOv5 | 22.86 | 16.44 |
| YOLOv8 | 28.82 | 25.54 |
| YOLOX | 68.33 | 83.79 |
| MobileNet_SSD_v1_ONNX | 29.75 | 15.64 |
| SFace | 5.1 | 9.15 |
| MPPalm | 4.46 | 3.4 |
| MPHand | 4.39 | 1.27 |
| MPPose | 11.73 | 5.37 |
| PPHumanSeg | 6.43 | 3.95 |
| CRNN | 33.08 | 2.99 |
| VitTrack | 3.38 | 1.1 |
| BERT | 23.92 | 18.98 |
| FacePaint | 877.13 | 1221.76 |
| Model | OpenCV | ONNX Runtime (1.25.1) |
|---|---|---|
| BiRefNet | 32483.861 | 26350.284 |
| BlazeFace | 1.755 | 0.669 |
| depth_anything | 398.525 | 554.114 |
| Dinov2 | 98.259 | 112.244 |
| Dinov3 | 86.7 | 82.6 |
| NAFNet | 3730 | 3809.2 |
| NanoTrack | 2.4 | 2.32 |
| owlv2 | 7635.68 | 9513.23 |
| ppocr | 323.7 | 273.9 |
| RAFT | 643.9 | — |
| RealEsrGAN | 1868.913 | 2771.789 |
| RetinaFACE | 166.67 | 217.91 |
| RTMPose | 33.3 | 45.1 |
| SAM2 ENCODER | 7454.963 | 8555.085 |
| SAM2 DECODER | 42.826 | 25.444 |
| SAM3 (Encoder) | error | 7222.96 |
| SAMURAI (Encoder) | 2717.2 | 3376.2 |
| SAMURAI (Decoder) | 43.5 | 28.7 |
| Segformer | 448.4 | 523 |
| SigLip | 125.5 | 130.6 |
| SwinIR | 994 | 1006.5 |
| trOCR (Encoder) | 353.4 | 409 |
| trOCR (Decoder) | 95.7 | 107.3 |
| U2 Net | 816.039 | 1005.142 |
| X-Feat | 38.21 | 36.017 |
| yolo26m_seg | 974.3 | 950.1 |
| yolo26n_seg | 85.6 | 107.5 |
| yolo26n | 55.6 | 63.4 |
| yunet | 11.95 | 12.82 |
| RF-DETR | 297.8 | 301.8 |
| RT-DETR | 408.4 | 363.8 |
| Grounding DINO | 10319.89 | 3941.3 |
| Fast Neural Style | 49.761 | 71.228 |
| AlexNet_ONNX | 10.29 | 11.89 |
| GoogLeNet_ONNX | 15.05 | 17.52 |
| SqueezeNet_ONNX | 3.8 | 3.53 |
| DenseNet121_ONNX | 38.23 | 32.36 |
| Inception_ONNX | 14.79 | 16.95 |
| ResNet_18_v1 | 13.25 | 18.18 |
| ResNet_50_v1 | 27.58 | 37.19 |
| ResNet50_QDQ | 16.52 | 11.19 |
| MobileNetv2 | 5.61 | 6.16 |
| EfficientNet | 20.67 | 23.19 |
| VIT_B_32 | 38.11 | 44.76 |
| VIT_Base_Patch16_224 | 126.94 | 143.61 |
| DeiT_Tiny_Patch16_224 | 16.62 | 13.33 |
| MobileViT_XS | 16.23 | 16.6 |
| MobileViTv2_100 | 22.34 | 23.33 |
| BEiT_Base_Patch16_224 | 131.79 | 147.36 |
| YOLOv3 | 51.76 | 72.7 |
| YOLOv4 | 482.87 | 1001.12 |
| YOLOv4_tiny | 26.76 | 37.3 |
| YOLOv5 | 34.35 | 34.26 |
| YOLOv8 | 41.13 | 54.71 |
| YOLOX | 106.91 | 144.73 |
| MobileNet_SSD_v1_ONNX | 21.34 | 17.86 |
| SFace | 5.68 | 7.71 |
| MPPalm | 5.87 | 7.08 |
| MPHand | 5.35 | 4.11 |
| MPPose | 10.71 | 6.17 |
| PPHumanSeg | 7.85 | 8.46 |
| CRNN | 29.3 | 7.71 |
| VitTrack | 2.71 | 1.45 |
| BERT | 16.84 | 17.77 |
| FacePaint | 1079.25 | 1231.11 |
| Model | OpenCV | ONNX Runtime (1.25.1) |
|---|---|---|
| BiRefNet | 13746.997 | 15144.133 |
| BlazeFace | 1.406 | 1.39 |
| depth_anything | 220.546 | 228.585 |
| Dinov2 | 37.745 | 38.512 |
| Dinov3 | 25.51 | 25.33 |
| NAFNet | 2581.816 | 2649.351 |
| NanoTrack | 1.54 | 1.61 |
| owlv2 | 2181.32 | 2427.6 |
| ppocr | 139.3 | 139.5 |
| RAFT | 4336 | 4415.1 |
| RealEsrGAN | 1138.484 | 1210.61 |
| RetinaFACE | 45.8 | 45.86 |
| RTMPose | 12.572 | 12.479 |
| SAM2 ENCODER | 3900.4 | 4072.64 |
| SAM2 DECODER | 27.18 | 27.11 |
| Segformer | 292.8 | 297.31 |
| SigLip | 81.73 | 81.61 |
| SwinIR | 351.618 | 398.654 |
| trOCR (Encoder) | 204 | 209.4 |
| trOCR (Decoder) | 58.9 | 60.1 |
| U2 Net | 309.222 | 299.946 |
| X-Feat | 24.095 | 24.149 |
| yolo26m_seg | 252.1 | 252.1 |
| yolo26n_seg | 31.7 | 31.7 |
| yolo26n | 22.7 | 21.6 |
| yunet | 5.18 | 5.17 |
| RF-DETR | 213.3 | 205.4 |
| RT-DETR | 302.8 | 301.9 |
| Grounding DINO | 7697.07 | 7740.43 |
| Fast Neural Style | 31.426 | 32.235 |
| AlexNet_ONNX | 5.85 | 5.96 |
| GoogLeNet_ONNX | 11.4 | 11 |
| SqueezeNet_ONNX | 3.47 | 3.83 |
| DenseNet121_ONNX | 39.85 | 39.75 |
| Inception_ONNX | 10.41 | 10.04 |
| ResNet_18_v1 | 7.71 | 7.75 |
| ResNet_50_v1 | 16.05 | 16.07 |
| ResNet50_QDQ | 11.27 | 11.25 |
| MobileNetv2 | 4.96 | 4.91 |
| EfficientNet | 19.89 | 20.06 |
| VIT_B_32 | 28.13 | 28.2 |
| VIT_Base_Patch16_224 | 76.68 | 76.86 |
| DeiT_Tiny_Patch16_224 | 16.94 | 17.16 |
| MobileViT_XS | 13.57 | 14.07 |
| MobileViTv2_100 | 16.59 | 16.58 |
| BEiT_Base_Patch16_224 | 82.97 | 83.49 |
| YOLOv3 | 27.26 | 27.31 |
| YOLOv4 | 238.16 | 241.8 |
| YOLOv4_tiny | 14.87 | 14.73 |
| YOLOv5 | 22.48 | 22.4 |
| YOLOv8 | 24.25 | 24.91 |
| YOLOX | 54.38 | 54.33 |
| MobileNet_SSD_v1_ONNX | 19.05 | 18.8 |
| SFace | 3.88 | 3.95 |
| MPPalm | 5.02 | 5 |
| MPHand | 4.94 | 5.03 |
| MPPose | 9.65 | 9.69 |
| PPHumanSeg | 6.15 | 5.98 |
| CRNN | 24.06 | 24.14 |
| VitTrack | 2.36 | 2.33 |
| BERT | 13.78 | 13.54 |
| FacePaint | 604.23 | 613.01 |
| Model | OpenCV | ONNX Runtime (1.25.1) |
|---|---|---|
| BiRefNet | 5812.648 | 9830.294 |
| BlazeFace | 0.94 | 3.21 |
| depth_anything | 93.037 | 240.39 |
| Dinov2 | 13.761 | 45.186 |
| Dinov3 | 14.313 | 14.594 |
| NAFNet | 2009.587 | 2794.632 |
| NanoTrack | 0.88 | 0.78 |
| owlv2 | 716.71 | 1722.6 |
| ppocr | 158.7 | 207 |
| RAFT | — | — |
| RealEsrGAN | 561.708 | 1162.763 |
| RetinaFACE | 23.23 | 27.24 |
| RTMPose | 5.828 | 7.347 |
| SAM2 ENCODER | 1813.17 | 2331.81 |
| SAM2 DECODER | 18.44 | 12.99 |
| SAM3 (Encoder) | 7619.52 | — |
| SAMURAI (Encoder) | 979.3 | 1246.1 |
| SAMURAI (Decoder) | 19.8 | 16.3 |
| Segformer | 170.09 | 307.94 |
| SigLip | 31.11 | 73.06 |
| SwinIR | 249.553 | 270.99 |
| trOCR (Encoder) | — | — |
| trOCR (Decoder) | — | — |
| U2 Net | 102.212 | 161.191 |
| X-Feat | 6.226 | 11.463 |
| yolo26m_seg | 110.8 | 154.9 |
| yolo26n_seg | 20.5 | 40.6 |
| yolo26n | 13.6 | 30.7 |
| yunet | 4.87 | 5.16 |
| RF-DETR | 73.5 | 176.9 |
| RT-DETR | 117.9 | 154.1 |
| Grounding DINO | 1690.48 | 2986.07 |
| Fast Neural Style | 15.449 | 27.634 |
| AlexNet_ONNX | 6.21 | 13.45 |
| GoogLeNet_ONNX | 3.82 | 8.66 |
| SqueezeNet_ONNX | 1.42 | 2.62 |
| DenseNet121_ONNX | 22.41 | 27.09 |
| Inception_ONNX | 3.59 | 6.76 |
| ResNet_18_v1 | 3.25 | 5.11 |
| ResNet_50_v1 | 8.1 | 6.73 |
| ResNet50_QDQ | 3.69 | 7.13 |
| MobileNetv2 | 2.13 | 3.48 |
| EfficientNet | 10.02 | 7.26 |
| VIT_B_32 | 11.93 | 17.29 |
| VIT_Base_Patch16_224 | 27.69 | 55.4 |
| DeiT_Tiny_Patch16_224 | 5.21 | 10.4 |
| MobileViT_XS | 6.45 | 14.86 |
| MobileViTv2_100 | 7.92 | 15.72 |
| BEiT_Base_Patch16_224 | 29.48 | 62.29 |
| YOLOv3 | 13.72 | 24.78 |
| YOLOv4 | 121.24 | 338.87 |
| YOLOv4_tiny | 7.48 | 12.94 |
| YOLOv5 | 12.03 | 28.4 |
| YOLOv8 | 14.65 | 21.95 |
| YOLOX | 25.37 | 38.93 |
| MobileNet_SSD_v1_ONNX | 14.27 | 14.87 |
| SFace | 1.82 | 3.43 |
| MPPalm | 2.04 | 8.7 |
| MPHand | 2.65 | 2.43 |
| MPPose | 6.01 | 10.81 |
| PPHumanSeg | 2.45 | 5.02 |
| CRNN | 12.28 | 2.62 |
| VitTrack | 1.36 | 3.18 |
| BERT | 10.95 | 25.74 |
| FacePaint | 347.4 | 652.3 |
Back to the OpenCV 5 release notes.
© Copyright 2019-2025, OpenCV team
- Home
- Deep Learning in OpenCV
- Running OpenCV on Various Platforms
- OpenCV 5
- OpenCV 4
- OpenCV 3
- Development process
- OpenCV GSoC
- Archive