Up to 50% longer inference time on for onnx model for the same hardware, compiled the sam way. What can be the reason? #23223

TrueWodzu · 2023-02-06T13:52:40Z

System Information

OpenCV version: 4.6.0
Operating System / Platform: Ubuntu 20.04 / Windows / WSL2 / Docker
Compiler & compiler version: GCC 9,11, MSVC 2017, 2019, Python 3.10
CUDA: 11.4, 11.8

Detailed description

A word of preface:

I am observing up to 50% larger inference times on one environment compared to other. Both environments are running on the same hardware and OpenCV have been compiled the same way. What is interesting is that performance is constant regardless of operating system.
For example: I wrote a c++ executable and I've run it on a WSL2 image (Ubuntu 20.04/CUDA 11.8) where my inference time is 25 FPS and I have a docker image (Ubuntu 20.04/CUDA 11.8) where I compiled OpenCV exactly the same and inference time is 15 FPS.

I've tested it also on Windows. This time I wrote a Python script. I have two Python environments, in one I get 25 FPS in the other it is 15 FPS.

So the problem is not within my code . and it does not depend on the operating system. Each time I've used the same OpenCV version with the same set of options. I am suspecting that maybe during compilation of OpenCV something sometimes is compiled differently?

I've dug deeper. I've profiled OpenCV, here are the results:

ID name                                                                      count thr          min          max       median          avg       *self*          IPP   %       OpenCL   %
                                                                                               t-min        t-max     t-median        t-avg        total        t-IPP   %     t-OpenCL   %
  1 cv::dnn::dnn4_v20220524::Net::forward#net.cpp:93                            241   1       34.539     1569.779       35.574       43.279    10430.354        0.000   0        0.000   0

ID name                                                                      count thr          min          max       median          avg       *self*          IPP   %       OpenCL   %
                                                                                               t-min        t-max     t-median        t-avg        total        t-IPP   %     t-OpenCL   %
  1 cv::dnn::dnn4_v20220524::Net::forward#net.cpp:93                            241   1       58.370     1596.572       60.892       67.375    16237.484        0.000   0        0.000   0
                                                                                              58.370     1596.572       60.892       67.375    16237.484        0.000   0        0.000   0

This is the major difference. So I know the problem lies within dnn. I'looked at source and discovered that I can time "layers". So I did that.

Here are top 20 worst times (seconds) where inference is slower:

6.0574 :onnx_node!Slice_19
5.3924 :onnx_node!Slice_29
5.3768 :onnx_node!Slice_9
2.8331 :onnx_node!Slice_4
2.6924 :onnx_node!Slice_24
2.6597 :onnx_node!Slice_14
2.6464 :onnx_node!Slice_34
0.1466 :onnx_node!Concat_40
0.1194 :onnx_node!Slice_39
0.0923 :onnx_node!Mul_43
0.0827 :onnx_node!Mul_271
0.0702 :onnx_node!Mul_46
0.0666 :onnx_node!Concat_190
0.0564 :onnx_node!Mul_297
0.0534 :onnx_node!Mul_180
0.0517 :onnx_node!Mul_70
0.048 :onnx_node!Mul_125
0.0459 :onnx_node!Mul_52
0.045 :onnx_node!Mul_59
0.0449 :onnx_node!Mul_93
...
Total: 36.8912

And here where it is faster.

5.7526 :onnx_node!Slice_19
3.2699 :onnx_node!Slice_29
3.2128 :onnx_node!Slice_9
2.785 :onnx_node!Slice_14
2.7736 :onnx_node!Slice_24
1.5678 :onnx_node!Slice_34
1.5364 :onnx_node!Slice_4
0.144 :onnx_node!Concat_40
0.1209 :onnx_node!Slice_39
0.0768 :onnx_node!Reshape_342
0.0626 :onnx_node!Mul_243
0.0615 :onnx_node!Mul_46
0.0584 :onnx_node!Mul_70
0.0562 :onnx_node!Reshape_361
0.0551 :onnx_node!Mul_180
0.0541 :onnx_node!Mul_297
0.0536 :onnx_node!Mul_43
0.0507 :onnx_node!Mul_125
0.0494 :onnx_node!Mul_52
0.0474 :onnx_node!Mul_271

...
Total: 27.696

I am not sure if this is a bug or not, but drop in performance is quite serious and it would be good to know what can cause this, so this can be documented.

Steps to reproduce

This happens for any Yolov5 model translated to onnx. The difference can be observed on any model. The bigger model, the bigger difference. On my machine I can reproduce this every time by installing a fresh WSL2 image and fresh docker image based on Ubuntu 20.04. However, as said earlier. Docker is not the problem here, neither the operating system or compiler.

Build settings:

cmake .. -D CMAKE_BUILD_TYPE=RELEASE \
    -D WITH_IPP=OFF \
    -D WITH_OPENGL=OFF \
    -D WITH_QT=OFF \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D OPENCV_EXTRA_MODULES_PATH=../contrib/modules \
    -D OPENCV_ENABLE_NONFREE=ON \
    -D WITH_JASPER=OFF \
    -D WITH_TBB=ON \
    -D BUILD_JPEG=ON \
    -D WITH_SIMD=ON \
    -D WITH_FFMPEG=ON \
    -D ENABLE_LIBJPEG_TURBO_SIMD=ON \
    -D BUILD_DOCS=OFF \
    -D BUILD_EXAMPLES=OFF \
    -D BUILD_TESTS=OFF \
    -D BUILD_PERF_TESTS=OFF \
    -D BUILD_opencv_java=NO \
    -D BUILD_opencv_python=NO \
    -D BUILD_opencv_python2=NO \
    -D BUILD_opencv_python3=NO \
    -D BUILD_CUDA_STUBS=ON \
    -D OPENCV_DNN_CUDA=ON \
    -D WITH_CUDA=ON \
    -D CUDA_ARCH_BIN=7.5 \
    -D WITH_GTK=ON \
    -D OPENCV_GENERATE_PKGCONFIG=ON \

Benchmarking code:

#include <opencv2/opencv.hpp>

int main(int argc, char* argv[])
{
  cv::Mat img(1080, 1920, CV_8UC3);
  cv::randu(img, cv::Scalar(0, 0, 0), cv::Scalar(255, 255, 255));

  cv::dnn::Net net = cv::dnn::readNet("/home/test/dev/yolov5m_based.onnx"); // Modify accordingly.
  net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
  net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);

  cv::Mat blob;
  cv::dnn::blobFromImage(img, blob, 1. / 255., cv::Size(640, 640), cv::Scalar(), false, false);
  net.setInput(blob);
  std::vector<cv::Mat> outputs;

  // Don't measure this, GPU needs to warm up.
  for (int i = 0; i < 5; ++i)
    net.forward(outputs, "output0");

  auto c1 = cv::getTickCount();
  for (int i = 0; i < 200; ++i)
    net.forward(outputs, "output0");
  auto c2 = cv::getTickCount();

  std::cout << "TOTAL TIME: " << ((c2 - c1) / cv::getTickFrequency() * 1000) << std::endl;

  return 0;
}

You must have Nvidia GPU.
Install WSL2 image (Ubuntu 20.04).
Install CUDA (11.8 and CuDNN 8.7 (don't install driver, install specific to WSL cuda packages).
Build OpenCV with provided options (compiler does not matter).
Build benchmarking program (compiler does not matter).
Run program, on my machine TOTAL TIME: 7260.
Install docker image based on Ubuntu 20.04 (you can do this from within WSL2!)
Execute step 2,3,4.
Run program, on my machine TOTAL TIME: 11061.

Of course there will be some small fluctuations in times but they will be very small.

You can obtain model from here

Issue submission checklist

I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
I updated to the latest OpenCV version and the issue is still there
There is reproducer code and related data files (videos, images, onnx, etc)

The text was updated successfully, but these errors were encountered:

dkurt · 2023-02-07T04:54:23Z

Docker and WSL are not perfect candidates for performance benchmarking. Both do not have full access to the system resources. So this is not OpenCV issue.

TrueWodzu · 2023-02-07T07:55:53Z

Docker and WSL are not perfect candidates for performance benchmarking. Both do not have full access to the system resources. So this is not OpenCV issue.

I guess you missed part where I wrote "I've tested it also on Windows. This time I wrote a Python script. I have two Python environments, in one I get 25 FPS in the other it is 15 FPS." Both times I've compiled Python bindings via OpenCV the same way (the same set of options).

So I've already excluded System, Compiler, Language, OpenCV version, CUDA version.

I will work on reproducing, I just hoped that maybe someone has an idea what can be the issue here.

zihaomu · 2023-02-08T01:40:30Z

Hi @TrueWodzu, can you try to reproduce this issue with OpenCV 4.x branch?

TrueWodzu · 2023-02-08T12:15:30Z

Hi @zihaomu thank you for your interest. The same issue happens on 4.x branch. I did some more tests. I took the compiled application which runs slower on Docker and moved it to WSL. The application was running faster under WSL.

I've measured GPU clocks and I can see that under docker GPU is less utilized when the application runs:
WSL:

 gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0      4     68      -     0      0      0      0    405    300
    0     24     68      -     0      0      0      0   5500   1005
    0     25     69      -     7      1      0      0   5500   1005
    0     52     71      -    28     11      0      0   5500   1665
    0     85     73      -    42     25      0      0   5500   1875
    0     70     74      -    39     25      0      0   5500   1860
    0     73     75      -    39     25      0      0   5500   1875
    0     78     75      -    39     25      0      0   5500   1875
    0     80     75      -    38     24      0      0   5500   1875
    0     73     76      -    38     24      0      0   5500   1875
    0     73     77      -    40     25      0      0   5500   1860
    0     81     77      -    38     24      0      0   5500   1860
    0     79     78      -    39     24      0      0   5500   1860
    0     46     76      -    26     16      0      0   5500   1860
    0     24     74      -     0      0      0      0   5000    630
    0     23     73      -     0      0      0      0   5000    390
    0      9     72      -     0      0      0      0    810    360
    0      4     72      -     0      0      0      0    405    300

Docker:

# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0      4     65      -     0      0      0      0    405    300
    0     17     65      -     1      0      0      0   5500   1005
    0     24     66      -     7      1      0      0   5500   1005
    0     33     67      -    27     10      0      0   5500   1005
    0     34     67      -    37     16      0      0   5500   1005
    0     30     67      -    38     16      0      0   5500   1005
    0     30     68      -    36     15      0      0   5500   1005
    0     30     68      -    36     15      0      0   5500   1005
    0     32     68      -    37     15      0      0   5500    960
    0     31     68      -    38     15      0      0   5500    945
    0     29     68      -    38     15      0      0   5500    945
    0     31     68      -    38     15      0      0   5500    945
    0     29     69      -    39     16      0      0   5500    945
    0     30     69      -    38     15      0      0   5500    945
    0     29     69      -    39     15      0      0   5500    930
    0     29     69      -    39     15      0      0   5500    930
    0     29     69      -    39     15      0      0   5500    930
    0     29     70      -    39     15      0      0   5500    930
    0     31     70      -    39     15      0      0   5500    930
    0     23     69      -    13      5      0      0   5000    690
    0     23     69      -     0      0      0      0   5000    435
    0      9     68      -     0      0      0      0    810    360
    0      4     68      -     0      0      0      0    405    360
    0      4     68      -     0      0      0      0    405    300

See how on WSL clock rises to 1875 and on Docker it stays at 1005?. However, I want to stress that in my opinion it is not the Docker issue. I've run Nvidia examples on both systems and their performance is exactly the same.
In my opinion the problem lies in some 3rd party library that OpenCV/onnx is using to do the math? Can you give ma a hint what libraries I should check?
I even did ldd my_app and it is linking to the same libraries on both systems including their versions.

atinfinity · 2023-02-08T12:42:44Z

@TrueWodzu

Question1

Could you share the result of following script?(WSL2 and Docker)

import cv2
print(cv2.getBuildInformation())

Question2

Could you share benchmark script and ONNX model?

TrueWodzu · 2023-02-08T13:42:19Z

@atinfinity

Answer 1

General configuration for OpenCV 4.6.0 =====================================
  Version control:               4.6.0

  Extra modules:
    Location (extra):            /home/wodzu/dev/libs/opencv/contrib/modules
    Version control (extra):     4.6.0

  Platform:
    Timestamp:                   2023-02-07T08:30:08Z
    Host:                        Linux 5.15.79.1-microsoft-standard-WSL2 x86_64
    CMake:                       3.16.3
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RELEASE

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (16 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (31 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                /usr/bin/c++  (ver 9.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
    Linker flags (Debug):        -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 alphamat cvv hdf java julia matlab ovis python2 python3 sfm ts viz
    Applications:                apps
    Documentation:               NO
    Non-free algorithms:         YES

  GUI:                           GTK3
    GTK+:                        YES (ver 3.24.20)
      GThread :                  YES (ver 2.64.6)
      GtkGlExt:                  NO
    VTK support:                 NO

  Media I/O:
    ZLib:                        /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11)
    JPEG:                        build-libjpeg-turbo (ver 2.1.2-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.4.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      YES (2.2.6)
    FFMPEG:                      YES
      avcodec:                   YES (58.54.100)
      avformat:                  YES (58.29.100)
      avutil:                    YES (56.31.100)
      swscale:                   YES (5.5.100)
      avresample:                YES (4.0.0)
    GStreamer:                   NO
    v4l/v4l2:                    YES (linux/videodev2.h)

  Parallel framework:            TBB (ver 2020.1 interface 11101)

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    VA:                          NO
    Lapack:                      NO
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  NVIDIA CUDA:                   YES (ver 11.8, CUFFT CUBLAS)
    NVIDIA GPU arch:             75
    NVIDIA PTX archs:

  cuDNN:                         YES (ver 8.7.0)

  OpenCL:                        YES (no extra features)
    Include path:                /home/wodzu/dev/libs/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            /usr/bin/python3

  Java:
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /usr/local
-----------------------------------------------------------------

I won't be sending second config because it is identical, I diffed it.

Answer 2

See updated, "steps to reproduce" section.

TrueWodzu · 2023-02-08T13:43:20Z

@alalek Since I've completed missing steps, is this issue now viable to look into?

TrueWodzu · 2023-02-10T09:21:36Z

I was able to recreate this on different machine. My laptop has RTX 2060, I've recreated this on GTX 1650

atinfinity · 2023-02-15T14:12:50Z

I could reproduce on same hardware.

Software

Ubuntu 22.04(native)
- OpenCV 4.7.0
Ubuntu 22.04(docker container)
- OpenCV 4.7.0

Hardware

Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz
GeForce RTX 2080 Ti

Execution time(TrueWodzu's code)

Ubuntu 22.04(native)

TOTAL TIME: 8330.61

Ubuntu 22.04(docker container)

TOTAL TIME: 12189.9

TrueWodzu · 2023-02-15T14:18:01Z

@atinfinity Thank you so much for you time. Just to make sure, you did this without Windows and WSL layer? You did this on pure Ubuntu system?

atinfinity · 2023-02-15T14:18:36Z

And, I checked inference of YOLOv4(Darknet).

script https://github.com/ghmagazine/opencv_dl_book/blob/main/ch8/8.4/yolov4_dnn_cuda_backend.py
model https://github.com/AlexeyAB/darknet
- yolov4.weights

In this case, there is no difference in execution time between "Ubuntu 22.04(native)" and "Ubuntu 22.04(docker container)".

atinfinity · 2023-02-15T14:19:41Z

@TrueWodzu

Just to make sure, you did this without Windows and WSL layer? You did this on pure Ubuntu system?

I tried on pure Ubuntu system.

atinfinity · 2023-02-17T06:10:38Z

I tried to use cuDNN sample(mnistCUDNN) on Ubuntu 22.04(native) and Ubuntu 22.04(docker container).
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#verify

cp -r /usr/src/cudnn_samples_v8/ $HOME
cd $HOME/cudnn_samples_v8/mnistCUDNN
make
./mnistCUDNN

As a result, CUDA kernel processing time is slower than Ubuntu 22.04(native).
So, I think this may be affected by Docker. And, I think that this is not OpenCV problem.

TrueWodzu · 2023-02-17T07:47:44Z

@atinfinity I've run mnistCUDNN example but I don't see any time differences. Also I think that these tests are way too short to make any conclusions.

ZJDATY · 2023-07-19T10:02:21Z

@TrueWodzu The 50% performance degradation is very similar to the phenomenon in my issue.
#23911

TrueWodzu · 2023-07-23T17:59:09Z

@ZJDATY I've read your post, and I see you have put a lot of energy into finding /reproducing the problem. The thing is, that in my case this happens on GPU, your case is for CPU.

ukoehler · 2023-08-09T08:34:19Z

I just stumbled across an issue that might be very similar or the reason. The same inference on 4.8.0 compared to 4.5.2 is sometimes slower. I traced it down to 4.5.2 using maximum 553.6 MB RAM wand 4.8.0 is using 1.89 GB RAM. Anybody seeing the problem might be swapping? Is the very high RAM usage a known issue?

dkurt · 2023-08-09T08:49:43Z

@ukoehler, please add a link to model. If possible, provide per-layer timings using getPerfProfile on both 4.5.2 and 4.8.0. This might help determine which layer has regression.

std::vector<double> timings;
net.getPerfProfile(timings);
std::vector<String> names = net.getLayerNames();
CV_Assert(names.size() == timings.size());

for (int i = 0; i < names.size(); ++i)
{
    Ptr<dnn::Layer> l = net.getLayer(net.getLayerId(names[i]));
    std::cout << names[i] << " " << l->type << " " << timings[i] << std::endl;
}

ukoehler · 2023-08-09T12:13:03Z

@dkurt Hi Dimitry, I created a new issue with the requested information:
#24134

ukoehler · 2023-08-10T09:16:34Z

I just ran more test and have increases from 1.368 s for version 4.5.2 to 4.351 s for version 4.8.0.

This version is just collecting show stopper bugs.

TrueWodzu added the bug label Feb 6, 2023

TrueWodzu changed the title ~~Different ONNX graph generated on the same hardware(GPU) with the same version of OpenCV, 50% performance loss.~~ Up to 50% longer inference time on for onnx modek for the same hardware, compiled the sam way. What can be the reason? Feb 6, 2023

TrueWodzu changed the title ~~Up to 50% longer inference time on for onnx modek for the same hardware, compiled the sam way. What can be the reason?~~ Up to 50% longer inference time on for onnx model for the same hardware, compiled the sam way. What can be the reason? Feb 6, 2023

ukoehler mentioned this issue Aug 9, 2023

DNN inference is using 3.5 time more memory for 4.8.0 when compared to 4.5.2 #24134

Open

4 tasks

asmorkalov added this to the 4.10.0 milestone Feb 7, 2024

asmorkalov assigned vpisarev Feb 7, 2024

asmorkalov modified the milestones: 4.10.0, 4.11.0 Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Up to 50% longer inference time on for onnx model for the same hardware, compiled the sam way. What can be the reason? #23223

Up to 50% longer inference time on for onnx model for the same hardware, compiled the sam way. What can be the reason? #23223

TrueWodzu commented Feb 6, 2023 •

edited

dkurt commented Feb 7, 2023

TrueWodzu commented Feb 7, 2023 •

edited

zihaomu commented Feb 8, 2023

TrueWodzu commented Feb 8, 2023 •

edited

atinfinity commented Feb 8, 2023 •

edited

TrueWodzu commented Feb 8, 2023

TrueWodzu commented Feb 8, 2023

TrueWodzu commented Feb 10, 2023 •

edited

atinfinity commented Feb 15, 2023

TrueWodzu commented Feb 15, 2023

atinfinity commented Feb 15, 2023

atinfinity commented Feb 15, 2023

atinfinity commented Feb 17, 2023

TrueWodzu commented Feb 17, 2023

ZJDATY commented Jul 19, 2023

TrueWodzu commented Jul 23, 2023

ukoehler commented Aug 9, 2023

dkurt commented Aug 9, 2023 •

edited

ukoehler commented Aug 9, 2023

ukoehler commented Aug 10, 2023

Up to 50% longer inference time on for onnx model for the same hardware, compiled the sam way. What can be the reason? #23223

Up to 50% longer inference time on for onnx model for the same hardware, compiled the sam way. What can be the reason? #23223

Comments

TrueWodzu commented Feb 6, 2023 • edited

System Information

Detailed description

Steps to reproduce

Issue submission checklist

dkurt commented Feb 7, 2023

TrueWodzu commented Feb 7, 2023 • edited

zihaomu commented Feb 8, 2023

TrueWodzu commented Feb 8, 2023 • edited

atinfinity commented Feb 8, 2023 • edited

Question1

Question2

TrueWodzu commented Feb 8, 2023

TrueWodzu commented Feb 8, 2023

TrueWodzu commented Feb 10, 2023 • edited

atinfinity commented Feb 15, 2023

Software

Hardware

Execution time(TrueWodzu's code)

Ubuntu 22.04(native)

Ubuntu 22.04(docker container)

TrueWodzu commented Feb 15, 2023

atinfinity commented Feb 15, 2023

atinfinity commented Feb 15, 2023

atinfinity commented Feb 17, 2023

TrueWodzu commented Feb 17, 2023

ZJDATY commented Jul 19, 2023

TrueWodzu commented Jul 23, 2023

ukoehler commented Aug 9, 2023

dkurt commented Aug 9, 2023 • edited

ukoehler commented Aug 9, 2023

ukoehler commented Aug 10, 2023

TrueWodzu commented Feb 6, 2023 •

edited

TrueWodzu commented Feb 7, 2023 •

edited

TrueWodzu commented Feb 8, 2023 •

edited

atinfinity commented Feb 8, 2023 •

edited

TrueWodzu commented Feb 10, 2023 •

edited

dkurt commented Aug 9, 2023 •

edited