Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cv2 much slower as you increase number of cores. Server and Personal Laptop comparison. #11107

Closed
shubhvachher opened this issue Mar 18, 2018 · 16 comments
Labels
question (invalid tracker) ask questions and other "no action" items here: https://forum.opencv.org

Comments

@shubhvachher
Copy link

shubhvachher commented Mar 18, 2018

System information (version)

Server :

  • OpenCV => 3.3.1
  • Operating System / Platform => Ubuntu 64 Bit
  • Using Python Anaconda fresh install

Personal Laptop :

  • OpenCV => 3.1
  • Operating System / Platform => Ubuntu 64 Bit
  • Python Anaconda but quite an old install. Works perfectly.
Detailed description

Running a python script on server with 8 Intel kabylake cores; (2 Nvidia TI GPUs) was slower than my personal laptop (4 haswell? cores) ! Profiling the code told me the problem was mostly with the cv2 functions and can be seen massively with cv2.findContours function.

I tried limiting python to 1 core using
taskset -c 1 python <program-name>.py
and the server blew my personal PC away ( Almost 2.x times faster).

Allowing two cores using taskset -c 1,2 python ... massively hit performance of program on the server while reducing performance on my laptop (but not nearly as much as on the server).

On two cores my server was 1.5times slower than my personal PC.

I have given an example with cv2.findContours below :

Steps to reproduce

Fresh install of anaconda and conda install opencv.

hand.png
hand

# Running in ipython
import cv2
image = cv2.imread("hand.png")
%timeit -n5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

Run ipython with
taskset -c 1 ipython

and then
taskset -c 1,2 ipython

and even just
ipython ; #Uses all available cores

Some results on my server :

$ ipython
Python 3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import cv2

In [2]: image = cv2.imread("hand.png", 0)

In [3]: _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN
   ...: _APPROX_NONE)

In [4]: %timeit -n 5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXT
   ...: ERNAL, cv2.CHAIN_APPROX_NONE)
445 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 5000 loops each)

In [5]: exit

$ taskset -c 1 ipython
...same as above

In [4]: %timeit -n 5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
386 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 5000 loops each)

In [5]: exit

$ taskset -c 1,2 ipython
...
In [4]: %timeit -n 5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
385 µs ± 851 ns per loop (mean ± std. dev. of 7 runs, 5000 loops each)

In [5]: exit

$ taskset -c 1,2,3,4 ipython
...
In [4]: %timeit -n 5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
388 µs ± 824 ns per loop (mean ± std. dev. of 7 runs, 5000 loops each)

In [5]: exit

(old_camera) tsinghuapcg@tsinguapcg-kabylake-nvidiati:~/shubh$ ipython
...
In [4]: _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

In [5]: %timeit -n 5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
448 µs ± 7.2 µs per loop (mean ± std. dev. of 7 runs, 5000 loops each)

On my personal laptop :


$ taskset -c 1 ipython
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:53:06) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import cv2

In [2]: image = cv2.imread("hand.png", 0)

In [3]: image
Out[3]: 
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

In [4]: _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

In [5]: %timeit _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
1000 loops, best of 3: 648 µs per loop

In [11]: %timeit -n5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
5000 loops, best of 3: 653 µs per loop

In [12]: exit

$ taskset -c 1,2 ipython
...
In [4]: %timeit -n5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
5000 loops, best of 3: 714 µs per loop

In [5]: exit

$ ipython
...
In [4]: %timeit -n5000 _, cnt, hier = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
5000 loops, best of 3: 1.15 ms per loop

Another example :

    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

    backgroundRange = np.array([[30, 0, 30], [255, 70, 255]])  # take6 white (low saturation) backgroud
    backgroundMask = cv2.inRange(img_hsv, backgroundRange[0], backgroundRange[1])

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    gray_masked = cv2.bitwise_and(gray, gray, mask = np.invert(backgroundMask))  # Remove background from image

    pinkRange = np.array([[165, 110, 120] , [175, 160, 255]])
    greenRange = np.array([[33, 160, 110] , [38, 230, 255]])
    #yellowRange = np.array([[32, 135, 165] , [36, 150, 180]])  # Not used in take6. Not updated.
    orangeRange = np.array([[13, 165, 100] , [20, 255, 255]])

The above code for some 895 number of files runs :
3.6901626586914062 : Laptop with only 1 core
4.2052507400512695 : Laptop with 2 cores active

2.1227364540100098 : Server with only 1 core
20.15206742286682 : Server with 2 cores active

Any help would be appreciated!

Is this a problem with the cores of my server not being able to communicate at full speed?

Are there any free speedup methods for inter core communication or something?

@shubhvachher shubhvachher changed the title cv2 much slower as you increase number of cores. Server vs Personal Laptop comparison. cv2 much slower as you increase number of cores. Server and Personal Laptop comparison. Mar 18, 2018
@shubhvachher
Copy link
Author

Can anyone verify this issue on their machine? I'm unsure if this is just a problem with my two systems..

@mshabunin
Copy link
Contributor

@shubhvachher , probably internal OpenCV parallel execution does not know about current affinity and will spawn 8 threads even if you enable only 1 core. Try to experiment with getNumThreads(), setNumThreads() functions.

Also, you should pay attention to the actual parallel execution backend being used, on Linux we have pthreads and TBB, each with its own specifics (check output of cv2.getBuildInformation()). It is necessary to use the same backend on both machines to get consistent results.

Finally, you should try the latest OpenCV version (3.4.1), it has some improvements in this area, e.g. #10691.

@shubhvachher
Copy link
Author

Hey, Thanks for the direction. Here is the output on the server which has massive slowdown for more CPUs than 1:

`All 8 CPUs`

$ ipython
Python 3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import cv2

In [2]: cv2.getNumThreads()
Out[2]: 8

In [3]: cv2.getNumberOfCPUs()
Out[3]: 8

In [6]: print(cv2.getBuildInformation())

General configuration for OpenCV 3.3.1 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            /tmp/build/80754af9/opencv_1512687413662/work/opencv_contrib-3.3.1/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2017-12-07T23:03:41Z
    Host:                        Linux 4.4.0-62-generic x86_64
    CMake:                       3.9.4
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/gmake
    Configuration:               Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2
      SSE4_1 (3 files):          + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (5 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (8 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

  C/C++:
    Built as dynamic libs?:      YES
    C++11:                       YES
    C++ Compiler:                /home/t/anaconda3/envs/old_camera/bin/x86_64-conda_cos6-linux-gnu-c++  (ver 7.2.0)
    C++ flags (Release):         -fvisibility-inlines-hidden -std=c++11 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fvisibility-inlines-hidden -std=c++11 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -g  -DDEBUG -D_DEBUG
    C Compiler:                  /home/t/anaconda3/envs/old_camera/bin/x86_64-conda_cos6-linux-gnu-cc
    C flags (Release):           -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -g  -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/t/anaconda3/envs/old_camera/lib -L/home/t/anaconda3/envs/old_camera/lib
    Linker flags (Debug):        -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/t/anaconda3/envs/old_camera/lib -L/home/t/anaconda3/envs/old_camera/lib
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:          dl m pthread rt
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 core flann hdf imgproc ml objdetect phase_unwrapping photo plot reg surface_matching video xphoto bgsegm dnn face freetype fuzzy img_hash imgcodecs shape videoio xobjdetect highgui superres bioinspired dpm features2d line_descriptor saliency text calib3d ccalib datasets rgbd stereo structured_light tracking videostab xfeatures2d ximgproc aruco optflow stitching python3
    Disabled:                    js world contrib_world
    Disabled by dependency:      -
    Unavailable:                 cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev java python2 ts viz cnn_3dobj cvv dnn_modern matlab sfm

  GUI: 
    QT:                          NO
    GTK+:                        NO
    GThread :                    NO
    GtkGlExt:                    NO
    OpenGL support:              NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /home/t/anaconda3/envs/old_camera/lib/libz.so (ver 1.2.11)
    JPEG:                        /home/t/anaconda3/envs/old_camera/lib/libjpeg.so (ver 90)
    WEBP:                        build (ver encoder: 0x020e)
    PNG:                         /home/t/anaconda3/envs/old_camera/lib/libpng.so (ver 1.6.32)
    TIFF:                        /home/t/anaconda3/envs/old_camera/lib/libtiff.so (ver 42 - 4.0.9)
    JPEG 2000:                   /home/t/anaconda3/envs/old_camera/lib/libjasper.so (ver 1.900.1)
    OpenEXR:                     build (ver 1.7.1)
    GDAL:                        NO
    GDCM:                        NO

  Video I/O:
    DC1394 1.x:                  NO
    DC1394 2.x:                  NO
    FFMPEG:                      YES
      avcodec:                   YES (ver 57.107.100)
      avformat:                  YES (ver 57.83.100)
      avutil:                    YES (ver 55.78.100)
      swscale:                   YES (ver 4.8.100)
      avresample:                NO
    GStreamer:                   NO
    OpenNI:                      NO
    OpenNI PrimeSensor Modules:  NO
    OpenNI2:                     NO
    PvAPI:                       NO
    GigEVisionSDK:               NO
    Aravis SDK:                  NO
    UniCap:                      NO
    UniCap ucil:                 NO
    V4L/V4L2:                    YES/YES
    XIMEA:                       NO
    Xine:                        NO
    Intel Media SDK:             NO
    gPhoto2:                     NO

  Parallel framework:            OpenMP

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Use Intel IPP:               2017.0.3 [2017.0.3]
               at:               /tmp/build/80754af9/opencv_1512687413662/work/build/3rdparty/ippicv/ippicv_lnx
    Use Intel IPP IW:            sources (2017.0.3)
                  at:            /tmp/build/80754af9/opencv_1512687413662/work/build/3rdparty/ippicv/ippiw_lnx
    Use VA:                      NO
    Use Intel VA-API/OpenCL:     NO
    Use Lapack:                  NO
    Use Eigen:                   YES (ver 3.3.3)
    Use Cuda:                    NO
    Use OpenCL:                  NO
    Use OpenVX:                  NO
    Use custom HAL:              NO

  Python 2:
    Interpreter:                 NO

  Python 3:
    Interpreter:                 /home/t/anaconda3/envs/old_camera/bin/python (ver 3.6.3)
    Libraries:                   /home/t/anaconda3/envs/old_camera/lib/libpython3.6m.so (ver 3.6.3)
    numpy:                       /home/t/anaconda3/envs/old_camera/lib/python3.6/site-packages/numpy/core/include (ver 1.9.3)
    packages path:               /home/t/anaconda3/envs/old_camera/lib/python3.6/site-packages

  Python (for build):            /home/t/anaconda3/envs/old_camera/bin/python

  Java:
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Matlab:                        NO

  Tests and samples:
    Tests:                       NO
    Performance tests:           NO
    C/C++ Examples:              NO

  Install path:                  /home/t/anaconda3/envs/old_camera

  cvconfig.h is in:              /tmp/build/80754af9/opencv_1512687413662/work/build
-----------------------------------------------------------------



In [7]: exit

`Only 1 CPU using taskset`

$ taskset -c 1 ipython
Python 3.6.4 |Anaconda, Inc.| (default, Mar 13 2018, 01:15:57) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import cv2

In [2]: cv2.getNumThreads()
Out[2]: 1

In [3]: cv2.getNumberOfCPUs()
Out[3]: 8

In [5]: print(cv2.getBuildInformation())

General configuration for OpenCV 3.3.1 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            /tmp/build/80754af9/opencv_1512687413662/work/opencv_contrib-3.3.1/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2017-12-07T23:03:41Z
    Host:                        Linux 4.4.0-62-generic x86_64
    CMake:                       3.9.4
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/gmake
    Configuration:               Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2
      SSE4_1 (3 files):          + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (5 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (8 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

  C/C++:
    Built as dynamic libs?:      YES
    C++11:                       YES
    C++ Compiler:                /home/t/anaconda3/envs/old_camera/bin/x86_64-conda_cos6-linux-gnu-c++  (ver 7.2.0)
    C++ flags (Release):         -fvisibility-inlines-hidden -std=c++11 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fvisibility-inlines-hidden -std=c++11 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -g  -DDEBUG -D_DEBUG
    C Compiler:                  /home/t/anaconda3/envs/old_camera/bin/x86_64-conda_cos6-linux-gnu-cc
    C flags (Release):           -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/home/t/anaconda3/envs/old_camera/include -I/home/t/anaconda3/envs/old_camera/include   -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections  -msse -msse2 -msse3 -fopenmp -g  -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/t/anaconda3/envs/old_camera/lib -L/home/t/anaconda3/envs/old_camera/lib
    Linker flags (Debug):        -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/t/anaconda3/envs/old_camera/lib -L/home/t/anaconda3/envs/old_camera/lib
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:          dl m pthread rt
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 core flann hdf imgproc ml objdetect phase_unwrapping photo plot reg surface_matching video xphoto bgsegm dnn face freetype fuzzy img_hash imgcodecs shape videoio xobjdetect highgui superres bioinspired dpm features2d line_descriptor saliency text calib3d ccalib datasets rgbd stereo structured_light tracking videostab xfeatures2d ximgproc aruco optflow stitching python3
    Disabled:                    js world contrib_world
    Disabled by dependency:      -
    Unavailable:                 cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev java python2 ts viz cnn_3dobj cvv dnn_modern matlab sfm

  GUI: 
    QT:                          NO
    GTK+:                        NO
    GThread :                    NO
    GtkGlExt:                    NO
    OpenGL support:              NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /home/t/anaconda3/envs/old_camera/lib/libz.so (ver 1.2.11)
    JPEG:                        /home/t/anaconda3/envs/old_camera/lib/libjpeg.so (ver 90)
    WEBP:                        build (ver encoder: 0x020e)
    PNG:                         /home/t/anaconda3/envs/old_camera/lib/libpng.so (ver 1.6.32)
    TIFF:                        /home/t/anaconda3/envs/old_camera/lib/libtiff.so (ver 42 - 4.0.9)
    JPEG 2000:                   /home/t/anaconda3/envs/old_camera/lib/libjasper.so (ver 1.900.1)
    OpenEXR:                     build (ver 1.7.1)
    GDAL:                        NO
    GDCM:                        NO

  Video I/O:
    DC1394 1.x:                  NO
    DC1394 2.x:                  NO
    FFMPEG:                      YES
      avcodec:                   YES (ver 57.107.100)
      avformat:                  YES (ver 57.83.100)
      avutil:                    YES (ver 55.78.100)
      swscale:                   YES (ver 4.8.100)
      avresample:                NO
    GStreamer:                   NO
    OpenNI:                      NO
    OpenNI PrimeSensor Modules:  NO
    OpenNI2:                     NO
    PvAPI:                       NO
    GigEVisionSDK:               NO
    Aravis SDK:                  NO
    UniCap:                      NO
    UniCap ucil:                 NO
    V4L/V4L2:                    YES/YES
    XIMEA:                       NO
    Xine:                        NO
    Intel Media SDK:             NO
    gPhoto2:                     NO

  Parallel framework:            OpenMP

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Use Intel IPP:               2017.0.3 [2017.0.3]
               at:               /tmp/build/80754af9/opencv_1512687413662/work/build/3rdparty/ippicv/ippicv_lnx
    Use Intel IPP IW:            sources (2017.0.3)
                  at:            /tmp/build/80754af9/opencv_1512687413662/work/build/3rdparty/ippicv/ippiw_lnx
    Use VA:                      NO
    Use Intel VA-API/OpenCL:     NO
    Use Lapack:                  NO
    Use Eigen:                   YES (ver 3.3.3)
    Use Cuda:                    NO
    Use OpenCL:                  NO
    Use OpenVX:                  NO
    Use custom HAL:              NO

  Python 2:
    Interpreter:                 NO

  Python 3:
    Interpreter:                 /home/t/anaconda3/envs/old_camera/bin/python (ver 3.6.3)
    Libraries:                   /home/t/anaconda3/envs/old_camera/lib/libpython3.6m.so (ver 3.6.3)
    numpy:                       /home/t/anaconda3/envs/old_camera/lib/python3.6/site-packages/numpy/core/include (ver 1.9.3)
    packages path:               /home/t/anaconda3/envs/old_camera/lib/python3.6/site-packages

  Python (for build):            /home/t/anaconda3/envs/old_camera/bin/python

  Java:
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Matlab:                        NO

  Tests and samples:
    Tests:                       NO
    Performance tests:           NO
    C/C++ Examples:              NO

  Install path:                  /home/t/anaconda3/envs/old_camera

  cvconfig.h is in:              /tmp/build/80754af9/opencv_1512687413662/work/build
----------------------------------------------------------------

@alalek
Copy link
Member

alalek commented May 31, 2018

Parallel framework: OpenMP

OpenMP uses active threads approach - worker threads waste CPU resources when idle (during several ms after parallel processing). Refer to OMP_WAIT_POLICY documentation.

@adityapatadia
Copy link

Tried OMP_WAIT_POLICY=PASSIVE. It does not have any impact.

@shubhvachher
Copy link
Author

Well... One workaround is to install opencv using the pip distribution. It uses
Parallel framework: pthreads

@adityapatadia
Copy link

Tried the same. If suffers from same problem. I tried this one: https://pypi.org/project/opencv-contrib-python/

@shubhvachher
Copy link
Author

shubhvachher commented Jan 7, 2019

Nope. Just pip install opencv-python

https://pypi.org/project/opencv-python/

@adityapatadia
Copy link

Sadly that does not have some of the functions I need. For example, the medianflow tracker.

@shubhvachher
Copy link
Author

Ah.. I see your package is just an extension of the above... I'm unsure as to why it doesn't have comparable multi core performance. try cv2.getBuildInformation() in your current installation and check the parallel framework being used.

@adityapatadia
Copy link

It's pthread. I still see degraded performance on higher core machine.

@alalek
Copy link
Member

alalek commented Jan 8, 2019

@adityapatadia Did you observe problem via OpenCV performance tests (they have --perf_threads=<N> option to limit number of used threads) ?
If yes, then please fill separate issue with description of your configuration and attached logs (Google Test .xml files via --gtest_output=xml:result.xml).

@phausamann
Copy link

FWIW, OMP_WAIT_POLICY=PASSIVE has solved the problem in my case (pupil-detectors library built against OpenCV 4.2.0 with OpenMP as parallel framework).

@lbf4616
Copy link

lbf4616 commented Jun 5, 2020

The same issue, using OpenVINO-OpenCv with TBB. E5 (40 threads) is two times slower than i7 8700 of cv2.HoughLinesP function.

@rivergold
Copy link

I met the same issue. Using opencv 4.2.0 from conda-forge, the resize sometimes will be much slower:
usually 7 ms to resize 1920 * 1080 -> 512 * 512,
sometimes more than 40ms

@harshilbhavsar7
Copy link

I also met the same issue. getting too much delay (40 sec) in amazon ec2 medium-size instance compared to my personal laptop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question (invalid tracker) ask questions and other "no action" items here: https://forum.opencv.org
Projects
None yet
Development

No branches or pull requests

8 participants