-
-
Notifications
You must be signed in to change notification settings - Fork 55.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cv2 much slower as you increase number of cores. Server and Personal Laptop comparison. #11107
Comments
Can anyone verify this issue on their machine? I'm unsure if this is just a problem with my two systems.. |
@shubhvachher , probably internal OpenCV parallel execution does not know about current affinity and will spawn 8 threads even if you enable only 1 core. Try to experiment with Also, you should pay attention to the actual parallel execution backend being used, on Linux we have pthreads and TBB, each with its own specifics (check output of Finally, you should try the latest OpenCV version (3.4.1), it has some improvements in this area, e.g. #10691. |
Hey, Thanks for the direction. Here is the output on the server which has massive slowdown for more CPUs than 1:
|
OpenMP uses active threads approach - worker threads waste CPU resources when idle (during several ms after parallel processing). Refer to |
Tried |
Well... One workaround is to install opencv using the pip distribution. It uses |
Tried the same. If suffers from same problem. I tried this one: https://pypi.org/project/opencv-contrib-python/ |
Nope. Just |
Sadly that does not have some of the functions I need. For example, the medianflow tracker. |
Ah.. I see your package is just an extension of the above... I'm unsure as to why it doesn't have comparable multi core performance. try |
It's pthread. I still see degraded performance on higher core machine. |
@adityapatadia Did you observe problem via OpenCV performance tests (they have |
FWIW, |
The same issue, using OpenVINO-OpenCv with TBB. E5 (40 threads) is two times slower than i7 8700 of cv2.HoughLinesP function. |
I met the same issue. Using opencv 4.2.0 from conda-forge, the |
I also met the same issue. getting too much delay (40 sec) in amazon ec2 medium-size instance compared to my personal laptop. |
System information (version)
Server :
Personal Laptop :
Detailed description
Running a python script on server with 8 Intel kabylake cores; (2 Nvidia TI GPUs) was slower than my personal laptop (4 haswell? cores) ! Profiling the code told me the problem was mostly with the cv2 functions and can be seen massively with
cv2.findContours
function.I tried limiting python to 1 core using
taskset -c 1 python <program-name>.py
and the server blew my personal PC away ( Almost 2.x times faster).
Allowing two cores using
taskset -c 1,2 python ...
massively hit performance of program on the server while reducing performance on my laptop (but not nearly as much as on the server).On two cores my server was 1.5times slower than my personal PC.
I have given an example with
cv2.findContours
below :Steps to reproduce
Fresh install of anaconda and conda install opencv.
hand.png
Run ipython with
taskset -c 1 ipython
and then
taskset -c 1,2 ipython
and even just
ipython ; #Uses all available cores
Some results on my server :
On my personal laptop :
Another example :
The above code for some 895 number of files runs :
3.6901626586914062 : Laptop with only 1 core
4.2052507400512695 : Laptop with 2 cores active
2.1227364540100098 : Server with only 1 core
20.15206742286682 : Server with 2 cores active
Any help would be appreciated!
Is this a problem with the cores of my server not being able to communicate at full speed?
Are there any free speedup methods for inter core communication or something?
The text was updated successfully, but these errors were encountered: