Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
WIP: HPX backend for OpenCV #11897
My name is Jakub Golinowski and I am a student enrolled for Google Summer of Code 2018. I am working for Ste||ar Group on developing HPX backend for parallelism in OpenCV. Here is the link to the GSoC project: https://summerofcode.withgoogle.com/projects/#5375652711104512. The main goal of the project is to allow better interoperability of HPX applications, which are using OpenCV. The primary use-case we have in mind is an application that is already using the HPX runtime environment and also performs computer vision operations using OpenCV. Including the HPX backend within the OpenCV will make such an application easier to implement and to control its parallelization behaviour using standard HPX approach.
So far there are 2 versions of HPX backend:
The backend was tested locally using small applications (they can be found here: https://github.com/Jakub-Golinowski/opencv_hpx_backend). The applications were behaving as expected.
This pullrequest changes
This is still work in progress, but I would like to ask for feedback early especially on the style of changes introduced to CMake config files.
Also, I would like to ask what would be required to have the HPX backend tested using OpenCV's CI? For now I added the cmake option WITH_HPX to choose HPX as the parallel backend and WITH_HPX_STARTSTOP to choose its back-up version (described above).
Answers to questions about this PR:
HPX supports distributed computation so it would be possible to distribute algorithms among multiple worker machines, but current implementation is made analogously to the other backends and uses only local parallel_for()_. However, this is a very interesting idea which requires some work in order to figure out for example if it is necessary to back the cv::Mat data structure by HPX distributed data structures (like partitioned_vector).
As for the HPX_STARTSTOP, the static initialization is possible with HPX and can be considered as the alternative version of the HPX_STARTSTOP backend (current STARTSTOP version is just the simplest version of HPX backed that does not require any changes to user code).
I was using the mandelbrot example from OpenCV documentation, the results for 4 cores can be found here:
As for the primary backend version (runtime assumed to be started by the user), it is true that user has to launch the HPX runtime explicitly and make calls to cv::parallel_for()_ loop from within the runtime - one way of achieving it is by putting #include "...hpx_main.hpp" into cpp file of his application that contains main() function, then all the user code is within the HPX runtime. We realize it might be a limitation to some applications but on the other hand this approach makes it easier for the use-case we have in mind. Since HPX is a library that is focused on parallelization by allowing the user to build a DAG of his work-flow instead of classic fork-join approach, the use-case we are thinking about is when user of HPX library wants to use functionality of OpenCV in his application as part of the above mentioned workflow DAG. In the above use-case using the backend in the primary version is far easier and more intuitive to the user. Whereas, any type of STARTSTOP backend in this use-case would make it really difficult to write such an application. Therefore, we propose two versions configurable by a build option. The first version is for the user who is using HPX and adds OpenCV functionality and second (STARTSTOP) for the user who simply wants to use HPX backend for cv::parallel_for_().
As for the timeout policies (when the worker threads are put to sleep), they can also be aggressive in HPX but they are configurable so this aspect can be easily tuned. However, I am not sure if I understand the example with multiple backends, I thought that parallel backend is a compile-time option and there cannot be multiple backends in OpenCV? Or are we talking about more general case?
referenced this pull request
Jul 5, 2018
@Jakub-Golinowski, the "with start/stop" option looks very inconvenient for OpenCV users. Basically, as I said, your PR is far from being complete in terms of support of this option - you put the include directive just to opencv_test_core without putting it to all the other tests and samples. And I don't think that would be a good idea to add this thing to every single test app and sample. Even with this option, if I interpreted the performance charts you provided correctly, HPX is slower than the existing parallel backends. With start/stop option it's even slower. So, what's the benefit of this parallel framework then?
@vpisarev The main benefit of including the “primary” version of HPX backend in OpenCV is solving the problem of competing parallel backends. Currently if a user develops an application using parallel capabilities of HPX runtime and wants to use OpenCV functionality as part of his application, then his application will spawn 2 parallel backends competing for the resources. Achieving high and predictable performance in this case is not trivial and introduces extra work for the user, discouraging him from combining HPX with OpenCV. However, with “primary” version of HPX backend in place the user would be able to easily include OpenCV in his HPX-based application and fully control its parallel behaviour.
As for the “start/stop” version of the backend I would like to clarify, that in this case the backend is not at all inconvenient for the OpenCV users as it does not require runtime management (no need to include hpx_init.hpp) and calls to cv::parallel_for_() can be made as in case of other backends. However, the main downside of this backend version is that starting and stopping HPX runtime environment for each cv::parallel_for_() call introduces overhead, and as you noted makes it the slowest backend.
I would like to also mention that in the benchmark I presented, the backend providing highest performance is the dedicated “pthreads” backend which was developed specifically to support cv::parallel_for_() calls. Other backends (tbb, omp, hpx) achieve lower performance - I see it as a trade-off between specialized implementation for highest performance and general implementation for serving multiple purposes.
As for the inclusion of hpx_main.hpp to accuracy and performance tests I am currently taking care of that and at the same time running tests locally. In my latest commit I added conditional #include of hpx_main.hpp to all the accuracy and performance tests that were built on my machine.
Summing up, the HPX “primary” backend version allows for “parallel compatibility” between OpenCV and HPX preserving the full control over the runtime environment in hands of the user. Additionally, we propose HPX “start/stop” backend allowing for calls to cv::parallel_for_() in the same way it is done for other backends for completeness.
@Jakub-Golinowski, according to the chart, even though tbb and omp are slower than pthreads, they are still faster than HPX. I think, if you want some real heavy workload, you may want to run the opencv_perf_dnn (and we (opencv team) would be very interested to see the results); for that you need to clone opencv_extra repository, run opencv_extra/testdata/dnn/download_models.py, set environment variable
Also, you are saying that using HPX has the advantage that the utilization of CPU cores is balanced between different components. Is the advantage preserved when
To help clarify this HPX PR, I'd like to explain further ...
OpenMP and TBB are parallelism libraries that specify parallel regions in which for_loops and such like can be run, and within which, make use of threading resources. The OpenCV pthreads backend is a special case that is 'hand-implemented'.
HPX does not use parallel regions as such. When the user starts his/her application, the whole of
Starting the runtime before
The use case for this PR is for a user who wishes to use HPX in their code, has already started the HPX runtime and created tasks, but may also want to run an OpenCV algorithm and make use of the existing thread pools on HPX threads. The OpenCV algorithm will generally be run within an existing HPX task and will be run on all threads assigned to the HPX runtime (or subset thereof defined by an executor/pool etc).
The secondary use-case is when a user does not already use HPX in their code, but wishes to start/stop the HPX runtime for each OpenCV algorithm (analogous to parallel regions for TBB/OpenMP). (In practice this is unlikely to be used if performance is worse than OpenMP and we could drop this support if it improved the chances of the first use-case being accepted).
We shall investigate the performance of both operating modes relative to OpenMP/pthreads. This PR represents the first version of HPX integration and may be subject to performance improvements.
@biddisco, sorry for delay with followup. So, what will happen if start/stop option is not used and one forgot to add this magic clause into an app?
#if defined(HAVE_HPX) && !defined(HPX_STARTSTOP) #include #endif
I think, we should eliminate start/stop option support, since it's extremely slow. And then think on how to make non-start/stop variant more convenient (or at least report a proper error when we forgot to add that include).
I believe that an exception would be thrown to the effect of "Trying to call an HPX routine before the runtime has been started". It's possible that the exception would be a bit more obscure - like "runtime pointer undefined" or something of that sort. Perhaps @Jakub-Golinowski could try it and see what does actually happen.
As for the error reported when one forgets to include hpx_main.hpp, it depends on what will be the first call to functions depending on the runtime. For example in the mandelbrot opencv benchmark (link) one gets the following error when hpx_main.hpp is not included:
At the moment we have the opportunity to rework the error messages caused by not including hpx_main.hpp as other gsoc student is working on improvements to the hpx_main implementation and I can collaborate with him on this. Moreover, including hpx_main is not the only way of starting the runtime and we assume that users should be familiar with the following chapter of the HPX documentation: link. It describes different ways of starting the backend, some of them are easier to use and others give the user greater control over the runtime. Finally, since the runtime is the core construct in HPX the user who decides to build opencv with HPX backend will most likely be aware of the above mentioned documentation chapter and and will not try to write an application without starting the runtime.
As for the dnn benchmark the results are presented in the following html file (produced with the opencv run.py and summary.py scripts): link. This was the run on the 4-core machine with fixed cpu frequency (ensuring that results are comparable). As can be seen in the above mentioned summary html file the opencv with hpx backend is of comparable or better performance than pthread backend in roughly 60% of tests. For the remaining tests the relative performance of hpx against pthreads is mainly in range 5%-25% with a few outliers.
We agree that the start/stop backend could be removed.
@vpisarev The results presented in the DNN perf test comparison table are achieved without the start/stop, i.e. with the primary version that requires the user to manage the runtime by himself and therefore giving him maximum control over what happens in his application.
As discussed before I dropped the start/stop backend in the most recent commit. As test has shown for the common use-case choosing pthreads backend is optimal and start/stop version is superfluous.
Summing up, the primary (and now the only) version of the HPX backend is suitable for an HPX application that uses OpenCV library. User can seamlessly integrate the OpenCV calls within his execution DAG and does not have to worry about competing backends.