-
-
Notifications
You must be signed in to change notification settings - Fork 55.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js) #15371
Conversation
50aee2c
to
1d12a88
Compare
@Wenzhao-Xiang , please use |
@huningxin Thanks! I will fix the trailing white space issues and merge the two commits into one to take it as my GSoC final commit. |
1097238
to
82e98fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! Great job 👍
@huningxin @terfendail I just found, almost all the implementation of |
Update the performance analysis #15371 (comment) |
Could you please specify what they are? I think it will help the decision. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update!
@@ -152,6 +152,11 @@ | |||
# define CV_VSX3 1 | |||
#endif | |||
|
|||
#if defined(EMSCRIPTEN) | |||
# define CV_WASM_SIMD 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually this macro is used in OpenCV for check:
defined(__EMSCRIPTEN__)
Is there any difference?
How SIMD feature can be disabled (via CMake/.py script parameters)? (it is useful for debugging purposes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review!
According to Detecting Emscripten in preprocessor, the correct define to use is __EMSCRIPTEN__
.
emscripten-core/emscripten#4665 introduced a strict build mode and removed the EMSCRIPTEN
define. Therefore it is not recommended to use EMSCRIPTEN
even though it still works in non-strict build mode.
I'll fix that then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For how to disable SIMD feature, it's decided by a .py script flag --simd
. If you build with this flag, CV_ENABLE_INTRINSICS
will be turned on, and then SIMD feature will be detected. And if not, only scalar version will be built.
@huningxin
They are almost for |
@alalek updated it. Is there any issues? |
+1. Thanks for the information. |
I suppose that retaining a few more fallback functions shouldn't essentially affect the size of the library. So let's keep them. |
I agree! Thanks! |
Any updates here? @alalek @terfendail @huningxin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
Thanks! @alalek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
ecdc729
to
c15f138
Compare
Rebased this branch to solve the conflicts. |
@Wenzhao-Xiang thanks for the rebase. @terfendail @alalek , is it OK to merge now? Otherwise we need Wenzhao to keep rebasing this PR. |
I rebased PR onto 3.4 branch: https://github.com/alalek/opencv/commits/pr15371_r Please pull these changes into |
Improve the performance of JavaScript version of OpenCV (OpenCV.js): 1. Create the base of OpenCV.js performance test: This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it. 2. Optimize the OpenCV.js performance by WASM threads: This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser. 3. Optimize the OpenCV.js performance by WASM SIMD: Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.
1. use short license header 2. fix documentation node issue 3. remove the unused `hasSIMD128()` api
1. fix emscripten define 2. use fallback function for f16
Fix rebase issue
7186dbb
to
b6467d0
Compare
@alalek |
Awesome! Thanks @alalek @terfendail @Wenzhao-Xiang . |
Guys, begging you to release. This will be so dope! |
Overview
This pullrequest changes
This perf test is based on benchmark.js. And we first add
cvtColor
,Resize
,Threshold
into it. We support both browser and Node.js version of it for test.This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser. We expose two new API
cv.parallel_pthreads_set_threads_num(number)
andcv.parallel_pthreads_get_threads_num()
, so we can use the former to set threads number dynamically and use the latter to get the current threads number. And the default threads number is the logic core number of the device.Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development. The simd version of OpenCV.js built by latest LLVM upstream may not work with the stable browser or old version of Node.js. Please use the latest version of unstable browser or Node.js to get new features, like
Chrome Dev
.The Test
Test Environment:
Results
Threshold
kernel with parameter(1920x1080, CV_8UC1, THRESH_BINARY)
as example:Performance Analysis
Kernel performance(ms)
Test Environment:
OS: Ubuntu 16.04
Emscripten: 1.38.42, LLVM upstream backend
Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)
Hardware: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz with 8 logical cores:
Analysis
With the current optimization, threads optimization works as we expected. However, wasm simd still have some issues. As we can see in the
Kernel performance result
, nowresize
only have 1.34x speed up than scalar version andcvtColor
is even 2-3x slower than scalar version, which still have a big gap compared with Native SIMD optimization.Thanks @huningxin for the investigation, here are some analysis results:
shift
to simulationinteger widening
instructions inv_dotprod
. We have opened an emscripten issue. And we can continue to optimizeresize
kernel after this new feature is enabled.pshufb
with memory operands are generated by V8 for current implementation.One solution is to refer to sse implementation that uses
punpcklbw
andpunpckhqdq
. We tried but it still fails due to an emscripten issue that leads V8 fails to generate those instructions. Let's see the response from emscripten community.