[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js) #15371

Wenzhao-Xiang · 2019-08-22T10:40:48Z

Overview

Proposal: Improve the performance of JavaScript version of OpenCV (OpenCV.js)
Project Report
Mentor: Ningxin Hu @huningxin , Vitaly Tuzov @terfendail
Student: Wenzhao Xiang @Wenzhao-Xiang

This pullrequest changes

Create the base of OpenCV.js performance test:
This perf test is based on benchmark.js. And we first add cvtColor, Resize, Threshold into it. We support both browser and Node.js version of it for test.
Optimize the OpenCV.js performance by WASM threads:
This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser. We expose two new API cv.parallel_pthreads_set_threads_num(number) and cv.parallel_pthreads_get_threads_num(), so we can use the former to set threads number dynamically and use the latter to get the current threads number. And the default threads number is the logic core number of the device.
Optimize the OpenCV.js performance by WASM SIMD:
Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development. The simd version of OpenCV.js built by latest LLVM upstream may not work with the stable browser or old version of Node.js. Please use the latest version of unstable browser or Node.js to get new features, like Chrome Dev.

The Test

Test Environment:

  OS: Ubuntu 16.04
  Emscripten: 1.38.42, LLVM upstream backend
  Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)
  Hardware: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz with ８ logical cores:

OpenCV.js tests: all passed
Universal Intrinsics WASM backend test: all passed

Results

For OpenCV kernels, take Threshold kernel with parameter (1920x1080, CV_8UC1, THRESH_BINARY) as example:

OS: Ubuntu 16.04<br>
Emscripten: 1.38.42, LLVM upstream backend<br>
Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)<br>
Hardware: Core(TM) i7-8700 CPU @ 3.20GHz with 12 logical cores

build	mean time (ms)	speed up (to scalar)
scalar	1.164	1
threads	0.261	4.45
simd	0.123	9.46
threads + simd	0.039	29.84

For real case, take OpenCV.js face recognition sample as example:

OS: Ubuntu Linux 16.04.5
Emscripten: 1.38.42, LLVM upstream backend
Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)
Hardware: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 12 logical cores

OpenCV.js build	FPS	Speedup (to scalar)
scalar	3	1
threads	10	3.33
simd	12	4
threads + simd	26	8.6

The OpenCV.js setup tutorial
The OpenCV.js demos (May need the latest version of Chrome-Dev)
GSoC report video

Performance Analysis

Kernel performance(ms)

Test Environment:
OS: Ubuntu 16.04
Emscripten: 1.38.42, LLVM upstream backend
Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)
Hardware: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz with 8 logical cores:

Kernel		cvtColor		resize	threshold
Type	BGR2GRAY	YUV2BGR	BGR2YUV	LINEAR_EXACT	THRESH_BINARY
WASM(scalar)	2.747	12.04	10.685	5.425	1.347
WASM(threads=8)	0.722	2.636	2.618	2.529	0.353
WASM(SIMD128)	11.355	27.02	27.245	4.309	0.14
WASM(threads+SIMD)	1.869	4.757	4.769	1.79	0.043
Native(scalar)	2.11	7.89	8.68	6.45	0.6
Native(threads=8)	0.58	1.97	2.19	2.48	0.17
Native(SIMD128)	1.91	3.4	4.15	1.03	0.11
Native(threads+SIMD)	0.43	0.86	0.91	0.65	0.04

Analysis

With the current optimization, threads optimization works as we expected. However, wasm simd still have some issues. As we can see in the Kernel performance result, now resize only have 1.34x speed up than scalar version and cvtColor is even 2-3x slower than scalar version, which still have a big gap compared with Native SIMD optimization.
Thanks @huningxin for the investigation, here are some analysis results:

For resize, the biggest reason is that we now use shift to simulation integer widening instructions in v_dotprod. We have opened an emscripten issue. And we can continue to optimize resize kernel after this new feature is enabled.
For cvtColor, the root cause is the inefficient pshufb with memory operands are generated by V8 for current implementation.
One solution is to refer to sse implementation that uses punpcklbw and punpckhqdq. We tried but it still fails due to an emscripten issue that leads V8 fails to generate those instructions. Let's see the response from emscripten community.

force_builders=Custom
buildworker:Custom=linux-1,linux-2,linux-4
build_image:Docs=docs-js
build_image:Custom=javascript

huningxin · 2019-08-23T02:59:03Z

@Wenzhao-Xiang , please use git diff --check to find and fix the trailing white space issues.

Wenzhao-Xiang · 2019-08-23T05:21:42Z

@huningxin Thanks! I will fix the trailing white space issues and merge the two commits into one to take it as my GSoC final commit.

alalek

Thank you for the contribution! Great job 👍

modules/core/include/opencv2/core/hal/intrin_wasm.hpp

modules/core/src/mathfuncs_core.simd.hpp

Wenzhao-Xiang · 2019-08-28T09:38:58Z

@huningxin @terfendail I just found, almost all the implementation of f64/u64/i64 function use fallback implementation due to the limitation of wasm intrin, which means only few function could be removed. So I'd suggest we just keep all the fallback function. WDYT?

Wenzhao-Xiang · 2019-08-29T04:11:51Z

Update the performance analysis #15371 (comment)

huningxin · 2019-08-29T06:06:31Z

which means only few function could be removed. So I'd suggest we just keep all the fallback function. WDYT?

Could you please specify what they are? I think it will help the decision. Thanks.

alalek

Thank you for update!

alalek · 2019-08-29T08:17:04Z

modules/core/include/opencv2/core/cv_cpu_dispatch.h

@@ -152,6 +152,11 @@
 #  define CV_VSX3 1
 #endif

+#if defined(EMSCRIPTEN)
+#  define CV_WASM_SIMD 1


Usually this macro is used in OpenCV for check:

defined(__EMSCRIPTEN__)

Is there any difference?

How SIMD feature can be disabled (via CMake/.py script parameters)? (it is useful for debugging purposes)

Thanks for review!
According to Detecting Emscripten in preprocessor, the correct define to use is __EMSCRIPTEN__.
emscripten-core/emscripten#4665 introduced a strict build mode and removed the EMSCRIPTEN define. Therefore it is not recommended to use EMSCRIPTEN even though it still works in non-strict build mode.
I'll fix that then.

For how to disable SIMD feature, it's decided by a .py script flag --simd. If you build with this flag, CV_ENABLE_INTRINSICS will be turned on, and then SIMD feature will be detected. And if not, only scalar version will be built.

Wenzhao-Xiang · 2019-09-03T07:34:12Z

@huningxin
See here:

v_rshr_pack, v_rshr_pack_store, v_pack_b, + , -, * , /, v_mul_expand, v_sqrt, v_invsqrt, 
v_abs, min, max, ==, !=, <, >, <=, >=, v_not_nan, v_absdiff, v_magnitude, <<, >>, v_shl, 
v_shr, v_store_low, v_store_high, v_reduce_sum, v_popcount, v_signmask, v_check_all, 
v_check_any, v_round, v_floor, v_ceil, v_trunc, v_cvt_f32, v_cvt_f64, v_cvt_f64_high,

They are almost for f64, i64, u64.
Keeping all the fallback functions will also help debug, I think.

Wenzhao-Xiang · 2019-09-03T14:41:26Z

@alalek updated it. Is there any issues?

huningxin · 2019-09-04T00:34:34Z

Keeping all the fallback functions will also help debug, I think.

+1. Thanks for the information.

terfendail · 2019-09-04T13:07:32Z

I suppose that retaining a few more fallback functions shouldn't essentially affect the size of the library. So let's keep them.

Wenzhao-Xiang · 2019-09-05T02:12:42Z

I suppose that retaining a few more fallback functions shouldn't essentially affect the size of the library.

I agree! Thanks!

Wenzhao-Xiang · 2019-09-06T02:54:24Z

Any updates here? @alalek @terfendail @huningxin

alalek

Looks good to me 👍

Wenzhao-Xiang · 2019-09-19T12:01:26Z

Thanks! @alalek
Any inputs? @terfendail

terfendail

👍

Wenzhao-Xiang · 2019-09-21T17:54:27Z

Rebased this branch to solve the conflicts.

huningxin · 2019-09-23T00:37:14Z

@Wenzhao-Xiang thanks for the rebase. @terfendail @alalek , is it OK to merge now? Otherwise we need Wenzhao to keep rebasing this PR.

alalek · 2019-09-23T11:25:58Z

I rebased PR onto 3.4 branch: https://github.com/alalek/opencv/commits/pr15371_r

Please pull these changes into Wenzhao-Xiang:gsoc_2019 or follow the instruction here: https://github.com/opencv/opencv/wiki/Branches

Improve the performance of JavaScript version of OpenCV (OpenCV.js): 1. Create the base of OpenCV.js performance test: This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it. 2. Optimize the OpenCV.js performance by WASM threads: This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser. 3. Optimize the OpenCV.js performance by WASM SIMD: Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.

1. use short license header 2. fix documentation node issue 3. remove the unused `hasSIMD128()` api

1. fix emscripten define 2. use fallback function for f16

Fix rebase issue

Wenzhao-Xiang · 2019-09-24T04:15:18Z

@alalek
Done! PTAL, Thanks!

huningxin · 2019-09-25T07:19:27Z

Awesome! Thanks @alalek @terfendail @Wenzhao-Xiang .

waliurjs · 2020-04-24T08:12:53Z

Guys, begging you to release. This will be so dope!

Wenzhao-Xiang force-pushed the gsoc_2019 branch from 50aee2c to 1d12a88 Compare August 22, 2019 11:46

Wenzhao-Xiang force-pushed the gsoc_2019 branch from 1097238 to 82e98fa Compare August 23, 2019 06:31

Wenzhao-Xiang mentioned this pull request Aug 23, 2019

Create PR to main opencv repo huningxin/opencv#295

Closed

5 tasks

alalek reviewed Aug 23, 2019

View reviewed changes

alalek reviewed Aug 29, 2019

View reviewed changes

alalek approved these changes Sep 18, 2019

View reviewed changes

terfendail approved these changes Sep 19, 2019

View reviewed changes

Wenzhao-Xiang force-pushed the gsoc_2019 branch from ecdc729 to c15f138 Compare September 21, 2019 17:52

Wenzhao-Xiang changed the base branch from master to 3.4 September 23, 2019 15:40

alalek mentioned this pull request Sep 23, 2019

(3.4) build: eliminate cxx11 warnings #15573

Merged

Wenzhao-Xiang added 4 commits September 24, 2019 00:13

[GSoC2019]

b3bf7e9

1. use short license header 2. fix documentation node issue 3. remove the unused `hasSIMD128()` api

[GSoC2019]

45d64b2

1. fix emscripten define 2. use fallback function for f16

[GSoC2019]

b6467d0

Fix rebase issue

Wenzhao-Xiang force-pushed the gsoc_2019 branch from 7186dbb to b6467d0 Compare September 24, 2019 04:11

alalek assigned terfendail Sep 24, 2019

alalek merged commit c209677 into opencv:3.4 Sep 24, 2019

alalek mentioned this pull request Sep 25, 2019

Merge 3.4 #15593

Merged

Neuroforge mentioned this pull request Oct 2, 2019

Opencvjs 4.0 WASM haoking/opencvjs#3

Open

This was referenced Oct 15, 2019

JS(SIMD): support Emscripten 1.38.48-upstream #15708

Merged

JS(SIMD): v_reverse implementation #15709

Merged

huningxin mentioned this pull request Jul 24, 2020

[SIMD] optimize the cvtColor kernel huningxin/opencv#301

Closed

alalek mentioned this pull request Mar 20, 2021

Wasm perf tests RuntimeError: abort(TypeError: cv.Size is not a constructor). #19754

Closed

alalek mentioned this pull request May 11, 2021

Problem with OpenCV.js when load with web worker #20064

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js) #15371

[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js) #15371

Wenzhao-Xiang commented Aug 22, 2019 •

edited

Loading

huningxin commented Aug 23, 2019

Wenzhao-Xiang commented Aug 23, 2019

alalek left a comment

Wenzhao-Xiang commented Aug 28, 2019

Wenzhao-Xiang commented Aug 29, 2019

huningxin commented Aug 29, 2019

alalek left a comment

alalek Aug 29, 2019

Wenzhao-Xiang Aug 29, 2019

Wenzhao-Xiang Aug 29, 2019 •

edited

Loading

Wenzhao-Xiang commented Sep 3, 2019 •

edited

Loading

Wenzhao-Xiang commented Sep 3, 2019

huningxin commented Sep 4, 2019

terfendail commented Sep 4, 2019

Wenzhao-Xiang commented Sep 5, 2019

Wenzhao-Xiang commented Sep 6, 2019

alalek left a comment

Wenzhao-Xiang commented Sep 19, 2019

terfendail left a comment

Wenzhao-Xiang commented Sep 21, 2019

huningxin commented Sep 23, 2019 •

edited

Loading

alalek commented Sep 23, 2019 •

edited

Loading

Wenzhao-Xiang commented Sep 24, 2019

huningxin commented Sep 25, 2019

waliurjs commented Apr 24, 2020

[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js) #15371

[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js) #15371

Conversation

Wenzhao-Xiang commented Aug 22, 2019 • edited Loading

Overview

This pullrequest changes

The Test

Results

Performance Analysis

Kernel performance(ms)

Analysis

huningxin commented Aug 23, 2019

Wenzhao-Xiang commented Aug 23, 2019

alalek left a comment

Choose a reason for hiding this comment

Wenzhao-Xiang commented Aug 28, 2019

Wenzhao-Xiang commented Aug 29, 2019

huningxin commented Aug 29, 2019

alalek left a comment

Choose a reason for hiding this comment

alalek Aug 29, 2019

Choose a reason for hiding this comment

Wenzhao-Xiang Aug 29, 2019

Choose a reason for hiding this comment

Wenzhao-Xiang Aug 29, 2019 • edited Loading

Choose a reason for hiding this comment

Wenzhao-Xiang commented Sep 3, 2019 • edited Loading

Wenzhao-Xiang commented Sep 3, 2019

huningxin commented Sep 4, 2019

terfendail commented Sep 4, 2019

Wenzhao-Xiang commented Sep 5, 2019

Wenzhao-Xiang commented Sep 6, 2019

alalek left a comment

Choose a reason for hiding this comment

Wenzhao-Xiang commented Sep 19, 2019

terfendail left a comment

Choose a reason for hiding this comment

Wenzhao-Xiang commented Sep 21, 2019

huningxin commented Sep 23, 2019 • edited Loading

alalek commented Sep 23, 2019 • edited Loading

Wenzhao-Xiang commented Sep 24, 2019

huningxin commented Sep 25, 2019

waliurjs commented Apr 24, 2020

Wenzhao-Xiang commented Aug 22, 2019 •

edited

Loading

Wenzhao-Xiang Aug 29, 2019 •

edited

Loading

Wenzhao-Xiang commented Sep 3, 2019 •

edited

Loading

huningxin commented Sep 23, 2019 •

edited

Loading

alalek commented Sep 23, 2019 •

edited

Loading