Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of applyMaskImageData #14766

Merged
merged 1 commit into from
Apr 9, 2022

Conversation

calixteman
Copy link
Contributor

  • write some uint32 instead of uint8 to avoid the check before clamping;
  • unroll the loop to write data in the buffer
  • but keep a loop for the last element of a line: it likely doesn't hurt that much since it's executed only for one time for each line;
  • I tested on a macbook with an Apple chip, and on Firefox nightly the new code is almost 3.5x faster than before (~1.8x with Chrome).

@calixteman
Copy link
Contributor Author

I measured the performances improvement with the pdf in https://bugzilla.mozilla.org/show_bug.cgi?id=878397.
I just open the first page, run each functions 100 times and computed the average time.
I got the following results in Firefox nightly (macbook pro M1):

For 100 runs, old: 13.760410000000084ms, new: 3.9957212499999697ms => 3.44x faster. 
For 100 runs, old: 11.18828541666674ms, new: 3.2841725000000044ms => 3.41x faster. 
For 100 runs, old: 11.076615416666673ms, new: 3.210079583333181ms => 3.45x faster.
For 100 runs, old: 11.120658750000075ms, new: 3.244162500000166ms => 3.43x faster. 
For 100 runs, old: 11.040089583333465ms, new: 3.299125833333292ms => 3.35x faster. 
For 100 runs, old: 11.082967083333351ms, new: 3.2127058333334206ms => 3.45x faster. 
For 100 runs, old: 11.006012083333443ms, new: 3.182824166666851ms => 3.46x faster. 
For 100 runs, old: 11.142017916666319ms, new: 3.218462083333325ms => 3.46x faster. 
For 100 runs, old: 13.580457500000248ms, new: 3.903052916667075ms => 3.48x faster. 
For 100 runs, old: 13.65221250000006ms, new: 3.9218341666663763ms => 3.48x faster. 
For 100 runs, old: 13.57868208333326ms, new: 3.9597995833329334ms => 3.43x faster. 
For 100 runs, old: 19.828106666666898ms, new: 5.813075000000026ms => 3.41x faster. 
For 100 runs, old: 13.406442500000121ms, new: 3.8805799999998998ms => 3.45x faster. 
For 100 runs, old: 10.512800416666504ms, new: 3.088225416666719ms => 3.40x faster. 
For 100 runs, old: 10.579237499999982ms, new: 3.078336666667019ms => 3.44x faster. 
For 100 runs, old: 10.544124583333796ms, new: 3.0949412499999016ms => 3.41x faster. 
For 100 runs, old: 13.468600833333257ms, new: 3.8730854166666178ms => 3.48x faster. 
For 100 runs, old: 13.375664999999717ms, new: 3.8885295833331472ms => 3.44x faster. 
For 100 runs, old: 13.551173333333063ms, new: 3.8657320833329867ms => 3.51x faster. 
For 100 runs, old: 13.32980250000015ms, new: 3.867634999999791ms => 3.45x faster. 
For 100 runs, old: 13.403179166665941ms, new: 3.862169583332943ms => 3.47x faster. 
For 100 runs, old: 13.52051583333312ms, new: 3.8448604166670703ms => 3.52x faster. 
For 100 runs, old: 13.358693333333285ms, new: 3.8543900000001305ms => 3.47x faster.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you!

mask >>= 1;
for (let i = 0; i < height; i++) {
for (const max = srcPos + widthInSource; srcPos < max; srcPos++) {
const elem = srcPos < src.length ? src[srcPos] : 255;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we computed this once at the start of the function, which seems more efficient overall. Hence, please re-introduce the srcLength variable and use that here and below.

Comment on lines 36 to 44
dest[destPos] = elem & 0b10000000 ? oneMapping : zeroMapping;
dest[destPos + 1] = elem & 0b1000000 ? oneMapping : zeroMapping;
dest[destPos + 2] = elem & 0b100000 ? oneMapping : zeroMapping;
dest[destPos + 3] = elem & 0b10000 ? oneMapping : zeroMapping;
dest[destPos + 4] = elem & 0b1000 ? oneMapping : zeroMapping;
dest[destPos + 5] = elem & 0b100 ? oneMapping : zeroMapping;
dest[destPos + 6] = elem & 0b10 ? oneMapping : zeroMapping;
dest[destPos + 7] = elem & 0b1 ? oneMapping : zeroMapping;
destPos += 8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Similar to this code in canvas.js, can we use the dest[destPos++] = ... format here as well since that (visually) aligns the code a tiny bit nicer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is mainly to have some good perfs: a dest++ induces that the new value is stored in dest which is useless.
With this code we've only one store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the dest++ and it doesn't make any significative difference, let's go for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the counter can probably live in a register while it's in the loop scope and then the new value is stored only at the end of the main loop.

- write some uint32 instead of uint8 to avoid the check before clamping;
- unroll the loop to write data in the buffer
- but keep a loop for the last element of a line: it likely doesn't hurt
  that much since it's executed only for one time for each line;
- I tested on a macbook with an Apple chip, and on Firefox nightly the new
  code is almost 3.5x faster than before (~1.8x with Chrome).
@calixteman
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Apr 9, 2022

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/b60892538e2232d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 9, 2022

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/939e6a85ec26ae6/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 9, 2022

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/b60892538e2232d/output.txt

Total script time: 24.78 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 5

Image differences available at: http://54.241.84.105:8877/b60892538e2232d/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 9, 2022

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/939e6a85ec26ae6/output.txt

Total script time: 26.42 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 1

Image differences available at: http://54.193.163.58:8877/939e6a85ec26ae6/reftest-analyzer.html#web=eq.log

@calixteman calixteman merged commit 2c135b0 into mozilla:master Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants