Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded #11284

Snuffleupagus · 2019-10-28T21:48:42Z

Currently, for data in ChunkedStream instances, the getMissingChunks method is used in a couple of places to determine if data is already available or if it needs to be loaded.

When looking at how ChunkedStream.getMissingChunks is being used in the ObjectLoader you'll notice that we don't actually care about which specific chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question.
Furthermore, when looking at how ChunkedStream.getMissingChunks itself is implemented you'll notice that it (somewhat expectedly) always iterates over all chunks.

All in all, using ChunkedStream.getMissingChunks in the ObjectLoader seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a ChunkedStream.allChunksLoaded method, consisting of a single simple check, which seems like a perfect fit for the ObjectLoader use cases.
In particular, once the entire PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the ObjectLoader to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with.

Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the ObjectLoader is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)

pdfjsbot · 2019-10-28T21:57:18Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/bdd5bfc4eac57de/output.txt

pdfjsbot · 2019-10-28T21:57:18Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f2ae79d712d9720/output.txt

pdfjsbot · 2019-10-28T22:16:01Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/f2ae79d712d9720/output.txt

Total script time: 18.71 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2019-10-28T22:23:55Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/bdd5bfc4eac57de/output.txt

Total script time: 26.60 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2019-10-29T18:58:19Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/67269d611c752a1/output.txt

pdfjsbot · 2019-10-29T18:58:19Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/3af6099ceb42557/output.txt

pdfjsbot · 2019-10-29T19:17:00Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/3af6099ceb42557/output.txt

Total script time: 18.66 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2019-10-29T19:24:56Z

From: Bot.io (Windows)

Failed

Full output at http://54.215.176.217:8877/67269d611c752a1/output.txt

Total script time: 26.60 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/67269d611c752a1/reftest-analyzer.html#web=eq.log

src/core/obj.js

src/core/chunked_stream.js

…f data needs to be loaded Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded. When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which *specific* chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question. Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over *all* chunks. All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a *single* simple check, which seems like a perfect fit for the `ObjectLoader` use cases. In particular, once the *entire* PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with. Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)

As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).

timvandermeij · 2019-10-30T21:48:59Z

Good improvement!

Snuffleupagus force-pushed the ObjectLoader-allChunksLoaded branch from d6211a9 to 14dc318 Compare October 28, 2019 21:53

timvandermeij added core performance labels Oct 28, 2019

Snuffleupagus force-pushed the ObjectLoader-allChunksLoaded branch from 14dc318 to 92b2891 Compare October 29, 2019 17:15

Snuffleupagus marked this pull request as ready for review October 29, 2019 19:27

timvandermeij reviewed Oct 29, 2019

View reviewed changes

src/core/obj.js Outdated Show resolved Hide resolved

src/core/chunked_stream.js Show resolved Hide resolved

Snuffleupagus added 2 commits October 29, 2019 23:20

Inline a couple of isRef/isDict checks in the ObjectLoader code

2d35a49

As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).

Snuffleupagus force-pushed the ObjectLoader-allChunksLoaded branch from 92b2891 to 2d35a49 Compare October 29, 2019 22:20

timvandermeij approved these changes Oct 30, 2019

View reviewed changes

timvandermeij merged commit 72bd8e8 into mozilla:master Oct 30, 2019

Snuffleupagus deleted the ObjectLoader-allChunksLoaded branch October 30, 2019 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded #11284

Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded #11284

Snuffleupagus commented Oct 28, 2019

pdfjsbot commented Oct 28, 2019

pdfjsbot commented Oct 28, 2019

pdfjsbot commented Oct 28, 2019

pdfjsbot commented Oct 28, 2019

pdfjsbot commented Oct 29, 2019

pdfjsbot commented Oct 29, 2019

pdfjsbot commented Oct 29, 2019

pdfjsbot commented Oct 29, 2019

timvandermeij commented Oct 30, 2019

Make the ObjectLoader use more efficient methods when determining if data needs to be loaded #11284

Make the ObjectLoader use more efficient methods when determining if data needs to be loaded #11284

Conversation

Snuffleupagus commented Oct 28, 2019

pdfjsbot commented Oct 28, 2019

From: Bot.io (Windows)

Received

pdfjsbot commented Oct 28, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 28, 2019

From: Bot.io (Linux m4)

Success

pdfjsbot commented Oct 28, 2019

From: Bot.io (Windows)

Success

pdfjsbot commented Oct 29, 2019

From: Bot.io (Windows)

Received

pdfjsbot commented Oct 29, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 29, 2019

From: Bot.io (Linux m4)

Success

pdfjsbot commented Oct 29, 2019

From: Bot.io (Windows)

Failed

timvandermeij commented Oct 30, 2019

Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded #11284

Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded #11284