Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the ObjectLoader use more efficient methods when determining if data needs to be loaded #11284

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

Currently, for data in ChunkedStream instances, the getMissingChunks method is used in a couple of places to determine if data is already available or if it needs to be loaded.

When looking at how ChunkedStream.getMissingChunks is being used in the ObjectLoader you'll notice that we don't actually care about which specific chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question.
Furthermore, when looking at how ChunkedStream.getMissingChunks itself is implemented you'll notice that it (somewhat expectedly) always iterates over all chunks.

All in all, using ChunkedStream.getMissingChunks in the ObjectLoader seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a ChunkedStream.allChunksLoaded method, consisting of a single simple check, which seems like a perfect fit for the ObjectLoader use cases.
In particular, once the entire PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the ObjectLoader to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with.

Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the ObjectLoader is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/bdd5bfc4eac57de/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f2ae79d712d9720/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/f2ae79d712d9720/output.txt

Total script time: 18.71 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/bdd5bfc4eac57de/output.txt

Total script time: 26.60 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/67269d611c752a1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/3af6099ceb42557/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/3af6099ceb42557/output.txt

Total script time: 18.66 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/67269d611c752a1/output.txt

Total script time: 26.60 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/67269d611c752a1/reftest-analyzer.html#web=eq.log

@Snuffleupagus Snuffleupagus marked this pull request as ready for review October 29, 2019 19:27
src/core/obj.js Outdated Show resolved Hide resolved
src/core/chunked_stream.js Show resolved Hide resolved
…f data needs to be loaded

Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded.

When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which *specific* chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question.
Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over *all* chunks.

All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a *single* simple check, which seems like a perfect fit for the `ObjectLoader` use cases.
In particular, once the *entire* PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with.

Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)
As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).
@timvandermeij timvandermeij merged commit 72bd8e8 into mozilla:master Oct 30, 2019
@timvandermeij
Copy link
Contributor

Good improvement!

@Snuffleupagus Snuffleupagus deleted the ObjectLoader-allChunksLoaded branch October 30, 2019 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants