Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) #5655

Merged
merged 1 commit into from Apr 20, 2015

Conversation

Snuffleupagus
Copy link
Collaborator

This is a tentative patch that fixes #5644.

The issue with the referenced PDF file is that the Pages dictionary contains a number of empty Kids, and we were getting stuck in those nodes before reaching the actual pages.

As far as I can tell this code is just an optimization, to avoid what is usually a number of unnecessary iterations, but isn't strictly necessary. The only way I could find to solve the particular issue referenced, and to avoid similar issues in the future, was to remove that code path. This might mean that the lookup becomes a tiny bit slower in very large files, but I unfortunately found no other way to fix the issue (hence why I'm labelling this as a tentative patch).

Edit: Removing this optimization, as the first version of the patch did, would be bad for performance reasons (especially with ranged loading). Hence I've submitted a new, and hopefully better, version of the patch that only checks all Kids if we've actually encountered an empty node.

I've checked a large number of the PDF files in the test suite, especially the longer ones (where the performance penalty would be especially bad), and didn't find any file where the special case introduced in this patch was actually hit.

Note: I'm adding an eq test, to ensure that we actually find the correct page for each pageIndex.

@Snuffleupagus Snuffleupagus added this to the 2015 Q1 milestone Jan 17, 2015
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c6bdd383996e251/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/77970bc91c955d0/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/c6bdd383996e251/output.txt

Total script time: 17.52 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/77970bc91c955d0/output.txt

Total script time: 22.52 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@brendandahl
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://107.22.172.223:8877/7ac9fda05041cc6/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://107.21.233.14:8877/352f1b0cbaa9d80/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/7ac9fda05041cc6/output.txt

Total script time: 18.43 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/352f1b0cbaa9d80/output.txt

Total script time: 22.72 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij
Copy link
Contributor

@brendandahl Good to merge, right?

brendandahl added a commit that referenced this pull request Apr 20, 2015
Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644)
@brendandahl brendandahl merged commit 846eb96 into mozilla:master Apr 20, 2015
@Snuffleupagus Snuffleupagus deleted the issue-5644 branch April 20, 2015 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to render PDF with many pictures
4 participants