Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614) #10615

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

There's a number of things wrong with the PDF document, since its inline images are first all a lot larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045).

Furthermore the actual ASCII85Decode data is interspersed with a lot of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection.
Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that all whitespace should be ignored.

Fixes #10614.

…f the EOD marker (issue 10614)

There's a number of things wrong with the PDF document, since its inline images are first all *a lot* larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045).

Furthermore the actual ASCII85Decode data is interspersed with *a lot* of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection.
Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that *all* whitespace should be ignored.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Mar 4, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/6357392be4e4e1d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 4, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/bee5316a2709db1/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 4, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/bee5316a2709db1/output.txt

Total script time: 18.08 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Mar 4, 2019

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/6357392be4e4e1d/output.txt

Total script time: 25.69 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Mar 8, 2019

Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that all whitespace should be ignored.

Note also the Ascii85Stream decoding in

pdf.js/src/core/stream.js

Lines 963 to 965 in e1b01a6

while (isSpace(c)) {
c = str.getByte();
}
and

pdf.js/src/core/stream.js

Lines 987 to 989 in e1b01a6

while (isSpace(c)) {
c = str.getByte();
}

/cc @brendandahl Do you have time to review this patch?

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Mar 8, 2019

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/8c2d4fd4f3187da/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 8, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/8c2d4fd4f3187da/output.txt

Total script time: 1.81 mins

Published

@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Mar 8, 2019

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/1c1e797f38c045d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 8, 2019

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/676618f9b7704bf/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 8, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/1c1e797f38c045d/output.txt

Total script time: 16.42 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Mar 8, 2019

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/676618f9b7704bf/output.txt

Total script time: 23.47 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij timvandermeij merged commit 8b149b8 into mozilla:master Mar 8, 2019
@timvandermeij
Copy link
Contributor

Thank you for finding and fixing this bug!

@Snuffleupagus Snuffleupagus deleted the corrupt-inline-ASCII85Decode branch March 9, 2019 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants