Patch for dealing with badly formed pdfs made on an ipad by comqdhb · Pull Request #73 · itext/itext-java

comqdhb · 2021-06-22T08:46:17Z

While reading a pdf annotated on an iPad, the dictionary was created with a String "Name" and not a token.Name. If we accept a String as a valid name, the reader can continue.

Makes me think that there should be a way of continuing if an err occurs.

While reading a pdf annotated on an iPad, the dictionary was created with a String "Name" and not a token.Name. If we accept a String as a valid name, the reader can continue. Makes me think that there should be a way of continuing if an err occurs.

yulian-gaponenko · 2022-06-01T17:14:35Z

Thank you for the PR @comqdhb !

I'm afraid, we think that this change makes the parsing too lenient and can also have many unexpected and undesirable side effects. This change allows basically any key to be a string (still parsing it as if it was a name), and this alters the behavior for parsing any dictionary within any PDF file. There are a lot of sharp edges here:

names and strings have different and various encoding rules, it's not clear how to handle strings properly;
what would happens if there is a (Name) key and /Name key at the same time? There are many different ways how such conflicts would be handled, and none is right;
another possible scenario is, though, that some software forgot to add an element or added an element too many between << and >>. We encountered such PDFs as well, and one element more or less means that all following keys become values and all following values become key candidates. In dictionaries that are string-valued, such leniency can result in more unexpected results than just having a clear exception.

We've also checked how other PDF processors handle such invalid PDF files and we see that strict approach is commonly applied.

If this scenario is critical, we suggest to customize implementation in the client code instead. Dictionary structure check is done in PdfReader#readDictionary that is protected, so one can quite easily introduce a custom PdfReader class that would override that method implementation.

I'm also attaching some of the examples of such invalid documents that we've generated just to keep them here.
Font-Entry_String-Key.pdf
ModDate_String-Key.pdf
Random_String-Key.pdf

yulian-gaponenko closed this Jun 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch for dealing with badly formed pdfs made on an ipad#73

Patch for dealing with badly formed pdfs made on an ipad#73
comqdhb wants to merge 1 commit into
itext:developfrom
comqdhb:develop

comqdhb commented Jun 22, 2021

Uh oh!

yulian-gaponenko commented Jun 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

comqdhb commented Jun 22, 2021

Uh oh!

yulian-gaponenko commented Jun 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants