Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary with undefined indirect object and direct object for same key fails #325

Closed
rwirth opened this issue Feb 13, 2017 · 4 comments
Closed
Labels
is-robustness-issue From a users perspective, this is about robustness needs-example-code The issue needs a minimal and complete (e.g. all imports) example showing the problem needs-pdf The issue needs a PDF file to show the problem

Comments

@rwirth
Copy link

rwirth commented Feb 13, 2017

I have a PDF file with a funny-looking page dict (produced using pdfLaTeX):

<<
/Type /Page
% [...]
/Group << /S /Transparency /I true /CS /DeviceRGB>>
/Parent 61 0 R
/Group 60 0 R % obj 60 is not defined
>>

The file is displayed just fine and running pdfinfo on it works without issues. When reading, PyPDF2 (strict mode) bails out with

PyPDF2.utils.PdfReadError: Multiple definitions in dictionary at byte 0x1559 for key /Group

The PDF 1.7 reference is not totally clear on this situation because on the one hand it states that [3.2.6]

A dictionary entry whose value is null (see Section 3.2.8, “Null Object”) is equivalent to an absent entry.

and [3.2.7]

An indirect reference to an undefined object is not an error; it is simply treated as
a reference to the null object.

On the other hand it says that

No two entries in the same dictionary should have the same key. If a key does
appear more than once, its value is undefined.

One way of reading this is that the key referencing the nonexistent object should be treated as if it was absent from the dictionary, which is IMO the most sensible interpretation.

Also, the current implementation always keeps the first value so that the outcome (in nonstrict mode) depends on the order of the entries and one might end up with the indirect reference although there is a (valid) direct object present. This is also connected to issue #236.

rwirth added a commit to rwirth/PyPDF2 that referenced this issue Feb 13, 2017
…g them

Entries with null values (either explicit or through references to nonexistent
objects) are completely ignored and do not cause multiple definition errors
if two entries are present for the same key and one has a null value.

Fixes issue py-pdf#325 and proposes a behavior for issue py-pdf#236.
rwirth added a commit to rwirth/PyPDF2 that referenced this issue Feb 13, 2017
@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 7, 2022
@MartinThoma
Copy link
Member

Can you share a PDF that causes the issue + code to show the issue?

@MartinThoma MartinThoma added the needs-change The PR/issue cannot be handled as issue and needs to be improved label Apr 22, 2022
@MartinThoma MartinThoma added needs-pdf The issue needs a PDF file to show the problem needs-example-code The issue needs a minimal and complete (e.g. all imports) example showing the problem is-robustness-issue From a users perspective, this is about robustness and removed needs-change The PR/issue cannot be handled as issue and needs to be improved is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Jun 26, 2022
@MartinThoma
Copy link
Member

I guess this was an issue with a non-conformant PDF. As we don't have a good way to verify the issue / any solution without an example PDF file, I'll close this. Please let me know if you have one / a potential solution

@rwirth
Copy link
Author

rwirth commented Jul 15, 2022

Sorry, I don't remember much about the details, it has been 5 years. There is a test case and example file that illustrates the issue. It contains a page dictionary with two /Group entries. Both contain indirect references but one of the referenced objects does not exist.

@MartinThoma MartinThoma reopened this Jul 15, 2022
@pubpub-zz
Copy link
Collaborator

attaching the test file from @rwirth (for local ref)
multipledefs.pdf

I've checked the file with latest version and only a warning is reported. I've created a PR to check that

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Jul 24, 2022
no issue to be solved
MartinThoma added a commit that referenced this issue Jul 25, 2022
Bug Fixes (BUG):
-  u_hash in AlgV4.compute_key (#1170)

Robustness (ROB):
-  Fix loading of file from #134 (#1167)
-  Cope with empty DecodeParams (#1165)

Documentation (DOC):
-  Typo in warning message (#1166)

Maintenance (MAINT):
-  Package updates; solve mypy strict remarks (#1163)

Testing (TST):
-  Add test from #325 (#1169)

Full Changelog: 2.8.0...2.8.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-robustness-issue From a users perspective, this is about robustness needs-example-code The issue needs a minimal and complete (e.g. all imports) example showing the problem needs-pdf The issue needs a PDF file to show the problem
Projects
None yet
3 participants