Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve IndirectObject when it refers to a free entry. #1054

Merged
merged 1 commit into from Jul 5, 2022

Conversation

Hatell
Copy link
Contributor

@Hatell Hatell commented Jul 4, 2022

Example to resolve IndirectObject when it refers to a free entry.

This could resolve issues #1034 and #521 .

@codecov
Copy link

codecov bot commented Jul 4, 2022

Codecov Report

Merging #1054 (daea3e1) into main (04d576c) will decrease coverage by 0.07%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##             main    #1054      +/-   ##
==========================================
- Coverage   90.92%   90.85%   -0.08%     
==========================================
  Files          24       24              
  Lines        4508     4516       +8     
  Branches      921      923       +2     
==========================================
+ Hits         4099     4103       +4     
- Misses        267      269       +2     
- Partials      142      144       +2     
Impacted Files Coverage Δ
PyPDF2/_reader.py 91.40% <50.00%> (-0.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04d576c...daea3e1. Read the comment docs.

@MartinThoma
Copy link
Member

Thank you for the PR ❤️ I'll review it tomorrow 🤞

@MartinThoma
Copy link
Member

I think your comment #1034 (comment) is super helpful. I would use the following commit message:

BUG: Resolve IndirectObject when it refers to a free entry

From the PDF 1.7 docs https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf:

Section 7.3.10 Indirect Objects:
An indirect reference to an undefined object shall not be considered an error by a conforming reader;
it shall be treated as a reference to the null object.

And section 7.5.4 Cross-Reference Table:
There are two ways an entry may be a member of the free entries list. Using the basic mechanism the free
entries in the cross-reference table may form a linked list, with each free entry containing the object number of
the next. The first entry in the table (object number 0) shall always be free and shall have a generation number
of 65,535; it is shall be the head of the linked list of free objects. The last free entry (the tail of the linked list)
links back to object number 0. Using the second mechanism, the table may contain other free entries that link
back to object number 0 and have a generation number of 65,535, even though these entries are not in the
linked list itself.

Those entries form a linked list. The correct way to handle this is to resolve the indirect reference to the NullObject.

See "3.4.3 Cross-Reference Table" in the PDF 1.7 standard for free cross-reference entries in general.

@MartinThoma
Copy link
Member

The part that I really don't get is what the free entries are good for. They are not used, right? So why are they stored in the first place? What are those generation numbers good for?

@Hatell
Copy link
Contributor Author

Hatell commented Jul 5, 2022

I think for example if you want quickly delete something you don't need to load whole document for renumbering. Only delete wanted data and mark entry free and update other offsets according deleted data in xref table.

And I understood that you can use generation if you edit PDF-file to show changed etc. If you have same entry multiple times in xref you should pick greatest generation by default. I didn't go deep in that section.

@MartinThoma MartinThoma merged commit 02c601c into py-pdf:main Jul 5, 2022
@MartinThoma
Copy link
Member

Thank you for the contribution ❤️ This fix will be part of PyPDF2==2.4.2 (probably this evening)

MartinThoma added a commit that referenced this pull request Jul 5, 2022
New Features (ENH):
-  Add PdfReader.xfa attribute (#1026)

Bug Fixes (BUG):
-  Wrong page inserted when PdfMerger.merge is done (#1063)
-  Resolve IndirectObject when it refers to a free entry (#1054)

Developer Experience (DEV):
-  Added {posargs} to tox.ini (#1055)

Maintenance (MAINT):
-  Remove PyPDF2._utils.bytes_type (#1053)

Testing (TST):
-  Scale page (indirect rect object) (#1057)
-  Simplify pathlib PdfReader test (#1056)
-  IndexError of VirtualList (#1052)
-  Invalid XML in xmp information (#1051)
-  No pycryptodome (#1050)
-  Increase test coverage (#1045)

Code Style (STY):
-  DOC of compress_content_streams (#1061)
-  Minimize diff for #879 (#1049)

Full Changelog: 2.4.1...2.4.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants