Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readObjectHeader: Allow extra whitespace before "obj" #567

Conversation

malthejorgensen
Copy link
Contributor

The header being read by readObjectHeader has the format:

<idnum> <generation> obj

where <idnum> and <generation> are integers.
Previously an arbitrary number of spaces was being allowed between <idnum> and <generation>, but not between <generation> and obj.

With this pull request an arbitrary number of spaces between <generation> and obj is allowed (but raises a warning similarly to how other extraneous whitespace is handled) .

@MartinThoma MartinThoma added the Tiny Pull requests that make a tiny change - and thus should be easy to merge label Apr 6, 2022
@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Apr 16, 2022
@MartinThoma
Copy link
Member

@malthejorgensen Sorry that it took so long - the PR looks good to me. It has merge conflicts which make it impossible for me to merge it at the moment. Would you mind solving them (or to open a new PR; might be simpler)?

I would understand if you don't want to do this. Then I'd add the change myself, giving you credit via githubs co-authored-by feature

@malthejorgensen malthejorgensen force-pushed the readObjectHeader-allow-extra-spaces-before-obj branch from 5b3d04f to 0184fda Compare April 16, 2022 06:51
The header being read has the format:

    <idnum> <generation> obj

where `<idnum>` and `<generation>` are integers.
Previously an arbitrary number of spaces was being allowed between `<idnum>` and `<generation>`, but not between `<generation>` and `obj`.
We now allow arbitrary spaces between `<generation>` and `obj`.
@malthejorgensen malthejorgensen force-pushed the readObjectHeader-allow-extra-spaces-before-obj branch from 0184fda to 74c573c Compare April 16, 2022 06:51
@malthejorgensen
Copy link
Contributor Author

@MartinThoma No problem :) – hereby rebased.

@codecov-commenter
Copy link

Codecov Report

Merging #567 (74c573c) into main (a5875c5) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #567   +/-   ##
=======================================
  Coverage   69.63%   69.64%           
=======================================
  Files           9        9           
  Lines        3316     3317    +1     
  Branches      783      783           
=======================================
+ Hits         2309     2310    +1     
  Misses        763      763           
  Partials      244      244           
Impacted Files Coverage Δ
PyPDF2/pdf.py 72.38% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5875c5...74c573c. Read the comment docs.

@MartinThoma MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label Apr 16, 2022
@MartinThoma MartinThoma merged commit cf20f92 into py-pdf:main Apr 16, 2022
@MartinThoma
Copy link
Member

Thank you very much! It was merged and will be part of the next release (some time this month)

MartinThoma added a commit that referenced this pull request Apr 18, 2022
Deprecations (DEP):
-  Remove support for Python 2.6 and older (#776)

New Features (ENH):
-  Extract document permissions (#320)

Bug Fixes (BUG):
-  Clip by trimBox when merging pages, which would otherwise be ignored (#240)
-  Add overwriteWarnings parameter PdfFileMerger (#243)
-  IndexError for getPage() of decryped file (#359)
-  Handle cases where decodeParms is an ArrayObject (#405)
-  Updated PDF fields don't show up when page is written (#412)
-  Set Linked Form Value (#414)
-  Fix zlib -5 error for corrupt files (#603)
-  Fix reading more than last1K for EOF (#642)
-  Acciental import

Robustness (ROB):
-  Allow extra whitespace before "obj" in readObjectHeader (#567)

Documentation (DOC):
-  Link to pdftoc in Sample_Code (#628)
-  Working with annotations (#764)
-  Structure history

Developer Experience (DEV):
-  Add issue templates (#765)
-  Add tool to generate changelog

Maintenance (MAINT):
-  Use grouped constants instead of string literals (#745)
-  Add error module (#768)
-  Use decorators for @staticmethod (#775)
-  Split long functions (#777)

Testing (TST):
-  Run tests in CI once with -OO Flags (#770)
-  Filling out forms (#771)
-  Add tests for Writer (#772)
-  Error cases (#773)
-  Check Error messages (#769)
-  Regression test for issue #88
-  Regression test for issue #327

Code Style (STY):
-  Make variable naming more consistent in tests

All changes: 1.27.5...1.27.6
VictorCarlquist pushed a commit to VictorCarlquist/PyPDF2 that referenced this pull request Apr 29, 2022
Deprecations (DEP):
-  Remove support for Python 2.6 and older (py-pdf#776)

New Features (ENH):
-  Extract document permissions (py-pdf#320)

Bug Fixes (BUG):
-  Clip by trimBox when merging pages, which would otherwise be ignored (py-pdf#240)
-  Add overwriteWarnings parameter PdfFileMerger (py-pdf#243)
-  IndexError for getPage() of decryped file (py-pdf#359)
-  Handle cases where decodeParms is an ArrayObject (py-pdf#405)
-  Updated PDF fields don't show up when page is written (py-pdf#412)
-  Set Linked Form Value (py-pdf#414)
-  Fix zlib -5 error for corrupt files (py-pdf#603)
-  Fix reading more than last1K for EOF (py-pdf#642)
-  Acciental import

Robustness (ROB):
-  Allow extra whitespace before "obj" in readObjectHeader (py-pdf#567)

Documentation (DOC):
-  Link to pdftoc in Sample_Code (py-pdf#628)
-  Working with annotations (py-pdf#764)
-  Structure history

Developer Experience (DEV):
-  Add issue templates (py-pdf#765)
-  Add tool to generate changelog

Maintenance (MAINT):
-  Use grouped constants instead of string literals (py-pdf#745)
-  Add error module (py-pdf#768)
-  Use decorators for @staticmethod (py-pdf#775)
-  Split long functions (py-pdf#777)

Testing (TST):
-  Run tests in CI once with -OO Flags (py-pdf#770)
-  Filling out forms (py-pdf#771)
-  Add tests for Writer (py-pdf#772)
-  Error cases (py-pdf#773)
-  Check Error messages (py-pdf#769)
-  Regression test for issue py-pdf#88
-  Regression test for issue py-pdf#327

Code Style (STY):
-  Make variable naming more consistent in tests

All changes: py-pdf/pypdf@1.27.5...1.27.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-robustness-issue From a users perspective, this is about robustness soon PRs that are almost ready to be merged, issues that get solved pretty soon Tiny Pull requests that make a tiny change - and thus should be easy to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants