Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IndexError for getPage() of decryped file #359

Merged
merged 4 commits into from Apr 16, 2022

Conversation

denis-osipov
Copy link
Contributor

@denis-osipov denis-osipov commented Jul 12, 2017

Issue #327
_flatten() method change flattenedPages to empty list when trying to get page and then test if file is encrypted and try to decrypt it. If it fails PdfReadError("file has not been decrypted") raise and flattenedPages not revert to None.
This fix allow to get page from PdfFileReader object after decryption.

@denis-osipov denis-osipov changed the title Fix IndexError for getPage() of decryped file Fix IndexError for getPage() of decryped file (Issue #327) Jul 12, 2017
@ameybh
Copy link

ameybh commented Jul 3, 2019

This works for me. I hope they merge it soon.

@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF Tiny Pull requests that make a tiny change - and thus should be easy to merge labels Apr 6, 2022
@MartinThoma
Copy link
Member

I'm sorry that it took so long. I just became a maintainer of this project.

Do you @ameybhavsar24 / @denis-osipov have an example file / code snippet that shows the issue?

@MartinThoma
Copy link
Member

I don't understand why this changes anything

@denis-osipov
Copy link
Contributor Author

denis-osipov commented Apr 7, 2022

Hi, @MartinThoma

There is an encrypted pdf file to use in example:

import PyPDF2

pdfFileObj = open('example.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

try:
    # Can't get access to content of encrypted file. It's OK.
    print(pdfReader.getPage(0))
except PyPDF2.utils.PdfReadError as error:
    print('Expected error:', error)

# Password is correct:)
pdfReader.decrypt('test')

try:
    # Current behaviour is unexpected (for me): we decrypted file and now should
    # have access to its content. But we'll get an error here.
    print(pdfReader.getPage(0))
except IndexError as error:
    print('Unexpected error (file is decrypted now):', error)

Problem appears because _flatten() method sets self.flattenedPages before it tries to get pages and doesn't set it back to None in case of error. This PR just makes _flatten() to set self.flattenedPages to an empty array after it successfully got pages.

If there is a better solution or current behaviour is correct, just close the PR, please.

@codecov-commenter
Copy link

Codecov Report

Merging #359 (0417c1c) into main (733989a) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #359   +/-   ##
=======================================
  Coverage   69.57%   69.57%           
=======================================
  Files           9        9           
  Lines        3316     3316           
  Branches      783      783           
=======================================
  Hits         2307     2307           
  Misses        766      766           
  Partials      243      243           
Impacted Files Coverage Δ
PyPDF2/pdf.py 72.36% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 733989a...0417c1c. Read the comment docs.

@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Apr 16, 2022
@MartinThoma MartinThoma merged commit bd7500d into py-pdf:main Apr 16, 2022
MartinThoma added a commit that referenced this pull request Apr 16, 2022
Credits to Denis Osipov:
#359 (comment)

Co-authored-by: Denis Osipov <osipov_d@list.ru>
@MartinThoma
Copy link
Member

@denis-osipov Thank you for all the time and patience 🙏

This PR was just merged and will go into the next release (some time this month)

@ameybhavsar24 It's merged :-) It was by far not soon, but as the new maintainer I hope in future such things get resolved quicker.

MartinThoma added a commit that referenced this pull request Apr 18, 2022
Deprecations (DEP):
-  Remove support for Python 2.6 and older (#776)

New Features (ENH):
-  Extract document permissions (#320)

Bug Fixes (BUG):
-  Clip by trimBox when merging pages, which would otherwise be ignored (#240)
-  Add overwriteWarnings parameter PdfFileMerger (#243)
-  IndexError for getPage() of decryped file (#359)
-  Handle cases where decodeParms is an ArrayObject (#405)
-  Updated PDF fields don't show up when page is written (#412)
-  Set Linked Form Value (#414)
-  Fix zlib -5 error for corrupt files (#603)
-  Fix reading more than last1K for EOF (#642)
-  Acciental import

Robustness (ROB):
-  Allow extra whitespace before "obj" in readObjectHeader (#567)

Documentation (DOC):
-  Link to pdftoc in Sample_Code (#628)
-  Working with annotations (#764)
-  Structure history

Developer Experience (DEV):
-  Add issue templates (#765)
-  Add tool to generate changelog

Maintenance (MAINT):
-  Use grouped constants instead of string literals (#745)
-  Add error module (#768)
-  Use decorators for @staticmethod (#775)
-  Split long functions (#777)

Testing (TST):
-  Run tests in CI once with -OO Flags (#770)
-  Filling out forms (#771)
-  Add tests for Writer (#772)
-  Error cases (#773)
-  Check Error messages (#769)
-  Regression test for issue #88
-  Regression test for issue #327

Code Style (STY):
-  Make variable naming more consistent in tests

All changes: 1.27.5...1.27.6
VictorCarlquist pushed a commit to VictorCarlquist/PyPDF2 that referenced this pull request Apr 29, 2022
Deprecations (DEP):
-  Remove support for Python 2.6 and older (py-pdf#776)

New Features (ENH):
-  Extract document permissions (py-pdf#320)

Bug Fixes (BUG):
-  Clip by trimBox when merging pages, which would otherwise be ignored (py-pdf#240)
-  Add overwriteWarnings parameter PdfFileMerger (py-pdf#243)
-  IndexError for getPage() of decryped file (py-pdf#359)
-  Handle cases where decodeParms is an ArrayObject (py-pdf#405)
-  Updated PDF fields don't show up when page is written (py-pdf#412)
-  Set Linked Form Value (py-pdf#414)
-  Fix zlib -5 error for corrupt files (py-pdf#603)
-  Fix reading more than last1K for EOF (py-pdf#642)
-  Acciental import

Robustness (ROB):
-  Allow extra whitespace before "obj" in readObjectHeader (py-pdf#567)

Documentation (DOC):
-  Link to pdftoc in Sample_Code (py-pdf#628)
-  Working with annotations (py-pdf#764)
-  Structure history

Developer Experience (DEV):
-  Add issue templates (py-pdf#765)
-  Add tool to generate changelog

Maintenance (MAINT):
-  Use grouped constants instead of string literals (py-pdf#745)
-  Add error module (py-pdf#768)
-  Use decorators for @staticmethod (py-pdf#775)
-  Split long functions (py-pdf#777)

Testing (TST):
-  Run tests in CI once with -OO Flags (py-pdf#770)
-  Filling out forms (py-pdf#771)
-  Add tests for Writer (py-pdf#772)
-  Error cases (py-pdf#773)
-  Check Error messages (py-pdf#769)
-  Regression test for issue py-pdf#88
-  Regression test for issue py-pdf#327

Code Style (STY):
-  Make variable naming more consistent in tests

All changes: py-pdf/pypdf@1.27.5...1.27.6
@MartinThoma MartinThoma changed the title Fix IndexError for getPage() of decryped file (Issue #327) Fix IndexError for getPage() of decryped file Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF soon PRs that are almost ready to be merged, issues that get solved pretty soon Tiny Pull requests that make a tiny change - and thus should be easy to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants