Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPDF2.utils.PdfReadError: file has not been decrypted #416

Closed
manish59 opened this issue Apr 13, 2018 · 10 comments · Fixed by #1015
Closed

PyPDF2.utils.PdfReadError: file has not been decrypted #416

manish59 opened this issue Apr 13, 2018 · 10 comments · Fixed by #1015
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@manish59
Copy link

manish59 commented Apr 13, 2018

I was trying to read the fileds in a pdf form. The Pypdf2 is kicking me with this error. I tired to see if it is encrypted using pypdf2.isencrypted but it is giving me that it is not encrypted. please let me know if any one know this

MCVE

File: https://www.fda.gov/downloads/AboutFDA/ReportsManualsForms/Forms/UCM074728.pdf

from PyPDF2 import PdfFileReader

reader = PdfFileReader("FDA-1572_508_R6_FINAL.pdf")
fields = reader.get_form_text_fields()
print(fields)
@oscardssmith
Copy link
Contributor

Can you provide a copy of your pdf and the code you are using? Both of these will be very helpful for troubleshooting

@manish59
Copy link
Author

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 7, 2022
@MartinThoma
Copy link
Member

@manish59 I know it's been years since you asked, but do you maybe still have a minimal Python script that shows the issue?

Can somebody else create a minimal example?

@MartinThoma MartinThoma added workflow-encryption From a users perspective, encryption is the affected feature/workflow Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Jun 10, 2022
MartinThoma added a commit that referenced this issue Jun 12, 2022
@MartinThoma MartinThoma removed the workflow-encryption From a users perspective, encryption is the affected feature/workflow label Jun 12, 2022
@MartinThoma
Copy link
Member

Note to myself: The file is NOT encrypted. PyPDF2 only thinks it is.

@pubpub-zz
Copy link
Collaborator

If you open the document with acrobat reader and look at the protections/permissions, you will note that there is some permission removed (eg assembling). When you set the permissions like that, the file has an admin password and if you want to open it as a user you use an empty password : this is typically the case of the pdf reference https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
If you check with acrobat reader you will get:
image

I've tried the open your file with "" password but the encryption but AES-128 seems to not be supported yet by PYPDF2😌

MartinThoma added a commit that referenced this issue Jun 12, 2022
New Features (ENH):
-  Add support for pathlib as input for PdfReader (#979)

Performance Improvements (PI):
-  Optimize read_next_end_line (#646)

Bug Fixes (BUG):
-  Adobe Acrobat \'Would you like to save this file?\' (#970)

Documentation (DOC):
-  Notes on annotations (#982)
-  Who uses PyPDF2
-  intendet \xe2\x9e\x94 in robustness page  (#958)

Maintenance (MAINT):
-  pre-commit / requirements.txt updates (#977)
-  Mark read_next_end_line as deprecated (#965)
-  Export `PageObject` in PyPDF2 root (#960)

Testing (TST):
-  Add MCVE of issue #416 (#980)
-  FlateDecode.decode decodeParms (#964)
-  Xmp module (#962)
-  utils.paeth_predictor (#959)

Code Style (STY):
-  Use more tuples and list/dict comprehensions (#976)

Full Changelog: 2.1.0...2.1.1
@bchandos
Copy link

Is it confirmed that PyPDF2 does not support AES-128 encryption? I have an AES encrypted PDF files generated by Acrobat Sign (that unfortunately I cannot share) that will not decrypt. The PDF document linked above initially exhibits the same behavior, but after using the "trick" in issue 652 of accessing len(reader.pages) to force decryption, it appears to work.

However, the "trick" doesn't work on my file - I cannot compel PyPDF2 to read or merge it.

The other differences between the example file above and my file, as reported by pdfinfo:
My file is v1.7 (vs. 1.6)
My file has encryption protecting change and addNotes (vs. change)

$ python -m platform
Linux-5.17.6-300.mbp.fc33.x86_64-x86_64-with-glibc2.35

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.3.1

Here's what I mean, which is hopefully more clear:

​Python 3.10.5 (main, Jun  9 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyPDF2 import PdfReader
>>>
>>> reader1 = PdfReader("FDA-1572_508_R6_FINAL.pdf")
>>> reader2 = PdfReader("my-file.pdf")
>>> reader1.metadata
Traceback (most recent call last):
  <snip>
PyPDF2.errors.PdfReadError: file has not been decrypted
>>> reader2.metadata
Traceback (most recent call last):
  <snip>
PyPDF2.errors.PdfReadError: file has not been decrypted
>>> len(reader1.pages)
2
>>> reader1.metadata
{'/Author': 'PSC Publishing Services', '/CreationDate': "D:20130522104413-04'00'", '/Creator': 'PScript5.dll Version 5.2.2', '/ModDate': "D:20190405072352-04'00'", '/Producer': 'Acrobat Distiller 9.0.0 (Windows)', '/Subject': 'Statement of Investigator', '/Title': 'FORM FDA 1572'}
>>> len(reader2.pages)
Traceback (most recent call last):
  <snip>
PyPDF2.errors.PdfReadError: file has not been decrypted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  <snip>
PyPDF2.errors.PdfReadError: File has not been decrypted
>>> reader2.metadata
Traceback (most recent call last):
  <snip>
PyPDF2.errors.PdfReadError: file has not been decrypted
>>>

@MartinThoma
Copy link
Member

Is it confirmed that PyPDF2 does not support AES-128 encryption?

PyPDF2>=2.3.1 does support AES-128 and AES-256 decryption (not encryption though).

@bchandos
Copy link

Thank you for confirming. As noted above, I do have an AES-encrypted file that fails to decrypt on 2.3.1. Unfortunately I can't share it and I can't find any Linux application that will generate and equivalent example file (PDF 1.7, AES-128bit). If I do, I will create a new issue.

@MartinThoma
Copy link
Member

There are several aspects to encryption in PDF and I cannot claim that I understand all of them.

  • One is the encryption algorithm. PyPDF2 only supported RC4 for a while, but with 2.3.1 also supports AES-128 and AES-256 for decryption.
  • The Revision /R of the standard security handler. PyPDF2 supports 2-5, but there is 6 as well which will be added with ENH: Support R6 decrypting #1015
  • There are also Crypt filters

The PDF 1.7 reference has the details in Chapter 3 "Password Algorithms"

@exiledkingcc is the expert on those topics :-)

@exiledkingcc
Copy link
Contributor

empty password encryped pdf, like #652.
my last commmit of #1015 will fix this.

MartinThoma pushed a commit that referenced this issue Jun 26, 2022
MartinThoma added a commit that referenced this issue Jul 9, 2022
MartinThoma added a commit that referenced this issue Jul 9, 2022
MartinThoma added a commit that referenced this issue Jul 9, 2022
MartinThoma added a commit that referenced this issue Jul 10, 2022
New Features (ENH):
-  Add PageObject._get_fonts (#1083)
-  Add support for indexed color spaces / BitsPerComponent for decoding PNGs (#1067)

Performance Improvements (PI):
-  Use iterative DFS in PdfWriter._sweep_indirect_references (#1072)

Bug Fixes (BUG):
-  Let Page.scale also scale the crop-/trim-/bleed-/artbox (#1066)
-  Column default for CCITTFaxDecode (#1079)

Robustness (ROB):
-  Guard against None-value in _get_outlines (#1060)

Documentation (DOC):
-  Stamps and watermarks (#1082)
-  OCR vs PDF text extraction (#1081)
-  Python Version support
-  Formatting of CHANGELOG

Developer Experience (DEV):
-  Cache downloaded files (#1070)
-  Speed-up for CI (#1069)

Maintenance (MAINT):
-  Set page.rotate(angle: int) (#1092)
-  Issue #416 was fixed by #1015 (#1078)

Testing (TST):
-  Image extraction (#1080)
-  Image extraction (#1077)

Code Style (STY):
-  Apply black
-  Typo in Changelog

Full Changelog: 2.4.2...2.4.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants