Regular PDF detected as encrypted and decryption with empty string fails #245

cycomanic · 2016-01-22T13:11:52Z

Hi,
I have a problem with a regular PDF that somehow gets detected as encrypted. I tried the method mentioned in #51, i.e.

input = pyPdf.PdfFileReader(<your file>)
if input.isEncrypted:
    input.decrypt('')

but I then I get:

----> 1 inpdf.decrypt(b'')

/home/jschrod/Downloads/Python/PyPDF2/build/lib/PyPDF2/pdf.py in decrypt(self, password)
   1971         self._override_encryption = True
   1972         try:
-> 1973             return self._decrypt(password)
   1974         finally:
   1975             self._override_encryption = False

/home/jschrod/Downloads/Python/PyPDF2/build/lib/PyPDF2/pdf.py in _decrypt(self, password)
   1977     def _decrypt(self, password):
   1978         encrypt = self.trailer['/Encrypt'].getObject()
-> 1979         if encrypt['/Filter'] != '/Standard':
   1980             raise NotImplementedError("only Standard PDF encryption handler is available")
   1981         if not (encrypt['/V'] in (1, 2)):

TypeError: 'NullObject' object has no attribute '__getitem__'

I seem to get around the issue if I do

inpdf._override_encryption=True
inpdf._flatten()

after which a

inpdf.getPage(X)

succeeds and allows me to access the pdf normally (and seemingly without issues), which seems to demonstrate that it is not really encrypted.

Cheers
Jochen

michi88 · 2016-02-12T13:39:19Z

Thanks for the workaround @cycomanic. I'm having the same issue.

fsiordia · 2018-07-02T20:40:07Z

Thank you so much @cycomanic!
That seems to work for me!

MartinThoma · 2022-06-26T08:33:16Z

@fsiordia @michi88 @cycomanic Do you still encounter the same issue with the latest PyPDF2 version? Could you upload an example PDF file?

Thomas-Boi · 2022-06-28T23:39:21Z

I have a similar problem that couldn't be solved by the workarounds described above.

I run this in a Windows and AWS Linux environment. I'm using Python 3.9

I'm trying to get the content of this PDF: fdo-fundingapplication-demandedefinancement.pdf

This is what I'm trying to do:

        response = requests.get(url)
        with open(pdf_path, "wb+") as file:
          file.write(response.content)
          
          # read from the beginning
          file.seek(0)

          reader = PdfReader(file)
          text = "".join([page.extract_text() for page in reader.pages])

However, I ran into this error when I run the above code.

Traceback (most recent call last):
  File "D:\project\scripts\get_page_hashes.py", line 104, in get_page_hashes
    text = "".join([page.extract_text() for page in reader.pages])
  File "D:\project\scripts\get_page_hashes.py", line 104, in <listcomp>
    text = "".join([page.extract_text() for page in reader.pages])
  File "D:\project\venv\lib\site-packages\PyPDF2\_page.py", line 1483, in __iter__
    for i in range(len(self)):
  File "D:\project\venv\lib\site-packages\PyPDF2\_page.py", line 1465, in __len__
    return self.length_function()
  File "D:\project\venv\lib\site-packages\PyPDF2\_reader.py", line 373, in _get_num_pages
    return self.trailer[TK.ROOT]["/Pages"]["/Count"]  # type: ignore
  File "D:\project\venv\lib\site-packages\PyPDF2\generic.py", line 650, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "D:\project\venv\lib\site-packages\PyPDF2\generic.py", line 221, in get_object
    obj = self.pdf.get_object(self)
  File "D:\project\venv\lib\site-packages\PyPDF2\_reader.py", line 1077, in get_object
    raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted

The decrypt using empty string tricked yield the same problem. I also tried the override and flatten trick:

if reader.is_encrypted:
            reader._override_encryption = True
            reader._flatten()

However, the code now yielded this error:

Traceback (most recent call last):
  File "D:\project\scripts\get_page_hashes.py", line 101, in get_page_hashes
    reader._flatten()
  File "D:\project\venv\lib\site-packages\PyPDF2\_reader.py", line 952, in _flatten
    pages = catalog["/Pages"].get_object()  # type: ignore
  File "D:\project\venv\lib\site-packages\PyPDF2\generic.py", line 650, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "D:\project\venv\lib\site-packages\PyPDF2\generic.py", line 221, in get_object
    obj = self.pdf.get_object(self)
  File "D:\project\venv\lib\site-packages\PyPDF2\_reader.py", line 1044, in get_object
    retval = self._get_object_from_stream(indirect_reference)  # type: ignore
  File "D:\project\venv\lib\site-packages\PyPDF2\_reader.py", line 995, in _get_object_from_stream
    objnum = NumberObject.read_from_stream(stream_data)
  File "D:\project\venv\lib\site-packages\PyPDF2\generic.py", line 355, in read_from_stream
    num = read_until_regex(stream, NumberObject.NumberPattern)
  File "D:\project\venv\lib\site-packages\PyPDF2\_utils.py", line 131, in read_until_regex
    raise PdfStreamError(STREAM_TRUNCATED_PREMATURELY)
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly

EDIT:
My problem has been resolved by using pikepdf. I used the linked step before I read the file and it seems to work.

MartinThoma · 2022-07-10T05:31:21Z

I believe this was solved with #1015

Without an example PDF I'm not able to verify. For this reason, I close this issue now.

If anybody still encounters it with the latest version of PyPDF2, please let me know.

mstamy2 added the workflow-encryption From a users perspective, encryption is the affected feature/workflow label Aug 2, 2016

rbares mentioned this issue Oct 28, 2018

[MRG + 1] Add basic support for encrypted PDF files atlanhq/camelot#180

Merged

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Jun 26, 2022

MartinThoma added the needs-pdf The issue needs a PDF file to show the problem label Jul 10, 2022

MartinThoma closed this as completed Jul 10, 2022

pubpub-zz mentioned this issue Aug 11, 2022

XFA forms: internally protected document #1224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regular PDF detected as encrypted and decryption with empty string fails #245

Regular PDF detected as encrypted and decryption with empty string fails #245

cycomanic commented Jan 22, 2016

michi88 commented Feb 12, 2016

fsiordia commented Jul 2, 2018

MartinThoma commented Jun 26, 2022

Thomas-Boi commented Jun 28, 2022 •

edited

Loading

MartinThoma commented Jul 10, 2022

Regular PDF detected as encrypted and decryption with empty string fails #245

Regular PDF detected as encrypted and decryption with empty string fails #245

Comments

cycomanic commented Jan 22, 2016

michi88 commented Feb 12, 2016

fsiordia commented Jul 2, 2018

MartinThoma commented Jun 26, 2022

Thomas-Boi commented Jun 28, 2022 • edited Loading

MartinThoma commented Jul 10, 2022

Thomas-Boi commented Jun 28, 2022 •

edited

Loading