ENH: Support R6 decrypting #1015

exiledkingcc · 2022-06-20T03:42:28Z

Fixes #416

codecov · 2022-06-20T07:58:59Z

Codecov Report

Merging #1015 (dfaa735) into main (a40946c) will increase coverage by 0.09%.
The diff coverage is 87.64%.

@@            Coverage Diff             @@
##             main    #1015      +/-   ##
==========================================
+ Coverage   89.50%   89.60%   +0.09%     
==========================================
  Files          24       24              
  Lines        4432     4425       -7     
  Branches      919      914       -5     
==========================================
- Hits         3967     3965       -2     
+ Misses        318      314       -4     
+ Partials      147      146       -1

Impacted Files	Coverage Δ
PyPDF2/_reader.py	`91.74% <81.81%> (-0.40%)`	⬇️
PyPDF2/_encryption.py	`74.86% <89.23%> (+2.00%)`	⬆️
PyPDF2/__init__.py	`100.00% <100.00%> (ø)`
PyPDF2/_merger.py	`88.80% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a40946c...dfaa735. Read the comment docs.

MartinThoma · 2022-06-20T11:54:52Z

You're on fire, @exiledkingcc 👍 🚀

The tests pass - is this ready from your perspective, @exiledkingcc ?

exiledkingcc · 2022-06-20T13:38:55Z

thanks to qpdf, althoght it's comment shows the algorithm is NOT EXACTLY described in specifiction.
they say

the wording of the specification is very unclear...

anyway i think you can merge this, it will not make things worse, at least, we can decrypt PDFs encypted by qpdf.
then i will start to rewrite the encrypt part if there is time.

MartinThoma · 2022-06-20T20:12:29Z

I was just checking the PDF files. While the r2 / r3 / r4 files ask for passwords, the r5 and r6 files don't. I can directly see the content. Are you sure that they are encrypted?

exiledkingcc · 2022-06-20T23:07:34Z

yes, they are encrypted, but not password-protected.
for R5 and R6, both user password and owner password can be used to decrypt the file, if one is empty, pdf softwares will not ask for the password.
I'm confused too. the pdf softwares ask for one password to decrypt the PDF file, but there are two passwords used for encryption.

exiledkingcc · 2022-06-20T23:36:42Z

this is how qpdf do about passwords, maybe we should keep the same way:
https://qpdf.readthedocs.io/en/stable/encryption.html#user-and-owner-passwords
so do not merge this for now, i will make the decrypting process more clear and deal with the empty password.

MartinThoma · 2022-06-25T08:44:27Z

@exiledkingcc Do you think this is ready to be merged? If yes, I would review it again this evening :-) I'm excited about it :-)

exiledkingcc · 2022-06-25T09:20:01Z

i think yes. 😊

MartinThoma · 2022-06-26T05:23:43Z

PyPDF2/_merger.py

@@ -137,7 +137,7 @@ def merge(
        reader = PdfReader(stream, strict=self.strict)  # type: ignore[arg-type]
        self.inputs.append((stream, reader, my_file))
        if encryption_obj is not None:
-            reader._encryption = encryption_obj
+            reader.encryption = encryption_obj


Every attribute that does not start with an underscore is part of the public interface. That means we cannot change the behavior / name without deprecation warnings.

Is there a reason why a user would want to access the encryption attribute directly?

MartinThoma · 2022-06-26T05:30:07Z

PyPDF2/_reader.py

@@ -254,8 +254,26 @@ def __init__(
        self.stream = stream

        self._override_encryption = False
-        if password is not None and self.decrypt(password) == 0:
-            raise PdfReadError("Wrong password")
+        self.encryption: Optional[Encryption] = None


Consistently with my question in _merger, I would prefer if this was the private attribute _encryption instead of the public attribute `encryption. Except, of course, if there is a reason why users would want to access it.

MartinThoma · 2022-06-26T05:30:40Z

PyPDF2/_reader.py

+            encryptEntry = cast(DictionaryObject, self.trailer[TK.ENCRYPT].get_object())
+            self.encryption = Encryption.read(encryptEntry, id1_entry)


Suggested change

encryptEntry = cast(DictionaryObject, self.trailer[TK.ENCRYPT].get_object())

self.encryption = Encryption.read(encryptEntry, id1_entry)

encrypt_entry = cast(DictionaryObject, self.trailer[TK.ENCRYPT].get_object())

self.encryption = Encryption.read(encrypt_entry, id1_entry)

MartinThoma · 2022-06-26T05:32:29Z

PyPDF2/_encryption.py

-        self._user_keys: Dict = {}
-        self._owner_keys: Dict = {}
+
+    def verified(self) -> bool:


I guess verified means that the file was decrypted? Would is_decrypted be better?

Also, I think this should rather be _verified if the user does not need it.

PyPDF2/_encryption.py

MartinThoma · 2022-06-26T06:00:08Z

I've tried to cross-check the decryption, but...

https://www.adobe.com/de/acrobat/online/password-protect-pdf.html : Uses algV=4, algR=4, /AESV2, 128
https://smallpdf.com/de/pdf-schuetzen : same as adobe
https://www.ilovepdf.com/ and https://www.pdf2go.com/: Uses algV=2, algR=3

I guess we are head of our time 😄

PyPDF2/_encryption.py

MartinThoma · 2022-06-26T06:08:30Z

@exiledkingcc Overall, it looks good to me. Good work 👏 👍 🎉

There are a couple of things I want to be changed before I release it. I could do them myself after merging this PR or you could do them now - whatever you prefer. Just let me know :-)

I've also noticed that if either the user password or the owner password is set to the empty password, typical viewers will directly show the contents. For this reason I would add the following test PDF:

qpdf --encrypt "foo" "bar" 256 -- unencrypted.pdf r6-both-passwords.pdf

exiledkingcc · 2022-06-26T08:22:28Z

@MartinThoma all your review comments are great, i will do the update tonight.

MartinThoma · 2022-06-26T11:11:11Z

I just saw https://github.com/pdfminer/pdfminer.six/tree/master/samples and especially:

Files in the encryption folder have been generated with cpdf 1.7 [http://www.coherentpdf.com/]
from the base.pdf file generated with LibreOffice 4.1.1.2 as follows:

cpdf -encrypt 40bit foo baz base.pdf -o rc4-40.pdf
cpdf -encrypt 128bit foo baz base.pdf -o rc4-128.pdf
cpdf -encrypt AES foo baz base.pdf -o aes-128.pdf
cpdf -encrypt AES foo baz base.pdf -no-encrypt-metadata -o aes-128-m.pdf
cpdf -encrypt AES256 foo baz base.pdf -o aes-256.pdf
cpdf -encrypt AES256 foo baz base.pdf -no-encrypt-metadata -o aes-256-m.pdf

I think I'll run PyPDF2 over those examples some time today :-)

exiledkingcc · 2022-06-26T15:47:29Z

updated.
but keep PasswordType.USER_PASSWORD = 1, PasswordType.OWNER_PASSWORD == 2

MartinThoma · 2022-06-26T17:32:05Z

PyPDF2/_encryption.py

+            try:
+                pwd = password.encode("latin-1")
+            except Exception:  # noqa
+                pwd = password.encode("utf-8")


Why did you choose latin-1 as the default?

to be honest, i don't think too much about it, just copy it from previous code. 😀

Hehe, ok. Thanks for the honesty ❤️

PyPDF2/__init__.py

MartinThoma · 2022-06-26T17:39:31Z

@exiledkingcc Very nice work! Thank you 🤗 I'll release it today (just have to eat something now 😄 )

New Features (ENH): - Support R6 decrypting (#1015) - Add PdfReader.pdf_header (#1013) Performance Improvements (PI): - Remove ord_ calls (#1014) Bug Fixes (BUG): - Fix missing page for bookmark (#1016) Robustness (ROB): - Deal with invalid Destinations (#1028) Documentation (DOC): - get_form_text_fields does not extract dropdown data (#1029) - Adjust PdfWriter.add_uri docstring - Mention crypto extra_requires for installation (#1017) Developer Experience (DEV): - Use /n line endings everywhere (#1027) - Adjust string formatting to be able to use mutmut (#1020) - Update Bug report template Full Changelog: 2.3.1...2.4.0

New Features (ENH): - Add PageObject._get_fonts (#1083) - Add support for indexed color spaces / BitsPerComponent for decoding PNGs (#1067) Performance Improvements (PI): - Use iterative DFS in PdfWriter._sweep_indirect_references (#1072) Bug Fixes (BUG): - Let Page.scale also scale the crop-/trim-/bleed-/artbox (#1066) - Column default for CCITTFaxDecode (#1079) Robustness (ROB): - Guard against None-value in _get_outlines (#1060) Documentation (DOC): - Stamps and watermarks (#1082) - OCR vs PDF text extraction (#1081) - Python Version support - Formatting of CHANGELOG Developer Experience (DEV): - Cache downloaded files (#1070) - Speed-up for CI (#1069) Maintenance (MAINT): - Set page.rotate(angle: int) (#1092) - Issue #416 was fixed by #1015 (#1078) Testing (TST): - Image extraction (#1080) - Image extraction (#1077) Code Style (STY): - Apply black - Typo in Changelog Full Changelog: 2.4.2...2.4.3

exiledkingcc added 2 commits June 20, 2022 11:40

support R6 decrypting

c992563

make encryption test file names clear

b4b74a4

MartinThoma mentioned this pull request Jun 22, 2022

PyPDF2.utils.PdfReadError: file has not been decrypted #416

Closed

make decrypt clear

6cbf6e9

MartinThoma changed the title ~~support R6 decrypting~~ ENH: Support R6 decrypting Jun 25, 2022

MartinThoma reviewed Jun 26, 2022

View reviewed changes

PyPDF2/_encryption.py Outdated Show resolved Hide resolved

MartinThoma reviewed Jun 26, 2022

View reviewed changes

PyPDF2/_encryption.py Outdated Show resolved Hide resolved

MartinThoma added workflow-encryption From a users perspective, encryption is the affected feature/workflow soon PRs that are almost ready to be merged, issues that get solved pretty soon labels Jun 26, 2022

MartinThoma mentioned this pull request Jun 26, 2022

PdfReadError: Could not read Boolean object #377

Closed

exiledkingcc added 4 commits June 26, 2022 23:16

make Encryption private for PdfReader

532930b

add more tests

5d076fe

Merge branch 'master' into encryption

10ff49d

fix test with issue 327

dfaa735

MartinThoma reviewed Jun 26, 2022

View reviewed changes

PyPDF2/__init__.py Show resolved Hide resolved

MartinThoma merged commit e83cdbe into py-pdf:main Jun 26, 2022

MartinThoma mentioned this pull request Jun 28, 2022

PyPDF2 can't decrypt PDF files with Acrobat 6.0 or higher password security compatibility #378

Closed

MartinThoma added a commit that referenced this pull request Jul 9, 2022

MAINT: Issue #416 was fixed by #1015

cd44e79

MartinThoma added a commit that referenced this pull request Jul 9, 2022

MAINT: Issue #416 was fixed by #1015

ae0d26b

MartinThoma added a commit that referenced this pull request Jul 9, 2022

MAINT: Issue #416 was fixed by #1015 (#1078)

9b048a2

This was referenced Jul 9, 2022

Decryption error in script mode, works in ptpython/shell #652

Closed

Not able to open a PDF with security - Adobe opens it fine without password entry #249

Closed

Regular PDF detected as encrypted and decryption with empty string fails #245

Closed

exiledkingcc deleted the encryption branch July 24, 2022 03:55

bchandos mentioned this pull request Aug 10, 2022

PyCryptoDome padding issue, AES encryption CBC mode #1221

Closed

exiledkingcc mentioned this pull request May 3, 2023

TypeError: can only concatenate list (not "str") to list #978

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support R6 decrypting #1015

ENH: Support R6 decrypting #1015

exiledkingcc commented Jun 20, 2022 •

edited by MartinThoma

codecov bot commented Jun 20, 2022 •

edited

MartinThoma commented Jun 20, 2022

exiledkingcc commented Jun 20, 2022 •

edited

MartinThoma commented Jun 20, 2022

exiledkingcc commented Jun 20, 2022

exiledkingcc commented Jun 20, 2022

MartinThoma commented Jun 25, 2022

exiledkingcc commented Jun 25, 2022

MartinThoma Jun 26, 2022

MartinThoma Jun 26, 2022

MartinThoma Jun 26, 2022

MartinThoma Jun 26, 2022

MartinThoma Jun 26, 2022

MartinThoma commented Jun 26, 2022

MartinThoma commented Jun 26, 2022 •

edited

exiledkingcc commented Jun 26, 2022

MartinThoma commented Jun 26, 2022

exiledkingcc commented Jun 26, 2022

MartinThoma Jun 26, 2022

exiledkingcc Jun 26, 2022

MartinThoma Jun 27, 2022

MartinThoma commented Jun 26, 2022

		encryptEntry = cast(DictionaryObject, self.trailer[TK.ENCRYPT].get_object())
		self.encryption = Encryption.read(encryptEntry, id1_entry)

ENH: Support R6 decrypting #1015

ENH: Support R6 decrypting #1015

Conversation

exiledkingcc commented Jun 20, 2022 • edited by MartinThoma

codecov bot commented Jun 20, 2022 • edited

Codecov Report

MartinThoma commented Jun 20, 2022

exiledkingcc commented Jun 20, 2022 • edited

MartinThoma commented Jun 20, 2022

exiledkingcc commented Jun 20, 2022

exiledkingcc commented Jun 20, 2022

MartinThoma commented Jun 25, 2022

exiledkingcc commented Jun 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MartinThoma commented Jun 26, 2022

MartinThoma commented Jun 26, 2022 • edited

exiledkingcc commented Jun 26, 2022

MartinThoma commented Jun 26, 2022

exiledkingcc commented Jun 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MartinThoma commented Jun 26, 2022

exiledkingcc commented Jun 20, 2022 •

edited by MartinThoma

codecov bot commented Jun 20, 2022 •

edited

exiledkingcc commented Jun 20, 2022 •

edited

MartinThoma commented Jun 26, 2022 •

edited