Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] Add basic support for encrypted PDF files #180

Merged
merged 11 commits into from
Oct 28, 2018

Conversation

rbares
Copy link
Contributor

@rbares rbares commented Oct 27, 2018

Update API and CLI to accept ASCII passwords to decrypt PDFs
encrypted by algorithm code 1 or 2 (limited by support from PyPDF2).
Update documentation and unit tests accordingly.

Example document health_protected.pdf generated as follows:
qpdf --encrypt userpass ownerpass 128 -- health.pdf health_protected.pdf

Partially resolves issue #162, may want to keep open to discuss additional support

Update API and CLI to accept ASCII passwords to decrypt PDFs
encrypted by algorithm code 1 or 2 (limited by support from PyPDF2).
Update documentation and unit tests accordingly.

Example document health_protected.pdf generated as follows:
qpdf --encrypt userpass ownerpass 128 -- health.pdf health_protected.pdf

Issue atlanhq#162
@rbares rbares changed the title [MRG] Add basic support for encrypted PDF files [WIP] Add basic support for encrypted PDF files Oct 27, 2018
@codecov-io
Copy link

codecov-io commented Oct 27, 2018

Codecov Report

Merging #180 into master will decrease coverage by 0.04%.
The diff coverage is 64.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
- Coverage   84.17%   84.12%   -0.05%     
==========================================
  Files          13       13              
  Lines        1226     1235       +9     
  Branches      294      297       +3     
==========================================
+ Hits         1032     1039       +7     
- Misses        150      151       +1     
- Partials       44       45       +1
Impacted Files Coverage Δ
camelot/cli.py 93.75% <100%> (+0.06%) ⬆️
camelot/io.py 100% <100%> (ø) ⬆️
camelot/handlers.py 82.5% <54.54%> (-0.84%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4366313...c010449. Read the comment docs.

@rbares rbares changed the title [WIP] Add basic support for encrypted PDF files [MRG] Add basic support for encrypted PDF files Oct 27, 2018
camelot/handlers.py Outdated Show resolved Hide resolved
camelot/handlers.py Outdated Show resolved Hide resolved
camelot/handlers.py Outdated Show resolved Hide resolved
camelot/handlers.py Outdated Show resolved Hide resolved
camelot/io.py Outdated Show resolved Hide resolved
@vinayak-mehta
Copy link
Contributor

@rbares This is a great PR! Apart from the changes I requested, LGTM.

Explicitly check passwords for None rather than falsey.
Correct read_pdf documentation for Owner/User password.

Issue atlanhq#162
camelot/handlers.py Outdated Show resolved Hide resolved
tests/test_common.py Outdated Show resolved Hide resolved
@vinayak-mehta vinayak-mehta changed the title [MRG] Add basic support for encrypted PDF files [MRG + 1] Add basic support for encrypted PDF files Oct 28, 2018
@@ -83,8 +92,8 @@ def _save_page(self, filename, page, temp):
"""
with open(filename, 'rb') as fileobj:
infile = PdfFileReader(fileobj, strict=False)
if infile.isEncrypted:
infile.decrypt('')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbares Should we change the default value for password to an empty string for this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove the is not None check whenever infile.isEncrypted is True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Was initially unsure why .decrypt('') was being used. I see from py-pdf/pypdf#51 that there are situations where normal PDFs are detected as encrypted, so defaulting to empty string seems sensible. It seems more awkward workarounds are still needed in some cases: py-pdf/pypdf#245 , but that feeds into the discussion around which library to use going forward.

Attempting to decrypt with None leads to TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' rather than the more helpful PdfReadError: File has not been decrypted, so I would be inclined to keep the None checks throughout.

Are you happy to change the defaults or shall I?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to decrypt with None leads to TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' rather than the more helpful PdfReadError: File has not been decrypted, so I would be inclined to keep the None checks throughout.

But the TypeError won't be there if we change the default value of password from None to an empty string right? I can change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless someone passes it through explicitly to mean no password is provided, which could be seen as cleaner use of an API. Would almost be nicer to change None to empty string if encrypted rather than making that visible at the interface, but will defer to your judgement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what I was proposing, I think password=None at the interface level makes sense, we can set password to an empty string when None is passed. Changing it.

self.filename = filename
if not filename.lower().endswith('.pdf'):
raise NotImplementedError("File format not supported")
self.pages = self._get_pages(self.filename, pages)
if password is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbares What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@vinayak-mehta vinayak-mehta merged commit 429640f into atlanhq:master Oct 28, 2018
@vinayak-mehta
Copy link
Contributor

@rbares Thanks for the contribution!

@rbares
Copy link
Contributor Author

rbares commented Oct 28, 2018

@vinayak-mehta my pleasure, thanks for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants