Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support password-protected PDFs #30

Closed
gco opened this issue Sep 24, 2021 · 5 comments
Closed

Support password-protected PDFs #30

gco opened this issue Sep 24, 2021 · 5 comments

Comments

@gco
Copy link

gco commented Sep 24, 2021

Opening an encrypted PDF without a password (bank statement) results in:

NotImplementedError: password-protected PDFs are currently not supported

This is a TODO mentioned here

@jorisschellekens
Copy link
Owner

I'm currently working on supporting encrypted (that is to say password protected) PDF documents.

The PDF standard lists various possible algorithms, I will not implement all of them in the next release, but the general framework and the most basic cases should be present.

Kind regards,
Joris Schellekens

@joshuisken
Copy link

I see quite some code for this but I would like to see a small example how to open a password protected pdf-file.
Providing a password to PDF.loads() is not enough:

AssertionError: R is not 2 or 3. A number specifying which revision of the standard security handler shall be used to interpret this dictionary.

How can I set R?

@jorisschellekens
Copy link
Owner

Currently, password-protected PDF documents (reading and/or writing thereof) are not supported.

I do have a useful contact from way back when I still worked at another PDF related company. He has already helped me with a bug in borb and was going to help me on the whole encryption thing.

I'll see if I can't poke him again.

Kind regards,
Joris Schellekens

@mfikrin
Copy link

mfikrin commented Sep 29, 2023

Is there an update for password-protected PDF documents? I'm looking forward to an example of that. Thank You

@jorisschellekens
Copy link
Owner

jorisschellekens commented Nov 29, 2023

Update: It should be possible for some of the supported types of password-protected PDF documents to be opened by borb ijn the upcoming release. I am going to close this issue. Further issues regarding this functionality will need to be more specific (so that I can get an idea of which type of encryption needs to be supported specifically).

The following TestCase will also be added to the tests:

class TestReadEncryptedPDF(TestCase):

    def test_open_encrypted_pdf(self):
        input_file: Path = self.get_artifacts_directory(True) / "input_001.pdf"
        with open(input_file, "rb") as fh:
            PDF.loads(fh, password="appeltje")

    def test_read_encrypted_pdf(self):
        input_file: Path = self.get_artifacts_directory(True) / "input_001.pdf"

        # open
        l: SimpleTextExtraction = SimpleTextExtraction()
        with open(input_file, "rb") as fh:
            PDF.loads(fh, password="appeltje", event_listeners=[l])

        # compare text
        text: str = l.get_text()[0]
        assert text.startswith("Video provides a powerful way to help you prove your point.")

Which attempts to read the following PDF:
input_001.pdf

Which is protected by the password appeltje (it means "little apple" in Dutch)

Kind regards,
Joris Schellekens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants