Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: produce encrypted PDFs with fpdf2 #194

Closed
Lucas-C opened this issue Jul 16, 2021 · 34 comments
Closed

New feature: produce encrypted PDFs with fpdf2 #194

Lucas-C opened this issue Jul 16, 2021 · 34 comments

Comments

@Lucas-C
Copy link
Member

Lucas-C commented Jul 16, 2021

The scope of this feature is to add support to fpdf2 to produce encrypted PDFs.

More information about this PDF functionality & code samples can be found:

By implementing this feature you, as a benevolent FLOSS developper, will provide access to the large community of fpdf2 users to a standard and useful PDF functionality.
You will also be added into the contributors list & map.
Moreover, by working on this feature, you will learn about PDFs encryption and the lifecycle & structure of a popular Python library.

In terms of API / library interface, this feature could be provided to fpdf2 users by introducing a new optional password parameter to the FPDF.output() method.
This is just a starting point though, and as a contributor you will be entirely free to design and craft this feature as you want.

This issue can count has part of hacktoberfest

@rjsu26
Copy link

rjsu26 commented Aug 18, 2021

I want to work on this feature. Is it up for grabs?

@Lucas-C
Copy link
Member Author

Lucas-C commented Aug 18, 2021

Yes, go for it!

And feel free to ask all the questions you need 😉

@rjsu26
Copy link

rjsu26 commented Aug 19, 2021

Thanks @Lucas-C . On it !

@Brianrmendes
Copy link

I would like to work on the issue

@rjsu26
Copy link

rjsu26 commented Aug 21, 2021

Hi @Brianrmendes I am already half-way into the task. We can team up to work on 2 different issues but with collaboration. What say?

@Brianrmendes
Copy link

Hi @Brianrmendes I am already half-way into the task. We can team up to work on 2 different issues but with collaboration. What say?

Is there any way I could reach you

@rjsu26
Copy link

rjsu26 commented Aug 21, 2021

Hi @Brianrmendes I am already half-way into the task. We can team up to work on 2 different issues but with collaboration. What say?

Is there any way I could reach you

DM on discord @rjs#2013

@Brianrmendes
Copy link

Hi @Brianrmendes I am already half-way into the task. We can team up to work on 2 different issues but with collaboration. What say?

Is there any way I could reach you

DM on discord @rjs#2013

Check DM

@siddheshbandgar
Copy link

I would like to work on this issue, is it still open?

@Brianrmendes
Copy link

Please check #204

@alexp1917
Copy link
Collaborator

anyone can submit a pr and the best one will be merged

@alexp1917
Copy link
Collaborator

that said, i see that the issue is assigned to @rjsu26, so you might want to check with them first; but we will review all PR's that are created. thanks, everyone, for the interest in contributing!

@Lucas-C
Copy link
Member Author

Lucas-C commented Aug 23, 2021

Note for people interested in implementing this feature:
some guidelines have been provided there in PR #204

@rjsu26
Copy link

rjsu26 commented Aug 24, 2021

Thanks for the guidelines @alexp1917 and @Lucas-C. After reading the review comments and the specification file, I still have a doubt that why can't we directly AES encrypt the whole pdf directly instead of leaving some of the fields in trailers or the Encrypt dictionary of the PDF file. The command line --decrypt "password" would still work. Due to this method, a pdf reader won't prompt for password as it won't be able to parse the input (the AES encrypted pdf) but as mentioned in the reviews, the server would anyways use the command line to decrypt the doc.

I have a feeling that I am missing something trivial here and I would be grateful to get a clarification from you guys.

@andrei-polukhin
Copy link

andrei-polukhin commented Sep 5, 2021

Left some comments for #204. Let me know if they are relevant as I initially wanted to implement this feature. Although I have read docs on encryption and have acquainted myself with similar implementations, I do not have enough free time to implement this.

@alexp1917
Copy link
Collaborator

also do not have enough time :)

@Lucas-C
Copy link
Member Author

Lucas-C commented Sep 28, 2021

I closed PR #204

To be clear with people interested in this: this feature is still up for grabs!

Just be sure to check the comments made in #204 if you want to submit a PR 😉

@Agent-Hellboy
Copy link

Hi @Lucas-C
As per your suggestions to add a flag named password in the output method of the FPDF object, I hope I need to encrypt this byte array if the password or/and encrypt field is provided. in output method

bytearray(b'%PDF-1.3\n3 0 obj\n<</Type /Page\n/Parent 1 0 R\n/Resources 2 0 R\n/Contents 4 0 R>>\nendobj\n4 0 obj\n<</Filter /FlateDecode /Length 73>>\nstream\nx\x9c3R\xf0\xe22\xd035W(\xe7r\nQ\xd0w3T04\xd230P\x08ISp\r\x01\t\x19\x1b\xea\x19Z(X\x18\x18\xebY\x9a(\x84\xa4(hd\xa4\xe6\xe4\xe4+\x94\xe7\x17\xe5\xa4h*\x84d\x81\x94\x01\x00\t\xd4\x10i\nendstream\nendobj\n1 0 obj\n<</Type /Pages\n/Kids [3 0 R]\n/Count 1\n/MediaBox [0 0 595.28 841.89]\n>>\nendobj\n5 0 obj\n<</Type /Font\n/BaseFont /Helvetica\n/Subtype /Type1\n/Encoding /WinAnsiEncoding\n>>\nendobj\n2 0 obj\n<<\n/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]\n/Font <<\n/F1 5 0 R\n>>\n/XObject <<\n>>\n>>\nendobj\n6 0 obj\n<<\n/CreationDate (D:20220415122655)\n>>\nendobj\n7 0 obj\n<<\n/Type /Catalog\n/Pages 1 0 R\n/OpenAction [3 0 R /FitH null]\n/PageLayout /OneColumn\n>>\nendobj\nxref\n0 8\n0000000000 65535 f \n0000000229 00000 n \n0000000411 00000 n \n0000000009 00000 n \n0000000087 00000 n \n0000000315 00000 n \n0000000515 00000 n \n0000000569 00000 n \ntrailer\n<<\n/Size 8\n/Root 7 0 R\n/Info 6 0 R\n>>\nstartxref\n672\n%%EOF\n')

As per discussion in this thread and PR, we need to avoid some fields like the trailer while encrypting. I can understand why, because it is used by readers like adobe and all.

some must-do things

Encryption-related information shall be stored in a document’s encryption dictionary, which shall be the value of
the Encrypt entry in the document’s trailer dictionary (see Table 15). The absence of this entry from the trailer
dictionary means that a conforming reader shall consider the document to be not encrypted. The entries shown
in Table 20 are common to all encryption dictionaries.

according to this, we need to add a new entry Encrypt in the trailer dictionary

From the below para of the pdf specification, I assume I need to add only a filter inside the encrypt dictionary as other fields are optional

Screenshot from 2022-04-15 13-10-46

what are the other things I must read from pdf spec to implement this?

@Lucas-C
Copy link
Member Author

Lucas-C commented Apr 17, 2022

Hi @Agent-Hellboy

Thanks for writing this detailed comment.
I'm a bit busy right now, but I'll try to answer you properly next week.

@Lucas-C
Copy link
Member Author

Lucas-C commented Apr 19, 2022

First, there is a general methodology I used frequently while adding features to fpdf2, that I would recommend to adopt here:

  1. find or craft a reference encrypted PDF. Libre Office Writer is an example software that can easily generate encrypted PDFs
  2. Use qpdf --qdf --compress-streams=n $in_file.pdf $out_file.pdf to produce a "pretty-formatted" PDF
  3. Open the "pretty-formatted" PDF in a text editor or IDE in order to study its structure

Now answering your questions:

I hope I need to encrypt this byte array if the password or/and encrypt field is provided

Yes, I think you are on the right track!

we need to avoid some fields like the trailer while encrypting

Agreed. For reference, the content of this trailer dictionary is currently generated by FPDF._puttrailer().
The call stack is output() -> close() -> _enddoc() -> _puttrailer()

I assume I need to add only a filter inside the encrypt dictionary as other fields are optional

Maybe... Regarding this, I'd really recommend checking the structure of encrypted PDFs generated by other sowftares. Comparing how 2 or 3 softwares generate encrypted PDFs would be ideal. If you check other code libraries, like PikePDF, you may simply look at their source code or reference PDF test files.

what are the other things I must read from pdf spec to implement this?

Having a basic understanding of PDF syntax will help you : object dictionaries (<< /KeyA /Value1 >>), streams (stream / endstream), the XRef trailer dictionary... and how we generate them in fpdf2.

Reading the PDF spec is not always very engaging. Depending on how you best learn things, you may want instead to check "pretty formatted" (using qpdf) example PDF files, or directly look at how we implement things in fpdf2 with Python code.

Note that there is a useful example of PDF syntax at the end of section 7.6, "Encryption", of the 1.7 PDF spec (just before the start of section 7.7), demonstrating a basic us of encryption in a PDF file.

Good luck working on this! 💪

@Lucas-C
Copy link
Member Author

Lucas-C commented Aug 18, 2022

As it has been a few months now without any update, I guess this issue is up-for-grabs 😊

Anybody is welcome to give it a try!

@andersonhc
Copy link
Collaborator

I'll give it a try

@andersonhc
Copy link
Collaborator

andersonhc commented Nov 16, 2022

My changes are progressing well. If you want to take a look how it's going it's all on my fork.
I still need to go through all the functions and make sure everything works (specially linearization), add the tutorial and tests.

This is an example :

from fpdf import FPDF
from fpdf.encryption import EncryptionMethod

pdf = FPDF()
pdf.add_page()
pdf.title = 'test'
pdf.set_font('helvetica', size=12)
pdf.cell(txt="hello world")
#em = EncryptionMethod.AES_128
em = None
pdf.set_encryption(owner_password="123",
            user_password=None,
            encryption_method=em,
            allow_print_lowres=False, 
            allow_modify=False,
            allow_copy=False,
            allow_annotation=False,
            allow_fill_forms=False,
            allow_copy_accessibility=False,
            allow_assemble=False,
            allow_print_highres=False)
pdf.output("hello_world.pdf")
print('complete')

@Lucas-C
Copy link
Member Author

Lucas-C commented Nov 16, 2022

My changes are progressing well. If you want to take a look how it's going it's all on my fork.
I still need to go through all the functions and make sure everything works (specially linearization), add the tutorial and tests.

Wow, I checked your fork, you have already done some solid work! Congrats!

Don't worry about linearization, it is not supported yet by fpdf2 (cf. #62)

Regarding the method API you suggested, it seems nice.
I think you may be better off using enum.flag to implement permissions, and put it in fpdf.enums.
This way you could have a single permissions param for set_encryption():

pdf.set_encryption(owner_password="123", permissions=PrintLowres | Modify | FillForms)

@andersonhc andersonhc mentioned this issue Nov 21, 2022
5 tasks
@Lucas-C
Copy link
Member Author

Lucas-C commented Dec 28, 2022

This has been implemented by @andersonhc in #609 - thank you!

Documentation: https://pyfpdf.github.io/fpdf2/Encryption.html

This will be released in version 2.6.1

Closing this issue now

@Lucas-C Lucas-C closed this as completed Dec 28, 2022
@kangik0817
Copy link

A pdf file with a password set with EncryptionMethod.NO_ENCRYPTION option was opened without a password in google chrome on mac os. But a native pdf app asks password.

output.pdf

Files with a password set in an app other than fpdf2 ask for a password in both cases.

@andersonhc
Copy link
Collaborator

A pdf file with a password set with EncryptionMethod.NO_ENCRYPTION option was opened without a password in google chrome on mac os. But a native pdf app asks password.

output.pdf

Files with a password set in an app other than fpdf2 ask for a password in both cases.

Did you try with another encryption method? Either rc4 or aes should require a password even on chrome

@kangik0817
Copy link

kangik0817 commented Dec 30, 2022

A pdf file with a password set with EncryptionMethod.NO_ENCRYPTION option was opened without a password in google chrome on mac os. But a native pdf app asks password.
output.pdf
Files with a password set in an app other than fpdf2 ask for a password in both cases.

Did you try with another encryption method? Either rc4 or aes should require a password even on chrome

Yes, those are working fine.

But, those have another issue. When using none ascii font, characters are broken.

password: 1113
none-ascii-font.pdf

@andersonhc
Copy link
Collaborator

Does it work if you don't encrypt the file? Can you provide a minimum reproducible example?

@andersonhc
Copy link
Collaborator

I was able to reproduce the problem. The font content stream is not being encrypted and causing the problem.
I'm starting a pull request to fix it.

@SfinxV
Copy link

SfinxV commented Jan 28, 2023

I was able.......

Hi! I want to report a problem when enabling encryption (RC4 and AES128 methods). After encryption, hyperlinks on objects are corrupted and turn into a link to the file itself. The link "http://google.com" turns into something like "file:///G|....path..../1.pdf"

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 28, 2023

I was able to reproduce the problem. The font content stream is not being encrypted and causing the problem.
I'm starting a pull request to fix it.

For information, this was fixed by @andersonhc in #655

Hi! I want to report a problem when enabling encryption (RC4 and AES128 methods). After encryption, hyperlinks on objects are corrupted and turn into a link to the file itself. The link "http://google.com" turns into something like "file:///G|....path..../1.pdf"

Thank you for the report @SfinxV.

I was able to reproduce this bug and opened this dedicated issue: #672

Everyone is welcome to have a look to fix this.

@Lucas-C
Copy link
Member Author

Lucas-C commented Jan 28, 2023

@allcontributors please add @SfinxV for bug

@allcontributors
Copy link

@Lucas-C

I've put up a pull request to add @SfinxV! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants