Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in add_attachment() when attaching several files #2090

Open
alexis-via opened this issue Aug 15, 2023 · 0 comments · May be fixed by #2197
Open

Bug in add_attachment() when attaching several files #2090

alexis-via opened this issue Aug 15, 2023 · 0 comments · May be fixed by #2197
Labels
workflow-annotation Everything about annotating PDF files

Comments

@alexis-via
Copy link

alexis-via commented Aug 15, 2023

While reading the source code of add_attachment(), I found out that it doesn't comply with the PDF specifications when it is called multiple times to attach multiple files.

In PDF reference 1.7 section 3..8.5 "Name Trees", it says:
<< The Names entries in the leaf (or root) nodes contain the tree’s keys and their associated values, arranged in key-value pairs and sorted lexically in ascending order by key.>>

In the current implementation, the order of the keys correspond to the order of the calls to add_attachment().
You can test with the following code:

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
writer.add_blank_page(100, 100)
writer.add_attachment("zz.txt", b"ZZ file content")
writer.add_attachment("aa.txt", b"AA file content")

with open("two_attachments.pdf", 'wb') as f:
    writer.write(f)
    f.close()

When you look at the generated PDF file, in /Names/EmbeddedFiles/Names, you will have:

  1. zz.txt
  2. aa.txt

The PDF specs says that they should be sorted in alphabetical order, so you should have:

  1. aa.txt
  2. zz.txt

From my experience, many PDF readers don't care about that (evince, PDF Studio viewer), but Acrobat Reader DC will be impacted by this: Acrobat Reader will display the attachments but, when the user tries to save the attachment to disk or open it, it won't work (without any error message).

I discovered this in 2018 when someone reported a bug on my factur-x lib when using the possibility to add additional attachments and opening the resulting file in Acrobat Reader DC. And I remember going crazy when working on this bug because I was able to reproduce the bug with some filenames... and the bug would disappear just by changing the filename !!! Eventually, the guy who reported the bug found this small detail in the PDF reference about sorting by alphabetical order and, with that information, I was able to fix it. The fix just involved calling sorted() to order the filenames by alphabetical order:
akretion/factur-x@a3ebfa4

The possibility to add multiple attachments was added in this PR #1611 by @pubpub-zz

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Sep 17, 2023
@MartinThoma MartinThoma added the workflow-annotation Everything about annotating PDF files label Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workflow-annotation Everything about annotating PDF files
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants