-
Notifications
You must be signed in to change notification settings - Fork 35
Closed
Description
When generating XMP metadata in drafthorse/pdf.py:158, the data is embedded in the XML template string unescaped. Now, if any non-XML-safe data comes in, the output XMP is invalid and will make the output PDF unparseable as Factur-X.
This will produce something like this when reading the file back:
pypdf.errors.PdfReadError: XML in XmpInformation was invalid: not well-formed (invalid token)
Or when the file is handled with Mustang:
WARNING: Problems with parsing metadata. XML parsing failure
org.verapdf.xmp.XMPException: XML parsing failure
...
Caused by: org.xml.sax.SAXParseException; lineNumber: 17; columnNumber: 28; The entity name must immediately follow the '&' in the entity reference.
There are two easy cases to make this occur in practice with metadata automatically extracted from Factur-X payload in drafthorse/pdf.py:294
- Selling company name in
ApplicableHeaderTradeAgreement/SellerTradePartyName/Name. Generating invoices forMichael & Sonwill fail - Invoice number in
ExchangedDocument/ID. Less likely, but still possible. Fails with an invoice number likeA&A-1
All metadata going to XMP generation should be escaped.
I'll be creating a PR to fix it.
Metadata
Metadata
Assignees
Labels
No labels