Unescaped data going to XMP metadata makes output documents unparseable

When generating XMP metadata in [drafthorse/pdf.py:158](https://github.com/pretix/python-drafthorse/blob/2af5f671ba0d513f3519c889b9909642e05aba0f/drafthorse/pdf.py#L158), the data is embedded in the XML template string unescaped. Now, if any non-XML-safe data comes in, the output XMP is invalid and will make the output PDF unparseable as Factur-X.

This will produce something like this when reading the file back:
`pypdf.errors.PdfReadError: XML in XmpInformation was invalid: not well-formed (invalid token)`

Or when the file is handled with Mustang:
```
WARNING: Problems with parsing metadata. XML parsing failure
org.verapdf.xmp.XMPException: XML parsing failure
...
Caused by: org.xml.sax.SAXParseException; lineNumber: 17; columnNumber: 28; The entity name must immediately follow the '&' in the entity reference.
```

There are two easy cases to make this occur in practice with metadata automatically extracted from Factur-X payload in [drafthorse/pdf.py:294](https://github.com/pretix/python-drafthorse/blob/2af5f671ba0d513f3519c889b9909642e05aba0f/drafthorse/pdf.py#L294)
- Selling company name in `ApplicableHeaderTradeAgreement/SellerTradePartyName/Name`. Generating invoices for `Michael & Son`will fail
- Invoice number in `ExchangedDocument/ID`. Less likely, but still possible. Fails with an invoice number like `A&A-1`

All metadata going to XMP generation should be escaped.

I'll be creating a PR to fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unescaped data going to XMP metadata makes output documents unparseable #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unescaped data going to XMP metadata makes output documents unparseable #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions