Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WinAnsiEncoding for built-in fonts #68

Merged
merged 1 commit into from Oct 17, 2020

Conversation

robinkrahl
Copy link
Contributor

Previously, PdfLayerReference::write_text would use the UTF-8 encoding
for built-in fonts. But the built-in fonts only support a very limited
set of encodings, see Appendix D of the PDF Reference [0]. When adding
the font to the PDF document, printpdf already selects the
WinAnsiEncoding (Windows-1252, see the Into implementation
for BuiltinFont). This would lead to encoding issues for all non-ASCII
characters, see also issue #30.

With this patch, we use lopdf::Document::encode_text to encode the text
for built-in fonts with the WinAnsiEncoding. This fixes the encoding
issues for built-in fonts. Note that that function just drops
characters that are not supported by the encoding.

This patch also adds notes about this behaviour to the documentation of
the add_builtin_font, use_text and write_text methods.

Fixes #30.

[0] https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf

Previously, PdfLayerReference::write_text would use the UTF-8 encoding
for built-in fonts.  But the built-in fonts only support a very limited
set of encodings, see Appendix D of the PDF Reference [0].  When adding
the font to the PDF document, printpdf already selects the
WinAnsiEncoding (Windows-1252, see the Into<LoDictionary> implementation
for BuiltinFont).  This would lead to encoding issues for all non-ASCII
characters, see also issue fschutt#30.

With this patch, we use lopdf::Document::encode_text to encode the text
for built-in fonts with the WinAnsiEncoding.  This fixes the encoding
issues for built-in fonts.  Note that that function just drops
characters that are not supported by the encoding.

This patch also adds notes about this behaviour to the documentation of
the add_builtin_font, use_text and write_text methods.

Fixes fschutt#30.

[0] https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf
@fschutt fschutt merged commit a7fc34c into fschutt:master Oct 17, 2020
@robinkrahl
Copy link
Contributor Author

Thanks for the merge! Do you have a schedule for the next release?

@robinkrahl robinkrahl deleted the win-ansi-encoding branch October 17, 2020 12:07
@fschutt
Copy link
Owner

fschutt commented Oct 17, 2020

@robinkrahl 0.3.3 is released

@robinkrahl
Copy link
Contributor Author

Great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UTF-8 vs PDF built-in fonts
2 participants