Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contradicting information about the encoding of TrueType fonts #316

Open
seehuhn opened this issue Aug 7, 2023 · 6 comments
Open

contradicting information about the encoding of TrueType fonts #316

seehuhn opened this issue Aug 7, 2023 · 6 comments
Labels
bug Something isn't correct

Comments

@seehuhn
Copy link

seehuhn commented Aug 7, 2023

Table 112 (Entries in an encoding dictionary) in section 9.6.5.1 states about the Differences entry of an encoding dictionary, that the entry "should not be used with TrueType fonts".

Section 9.6.5.4 (Encodings for TrueType fonts) has a section beginning with "The following paragraphs describe the treatment of TrueType font encodings beginning with PDF 1.3." In this section, it is described how a table that maps from character codes to glyph names is constructed. As part of this process, the description states "Any entries in the Differences array shall be used to update the table."

These two parts of the PDF spec seem to contradict each other, since the table states not to use the differences array, and the later section indicates the differences array can be used to describe the encoding.

The text should be clarified to remove this contradiction. Maybe the table is meant to say "should not be used with TrueType fonts for PDF versions before PDF 1.3"? Or maybe the text is section 9.6.5.4 should be updated to describe how to describe the encoding without using the differences array?

Use of differences arrays seems to be supported in practice. The attached PDF file includes a TrueType font which uses a differences array, and the text displays correctly in Adobe Acrobat Reader, in the Preview app on MacOS, and in the PDF viewer built into Google Chrome (also on MacOS).

truetype.pdf

@seehuhn seehuhn added the bug Something isn't correct label Aug 7, 2023
@petervwyatt
Copy link
Member

From an editorial (non-technical) PoV this recommendation ("should") and requirement ("shall") are not conflicting when read with an understanding of "ISO-ese": Differences is not recommended ("should") for TrueType but when Differences is present for TrueType then it must always ("shall") be used. Practically that means Differences cannot be ignored on the assumed few times it will be present for TrueType fonts.

Note: I have not addressed the technical logic behind why Differences is not recommended for TrueType font.

@seehuhn
Copy link
Author

seehuhn commented Aug 7, 2023

Thank you for your quick response. I did indeed not fully appreciate the the difference between "shall" and "should".

Even if the text of the specification is correct as is, it might still make sense to add some guidance for application writers about how TrueType fonts should be embedded by new software. I am trying to generate PDF files which embed TrueType fonts (like the one attached to the issue, above). If Differences arrays were ok to use, it would be possible to select different sets of glyphs from one larger font program in different font dictionaries. If the encoding for this use case in practice needs to be specified in the TrueType "cmap" table, this would require to embed a separate font program for each font dict.

@lrosenthol
Copy link
Contributor

If Differences arrays were ok to use, it would be possible to select different sets of glyphs from one larger font program in different font dictionaries.

It was never envisioned that one could do that - and for good reason, it makes downstream PDF modification extremely difficult (or more difficult).

@seehuhn
Copy link
Author

seehuhn commented Aug 8, 2023

It was never envisioned that one could do that - and for good reason, it makes downstream PDF modification extremely difficult (or more difficult).

But this approach is explicitly mentioned as being possible in section 9.6.5.1: "Some character sets consist of more than 256 characters, including ligatures, accented characters, and other symbols required for high-quality typography or non-Latin writing systems. Different encodings may select different subsets of the same character set."

@seehuhn
Copy link
Author

seehuhn commented Aug 12, 2023

Here are some thoughts about what could be done to make the text of the spec more consistent:

  • In section 9.6.5.4 (Encodings for TrueType fonts), second bullet point: If differences arrays should not be used, probably "A nonsymbolic font" should be replaced with something like "A font that is used to display glyphs that use either MacRomanEncoding or WinAnsiEncoding", to match up with the following bullet point.
  • Since for TrueType fonts the symbolic/nonsymbolic flag should be set also for fonts which use glyphs inside the Standard Latin character set, but are not restricted to either MacRomanEncoding or WinAnsiEncoding, the description of the Symbolic and Nonsymbolic flags in table 121 should maybe get an extra clause stating the rules for TrueType fonts.

There is also a potential contradiction between the rules on page 326, and the text underneath table 113. The text on page 326 gives the rules for the case when "the font has a named Encoding entry of either MacRomanEncoding or WinAnsiEncoding, or if the font descriptor’s Nonsymbolic flag [...] is set". On the following page, after table 113 the text gives rules for when "the font has no Encoding entry, or the font descriptor’s Symbolic flag is set (in which case the Encoding entry is ignored)". This leaves us with the following situation:

  • Symbolic, Encoding entry of either MacRomanEncoding or WinAnsiEncoding: rules from page 326 or rules after table 113 ???
  • Symbolic, other Encodings: rules after table 113
  • Symbolic, no Encoding: rules after table 113
  • Nonsymbolic, Encoding entry of either MacRomanEncoding or WinAnsiEncoding: rules from page 326
  • Nonsymbolic, other Encodings: rules from page 326
  • Nonsymbolic, no Encoding: rules from page 326 or rules after table 113 ???

It is not clear to me which set of rules applies for the first and last case in this list. Maybe this could be clarified in the spec?

@seehuhn
Copy link
Author

seehuhn commented Aug 16, 2023

I looked at older versions of the spec. In the PDF 1.4 spec, the description does not yet make use of the symbolic/non-symbolic flags. There is just says (in many words): if an /Encoding entry is given, it is used. Otherwise the “cmap” subtable with platform ID 1 and encoding 0 will be used. At the time they also still allowed MacExpertEncoding, which is no longer allowed in the current spec. Thus, if the intention was to be backwards compatible, the two problematic cases above would be resolved as follows:

  • Symbolic, Encoding entry of either MacRomanEncoding or WinAnsiEncoding: rules from page 326
  • Nonsymbolic, no Encoding: rules after table 113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't correct
Projects
None yet
Development

No branches or pull requests

3 participants