contradicting information about the encoding of TrueType fonts #316

seehuhn · 2023-08-07T10:52:00Z

Table 112 (Entries in an encoding dictionary) in section 9.6.5.1 states about the Differences entry of an encoding dictionary, that the entry "should not be used with TrueType fonts".

Section 9.6.5.4 (Encodings for TrueType fonts) has a section beginning with "The following paragraphs describe the treatment of TrueType font encodings beginning with PDF 1.3." In this section, it is described how a table that maps from character codes to glyph names is constructed. As part of this process, the description states "Any entries in the Differences array shall be used to update the table."

These two parts of the PDF spec seem to contradict each other, since the table states not to use the differences array, and the later section indicates the differences array can be used to describe the encoding.

The text should be clarified to remove this contradiction. Maybe the table is meant to say "should not be used with TrueType fonts for PDF versions before PDF 1.3"? Or maybe the text is section 9.6.5.4 should be updated to describe how to describe the encoding without using the differences array?

Use of differences arrays seems to be supported in practice. The attached PDF file includes a TrueType font which uses a differences array, and the text displays correctly in Adobe Acrobat Reader, in the Preview app on MacOS, and in the PDF viewer built into Google Chrome (also on MacOS).

truetype.pdf

The text was updated successfully, but these errors were encountered:

petervwyatt · 2023-08-07T11:16:09Z

From an editorial (non-technical) PoV this recommendation ("should") and requirement ("shall") are not conflicting when read with an understanding of "ISO-ese": Differences is not recommended ("should") for TrueType but when Differences is present for TrueType then it must always ("shall") be used. Practically that means Differences cannot be ignored on the assumed few times it will be present for TrueType fonts.

Note: I have not addressed the technical logic behind why Differences is not recommended for TrueType font.

seehuhn · 2023-08-07T11:26:58Z

Thank you for your quick response. I did indeed not fully appreciate the the difference between "shall" and "should".

Even if the text of the specification is correct as is, it might still make sense to add some guidance for application writers about how TrueType fonts should be embedded by new software. I am trying to generate PDF files which embed TrueType fonts (like the one attached to the issue, above). If Differences arrays were ok to use, it would be possible to select different sets of glyphs from one larger font program in different font dictionaries. If the encoding for this use case in practice needs to be specified in the TrueType "cmap" table, this would require to embed a separate font program for each font dict.

lrosenthol · 2023-08-08T12:32:37Z

If Differences arrays were ok to use, it would be possible to select different sets of glyphs from one larger font program in different font dictionaries.

It was never envisioned that one could do that - and for good reason, it makes downstream PDF modification extremely difficult (or more difficult).

seehuhn · 2023-08-08T14:14:13Z

It was never envisioned that one could do that - and for good reason, it makes downstream PDF modification extremely difficult (or more difficult).

But this approach is explicitly mentioned as being possible in section 9.6.5.1: "Some character sets consist of more than 256 characters, including ligatures, accented characters, and other symbols required for high-quality typography or non-Latin writing systems. Different encodings may select different subsets of the same character set."

seehuhn · 2023-08-12T13:13:00Z

Here are some thoughts about what could be done to make the text of the spec more consistent:

In section 9.6.5.4 (Encodings for TrueType fonts), second bullet point: If differences arrays should not be used, probably "A nonsymbolic font" should be replaced with something like "A font that is used to display glyphs that use either MacRomanEncoding or WinAnsiEncoding", to match up with the following bullet point.
Since for TrueType fonts the symbolic/nonsymbolic flag should be set also for fonts which use glyphs inside the Standard Latin character set, but are not restricted to either MacRomanEncoding or WinAnsiEncoding, the description of the Symbolic and Nonsymbolic flags in table 121 should maybe get an extra clause stating the rules for TrueType fonts.

There is also a potential contradiction between the rules on page 326, and the text underneath table 113. The text on page 326 gives the rules for the case when "the font has a named Encoding entry of either MacRomanEncoding or WinAnsiEncoding, or if the font descriptor’s Nonsymbolic flag [...] is set". On the following page, after table 113 the text gives rules for when "the font has no Encoding entry, or the font descriptor’s Symbolic flag is set (in which case the Encoding entry is ignored)". This leaves us with the following situation:

Symbolic, Encoding entry of either MacRomanEncoding or WinAnsiEncoding: rules from page 326 or rules after table 113 ???
Symbolic, other Encodings: rules after table 113
Symbolic, no Encoding: rules after table 113
Nonsymbolic, Encoding entry of either MacRomanEncoding or WinAnsiEncoding: rules from page 326
Nonsymbolic, other Encodings: rules from page 326
Nonsymbolic, no Encoding: rules from page 326 or rules after table 113 ???

It is not clear to me which set of rules applies for the first and last case in this list. Maybe this could be clarified in the spec?

seehuhn · 2023-08-16T10:02:41Z

I looked at older versions of the spec. In the PDF 1.4 spec, the description does not yet make use of the symbolic/non-symbolic flags. There is just says (in many words): if an /Encoding entry is given, it is used. Otherwise the “cmap” subtable with platform ID 1 and encoding 0 will be used. At the time they also still allowed MacExpertEncoding, which is no longer allowed in the current spec. Thus, if the intention was to be backwards compatible, the two problematic cases above would be resolved as follows:

Symbolic, Encoding entry of either MacRomanEncoding or WinAnsiEncoding: rules from page 326
Nonsymbolic, no Encoding: rules after table 113

seehuhn added the bug Something isn't correct label Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contradicting information about the encoding of TrueType fonts #316

contradicting information about the encoding of TrueType fonts #316

seehuhn commented Aug 7, 2023

petervwyatt commented Aug 7, 2023

seehuhn commented Aug 7, 2023

lrosenthol commented Aug 8, 2023

seehuhn commented Aug 8, 2023

seehuhn commented Aug 12, 2023

seehuhn commented Aug 16, 2023

contradicting information about the encoding of TrueType fonts #316

contradicting information about the encoding of TrueType fonts #316

Comments

seehuhn commented Aug 7, 2023

petervwyatt commented Aug 7, 2023

seehuhn commented Aug 7, 2023

lrosenthol commented Aug 8, 2023

seehuhn commented Aug 8, 2023

seehuhn commented Aug 12, 2023

seehuhn commented Aug 16, 2023