Legacy fonts can hava a NameRecord not encoded in UTF-16BE #643

moi15moi · 2022-08-22T19:13:37Z

Currently, libass always decode family name with utf-16be:

Lines 283 to 284 in a48c98c

    
           ass_utf16be_to_utf8(buf, sizeof(buf), (uint8_t *)name.string, 
        
                               name.string_len);

But, Microsoft NameRecord don't always use utf-16be.
To know how libass should decode properly namerecord, see: MicrosoftDocs/typography-issues#956 (comment)

Something like this could be added in ass_utils.c:

char* get_name_encoding(FT_SfntName name) {
    if (name.platform_id == TT_PLATFORM_MICROSOFT)
    {
        switch (name.encoding_id)
        {
            case TT_MS_ID_PRC:
                return "windows-936";
            break;

            case TT_MS_ID_BIG_5:
                return (name.name_id == TT_NAME_ID_FONT_SUBFAMILY) ? "UTF-16BE" : "windows-950";
            break;

            case TT_MS_ID_WANSUNG:
                return (name.name_id == TT_NAME_ID_FONT_SUBFAMILY) ? "UTF-16BE" : "windows-949";
            break;

            default:
                return "UTF-16BE";
        }
    }
}

Finally, to decode byte into utf-8, libass could use ICU: https://unicode-org.github.io/icu/userguide/conversion/converters.html#1-single-string

PS: To test if it decode properly namerecord with BIG_5, download the font in this issue: MicrosoftDocs/typography-issues#956 (comment)

astiob · 2022-08-22T21:07:17Z

ICU is waaaay too massive for libass to use, if I’m not mixing anything up.

But the issue is, of course, real; thanks for creating a dedicated ticket for it. I already have name decoding code in https://github.com/astiob/libass/tree/debug-fonts. It should probably be adapted into mainline libass.

moi15moi · 2022-08-22T21:38:33Z

ICU is waaaay too massive for libass to use, if I’m not mixing anything up.

Ok. I don't have a good knowledge of C.

I already have name decoding code in https://github.com/astiob/libass/tree/debug-fonts.

I don't think it is a good idea to use mac platform id.

Here is what the Apple documentation mentions: Names with platformID 1 were required by earlier versions of macOS. Its use on modern platforms is discouraged.

astiob · 2022-08-22T21:56:40Z

Here is what the Apple documentation mentions: Names with platformID 1 were required by earlier versions of macOS. Its use on modern platforms is discouraged.

That means about as much as the Microsoft docs not mentioning Windows 95 quirks that GDI nevertheless emulates to this day. It’s “discouraged” to use non-Unicode fonts at all, but that’s exactly what we’re trying to do here. (And IIRC macOS itself still preferred Macintosh-platform names when I last checked.)

That branch is (as the name suggests) meant for debugging font issues, so it dumps all the information it can find in a font. What we actually want (ideally) is what VSFilter’s GDI calls use:

For TrueType fonts, uses names with the same platform and encoding as the first valid Microsoft-platform cmap (if any) or MacRoman cmap (otherwise). Never uses Unicode-platform names.

so all Microsoft-platform encodings, as well as Macintosh-platform MacRoman (whatever version of it is implemented in Windows).

astiob · 2022-08-22T22:01:17Z

Of course, we don’t currently support MacRoman cmaps, either, and I don’t remember if I’ve ever seen a font that lacked Microsoft-platform data and actually worked in VSFilter. (Zapfino lacks them, and it doesn’t work in VSFilter.) But anyway, just Microsoft-platform names for now would be plenty good, to match our support of Microsoft-platform cmaps.

moi15moi · 2023-03-20T15:34:19Z

Here is 2 font that should not use utf-16be to be decoded.
fonts.zip

moi15moi · 2023-08-04T12:38:31Z

I spoke with a Microsoft employee and he told me that GDI performed this processing:

# This NameRecord is from 文鼎中特廣告體 - PlatEncID 4.ttf
name_record = NameRecord()
name_record.nameID = 1
name_record.string = b"\x00\xa4\x00\xe5\x00\xb9\x00\xa9\x00\xa4\x00\xa4\x00\xaf\x00S\x00\xbc\x00s\x00\xa7\x00i\x00\xc5\x00\xe9"
name_record.platformID = 3
name_record.platEncID = 4
name_record.langID = 0

encoding = get_name_record_encoding(name_record)

if name_record.platformID == 3 and encoding != "utf_16_be":
    name_to_decode = name_record.string.replace(b"\x00", b"")
else:
    name_to_decode = name_record.string

decoded_name = name_to_decode.decode(encoding)

moi15moi changed the title ~~[Bug] Fail decode namerecord string~~ [Bug] Wrong encoding is been used to decode NameRecord Aug 22, 2022

TheOneric added bug fonts compatibility and removed bug labels Aug 22, 2022

TheOneric changed the title ~~[Bug] Wrong encoding is been used to decode NameRecord~~ Legacy fonts can hava a NameRecord not encoded in UTF-16BE Aug 22, 2022

astiob mentioned this issue Jan 26, 2023

Fonts without Microsoft-platform cmaps use Mac NameRecord #679

Open

moi15moi mentioned this issue Feb 26, 2023

Correct cmap decoding, name decoding and wrap style moi15moi/FontCollector#17

Merged

moi15moi mentioned this issue Mar 27, 2023

Fallback decode name moi15moi/FontCollector#21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legacy fonts can hava a NameRecord not encoded in UTF-16BE #643

Legacy fonts can hava a NameRecord not encoded in UTF-16BE #643

moi15moi commented Aug 22, 2022

astiob commented Aug 22, 2022

moi15moi commented Aug 22, 2022

astiob commented Aug 22, 2022 •

edited

astiob commented Aug 22, 2022 •

edited

moi15moi commented Mar 20, 2023

moi15moi commented Aug 4, 2023

Legacy fonts can hava a NameRecord not encoded in UTF-16BE #643

Legacy fonts can hava a NameRecord not encoded in UTF-16BE #643

Comments

moi15moi commented Aug 22, 2022

astiob commented Aug 22, 2022

moi15moi commented Aug 22, 2022

astiob commented Aug 22, 2022 • edited

astiob commented Aug 22, 2022 • edited

moi15moi commented Mar 20, 2023

moi15moi commented Aug 4, 2023

astiob commented Aug 22, 2022 •

edited

astiob commented Aug 22, 2022 •

edited