-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legacy fonts can hava a NameRecord not encoded in UTF-16BE #643
Comments
ICU is waaaay too massive for libass to use, if I’m not mixing anything up. But the issue is, of course, real; thanks for creating a dedicated ticket for it. I already have name decoding code in https://github.com/astiob/libass/tree/debug-fonts. It should probably be adapted into mainline libass. |
Ok. I don't have a good knowledge of C.
I don't think it is a good idea to use mac platform id. Here is what the Apple documentation mentions: |
That means about as much as the Microsoft docs not mentioning Windows 95 quirks that GDI nevertheless emulates to this day. It’s “discouraged” to use non-Unicode fonts at all, but that’s exactly what we’re trying to do here. (And IIRC macOS itself still preferred Macintosh-platform names when I last checked.) That branch is (as the name suggests) meant for debugging font issues, so it dumps all the information it can find in a font. What we actually want (ideally) is what VSFilter’s GDI calls use:
so all Microsoft-platform encodings, as well as Macintosh-platform MacRoman (whatever version of it is implemented in Windows). |
Of course, we don’t currently support MacRoman cmaps, either, and I don’t remember if I’ve ever seen a font that lacked Microsoft-platform data and actually worked in VSFilter. (Zapfino lacks them, and it doesn’t work in VSFilter.) But anyway, just Microsoft-platform names for now would be plenty good, to match our support of Microsoft-platform cmaps. |
Here is 2 font that should not use utf-16be to be decoded. |
I spoke with a Microsoft employee and he told me that GDI performed this processing: # This NameRecord is from 文鼎中特廣告體 - PlatEncID 4.ttf
name_record = NameRecord()
name_record.nameID = 1
name_record.string = b"\x00\xa4\x00\xe5\x00\xb9\x00\xa9\x00\xa4\x00\xa4\x00\xaf\x00S\x00\xbc\x00s\x00\xa7\x00i\x00\xc5\x00\xe9"
name_record.platformID = 3
name_record.platEncID = 4
name_record.langID = 0
encoding = get_name_record_encoding(name_record)
if name_record.platformID == 3 and encoding != "utf_16_be":
name_to_decode = name_record.string.replace(b"\x00", b"")
else:
name_to_decode = name_record.string
decoded_name = name_to_decode.decode(encoding) |
Currently, libass always decode family name with utf-16be:
libass/libass/ass_fontselect.c
Lines 283 to 284 in a48c98c
But, Microsoft NameRecord don't always use utf-16be.
To know how libass should decode properly namerecord, see: MicrosoftDocs/typography-issues#956 (comment)
Something like this could be added in ass_utils.c:
Finally, to decode byte into utf-8, libass could use ICU: https://unicode-org.github.io/icu/userguide/conversion/converters.html#1-single-string
PS: To test if it decode properly namerecord with BIG_5, download the font in this issue: MicrosoftDocs/typography-issues#956 (comment)
The text was updated successfully, but these errors were encountered: