New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
directwrite: fix match_fonts
#503
Conversation
libass/ass_directwrite.c
Outdated
|
||
add_font(font, fontFamily, provider); | ||
UINT32 fontCount = IDWriteFontFamily_GetFontCount(fontFamily); | ||
for (UINT32 fontIdx = 0; fontIdx < fontCount; ++fontIdx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of style nits: libass uses snake_case
instead of camelCase
, so it's better to follow that. Also I think that such long names for variables used in a couple of neighboring lines is overkill: n
is enough for fontCount
and i
for fontIdx
. And AFAIK we've decided to prefer postfix increment.
a81ef4f
to
f5f413d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good, although I haven’t tested. (I can easily test Windows binaries, but building my own is not very convenient.) There’s a naming bikeshed, but I’m not sure how long we should dwell on it.
Note: #502 implies this whole approach isn’t good enough for font managers that dynamically activate fonts, and native GDI calls are needed to cater for them in any case. This isn’t a regression—our DirectWrite provider has never worked with that—but if we do pursue that, then it seems to me that this code is effectively a stopgap.
My remaining worry is: is it possible that DirectWrite’s notion of “family” is too narrow in some situation, and the system has another font with the given name that it fails to count as part of the family and therefore fails to give us?
libass/ass_directwrite.c
Outdated
@@ -677,7 +677,7 @@ static void match_fonts(void *priv, ASS_Library *lib, | |||
{ | |||
ProviderPrivate *provider_priv = (ProviderPrivate *)priv; | |||
IDWriteFont *font = NULL; | |||
IDWriteFontFamily *fontFamily = NULL; | |||
IDWriteFontFamily *fontfamily = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a little naming war here 😄 add_font
spells this as fontFamily
, matching DirectWrite’s GetFontFamily(IDWriteFontFamily **fontFamily)
, so I took the liberty to change it in the first PR to match. @MrSmile wants snake_case
. You seem to like fontfamily
as a single word.
Currently, this file uses a mix of snake_case
and camelCase
in its existing names but seems to spell each individual name consistently. My impression is that it uses camelCase
for formal arguments in function prototypes and for local variables used as actual arguments in function calls, and snake_case
for everything else. This is likely intended to match DirectWrite’s parameter names and parameter naming convention.
Given this, to me, it (still) seems most consistent to name this fontFamily
as it’s used as the actual argument passed as fontFamily
in the add_font
calls. Your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that naming style should be consistent throughout libass, i.e. always snake_case
when possible (perhaps excluding copy-pasted prototypes, but I would have adapted even them). So I think that argument of add_font
should be font_family
(or maybe even simply family
). It's not a target for this PR to fix style of old code, but I think all new variables should follow libass style conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I always believe local consistency trumps global consistency. If this hunk is to introduce font_family
, we’ll end up with two different names for the same thing in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I always believe local consistency trumps global consistency.
In that case style violation once introduced risks to never get dedicated cosmetic PR to fix it. Also such restyle-only commits complicate code history and blaming. I think incremental fixing around new changes is superior. And in this case I believe there are no consistency at all, as even local variables in the same function have named differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses camel case for DirectWrite function arguments to match their declared formal parameter names, and it propagates these camel-case names throughout the code to ensure each name has exactly one spelling throughout the file. It uses snake case for other variables and fields to match libass’s default style.
This may have been a poor choice, but at least it makes some sense to me and is easy to explain and describe. If we start half-changing existing names, it will just be chaos instead.
For fontFamily
especially, just searching for fontFamily
within the file currently immediately brings up the full chain of control it goes through.
Also such restyle-only commits complicate code history and blaming.
When a logic change touches a line anyway, then fixing minor style nits at the same time reduces noise. But with larger changes, I find it to be the exact opposite: when blaming, if I can tell that the last commit to change the line is purely stylistic from the commit message, I can immediately skip over it; but if it’s a commit that mixes logic with style, I must spend time figuring out why that commit touched this line.
In this particular case, I see a change from fontFamily
as increasing entropy, not decreasing it. It would be interesting to hear the opinions of @rcombs and @TheOneric, as well as a word from @Apache553 who used neither option and wrote it as a single word :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there seems to be some regularity to the current naming and to preserve the ability to find all instances of a variable being used (even in other functions it's passed to), sticking to fontFamily
in this commit seems reasonable to me.
Also adjusting all other instances of fontFamily
in this commit would just introduce noise. If we want to change this, it should be done for the entire file and preferably in a separate commit.
For font managers that dynamically load fonts to avoid font installation, the possible cause that some programs using DirectWrite failed to find the font is the system font cache. The system font collection used by libass in previous verison to enumerate installed fonts may be not updated without delay (see msdn). It's like some kind of magic due to the lack of documentation of the font cache. So I don't have much idea. But at least my tools using
As the MSDN page tells:
This family name comes from truetype/opentype name table. So I think DirectWrite's "family" is what we expected. By the way, for some historical reasons, a font family in GDI can only hold up to 4 fonts. So the GDI family name may differ from its real family name. I am not sure whether it will be a problem. But I guess it's ok since we have done font selection by ourselves. |
Yes, but a single font can have many names in its name table :-) I don’t know how exactly DirectWrite groups them. But neither do you, I imagine.
Hmmmm. Are you saying GDI’s font selection can be weird when there are more than four same-family-name fonts available? Our goal in font selection is to emulate GDI’s |
IIRC the family name have its own id. id 1 for gdi-compatible family name, and id 16 for typographic family name (my real family name). DirectWrite provides almost each name according to the id as
Yes. I have once installed a third-party font which shared the family name with "Heiti", which is a basic system font. Then every place using GDI and "Heiti" is showing the newly installed font so that I have to remove it. I think it's an abnormal condition something like the undefined behavior of c++. The behavior was shown long time ago and I cannot try it again since I lost those troublesome fonts. I think there is no need to emulate that until someone complains. |
Yep, in this context I’m primarily worried about localized names (because I can’t really imagine what could go wrong other than localized names). Something stupid like, two fonts that share a Chinese family name but have different English family names. Then again, localized names probably pose a problem in any case, as Windows presumably still matches the English name and the current-system-locale name but not other names. Ultimately, we can’t fully 100% truly always match VSFilter on localized names anyway, as it’s using a GDI call that does exactly that kind of locale-sensitive font selection and we can never know what locale the script author used. At that point, the stupid worrying case rather becomes two fonts that share an English family name and have different Chinese family names.
Doesn’t that seem like merely one font overriding the other in a situation where they can’t be distinguished by any parameter? But in any case, we’ll try to investigate many-font families. |
It seems that Windows matches fonts with all available names. For example, I can call And for those two fonts share a Chinese family name but have different English family names or vice versa, I think it is the script writer's responsibility to choose proper font name used in the script, since those fonts are non-standard and hard to handle programmatically. The behavior of GDI under such condition is not well-known, too. So it will be not easy to emulate that. And the result of |
match_fonts is supposed to add all the fonts with the same family name for fontselect. Commit d325c63 adds only one font, which leads to inability to select (using family name) correct fonts from families with multiple styles.
At any rate, this is an improvement, and there are no substantial complaints, so merging this. Gonna give it a few days in Task to self, or any other interested party with Windows access: create (or find) a bunch of fonts that share family names in various combinations and test how this behaves (and how VSFilter behaves just to be sure). |
Family names, yes. I can find Chinese (with English and two Chinese names) and Japanese fonts (with English and Japanese names) by any family name in GDI/VSFilter and in DirectWrite/libass. Full names, no. In GDI/VSFilter on Russian Windows 7, I can only find bold Arial and Verdana by their Russian full names. Not even by their English full names! More importantly, our latest DirectWrite font provider seems to fail to find fonts by any full name or PostScript name. This is a problem. @Apache553 Any clue as to why this is happening? |
For full names, I have no idea about that. For Postscript names, it's a blind spot since I never use PS names to match fonts (some font contains bad ps names). After some testing, it turned out that DirectWrite would not match any PostScript names. So I think it would be best to use |
I made a patch to this. |
For reference, DirectWrite’s font families are supposedly based on WWS family names (which have an explicit name ID 21, as opposed to GDI [name ID 1] and typographic [name ID 16] family names). But so far I haven’t been able to verify this with a font actually equipped with explicit WWS names. Weirdly though, testing a random copy of Minion Pro Caption (which is used as an example on MSDN to explain WWS names) that does not have explicit WWS names and whose OS/2 table version predates the introduction of WWS names, DirectWrite on Windows 7 is somehow able to tell that it’s not part of “Minion Pro” even though its typographic family name says it is. Font browser shows it as “Minion Pro Caption ”. Opening the font shows “Minion Pro Capt” (its GDI family name). It’s worth noting that none of this font’s Microsoft-platform names are “Minion Pro Caption”, but it can be obtained either as the Macintosh-platform full name (doubtful) or as typographic family + typographic subfamily (more likely… but it gets an extra Regular!). So it seems there’s some kind of WWS name synthesis logic built in. |
As for the patch, FYI: the reason |
To be fair, it probably makes sense! In fact,
Going back to localized names for a moment, Microsoft seems to contradict itself in its
|
Turns out it doesn’t find those fonts that are in .ttc font collection files. Great. |
I suspect that it’s Aegisub’s fault Moving on. |
We need to clarify what name is supposed to appear in the script files(ass/ssa). After checking all script files I have (ass), the I also have done some testing with both As the doc of LOGFONT says: the |
You must be testing TrueType fonts. CreateFontIndirect/VSFilter finds them by family or full name, and not by PostScript name. But OpenType fonts with CFF (PostScript) outlines are found by family or PostScript name, and not by full name. Windows does not seem to come with any such fonts, at least Windows 7 (I can’t check 10 right now), so you may need to find a font elsewhere to test. For example, Source Sans Pro is a free OpenType/CFF font family. (See also:
It’s useless to declare what is “supposed” to appear. People can write and have written anything, and we just need to support exactly what VSFilter supports. Besides, PostScript and full names—when supported—are more exact than family names and hence provide authors better control over the font choice, so it makes sense to advise authors to prefer them. |
I’ve written up all that I know so far (I hope) about the behaviour of the various APIs on https://github.com/libass/libass/wiki/Fonts-across-platforms. Back to the proposed patch: for what it’s worth, it’s basically the same as #502. Which is cool (if a little ironic), but #502 says there are some issues, so we should investigate them. Let’s continue that discussion in #502, I think. But in any case, for maximum foolproof compatibility, it seems we should do
|
You are right, I didn't realize the differences between how GDI handle TrueType and OpenType CFF fonts.
So what we should implement now is:
Do I understand correctly? |
Yes… And now that I think about it, based on the Chrome patch linked in the other PR, we should most likely not use Somewhat unrelatedly, I want to keep DirectWrite family enumeration for #513, although it’s a QOL improvement rather than a compatibility fix. |
There is a way to get raw font data from a single However, if we don't get a |
Right; of course. I continued to read after posting that comment and concluded that the same code should work, but now I’m wondering about Type 1 fonts, which have multiple files.
We do that already in
So much for Microsoft’s spec saying, at the end of this section, that all Microsoft-platform names must be encoded in UTF-16. Welp.
WTF! I’ve obtained a copy and can confirm that’s what it has and it does work in WIndows 7. |
OK now I have read some codes from leaked Windows XP SP1 source. It seems that Windows does following to get names (using freetype names & contants, assuming
|
Oh, I had written a comment but didn’t post it! Here it is:
So it turns out: Microsoft reads Chinese and Wansung (not sure about Johab) Microsoft-platform names as sequences of uint16 (just like all other Microsoft-platform names), but before decoding them using the corresponding code page, it drops any leading zero byte from each uint16. And this (probably) doesn’t even apply to all names. The rest are just UTF-16BE as usual, and so are Japanese names. Funnily enough, while the correct name shows up in the font explorer and the font installation progress popup, when I open the font file, the font viewer shows incorrectly-decoded-as-UTF-16 Macintosh-platform names (mojibake), not Microsoft-platform names at all. Both GDI/VSFilter and DirectWrite/libass do find the font by its correct Microsoft-platform name, though. However, at least for this particular font, finding this font doesn’t help libass render it! libass is unable to map Chinese characters to glyphs in this font, so only replacement-character boxes show up.
I’m not so sure. Win2003 apparently fails the entire font decoding if this fails. Testing 微软繁标宋 with manually tweaked bytes on Windows 7, I haven’t been able to fail the GBK conversion at all. In fact, I’ve been able to make Windows do a buffer overread :-) |
Well, I didn't do many tests on that. But this routine really helped me on reading names for my own project.
According to the code, it seems that Windows restarts the whole decoding as UTF16BE if any decode procedure fails. |
XP code. Win2003 code is different. |
Oh I realized that's different. I downloaded Win2003 code then found that font loading will fail if a required name decoding fails instead of restarting decoding as UTF16BE. That's the point! |
So… testing on one of these days showed that So it seems that, for a truly compatible lookup, our only option is to expand |
By the way, going back to Chinese/Korean font name encodings: to actually handle those fonts, we’d also need to learn to use Chinese/Korean non-Unicode character maps. That is, to convert Unicode code points to those character sets (and the other Windows Asian character sets for good measure). This sounds somewhat painful. |
I have some good news, some bad news and some weird news.
Turns out that I got fooled by another Aegisub bug. I’ve now stopped using Aegisub for testing font lookup.
Unfortunately, it also turns out WinRT/UWP/Microsoft Store apparently forbids/lacks GDI APIs. Luckily, DirectWrite in Windows 10 (all versions) actually includes a new Also, Microsoft’s DirectWrite change log claims it supports fonts loaded on demand by font managers in Windows 10. I’ll poke some people to try and understand better what the actual situation is. Finally, and oddly, DirectWrite in current Windows 10 also finds fonts by full name (any localization) in its Putting it all together, we could:
|
match_fonts
is supposed to add all the fonts with the same family name for fontselect. The previous PR adds only one font, which leads to font with multiple styles cannot be correctly selected.