Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate unexpected vsfilter behavior with \h on certain font #706

Open
rcombs opened this issue Sep 10, 2023 · 15 comments
Open

Investigate unexpected vsfilter behavior with \h on certain font #706

rcombs opened this issue Sep 10, 2023 · 15 comments

Comments

@rcombs
Copy link
Member

rcombs commented Sep 10, 2023

Sample script + font:
GosmickSample.zip

Lines in question:

Dialogue: 0,0:16:58.62,0:17:04.12,Song Insert Romaji,,0,0,0,,{\fad(200,200)\k33}A{\k22}ko{\k34}ga{\k35}re {\k32}no {\k23}se{\k14}ri{\k20}fu {\k68}o {\k51}"\h\h\h" (kak{\k39}ko) {\k34}ni {\k37}i{\k17}re{\k20}te {\k17}mi{\k54}you
Dialogue: 0,0:16:58.62,0:17:04.12,Song Insert TL,,0,0,0,,{\fad(200,200)}Let's try putting aspirational words in the "\h\h\h"

Reported MPC/vsfilter output:
image

So, uh, it seems to be rendering \h (i.e. U+00A0/nbsp) with a, which is in the .notdef glyph slot? What?

The libass behavior here is sane (we render whitespace as expected), and I haven't seen any indication that anybody has deliberately relied on this behavior, so this might not be worth changing, but I figure it's at least worth understanding and documenting.

@frozenpandaman
Copy link

Following since this is my screenshot + the error I ran into. Thanks for making the issue!

@TheOneric

This comment was marked as outdated.

@astiob
Copy link
Member

astiob commented Sep 10, 2023

The Encoding is 1 in the sample archive. A space is displayed when the font is not installed, but indeed “aaa” is shown in XySubFilter when it is. I haven’t found an explanation yet.

@TheOneric
Copy link
Member

Apologies, I somehow missed that the full script and not just the font is attached.

@astiob
Copy link
Member

astiob commented Sep 10, 2023

Fallback works normally (for \h, Cyrillic, hiragana, and code points in the Latin Extended-A block that don’t have dedicated glyphs in the font) in runs rendered by Uniscribe (e. g. if hiragana is included), but absolutely everything (\h, Cyrillic and Latin alike) is displayed as .notdef in runs rendered by GDI.

@astiob
Copy link
Member

astiob commented Sep 10, 2023

Except that in another anomaly, I wanted to double-check that it’s using Uniscribe by verifying that I see kerning, but it doesn’t seem to be applying any kerning at all! But it should, per #237… According to fontTools’ TTX, the font has a version 0 kern table with a single format 0 subtable with coverage 1 (“horizontal”), which Microsoft says is supported on Windows 🤯

@astiob
Copy link
Member

astiob commented Sep 10, 2023

Resaving from FontForge with the kerning additionally saved to GPOS confirms that kerning works and thus those are, indeed, Uniscribe runs. I’m surprised that kern wasn’t enough, but it’s probably irrelevant to this issue; I’ll just make note of this observation in #237.

@astiob
Copy link
Member

astiob commented Sep 10, 2023

I’ve spent this hour transplanting bits and pieces from the font in #42 to the font here in hopes of finding which particular piece is preventing fallback/substitution, but nothing has helped so far :-(

@astiob
Copy link
Member

astiob commented Nov 8, 2023

Finally managed to get the font to let \h fall back to another font:

It happens when I remove U+0022 QUOTATION MARK (and not any other particular code point, although I haven’t exhaustively tried each) from the font’s Windows cmap 🤯

I truly have no idea why this is. At first glance it might seem related to the quotation marks surrounding the \h in the ASS, but the behaviour (fallback or no fallback) stays the same even if I remove the quotation marks from the ASS.

@astiob
Copy link
Member

astiob commented Nov 8, 2023

My gut feeling is that perhaps GDI probes some hardcoded list of code points (which includes U+0022) and if it sees glyphs for all of them, it blindly assumes the font supports a whole subset of Unicode (which includes U+00A0) and skips font fallback for any code points in that subset. And it’s weird, because like, don’t ASCII-only fonts exist? Why would you assume that the non-ASCII NBSP is supported?

In fact, I’m able to produce similar behaviour by removing all code points except the quotation mark and the Latin letters:

  • quotation mark, all uppercase & lowercase letters present: GDI uses this font and renders everything else (spaces and apostrophes) as .notdef
  • quotation mark, all uppercase letters present (but not lowercase): GDI uses this font, renders the lowercase letters as .notdef, but uses a substitute font for \h
  • further remove Y: same as above
  • remove X instead (even if Y is kept): GDI rejects this font entirely and renders everything in Arial

@astiob
Copy link
Member

astiob commented Nov 8, 2023

  • quotation mark, all uppercase letters present (but not lowercase): GDI uses this font, renders the lowercase letters as .notdef, but uses a substitute font for \h
  • further remove Y: same as above

And now I can’t reproduce this! It worked just a few minutes ago! Now it’s rejecting the font and using Arial instead. I’ve been restarting MPC-HC between tests to clear any possible per-process caches, but maybe that’s not enough? Or did I make a mistake somewhere?

Meanwhile, a font with full sets of both uppercase and lowercase letters but without the quotation mark (or any other code points) is accepted, and \h uses a substitute font.

@astiob
Copy link
Member

astiob commented Feb 22, 2024

Finally managed to get the font to let \h fall back to another font:

It happens when I remove U+0022 QUOTATION MARK (and not any other particular code point, although I haven’t exhaustively tried each) from the font’s Windows cmap 🤯

As noted in #237 (comment), U+0022 is actually among the four magical-constant code points that cause GDI to switch to Uniscribe if the font lacks glyphs for them. So the fallback works in this case because the string is rendered by Uniscribe, just as in the other Uniscribe cases in the earlier test above.

@astiob
Copy link
Member

astiob commented Mar 1, 2024

Discovery of the century incoming…
You know what, I feel silly. This is so simple and stupid; how did we spend years not realizing this?

GDI doesn’t do inline font fallback. At all.

GDI uses font linking—and nothing more.

The Uniscribe-based code path does inline font fallback, and its fallback choice can differ from GDI’s font linking.

My proof:

  • Microsoft’s docs (which gave me the idea):

    • This mentions:

      ExtTextOut will use Uniscribe when necessary resulting in font fallback. The ETO_IGNORELANGUAGE flag will inhibit this behavior and should not be passed.

      implying that without Uniscribe, font fallback doesn’t happen.

    • This mentions font linking together with GDI, whereas for font fallback, it mentions .NET and Uniscribe but not GDI.

      It also mentions:

      Font linking […] can be used [to] prevent […] text from being displayed as a default glyph (called tofu).

      implying that without font linking, undefined glyphs will indeed be displayed as tofu, not fall back to other fonts.

      (It does talk of how font linking “takes priority over font fallback”, but the way it is described in the next sentence makes me think that perhaps this is meant to say “font substitution”, which is GDI’s mechanism for defining font aliases, described further down the page, or perhaps this means something else completely.)

  • Targeted test (after reading those docs):

    • Arial has no font linking defined on my machine. Arial has a surprisingly wide glyph coverage, but it lacks a glyph for U+2025 TWO DOT LEADER in the General Punctuation block and it lacks Japanese glyphs. Arial has GPOS kerning, which makes it easy to tell when Uniscribe is activated by including the heavily-kerned string “WAT.” in the test. As this test reaffirms, General Punctuation isn’t treated as a “complex script”, but Japanese kana is.

    • Tahoma has a long list of linked fonts on my machine, the first of them being MS UI Gothic. Tahoma also has a surprisingly wide glyph coverage, and it does have a glyph for U+2025 TWO DOT LEADER, but it lacks a glyph for U+2196 NORTH WEST ARROW in the Arrows block. On the other hand, MS UI Gothic does have a glyph for that arrow. The Arrows block isn’t treated as a “complex script”. Tahoma doesn’t have kerning outside of Arabic.

    This ASS:

    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnArial}Arial アリアル ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnArial}Arial ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma アリアル ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma/MS UI Gothic ‥{\fs79.53398\fnMS UI Gothic}↖{\fs96\fnTahoma}WAT.
    

    (where the \fs for MS UI Gothic is calculated from Tahoma’s \fs and both fonts’ metrics in such a manner that the em size in pixels stays constant, as documented for font linking)

    displays:

    A screenshot of the above ASS.

    As we can see, kerning is applied in the bottommost line (with the Japanese) but not the one above it (without the Japanese), so one is rendered by Uniscribe and the other by GDI itself. The two glyphs absent from Arial are shown in the Uniscribe rendering but replaced by tofu in the GDI rendering. In the Tahoma lines, all glyphs are visible but the arrow uses different glyphs in Uniscribe and in GDI. GDI’s glyph exactly matches the explicitly-requested MS UI Gothic glyph, so it must be that GDI is applying font linking whereas Uniscribe is applying font fallback that isn’t based on font linking.

@astiob
Copy link
Member

astiob commented Mar 2, 2024

It does talk of how font linking “takes priority over font fallback”, but the way it is described in the next sentence makes me think that perhaps this is meant to say “font substitution”, which is GDI’s mechanism for defining font aliases, described further down the page, or perhaps this means something else completely.

Turns out Uniscribe proper actually allows enabling and disabling both font linking and font fallback, but GDI’s various entry points configure it differently. TextOut (which VSFilter uses) enables font fallback, whereas GetCharacterPlacement (which VSFilter doesn’t use) doesn’t. My first thought was that perhaps this text means that if Uniscribe is called with both flags set, then linking indeed takes precedence over fallback, but some reverse engineering suggests that all of these paths enable linking, and yet it isn’t happening.

@astiob
Copy link
Member

astiob commented Mar 2, 2024

Uniscribe proper actually allows enabling and disabling both font linking and font fallback

Well, further tests suggest that while the API allows this, SSA_LINK actually does nothing. So we’re back to what I said earlier: modern Uniscribe doesn’t do font linking. I’m not sure whether it’s me or Microsoft* that is doing something wrong, but this is the effective behaviour I’m seeing.

* Wouldn’t be a huge surprise, seeing as e. g. GetCharacterPlacement actually completely ignores GCP_USEKERNING and always applies Uniscribe’s kerning except in ancient Windows without an extended language support pack**. This thread suggests SSA_LINK did do something 18 years ago, so it could easily be that Microsoft has changed something and rendered it ineffective since then.

** Yes, this whole GDI/Uniscribe business depends on whether the language support pack is installed. That is, the delegation to Uniscribe only happens when the pack is installed. Fun. I haven’t been able to find conclusively what this pack is, but my guess is it’s one of (or perhaps a DLL included in both of) the optional installs available via checkboxes in old Regional Settings (for those who don’t know: [1], [2], [3 with pictures]). My hopeful understanding is that it’s been included unconditionally since Vista, but it was still optional in XP. And XP was still popular in the early softsub era. Welp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants