-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support "dist" feature for indic fonts in KernFeatureWriter #176
Comments
it would be nice to have this PR cleaned-up/merged before extending the feature writers #156 |
Are there actually any applications that support Indic scripts but turn off kern feature by default? |
I haven't tried, but I guess it's the usual suspects.. MS Word and the like. |
Yes. kern feature is not part of OT Indic specs. |
Do we want to extend the functionality of the existing KernFeatureWriter so that whenever it finds any "indic" glyphs in the font it splits the kerning.plist data into two sets, one to build the "dist" feature and the rest to build the normal "kern" feature? Or do we want to devise some way for clients (fontmake) to override the default choice of KernFeatureWriter and have ufo2ft use a DistFeatureWriter implementation that does both kern and dist? |
For the question above, I prefer option 1) a single KernFeatureWriter that does both "kern" and "dist" by default. Another big problem is how do we define the "Indic-ness" of a glyph in a kerning pair? There are actually two related problems here:
I looked at the "Script-specific development" section on the Microsoft website, and collected all the script tags where "dist" is mentioned among the recommended features: Please tell me if this is correct or if you would like to add or remove from it:
Any comments or suggestions would be appreciated |
Eventually there will also be the Indic-3 tags so I supposed you could add those, but not a big deal at the moment. I don't know about Buginese, Javanese or Myanmar. At least for Noto we aren't using dist. Khmer also doesn't necessarily need it. I've used it just for adjustments, but the main kerning is still in kern. @ohbendy What do you think? Do you expect dist for Myanmar or Khmer or anything else? @schriftgestalt Which scripts does Glyphs apply dist to? /cc @kalapi |
I'm no expert in the engineering side, but dist is very helpful in the Southeast Asian scripts. I consider kerning as more to do with evening out the gaps, for improving the appearance of a glyph sequence, while dist is more about composing clusters so that components are positioned correctly which would otherwise not be correct or readable. We've used dist extensively in our latest Burmese fonts. I can also imagine it also being used in Thai or Lao to shift clusters starting with โใไ/ໂໃໄ that follow clusters with abovemarks. I gather in scripts like Telugu, where marks need to be anchored to a base, but also need to have an advance width for their post-base part, the dist feature is used. I think it's useful to consider and implement these kinds of adjustments in a different way than normal base-to-base kerning. @NorbertLindenberg would be able to expand. |
Thanks Zachary and Ben.
we can add these later once they are fully spec'ed.
For the scripts you mentioned, I believe that the corresponding Noto fonts all use MTI feature files instead Adobe's FEA, so they shouldn't be affected by this.
I see. So does that mean one would like to be able to decide which part of the kerning pairs should be written out as "kern" feature, and which other part should compose the "dist" feature, instead of everything in either one or the other depending on the script tags?
I could also add Thai and Lao to the list. Myanmar and Telugu are already present. About the problem of how to split the data contained in groups.plist and kerning.plist into kern- vs dist-related. Or, yet another way could be to check for the presence of the script name (or 4-letter iso 15924 code or opentype script tag) in the kerning groups' names, and use that as a way to split up the kern vs dist "kerning". Of course any approach that rely on kerning classes' naming conventions can't fully account for the glyph-to-glyph kerning pairs (unless these are "exceptions" to class-based kerning). For these we would have to guess from the glyphs' codepoints or glyph names (or again, the lib key with script property). |
(Can’t but wonder if it is time to have a more rich kerning model in UFO that does not lump everything in one big table and is not limited by what some ancient versions of FontLab were capable of doing). |
@khaledhosny Any specific proposal you have in mind? |
From a usage point of view, I tend to think of kern as improving the spacing between base glyphs, while dist is primarily used to avoid collisions between or with above- or below-base marks, which in Brahmic scripts are often wider than the bases they sit above or below of. Other people may use them differently. The rules in dist features are very often contextual and only look at selected marks (e.g., only above-base or only below-base) – I don’t know whether those can be generated automatically.
The list of script tags is missing all the ones listed in the Universal Shaping Engine specification: The list is correct in omitting Lao and Thai (and pre-Windows 10 Tibetan) because their specification doesn’t have dist. |
I'm even more confused now. What Norbert said about kern being about improving spacing between base glyphs vs dist about avoiding collision between or with marks, and often contextual, does not fit with the way (as I understood it) Glyphs.app auto-generates the dist feature for Indic scripts. Glyphs.app automatically creates (or appends) a "dist" feature whenever there are kerning pairs between glyphs that are classified as "Indic" (according to a list of scripts which I still haven't completely figured out). Whereas any other kerned glyphs are included in the normal "kern" feature. As simple as that. (The main reason I'd like to extend the kern feature writer in ufo2ft is to be able to match Glyphs.app behaviour, as some Indic Noto fonts have been engineered using FEA (instead of MTI features) inside Glyphs.app, and they rely on these Glyphs.app-specific automatic features) I don't have much expertise in making fonts for Indic scripts, that's why I'm asking for help. If Glyphs.app users and Indic fonts experts are happy with the way it auto-generates the dist feature for them, then this would be an argument for matching this behavior in our fontmake pipeline.
Does that mean I should consider all the scripts listed at the end of that page for this auto-generation of a dist feature (as defined above, as alternative to the regular kern feature)? (I see stuff like N'ko in there.. Is that considered an "Indic" script?) |
Also, I don't understand what @behdad means by
If I look at, for example, https://www.microsoft.com/typography/OpenTypeDev/kannada/intro.htm, I see that "kern" is listed among the positioning features alongside "dist", "abvm" and "blwm". Both "kern" and "dist" are characterized in those documents as intended to "adjust distances", the only difference is that "dist"
|
As you say, base-to-base spacing adjustments can be handled easily by just kerning, which can be done graphically in Glyphs now. As Norbert says, marks often need to trigger spacing adjustments, and these are likely to be contextual because different bases and marks are usually different shapes/widths. Currently Glyphs does not offer a way to preview other spacing adjustments, or write a dist feature for those, but things are often being updated so I wouldn't necessarily think it's a meaningful criterion (I mean, yes, we write them manually now, but that seems like something likely to be improved). I'm not sure of the reason why Glyphs would put all kerning (including base-to-base pairs) in dist for all Indic scripts, it's not the way I would do things, especially considering your observation that Kannada does include kern. I would not be happy for all kerning in a Burmese font to be moved into the dist feature behind the scenes.
The specifications are not always very comprehensive...does it mean that dist will not be activated in those scripts even if a font has the feature? If so I think I'd like to suggest to the specification-writer (who is it?) that it should be possible (though not essential) to use dist in Thai and Lao. |
I think there are two issues here:
The reason I started this whole issue was to get the same output that we get for some Indic scripts (like Kannada, Gurmukhi, Gujarati) when exporting from Glyphs. The use of kern vs dist in many situations can be personal preference. For a simple PairPos lookup it hardly matters which feature it's under. When looking at the 1st point the theory of what kern or dist are intended for or how somebody might use them is irrelevant. We want to match Glyphs (whether Glyphs is doing it correctly or not is another topic, but there's a lot of feedback directly to Georg and on the forums so I think it's looking pretty good in general). The reason for this issue and the reason Glyphs creates a dist feature is because for Indic scripts dist is expected to be on by default in apps like MS Word. kern, on the other hand, needs to be activated. For the general user who doesn't know these things they expect to open up an application and see text rendered correctly. It's pretty well known that dist is on by default and therefore desirable to use for adjustments required for correct shaping. I can give plenty of examples where dist is used in various scripts. Related to the 2nd point above, one thing I do have a huge problem with is the automatic generation that Glyphs does. Intent cannot be automated. There should be more control over how the user decides to use these features. This is the main reason I have never used Glyphs generated OpenType tables. I write my own because I don't like how Glyphs autogenerates everything. If I want kern for this and dist for that then I decide. The same goes for more complex tables like chained contextual lookups for dist. For some users the default is fine, though.
Norbert is absolutely right about the general use of dist. Using it instead of kern for Indic scripts is kind of an exception to that rule because dist is on by default and is necessary for correct shaping.
I don't think USE shaped scripts are necessarily relevant here except for the Indic 3 script tags, but if Glyphs is also generating dist for the others then I think we should do the same here. Again, my main goal here is matching Glyphs. If a designer works in Glyphs and generates with Glyphs and tests and then delivers and the pipeline outputs a totally different font then I'd say that's an issue with the pipeline. However, I 100% agree with the idea of not tying this so tightly to Glyphs because using Glyphs is not at all a requirement for using fontmake or ufo2ft. |
@adrientetar, off the top of my head: ability to control feature tags, how kerning is split into lookups, their order, language systems, and so on. Right now you either you have nearly no control over how the kern feature will be written, or abandon UFO kerning completely and write feature code manually (and keep it up to date as glyphs are added, removed, or renamed). |
You are right. Maybe they were added later. I remember they not being there . What was the original bug report? Did anyone observe something not working, or just our output being different from Glyphs? |
Let's just say that the Microsoft specifications aren't really very accurate ;) |
However, I've also been thinking about an alternative way to assign all these Unicode character properties such as script, category, bidi type, etc. to every glyphs in a UFO, especially those without unicode codepoint. The Glyphs.app way is to define a global database which maps predefined glyph names (including alternative equivalent names) to a set of Unicode character properties; if a font uses these "nice" names, the latter are looked up automatically in there; in addition, the user can override some of these on an individual glyph basis, and the overrides are stored in the .glyphs source file, under respective glyph properties like "script", "category", etc. However, these are properties of unicode characters, not of glyphs as such. They are only assigned to glyphs indirectly because of their association with some unicode character, either via their What I'm thinking is, instead of storing each of these properties separately as key/value pairs in a UFO GLIF's What do you guys think? Am I missing something? Can you think of a case where one would like to assign any of these unicode properties to a glyph independently without also associating the glyph to a unicode character? I can't. |
(perhaps one may argue that even that is redundant because technically it could be derived from the substitution rules defined in the |
@anthrotype the only objection that comes to mind is that the Unicode category for some characters is not necessarily "correct" for how OpenType deals with that character. So pointing a glyph to its associated character as represented in the Unicode Character Database will not give the desired category. Mostly this has to do with the various types of marks like non-spacing, spacing, spacing combining. That's usually where things can go wrong. For example, a character defined as a Spacing Mark or Spacing Combining mark might need to be classified as a Letter to allow spacing. I think in general your idea will work, especially for script, bidi etc, but category/subcategory is where I've always run into problems because the GDEF only understands 1: base, 2: ligature, 3: mark and the Unicode database has way more categories that don't always fit nicely into that. |
I'm not certain whether I'm understanding correctly so please disregard if I've got the wrong end of the stick. Certain Unicode characters in Burmese are marks by default, but glyph alternates are spacing letters. So the spacing/letter alternate glyph would need to have different properties from the nonspacing/mark default character. And if a glyph is a ligature of a letter and a mark, which category should it inherit? That needs to be up to the user, sometimes depending on the design. |
@ohbendy You are understanding correctly. We are basically saying the same thing: it can't all be automated or guessed. |
Thanks guys. I had not considered that. So I guess, we do need to allow users to override category/subCategory on a glyph basis, even when a unicode codepoint is assigned. But I feel a bit uneasy with allowing to override properties like script or bidirectionalilty for glyphs that do have a unicode codepoint, and cannot be understood in a different way. One shouldn't be able to say this glyph with codepoint 0061 has script "Arabic"... |
also "garbage in, garbage out". If that glyph with codepoint 0061 has a wrong script name, it will end up in the wrong lookup, but then the user is to blame, not the tool. |
I think this issue is now fixed by #255. |
Some Noto fonts for Indic scripts don't have explicit MTI sources for the opentype features, but have instead self-contained *.glyphs sources with FEA code, and these rely on a Glyphs.app's feature that splits the kerning data at export time between a regular "kern" feature, and a "dist" feature only for kern pairs between Indic glyphs.
googlefonts/glyphsLib#223
Indic scripts expect "dist" because the latter is usually on by default, whereas kern isn't so.
I think ufo2ft's KernFeatureWriter should do this by default as well: i.e. emit not just "kern", but also "dist" for Indic glyphs, if any.
The argument that this would be a Glyphs.app-only feature may be true, but:
Whether we extend the existing kern feature writer or add an additional dist feature writer subclass is not so important, as long as we agree this is needed here.
Georg said he has heuristic to determine what's Indic or not, probably by splitting ligatures at "_" and dropping "." suffixes. We could do a similar thing (I think we already do to determine whether kern pair is left-to-right or right-to-left).
Also, in *.glyphs source one can assign custom script property to a glyph (the default is given by the GlyphData.xml database used).
In UFO, there's isn't such thing. So I was thinking of defining a private UFO glyph lib entry that specifies a custom script (as well one for category, which could be useful for other things) for glyphs that either don't have a unicode assigned, or they do have one but the user may want (for whatever reason) to override the default script value for them.
Comments?
The text was updated successfully, but these errors were encountered: