Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kern pairs between Arabic numerals missing from kern feature #198

Closed
anthrotype opened this issue Jan 5, 2018 · 26 comments
Closed

kern pairs between Arabic numerals missing from kern feature #198

anthrotype opened this issue Jan 5, 2018 · 26 comments

Comments

@anthrotype
Copy link
Member

anthrotype commented Jan 5, 2018

The kerning.plist in the UFOs generated from noto-source/NotoSansArabic-MM.glyphs only contains kerning pairs between Arabic numerals. These are written left-to-right, have special bidirectional class "AN".
Our _glyphIsRtl method correctly return False. So these kern pairs are not moved to the RTL kern lookup.

The problem is the current KernFeatureWriter doesn't output any kerning for them, not even in the LTR kern lookup...

I believe it has to do with this commit, whereby we only output LTR lookup if the ltrScripts list is not empty, or if the latter is empty and the rtlScripts list is empty:
9757f20

But the font's features.fea has languagesytem DFLT dftl; languagesystem arab dflt;, so the ltrScripts list is empty and rtlScripts is non empty.

This is also related:
#112

I'll follow up with a failing test case

@anthrotype
Copy link
Member Author

ping @behdad

@anthrotype
Copy link
Member Author

anthrotype commented Jan 5, 2018

Actually I’m not even sure that the correct place for Arabic numerals is in the kern_ltr lookup.
Maybe they should go to the kern_rtl lookup, despite not being “right to left” as such...
If that is the case, then the bug is in that _glyphIsRtl method which doesn’t treat “AN” bidi class as rigtht-to-left for the sake of splitting kerning pairs.

@khaledhosny could you advise?

@khaledhosny
Copy link
Collaborator

I used to kern my Arabic numbers in LTR direction, but I’m not sure any more. I think HarfBuzz will apply the kerning in RTL direction (the script native direction) while MS implementation(s) will apply it in LTR direction (the text direction). That is one of the big annoying incompatibilities between HarfBuzz and MS implementation(s).

If this is the case (needs testing) then I think we need to duplicate the kerning in both LTR and RTL lookups (this also needs testing to see if each engine will not apply both).

@anthrotype
Copy link
Member Author

more info on the difference between harfbuzz and MS as regards kerning of arabic numbers

harfbuzz/harfbuzz#501 (comment)

i'm not sure what to do...

Khaled suggested to "duplicate the kerning in both LTR and RTL lookups". Do we do that only for Arabic numerals (i.e. bidi class "AN") or for other characters too?

Let me summarise how ufo2ft currently splits the kerning.plist into two LTR and RTL groups, so we are on the same page:

  1. it first checks that any RTL script is defined in languagesystem statements;
  2. if so, it checks for each pair if any glyph is RTL (i.e. has bidirectional class "R" or "AL"), and then pops the pair from the global group and adds it to the RTL group; what is left from the original kerning is by exclusion considered LTR.

From these, it writes two standalone kern lookups, kern_ltr and kern_rtl, and registers the LTR lookup in the default scope of the kern feature block (so it applies to all languagesystems), and if there are any RTL scripts defined, it adds references to the LTR lookup to the LTR languagesystems (why wasn't it sufficient to add it once for all?), and does the same for the RTL lookup and the RTL languagesystems.

Following the above logic, Arabic numerals are being left in the LTR group (because their bidi class is nether "R" nor "AL"). However (from commit 9757f20) the LTR lookup is only written if the languagesystem declaration contains any LTR script (and DFLT dflt is not considered LTR), or if it doesn't contain neither LTR nor RTL scripts. That's why in NotoSansArabic we get an empty kern feature, because the latter has some RTL languagesystem defined, no LTR languagesystem, and the only pairs in kerning.plist happen to be between Arabic Numerals which are not considered RTL according to above definition.

I'd appreciate if anyone could help here.

@anthrotype
Copy link
Member Author

anybody has examples of kerning.plist being used for kerning Arabic letters or another RTL script? I'd like to use for my tests, but can't find much.

@adrientetar
Copy link
Collaborator

cc @typoman

Btw how does glyphs handle the arabic numerals kerning? does it do the "duplicate the kerning in both LTR and RTL lookups" thing?

@anthrotype
Copy link
Member Author

how does glyphs handle the arabic numerals kerning?

Looking at the features.fea generated by Glyphs.app in ~/Library/Application Support/Glyphs/Temp when exporting the Noto Sans Arabic, it appears that the arabic numerals are placed in a kern_DFLT lookup, registered for all langsyses, and with what appear to be left-to-right pos rules (e.g. pos zero-ar six-ar -70).

Although it's not consistent. As soon as I add a right-to-left kerning pair between two arabic non-numbers, and export again, I can see a new kern_arabic lookup, this one only registered for arab script, definitely RTL (the pos rules modify the x advance and x placement of the second glyph, e.g. pos question-ar beh-ar.fina <-20 0 -20 0>). The weird thing is that this new kern_arabic lookup also contains some kerning pairs between arabic numbers! And the pairs involving arabic numbers in kern_DFLT and kern_arabic do not seem to be a duplicate of each other; they actually don't intersect, as if the kerning data was split between the two lookups according to some logic which I still cannot discern.

Another thing I noticed is that, if the font has some kern feature already defined, as is the case for Noto Sans Arabic (it's got some contextual pos rules), then the kern_DFLT is placed above and the kern_arbic below the pre-existing kern snippet. So it's neither "append" nor "prepend" (see #202), but a mix of both...

It would be great if @schriftgestalt could give us some clue as to how Glyphs.app

@moyogo
Copy link
Collaborator

moyogo commented Jan 9, 2018

@belluzj pointed this is related to unified-font-object/ufo-spec#16 (comment)

@anthrotype
Copy link
Member Author

Thanks for reminding me of that link, with all the related links and discussions!

From #8, it seems that @graphicore first, and then @jamesgk (in #45) tried to apply the euristic @behdad suggested here.

I'll write some tests to convince myself that ufo2ft is doing exactly that.

But the conclusion regarding arabic numbers' kerning is a little bit disappointing...

Easiest is to ignore them for now. Or if you wish, remove them from the kern pairs. I can't recommend one way or another as the "right" way at this time.

That would mean scrap the entire kerning.plist in NotoSansArabic, which is only made of pairs between Arabic figures.

@behdad
Copy link
Collaborator

behdad commented Jan 9, 2018

But the conclusion regarding arabic numbers' kerning is a little bit disappointing...

Indeed. For reasons that I describe in:
harfbuzz/harfbuzz#501 (comment)

Let's also talk about that issue. Khaled, any suggestions?

@schriftgestalt
Copy link

Kerning between Arabic numerals is fine. What can’t be done is kerning between letters and numerals. There is a layout direction change between them and that means numbers and letters are layer out separately. Things get interesting with punctuation. It has no inherited direction so there is some heuristic to determine if they are RTL or LTR. Things like the default script and what letters are next to it play a role. But you never know what will happen.

Glyphs sorts the pairs by the script of the first glyph. So arabic numbers are LTR and thous end up in the default group (with all the latin numbers). Most cross script pairs are ignored so arabic letter + arabic numbers should be ignored but you already saw that it is complicated...

@anthrotype
Copy link
Member Author

anthrotype commented Jan 17, 2018

I'm thinking that when a font contains both LTR and RTL kerning, we should divide the pairs into three main groups:

  1. pairs between "strong" LTR characters (bidirectional type "L"), or between "L" and characters with "weak" or "neutral" bidirectional type

  2. pairs between "strong" RTL characters (bidi type "R" or "AL"), or between the latter, on the one hand, and characters with "weak" or "netural" bidi types, on the other;

  3. pairs between characters without a strong L or R|AL directionality (e.g. spaces, punctuation, etc. including the arabic figures "AN").

Then we produce two kern lookup:

  1. one lookup kern_LTR only registered for scripts (and all their languages) that are left-to-right; this will have the kerning values written as usual (value record format A in FEA syntax): i.e. pos y semicolon -10. It will contain all the pairs from the strong- or mixed-LTR group (group 1), as well as the pairs from group 3 (weak/neutral).

  2. another lookup kern_RTL only be registered for scripts that are right-to-left; the kerning values are written as value record format B: pos six-ar four-ar <-20 0 -20 0> (i.e. modifying the x placement and x advance of the first glyph). This will contain all the pairs from the strong-RTL and mixed-RTL group (group 2), as well as the weak/neutral pairs from group 3.


This is what the current ufo2ft kern writer does: if a RTL script is present in languagesystems statements, separate from the generic kerning dictionary all the RTL pairs (defined as "if any character in the pair has strong R or AL bidi type", thus excluding pairs between arabic numbers), then make a separate RTL lookup (with format B value records) and only register it under RTL scripts; the remainder (anything not RTL as defined above) is registered implicitly for all the languagesystems.

Any comments are highly appreciated, thanks.

@anthrotype
Copy link
Member Author

ok, what I'm saying is exactly the same that Behdad wrote here

Exclude from RTL kern table all glyphs associated with Unicode characters that have Bidi_Type=L,
Exclude from LTR kern table all glyphs associated with Unicode characters that have Bidi_Type=R or Bidi_Type=AL.

@typoman
Copy link

typoman commented Jan 17, 2018

Sorry to join late. RTL/RTL works in HarfBuzz, LTR/LTR Arabic numerals doesn't work, although it works in CoreText. I remember LTR/LTR Arabic numerals worked in inDesign too despite its bad reputation, can't confirm. LTR/RTL is better to be put in a RTL lookup, because punctuation and Arabic letter is an example although I can't seem to find a case that a LTR Arabic numeral kerned to an RTL letter works in either CoreText or Harfbuzz atm, considering that the positioning is inside a RTL lookup. Below is the structure of a sample feature file I write.

feature kern {
lookup kern_lookup_noflag_1 {
        pos haft hasht -70;                                                        # glyph, glyph
} kern_lookup_noflag_1;
lookup kern_lookup_nomark_rtl_1 {    
    lookupflag IgnoreMarks RightToLeft;
        pos haft reh.isol <-140 0 -140 0>;                                         # glyph, glyph
        pos reh.isol alef.isol <-135 0 -135 0>;                                    # glyph, glyph
        pos reh.isol haft <-140 0 -140 0>;                                         # glyph, glyph
} kern_lookup_nomark_rtl_1;

} kern;

haft and hasht are LTR Arabic numerals, reh.isol and alef.isol are Arabic RTL letters.

@typoman
Copy link

typoman commented Jan 17, 2018

Maybe an issue should be opened on LTR/LTR Arabic numerals kerning that doesn't work in Harfbuzz?

@schriftgestalt
Copy link

My experience is that the pair should be in the lookup that fits the first letter. So ‘pos one-ar alef-ar 20;’ would need to go in a LTR lookup.
But actually having the two glyphs next to each other visually and in the character string is very unlikely.

@khaledhosny
Copy link
Collaborator

Just to summarize the situation:

  • Kerning between glyphs with different directions does not work in the current OpenType model since text is processed as single direction (and script and language) runs and no interaction happens between the runs. So LTR/RTL or RTL/LTR kerning (or any OpenType lookup for that matter) does not work by design. pos one-ar alef-ar 20; can’t possibly work in any OpenType implementation I’m aware of (unless the whole string of text is forced to be LTR or RTL which makes no sense).
  • Kerning between Arabic numbers should work LTR since their direction is LTR, and it works in Uniscribe/DirectWrirte this way, and possibly in Core Text but not in HarfBuzz, which is a known limitation, see Behdad comment above kern pairs between Arabic numerals missing from kern feature #198 (comment).
  • Duplicating Arabic numbers kerning might not be ideal as a pos one-ar two-ar 20; duplicated as pos two-ar one-ar 20; means ١٢ and ٢١ will kerned by the same amout in all implementations which might not desired (that is like kerning af the same as fa).

I don’t obviously have an answer to this incompatibility other than for HarfBuzz to find a way to handle this in a compatible way. As for ufo2ft I guess it should just kern the Arabic digits LTR and want users this will give the wrong result in HarfBuzz and let them decide if dropping the kerns entirely is a better trade off (by editing the fonts obviously, ufo2ft does not nee to be too smart here).

@khaledhosny
Copy link
Collaborator

BTW, Arabic numbers kerning in HarfBuzz works if you set the script to latn, so that is might be one way its clients can work around this incompatibility (and hope that lookups involving numbers are not limited to arab script).

@khaledhosny
Copy link
Collaborator

Also, lookupflag RightToLeft; has no effect on pairwise or any kind of lookup other than cursive attachments where it determines the order of start and end anchors.

@typoman
Copy link

typoman commented Jan 18, 2018

I don't think dropping the LTR/LTR Arabic numerals pairs would be a solution just because Harfbuzz doesn't do it, since most of the text engines do shape it correctly. In my experience Arabic numerals need kerning especially this pair: ۷۸.

@khaledhosny
Copy link
Collaborator

Depends on your audience, HarfBuzz might be what the majority of your users are using so it doesn't matter whose bug it is until it is fixed.

Anyway, I’m not suggesting ufo2ft drop anything, just warn it's users i.e. the type designers and let them decide for themselves what they want.

Personally I always default to tabular figure because they work in more situations than proportional ones, so digit kerning is limited to a non-default case making it a not so important issue.

@behdad
Copy link
Collaborator

behdad commented Jan 19, 2018

I definitely want to fix HarfBuzz to do similar to Uniscribe. But I want everyone to understand the issue completely: the LTR kern pair for digits GO INTO THE RTL KERNING TABLE! That's the problem.

@typoman
Copy link

typoman commented Jan 28, 2018

Ok I want to understand this correctly. Does it mean typical compilers like FDK puts the digits in the RTL table, so it's better that the compilers change?

@schriftgestalt
Copy link

If the number kerning is in a RTL lookup, then is doesn't work in Indesign and Safari. In a LTR lookup, it works as expected.

Chrome applies the kerning to the wrong pair. So, if I have the text ٧٨٧ and kern the seven to the eight, Chrome applies the kerning from the RTL lookup to the second pair, as if it had revered the string. I tend to think that Safari and Indesign is correct, as least that makes the most sense to me.

@anthrotype
Copy link
Member Author

I just realised that while fixing the problem of LTR Arabic numerals, I have overlooked the rest of the RTL scripts that use the common digits, like Hebrew. The new KernFeatureWriter only creates two LTR and RTL lookups, and register the former only to the LTR scripts, and the latter only to the RTL scripts. Which means that digits (whose script is "Zyyy", i.e. Common) will end up in the LTR lookup, and kerning between them will not display when, say, Hebrew is active.
I think we may need a third DFLT lookup that is registered for all the scripts, and which contains only the kerning pairs between "Common" glyphs (associated with Zyyy script code, like digits).
This third DFLT lookup will only be generated when we do the split (i.e. when there are any glyphs in the kerning which are associated with a globally RTL script).

anthrotype added a commit that referenced this issue May 24, 2018
This 'kern_dflt' lookup includes all the pairs for the neutral-direction
glyphs (Common or Inherited script), which as such do not belong to
neither the LTR scripts' group, nor the RTL one.
The lookup is included in all defined languagesystems.
If we are not splitting up LTR/RTL lookups, then we do not need to
create a separate kern_dflt lookup.
Note that this lookup is only registered for the kern feature, and
does not affect the dist feature (if any).

#198 (comment)
@anthrotype
Copy link
Member Author

ok I think this should be fixed now.. Phew.
Please test and let me know if it doesn't work as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants