Verse numbers of "KFGQPC Uthmanic Script HAFS" are not shaped right #501

ebraminio · 2017-06-23T21:29:35Z

Filed from this mailing list topic as a possible bug, steps to repro:

Download the font from http://fonts.qurancomplex.gov.sa/?page_id=42
./hb-view UthmanicHafs1\ Ver09.otf --direction=ltr ١٢

Actual:
21

Expected:
12

(this works if we issue ./hb-view UthmanicHafs1\ Ver09.otf --shapers=coretext --direction=ltr ١٢ on macOS and perhaps uniscribe, and with other fonts)

The text was updated successfully, but these errors were encountered:

khaledhosny · 2017-07-11T08:19:34Z

I think this is because HarfBuzz processes the characters in the “native” direction of the script, so Arabic-Indic digits are processed from right-to-left so gives the wrong ligature in the font. I think that is one of the major remaining differences between HarfBuzz and Uniscribe. I stumbled upon this issue years ago and after Behdad’s explanation I managed to come up with a workaround in my font, but I wasn’t using ligatures so it was doable unlike here.

ebraminio · 2017-10-02T17:23:22Z

Thanks for you insightful comment @khaledhosny,

@behdad, I remember you wanted to put your comment here also, perhaps it would be useful for the future developments of the project and other collaborators.

behdad · 2017-10-04T14:44:41Z

Here's full details of this issue:

In Unicode, one can force-change direction of text using RLO and LRO characters. HarfBuzz wants to support this correctly. Ie. if you have <RLO,e,f,f,i> you should get "i,f,f,e" rendered. If you try with Uniscribe, you get "ffi,e", ie. it wrongly forms the 'ffi' ligature. Ie. it processes lookups in whatever input text order was there. At least it used to work this way two years ago.

Now, Uniscribe does handle this correctly in their Arabic and I suppose Hebrew shapers. In HarfBuzz I want to support it correctly all the time. The way I do this is to have a per-script "native" horizontal direction, which reflects the natural order of the script. When hb_shape() and requested shaping direction is reverse of the natural horizontal direction of the script, we flip the buffer first such that text becomes in natural order before we apply the lookups, because the font designer designed lookups for the natural order.

Now, when I do the above, I do it just based on requested direction and script. Which means, I cannot distinguish between <LRO,Lam,Alef> ie. a forced non-natural direction, and <660,661> ie. natural, left-to-right, numbers in Arabic. As such, there's no way for HarfBuzz to apply lookups in the natural direction in both of these cases, short of looking at the input buffer text and if it starts with digits then assume a LTR natural direction...

While I would hate such a solution, I'm open to implementing it. We should go back to Unicode bidi spec and see what sequences of characters can naturally become LTR in otherwise RTL scripts. Note that even if we do this, kerning / ligature of digits and weak punctuation still will get confused. Eg, if you happen to have kerning between left-parenthesis and digit zero, it's not clear which direction the lookup should be applied...

Now, for years I was puzzled as to how Uniscribe gets this right. After some very heated exchanges with PeterCon, Sergey Malkin, and Dwayne Robinson, I finally found out. Sergey gave me the lead: they use a different script code for Arabic digits, such that they can differentiate later, and choose LTR natural order for Arabic digits and RTL natural order for rest of Arabic. This, obviously is out of scope for HarfBuzz as we don't do the itemization and I don't want to copy the different-script hack either.

My plan to fix this properly is to encoded intended-direction of the lookup in LookupFlags. I've talked about it for years. Will try to flesh it out and write the proposal before end of year.

behdad · 2018-01-05T16:59:33Z

CC @anthrotype

khaledhosny · 2018-01-18T11:16:39Z

I’m just wondering, can’t we solve this somehow by special casing numbers (based on the general category of the characters I suppose) and give them a LTR native direction always? Is the native direction a property of the whole buffer?

behdad · 2018-01-19T04:30:11Z

Is the native direction a property of the whole buffer?

Yes, and that's the problem. The best we can do though is indeed detect digits and use LTR native. That will screw up any punctuation stuff in the buffer, but that's a smaller issue.

What would be a nice exercise is to see what neutral characters can end up in a LTR run adjacent to other neutral characters... But then again, doesn't really matter.

khaledhosny · 2021-06-16T09:51:08Z

Firefox (cc @jfkthame) seems to be trying to workaround this in https://bugzilla.mozilla.org/show_bug.cgi?id=1716029, but I’m really concerned about it as it will mean HarfBuzz clients will be behaving differently.

Right now I’m working around this issue by using the private BUZZ feature to map the digits to an alternate version and handle them differently so that HarfBuzz and non-HarfBuzz engines work the same, but with the workaround in Firefox it will give the wrong output in Firefox without a way to handle it in the font.

jfkthame · 2021-06-16T10:26:03Z

Firefox (cc @jfkthame) seems to be trying to workaround this in https://bugzilla.mozilla.org/show_bug.cgi?id=1716029, but I’m really concerned about it as it will mean HarfBuzz clients will be behaving differently.

Yes, I'm experimenting with a possible approach to try and handle this; if it looks workable I was intending to bring it back here to discuss if we could incorporate it into HB more directly.

Right now I’m working around this issue by using the private BUZZ feature to map the digits to an alternate version and handle them differently so that HarfBuzz and non-HarfBuzz engines work the same, but with the workaround in Firefox it will give the wrong output in Firefox without a way to handle it in the font.

That's interesting, but doesn't seem like it helps with things like the existing KFGQPC font that wants to ligate digit sequences? Given that font works "as intended" in Safari, I was hoping we could achieve the same result in the HB-based engines.

khaledhosny · 2021-06-16T11:10:01Z

That's interesting, but doesn't seem like it helps with things like the existing KFGQPC font that wants to ligate digit sequences? Given that font works "as intended" in Safari, I was hoping we could achieve the same result in the HB-based engines.

Ideally it would be fixed in HarfBuzz for good, but if not I’d rather have HarfBuzz clients work in a consistent (thus predictable) way.

Now I think the solution Behdad suggested #501 (comment) might be that bad after all :

short of looking at the input buffer text and if it starts with digits then assume a LTR natural direction...

(though checking for the first digit only is not enough, as it will miss the case in the Firefox bug report). I’d say any buffer with LTR and RTL script and at least one BiDi AN or EN character should be treated as having natural LTR direction.

May be the long term fix is to implement script segmentation in HarfBuzz and make all clients use it (I was checking DirectWrite the other day and it appears that there is no way to manually set the script based on its code, one have to use its script analyzer to get opaque script number to pass to the shaper).

jfkthame · 2021-06-18T10:04:19Z

In https://bugzilla.mozilla.org/show_bug.cgi?id=1716029 we're working around this via a hack that basically checks for numeric runs in Arabic and shapes them with LTR directionality, but if we could move this (or something comparable) into harfbuzz itself for consistency, that would obviously be better.

As a strawman suggestion, here's a harfbuzz patch that would result in similar behavior to what we're trying in Gecko:

diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -564,6 +565,42 @@ hb_ensure_native_direction (hb_buffer_t 
   hb_direction_t direction = buffer->props.direction;
   hb_direction_t horiz_dir = hb_script_get_horizontal_direction (buffer->props.script);
 
+  /* Numeric runs in natively-RTL scripts are actually native-LTR, so we reset the
+   * horiz_dir if the run contains at least one decimal-number char, and no chars with
+   * strong directionality or format-control chars (like the Arabic number/footnote/
+   * year signs) that may be trying to interact with digits.
+   *
+   * This allows digit sequences in Arabic etc to be shaped in "native" direction,
+   * so that features like ligatures will work as intended.
+   */
+  if (unlikely (horiz_dir == HB_DIRECTION_RTL && direction == HB_DIRECTION_LTR)) {
+    const hb_glyph_info_t* info = buffer->info;
+    bool found_number = false,
+         found_strong = false;
+    foreach_grapheme (buffer, start, end)
+    {
+      auto gc = _hb_glyph_info_get_general_category (&info[start]);
+      if (gc == HB_UNICODE_GENERAL_CATEGORY_DECIMAL_NUMBER)
+        found_number = true;
+      else if (unlikely (gc == HB_UNICODE_GENERAL_CATEGORY_FORMAT))
+      {
+        found_strong = true;
+        break;
+      }
+      else
+      {
+        auto bidi = mozilla::unicode::GetBidiCat(info[start].codepoint);
+        if (bidi == eCharType_RightToLeft || bidi == eCharType_RightToLeftArabic)
+        {
+          found_strong = true;
+          break;
+        }
+      }
+    }
+    if (found_number && !found_strong)
+      horiz_dir = HB_DIRECTION_LTR;
+  }
+
   /* TODO vertical:
    * The only BTT vertical script is Ogham, but it's not clear to me whether OpenType
    * Ogham fonts are supposed to be implemented BTT or not.  Need to research that

This works within Firefox, but depends on calling a mozilla API to look up bidi categories; I don't think we currently have that in harfbuzz, do we? So if we wanted to do something like that, we'd need to add bidi-category lookup to the unicode functions.

behdad · 2021-06-21T04:53:45Z

We don't have bidi categories currently. I'm open to adding those if we ever implement the bidi algo. In the mean time:

I think checking for GC=Letter* types is enough in this case. If not, can be combined with horiz-dir=RTL. But is that even necessary? I think checking for GC=Letter* should be enough.

My concern is with GC=Format. Doesn't that negate the point? Aren't the year-sign etc GC=Cf? Please adivse.

I'm excited about getting this in and closing the gap with Uniscribe.

jfkthame · 2021-06-23T10:55:17Z

My concern is with GC=Format. Doesn't that negate the point? Aren't the year-sign etc GC=Cf? Please adivse.

Yes, those signs are Cf.

Currently to get the desired rendering with e.g. Noto Nastaliq Urdu, the client needs to explicitly override directionality of the run. See for example https://searchfox.org/mozilla-central/source/layout/reftests/bugs/1599841-1.html, where unicode-bidi: bidi-override is used to force RTL directionality onto the list item numbers.

The tweak here breaks that rendering if we don't exclude cases where GC=Cf is present.

But there may be better ways to approach this... we should try to think it through more carefully.

khaledhosny · 2021-06-23T13:28:42Z

But if you are forcing RTL direction, the direction == HB_DIRECTION_LTR) will be false and this new code will not be applied?

khaledhosny · 2021-06-23T13:46:49Z

With the modified patch below, setting direction to LTR and RTL seems to produce visually identical output using Noto Nastaliq Urdu (for single digits of course, forcing RTL does not make sense for more than one digit), and it seems to match both Uniscribe and CoreText:

diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index 4fe148d67..1b72527f6 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -574,6 +574,33 @@ hb_ensure_native_direction (hb_buffer_t *buffer)
   hb_direction_t direction = buffer->props.direction;
   hb_direction_t horiz_dir = hb_script_get_horizontal_direction (buffer->props.script);

+  /* Numeric runs in natively-RTL scripts are actually native-LTR, so we reset the
+   * horiz_dir if the run contains at least one decimal-number char, and no chars with
+   * strong directionality or format-control chars (like the Arabic number/footnote/
+   * year signs) that may be trying to interact with digits.
+   *
+   * This allows digit sequences in Arabic etc to be shaped in "native" direction,
+   * so that features like ligatures will work as intended.
+   */
+  if (unlikely (horiz_dir == HB_DIRECTION_RTL && direction == HB_DIRECTION_LTR)) {
+    const hb_glyph_info_t* info = buffer->info;
+    bool found_number = false,
+         found_strong = false;
+    foreach_grapheme (buffer, start, end)
+    {
+      auto gc = _hb_glyph_info_get_general_category (&info[start]);
+      if (gc == HB_UNICODE_GENERAL_CATEGORY_DECIMAL_NUMBER)
+        found_number = true;
+      else if (HB_UNICODE_GENERAL_CATEGORY_IS_LETTER (gc))
+      {
+        found_strong = true;
+        break;
+      }
+    }
+    if (found_number && !found_strong)
+      horiz_dir = HB_DIRECTION_LTR;
+  }
+
   /* TODO vertical:
    * The only BTT vertical script is Ogham, but it's not clear to me whether OpenType
    * Ogham fonts are supposed to be implemented BTT or not.  Need to research that
diff --git a/src/hb-unicode.hh b/src/hb-unicode.hh
index 5c0cb859d..0a7994410 100644
--- a/src/hb-unicode.hh
+++ b/src/hb-unicode.hh
@@ -359,6 +359,13 @@ DECLARE_NULL_INSTANCE (hb_unicode_funcs_t);
          FLAG (HB_UNICODE_GENERAL_CATEGORY_ENCLOSING_MARK) | \
          FLAG (HB_UNICODE_GENERAL_CATEGORY_NON_SPACING_MARK)))

+#define HB_UNICODE_GENERAL_CATEGORY_IS_LETTER(gen_cat) \
+       (FLAG_UNSAFE (gen_cat) & \
+        (FLAG (HB_UNICODE_GENERAL_CATEGORY_LOWERCASE_LETTER) | \
+         FLAG (HB_UNICODE_GENERAL_CATEGORY_MODIFIER_LETTER) | \
+         FLAG (HB_UNICODE_GENERAL_CATEGORY_OTHER_LETTER) | \
+         FLAG (HB_UNICODE_GENERAL_CATEGORY_TITLECASE_LETTER) | \
+         FLAG (HB_UNICODE_GENERAL_CATEGORY_UPPERCASE_LETTER)))

 /*
  * Ranges, used for bsearch tables.

behdad · 2021-06-23T14:30:48Z

@khaledhosny didn't you have to do special hacks in Amiri for this as well? Are they unnecessary now? I'm more interested in number-sign, ayah-sign etc working, than the kerning applied being of a particular direction.

khaledhosny · 2021-06-23T15:23:50Z

@khaledhosny didn't you have to do special hacks in Amiri for this as well? Are they unnecessary now? I'm more interested in number-sign, ayah-sign etc working, than the kerning applied being of a particular direction.

They are not needed anymore, I already built a test version of Amiri without the hacks and it works equally well with the three engines. I probably can simplify things further and use anchor attachment instead of the manually calculated positioning I currently use.

behdad · 2021-06-23T15:27:09Z

They are not needed anymore, I already built a test version of Amiri without the hacks and it works equally well with the three engines. I probably can simplify things further and use anchor attachment instead of the manually calculated positioning I currently use.

Great. Thanks. Send a PR :).

behdad · 2021-06-23T15:27:52Z

+  if (unlikely (horiz_dir == HB_DIRECTION_RTL && direction == HB_DIRECTION_LTR)) {

Bracket on new line please.

+    const hb_glyph_info_t* info = buffer->info;

We typically try to keep this closest to the for loop, for better auditing.

See inline comments. Slightly modified version of the code from Jonathan Kew on the linked issue. Fixes #501

khaledhosny · 2021-06-23T16:35:00Z

Great. Thanks. Send a PR :).

#3035

See inline comments. Slightly modified version of the code from Jonathan Kew on the linked issue. Fixes #501

anthrotype · 2021-06-24T14:40:54Z

nice to see this old issue finally fixed! thank you :)

muhannad102 · 2021-09-27T02:49:50Z

I am still having the same issue in Chrome and Opera, but not with Microsoft Edge

alerque · 2021-09-27T07:34:05Z

Which version of Harfbuzz is baked into the version of Chrome & Opera you tested with? I assume this fix just hasn't landed in the branches of those projects you are using yet.

anthrotype mentioned this issue Jan 8, 2018

kern pairs between Arabic numerals missing from kern feature googlefonts/ufo2ft#198

Closed

ebraminio mentioned this issue Mar 31, 2018

Implement prop table parsing #906

Closed

madig mentioned this issue Jul 30, 2019

Check suggestion: flag kerning of Arabic numbers fonttools/fontbakery#2603

Open

khaledhosny mentioned this issue Jun 19, 2021

U+0600 U+0601 U+0602 U+0603 overlap the preceding letters in NotoNaskhArabic and NotoNaskhArabicUI notofonts/arabic#21

Closed

khaledhosny added a commit that referenced this issue Jun 23, 2021

[ot-shape] Numeric runs native direction is LTR

988d2d6

See inline comments. Slightly modified version of the code from Jonathan Kew on the linked issue. Fixes #501

khaledhosny mentioned this issue Jun 23, 2021

[ot-shape] Numeric runs native direction is LTR #3035

Merged

khaledhosny added a commit that referenced this issue Jun 23, 2021

[ot-shape] Numeric runs native direction is LTR

3946ffc

See inline comments. Slightly modified version of the code from Jonathan Kew on the linked issue. Fixes #501

khaledhosny added a commit that referenced this issue Jun 23, 2021

[ot-shape] Numeric runs native direction is LTR

def3ef9

See inline comments. Slightly modified version of the code from Jonathan Kew on the linked issue. Fixes #501

behdad closed this as completed in #3035 Jun 23, 2021

behdad pushed a commit that referenced this issue Jun 23, 2021

[ot-shape] Numeric runs native direction is LTR

c3be28e

See inline comments. Slightly modified version of the code from Jonathan Kew on the linked issue. Fixes #501

Luciogi mentioned this issue Apr 18, 2023

Uthmani Hafs font verse count outside of circle AlfaazPlus/QuranApp#210

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verse numbers of "KFGQPC Uthmanic Script HAFS" are not shaped right #501

Verse numbers of "KFGQPC Uthmanic Script HAFS" are not shaped right #501

ebraminio commented Jun 23, 2017 •

edited

khaledhosny commented Jul 11, 2017

ebraminio commented Oct 2, 2017

behdad commented Oct 4, 2017

behdad commented Jan 5, 2018

khaledhosny commented Jan 18, 2018

behdad commented Jan 19, 2018

khaledhosny commented Jun 16, 2021

jfkthame commented Jun 16, 2021

khaledhosny commented Jun 16, 2021 •

edited

jfkthame commented Jun 18, 2021

behdad commented Jun 21, 2021

jfkthame commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

behdad commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

behdad commented Jun 23, 2021

behdad commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

anthrotype commented Jun 24, 2021

muhannad102 commented Sep 27, 2021

alerque commented Sep 27, 2021

Verse numbers of "KFGQPC Uthmanic Script HAFS" are not shaped right #501

Verse numbers of "KFGQPC Uthmanic Script HAFS" are not shaped right #501

Comments

ebraminio commented Jun 23, 2017 • edited

khaledhosny commented Jul 11, 2017

ebraminio commented Oct 2, 2017

behdad commented Oct 4, 2017

behdad commented Jan 5, 2018

khaledhosny commented Jan 18, 2018

behdad commented Jan 19, 2018

khaledhosny commented Jun 16, 2021

jfkthame commented Jun 16, 2021

khaledhosny commented Jun 16, 2021 • edited

jfkthame commented Jun 18, 2021

behdad commented Jun 21, 2021

jfkthame commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

behdad commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

behdad commented Jun 23, 2021

behdad commented Jun 23, 2021

khaledhosny commented Jun 23, 2021

anthrotype commented Jun 24, 2021

muhannad102 commented Sep 27, 2021

alerque commented Sep 27, 2021

ebraminio commented Jun 23, 2017 •

edited

khaledhosny commented Jun 16, 2021 •

edited