Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504

mowais786 · 2018-12-21T10:07:49Z

I am using harfbuzz 2.2.0 wich icu63, icu layout along with icu-le-hb bridge. Initial and medial form of 0x626 are not returned correctly. Base glyph is returned.
Here is the sample input and wrong output
المكتب الرئيسي
المكتب الرئ يسي (Intentionally added space to show the wrong output)

This was working fine with icu56 and layoutengine

khaledhosny · 2018-12-21T10:26:31Z

Does this happen with a specific font, if so which font. If not, do you have some minimal code to reproduce this?

mowais786 · 2018-12-21T18:11:15Z

This happens with all fonts, even with Arial Unicode. I will try to produce some short code to reproduce. I can give you some hint. I debugged code, it generates fonts fall back map, which contains all four glyph but u.format2.get_coverage returns NOT_COVERED for initial and medial glyph and works fine for End glyph. While building the range for 0x626 it seems that RangeRecord is set start = 0x626 and end = 0.
Hope this will help to identify the issue. In the meanwhile I will try to produce some small code example to reproduce the issue.

gilbahat · 2018-12-23T09:48:47Z

we are also having issues with the same glyph (0x0626) but it appears to be a shaper issue for us AND it doesn't work with icu52 or icu63. coretext returns a correct result but ot or fallback shapers don't.
tested with GNU unifont 11.

gilbahat@pasture:/$ cat 2.txt
تعبئة
gilbahat@pasture:/: hb-shape --verbose --trace --font-file=unifont.ttf --shapers=ot --text-file 2.txt
1: (تعبئة)
1: <U+062A,U+0639,U+0628,U+0626,U+0629>
1: [uniFE94=4+512|uni0626=3+512|uniFE92=2+512|uniFECC=1+512|uniFE97=0+512]
gilbahat@pasture:/: hb-shape --verbose --trace --font-file=unifont.ttf --shapers=fallback --text-file 2.txt
1: (تعبئة)
1: <U+062A,U+0639,U+0628,U+0626,U+0629>
1: [uni0629=4+512|uni0626=3+512|uni0628=2+512|uni0639=1+512|uni062A=0+512]
gilbahat@pasture:/: hb-shape --verbose --trace --font-file=unifont.ttf --shapers=coretext --text-file 2.txt
1: (تعبئة)
1: <U+062A,U+0639,U+0628,U+0626,U+0629>
1: [uniFE94=4+512|uniFE8C=3+512|uniFE92=2+512|uniFECC=1+512|uniFE97=0+512]

please let me know if you think this should be split to another bug.

khaledhosny · 2018-12-25T14:56:00Z

Unifont does not have a GSUB table, so that is our fallback Arabic shaping (not to be confused with the fallback shaper). I can reproduce the issue with this font, but I can’t see why it is happening; the shaping table has the expected entries for U+0626 and the fonts has glyph for these characters:

  {0xFE8Bu, 0xFE8Cu, 0xFE8Au, 0xFE89u}, /* U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE */

gilbahat · 2018-12-26T07:00:00Z

Thank you for your prompt reply. I am not sure why coretext works in this case then... anyway, if there's anything I can do to further help, just let me know.

mowais786 · 2019-01-02T07:40:40Z

@khaledhosny do we have any update on the fix of this issue?

khaledhosny · 2019-01-02T09:20:03Z

No, all I know is that fallback shaping for U+0626 is not working for some reason. But this affects only fonts that lack GSUB table which is a rarity (and these fonts are broken in many other ways), not all Arabic fonts. So, either all fonts you tested with lack GSUB table or there are two different issues here.

gilbahat · 2019-01-06T06:16:11Z

well, we use gnu unifont as a fallback font, it is extremely useful for that purpose. furthermore, the absence of GSUB table actually works well for us because we can pre-shape the text this way irregardless of the presence of a correct GSUB table, for any arbitrary font.

is the lack of GSUB table a matter of prioritization or is that a WONTFIX?

gilbahat · 2019-02-24T06:39:59Z

hey, I would be happy if you could update regarding my last question (that is, what is the approach to this bug re: policy). I would like to believe that gnu unifont should be considered a primary harfbuzz correctness test target, as I suspect we're not the only ones needing to use such a font or falling back to it.

khaledhosny · 2019-03-18T19:20:17Z

I don’t really understand your use case, and Unifont is a pretty poor fallback choice in every possible way. Fixing it isn’t a priory for me, but I can speak only for myself.

gilbahat · 2019-03-31T13:31:28Z

Our use case is so:

we have to pre-render a piece of text (in this context, get its glyphs) prior to choice of a font. since we cannot assume anything about the font that we will end up using, we have to somehow accomodate fonts which are broken as well. using unifont allows us to assume nothing about the font itself, including the correctness of its GSUB table.

khaledhosny · 2019-03-31T13:37:17Z

I don’t see how it is even possible to render a piece of text before actually knowing the font. But I’m not trying to argue with you about what you should or shouldn’t do, I’m just pointing that what you are doing is not that common.

jfkthame · 2019-03-31T16:17:05Z

we have to pre-render a piece of text (in this context, get its glyphs) prior to choice of a font

In the general case, this isn't possible. There are plenty of cases (e.g. among Indic scripts, but also including some extended Arabic-script letters) where the various shaped forms required for proper rendering are not encoded with their own Unicode codepoints at all; they exist only as (font-specific) glyph IDs within the font. So the shaping is inherently font-specific, and cannot be performed without reference to the font's layout tables.

gilbahat · 2019-03-31T16:26:06Z

It's a bit more complex for us at our use case, but let's say that our approach does work even if it (obviously) won't support fonts with custom encodings and their own glyph substitution table.

Fixes a bug in CoverageFormat2::serialize whereby the first range was not serialized correctly if it consists of only a single glyph ID. This broke shaping of U+0626 in the Arabic fallback shaper, because it is not found in the coverage table of the 'init' and 'medi' lookups. Also fix similar bug in ClassDefFormat2::serialize, noted during code inspection (I haven't observed a case that was actually affected by this, but it looks broken). Fixes #1504

jfkthame · 2019-03-31T18:30:30Z

I believe I have identified the bug affecting U+0626, and proposed a fix in #1646.

(That doesn't alter the fact that this approach to shaping is inherently limited and there are many languages that simply cannot be rendered correctly in this way.)

Fixes a bug in CoverageFormat2::serialize whereby the first range was not serialized correctly if it consists of only a single glyph ID. This broke shaping of U+0626 in the Arabic fallback shaper, because it is not found in the coverage table of the 'init' and 'medi' lookups. Also fix similar bug in ClassDefFormat2::serialize, noted during code inspection (I haven't observed a case that was actually affected by this, but it looks broken). Fixes #1504

jfkthame mentioned this issue Mar 31, 2019

Don't skip setting the .end field of the first range #1646

Merged

behdad closed this as completed in #1646 Apr 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504

Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504

mowais786 commented Dec 21, 2018

khaledhosny commented Dec 21, 2018

mowais786 commented Dec 21, 2018

gilbahat commented Dec 23, 2018 •

edited

khaledhosny commented Dec 25, 2018

gilbahat commented Dec 26, 2018

mowais786 commented Jan 2, 2019

khaledhosny commented Jan 2, 2019

gilbahat commented Jan 6, 2019

gilbahat commented Feb 24, 2019

khaledhosny commented Mar 18, 2019

gilbahat commented Mar 31, 2019

khaledhosny commented Mar 31, 2019

jfkthame commented Mar 31, 2019

gilbahat commented Mar 31, 2019

jfkthame commented Mar 31, 2019

Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504

Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504

Comments

mowais786 commented Dec 21, 2018

khaledhosny commented Dec 21, 2018

mowais786 commented Dec 21, 2018

gilbahat commented Dec 23, 2018 • edited

khaledhosny commented Dec 25, 2018

gilbahat commented Dec 26, 2018

mowais786 commented Jan 2, 2019

khaledhosny commented Jan 2, 2019

gilbahat commented Jan 6, 2019

gilbahat commented Feb 24, 2019

khaledhosny commented Mar 18, 2019

gilbahat commented Mar 31, 2019

khaledhosny commented Mar 31, 2019

jfkthame commented Mar 31, 2019

gilbahat commented Mar 31, 2019

jfkthame commented Mar 31, 2019

gilbahat commented Dec 23, 2018 •

edited