Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504

Closed
mowais786 opened this issue Dec 21, 2018 · 15 comments · Fixed by #1646
Closed

Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504

mowais786 opened this issue Dec 21, 2018 · 15 comments · Fixed by #1646

Comments

@mowais786
Copy link

I am using harfbuzz 2.2.0 wich icu63, icu layout along with icu-le-hb bridge. Initial and medial form of 0x626 are not returned correctly. Base glyph is returned.
Here is the sample input and wrong output
المكتب الرئيسي
المكتب الرئ يسي (Intentionally added space to show the wrong output)

This was working fine with icu56 and layoutengine

@khaledhosny
Copy link
Collaborator

Does this happen with a specific font, if so which font. If not, do you have some minimal code to reproduce this?

@mowais786
Copy link
Author

This happens with all fonts, even with Arial Unicode. I will try to produce some short code to reproduce. I can give you some hint. I debugged code, it generates fonts fall back map, which contains all four glyph but u.format2.get_coverage returns NOT_COVERED for initial and medial glyph and works fine for End glyph. While building the range for 0x626 it seems that RangeRecord is set start = 0x626 and end = 0.
Hope this will help to identify the issue. In the meanwhile I will try to produce some small code example to reproduce the issue.

@gilbahat
Copy link

gilbahat commented Dec 23, 2018

we are also having issues with the same glyph (0x0626) but it appears to be a shaper issue for us AND it doesn't work with icu52 or icu63. coretext returns a correct result but ot or fallback shapers don't.
tested with GNU unifont 11.

gilbahat@pasture:/$ cat 2.txt
تعبئة
gilbahat@pasture:/: hb-shape --verbose --trace --font-file=unifont.ttf --shapers=ot --text-file 2.txt
1: (تعبئة)
1: <U+062A,U+0639,U+0628,U+0626,U+0629>
1: [uniFE94=4+512|uni0626=3+512|uniFE92=2+512|uniFECC=1+512|uniFE97=0+512]
gilbahat@pasture:/: hb-shape --verbose --trace --font-file=unifont.ttf --shapers=fallback --text-file 2.txt
1: (تعبئة)
1: <U+062A,U+0639,U+0628,U+0626,U+0629>
1: [uni0629=4+512|uni0626=3+512|uni0628=2+512|uni0639=1+512|uni062A=0+512]
gilbahat@pasture:/: hb-shape --verbose --trace --font-file=unifont.ttf --shapers=coretext --text-file 2.txt
1: (تعبئة)
1: <U+062A,U+0639,U+0628,U+0626,U+0629>
1: [uniFE94=4+512|uniFE8C=3+512|uniFE92=2+512|uniFECC=1+512|uniFE97=0+512]

please let me know if you think this should be split to another bug.

@khaledhosny
Copy link
Collaborator

Unifont does not have a GSUB table, so that is our fallback Arabic shaping (not to be confused with the fallback shaper). I can reproduce the issue with this font, but I can’t see why it is happening; the shaping table has the expected entries for U+0626 and the fonts has glyph for these characters:

  {0xFE8Bu, 0xFE8Cu, 0xFE8Au, 0xFE89u}, /* U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE */

@gilbahat
Copy link

Thank you for your prompt reply. I am not sure why coretext works in this case then... anyway, if there's anything I can do to further help, just let me know.

@mowais786
Copy link
Author

@khaledhosny do we have any update on the fix of this issue?

@khaledhosny
Copy link
Collaborator

No, all I know is that fallback shaping for U+0626 is not working for some reason. But this affects only fonts that lack GSUB table which is a rarity (and these fonts are broken in many other ways), not all Arabic fonts. So, either all fonts you tested with lack GSUB table or there are two different issues here.

@gilbahat
Copy link

gilbahat commented Jan 6, 2019

well, we use gnu unifont as a fallback font, it is extremely useful for that purpose. furthermore, the absence of GSUB table actually works well for us because we can pre-shape the text this way irregardless of the presence of a correct GSUB table, for any arbitrary font.

is the lack of GSUB table a matter of prioritization or is that a WONTFIX?

@gilbahat
Copy link

hey, I would be happy if you could update regarding my last question (that is, what is the approach to this bug re: policy). I would like to believe that gnu unifont should be considered a primary harfbuzz correctness test target, as I suspect we're not the only ones needing to use such a font or falling back to it.

@khaledhosny
Copy link
Collaborator

I don’t really understand your use case, and Unifont is a pretty poor fallback choice in every possible way. Fixing it isn’t a priory for me, but I can speak only for myself.

@gilbahat
Copy link

Our use case is so:

we have to pre-render a piece of text (in this context, get its glyphs) prior to choice of a font. since we cannot assume anything about the font that we will end up using, we have to somehow accomodate fonts which are broken as well. using unifont allows us to assume nothing about the font itself, including the correctness of its GSUB table.

@khaledhosny
Copy link
Collaborator

I don’t see how it is even possible to render a piece of text before actually knowing the font. But I’m not trying to argue with you about what you should or shouldn’t do, I’m just pointing that what you are doing is not that common.

@jfkthame
Copy link
Collaborator

we have to pre-render a piece of text (in this context, get its glyphs) prior to choice of a font

In the general case, this isn't possible. There are plenty of cases (e.g. among Indic scripts, but also including some extended Arabic-script letters) where the various shaped forms required for proper rendering are not encoded with their own Unicode codepoints at all; they exist only as (font-specific) glyph IDs within the font. So the shaping is inherently font-specific, and cannot be performed without reference to the font's layout tables.

@gilbahat
Copy link

It's a bit more complex for us at our use case, but let's say that our approach does work even if it (obviously) won't support fonts with custom encodings and their own glyph substitution table.

jfkthame added a commit that referenced this issue Mar 31, 2019
Fixes a bug in CoverageFormat2::serialize whereby the first range
was not serialized correctly if it consists of only a single glyph ID.
This broke shaping of U+0626 in the Arabic fallback shaper, because it
is not found in the coverage table of the 'init' and 'medi' lookups.

Also fix similar bug in ClassDefFormat2::serialize, noted during code
inspection (I haven't observed a case that was actually affected by
this, but it looks broken).

Fixes #1504
@jfkthame
Copy link
Collaborator

I believe I have identified the bug affecting U+0626, and proposed a fix in #1646.

(That doesn't alter the fact that this approach to shaping is inherently limited and there are many languages that simply cannot be rendered correctly in this way.)

behdad pushed a commit that referenced this issue Apr 1, 2019
Fixes a bug in CoverageFormat2::serialize whereby the first range
was not serialized correctly if it consists of only a single glyph ID.
This broke shaping of U+0626 in the Arabic fallback shaper, because it
is not found in the coverage table of the 'init' and 'medi' lookups.

Also fix similar bug in ClassDefFormat2::serialize, noted during code
inspection (I haven't observed a case that was actually affected by
this, but it looks broken).

Fixes #1504
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants