New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong glyph for Initial and Medial form of Arabic Letter (ئ) YEH WITH HAMZA above #1504
Comments
Does this happen with a specific font, if so which font. If not, do you have some minimal code to reproduce this? |
This happens with all fonts, even with Arial Unicode. I will try to produce some short code to reproduce. I can give you some hint. I debugged code, it generates fonts fall back map, which contains all four glyph but u.format2.get_coverage returns NOT_COVERED for initial and medial glyph and works fine for End glyph. While building the range for 0x626 it seems that RangeRecord is set start = 0x626 and end = 0. |
we are also having issues with the same glyph (0x0626) but it appears to be a shaper issue for us AND it doesn't work with icu52 or icu63. coretext returns a correct result but ot or fallback shapers don't. gilbahat@pasture:/$ cat 2.txt please let me know if you think this should be split to another bug. |
Unifont does not have a GSUB table, so that is our fallback Arabic shaping (not to be confused with the {0xFE8Bu, 0xFE8Cu, 0xFE8Au, 0xFE89u}, /* U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE */ |
Thank you for your prompt reply. I am not sure why coretext works in this case then... anyway, if there's anything I can do to further help, just let me know. |
@khaledhosny do we have any update on the fix of this issue? |
No, all I know is that fallback shaping for U+0626 is not working for some reason. But this affects only fonts that lack GSUB table which is a rarity (and these fonts are broken in many other ways), not all Arabic fonts. So, either all fonts you tested with lack GSUB table or there are two different issues here. |
well, we use gnu unifont as a fallback font, it is extremely useful for that purpose. furthermore, the absence of GSUB table actually works well for us because we can pre-shape the text this way irregardless of the presence of a correct GSUB table, for any arbitrary font. is the lack of GSUB table a matter of prioritization or is that a WONTFIX? |
hey, I would be happy if you could update regarding my last question (that is, what is the approach to this bug re: policy). I would like to believe that gnu unifont should be considered a primary harfbuzz correctness test target, as I suspect we're not the only ones needing to use such a font or falling back to it. |
I don’t really understand your use case, and Unifont is a pretty poor fallback choice in every possible way. Fixing it isn’t a priory for me, but I can speak only for myself. |
Our use case is so: we have to pre-render a piece of text (in this context, get its glyphs) prior to choice of a font. since we cannot assume anything about the font that we will end up using, we have to somehow accomodate fonts which are broken as well. using unifont allows us to assume nothing about the font itself, including the correctness of its GSUB table. |
I don’t see how it is even possible to render a piece of text before actually knowing the font. But I’m not trying to argue with you about what you should or shouldn’t do, I’m just pointing that what you are doing is not that common. |
In the general case, this isn't possible. There are plenty of cases (e.g. among Indic scripts, but also including some extended Arabic-script letters) where the various shaped forms required for proper rendering are not encoded with their own Unicode codepoints at all; they exist only as (font-specific) glyph IDs within the font. So the shaping is inherently font-specific, and cannot be performed without reference to the font's layout tables. |
It's a bit more complex for us at our use case, but let's say that our approach does work even if it (obviously) won't support fonts with custom encodings and their own glyph substitution table. |
Fixes a bug in CoverageFormat2::serialize whereby the first range was not serialized correctly if it consists of only a single glyph ID. This broke shaping of U+0626 in the Arabic fallback shaper, because it is not found in the coverage table of the 'init' and 'medi' lookups. Also fix similar bug in ClassDefFormat2::serialize, noted during code inspection (I haven't observed a case that was actually affected by this, but it looks broken). Fixes #1504
I believe I have identified the bug affecting U+0626, and proposed a fix in #1646. (That doesn't alter the fact that this approach to shaping is inherently limited and there are many languages that simply cannot be rendered correctly in this way.) |
Fixes a bug in CoverageFormat2::serialize whereby the first range was not serialized correctly if it consists of only a single glyph ID. This broke shaping of U+0626 in the Arabic fallback shaper, because it is not found in the coverage table of the 'init' and 'medi' lookups. Also fix similar bug in ClassDefFormat2::serialize, noted during code inspection (I haven't observed a case that was actually affected by this, but it looks broken). Fixes #1504
I am using harfbuzz 2.2.0 wich icu63, icu layout along with icu-le-hb bridge. Initial and medial form of 0x626 are not returned correctly. Base glyph is returned.
Here is the sample input and wrong output
المكتب الرئيسي
المكتب الرئ يسي (Intentionally added space to show the wrong output)
This was working fine with icu56 and layoutengine
The text was updated successfully, but these errors were encountered: