Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect shaping with of <0644, 064E, 0670, 0653, ...> with "KFGQPC Uthmanic Script HAFS" #505

Closed
emuller-amazon opened this issue Jul 10, 2017 · 4 comments

Comments

@emuller-amazon
Copy link
Contributor

Word (on Mac) and the coretext shaper produce different output than hb on the string <U+0644,U+064E,U+0670,U+0653,U+0626> with the font KFGQPC Uthmanic Script HAFS (available at http://fonts.qurancomplex.gov.sa/?page_id=42).

It is as if hb applies lookups in lookup order, while the Microsoft spec for Arabic specifies to applies the features one by one, in a defined order.


Details:

U+0644 --cmap--> g+94 --init-->g+367
U+064E --cmap--> g+104
U+0670 --cmap--> g+138
U+0653 --cmap--> g+109

Under 'calt', lookup 8: replaces g+367 by g+615 in the context <g+367, g+104, g+138>.

Under 'liga', lookup 0: replaces <g+138, g+109> by g+290.

Finally, GPOS attaches g+290 to the left of the vertical stem of the lam, g+615.

With hb:

emuller> hb-shape --text-file=test.txt --font-file=UthmanicHafs1\ Ver09.otf --output-format=json --no-glyph-names --verbose
1: (لَٰٓئ)
1: <U+0644,U+064E,U+0670,U+0653,U+0626>
1: [{"g":306,"cl":4,"dx":0,"dy":0,"ax":1202,"ay":0},{"g":290,"cl":0,"dx":-75,"dy":1515,"ax":0,"ay":0},{"g":104,"cl":0,"dx":0,"dy":1425,"ax":0,"ay":0},{"g":367,"cl":0,"dx":0,"dy":0,"ax":518,"ay":0}]

It seems that lookup 0 is applied first, which means that the context of lookup 8 is no longer there, so the glyph for lam is not replaced, and the superscript alef + maddah is positioned above the vertical stem of the lam.

In Word, and with coretext shaping:

emuller> hb-shape --text-file=test.txt --font-file=UthmanicHafs1\ Ver09.otf --output-format=json --no-glyph-names --verbose --shaper=coretext
1: (لَٰٓئ)
1: <U+0644,U+064E,U+0670,U+0653,U+0626>
1: [{"g":306,"cl":4,"dx":0,"dy":0,"ax":1252,"ay":0},{"g":290,"cl":0,"dx":0,"dy":350,"ax":500,"ay":0},{"g":104,"cl":0,"dx":0,"dy":1425,"ax":-550,"ay":0},{"g":676,"cl":0,"dx":0,"dy":0,"ax":1065,"ay":0}]

It seems that calt is applied first, which means that the glyph for lam (g+367) is replace (by g+676) and now the superscript alef and maddah attach to the left of the vertical stem of the lam.

@khaledhosny
Copy link
Collaborator

Core Text can’t be trusted to do the right thing IMO, it would be more interesting to compare with Uniscribe or DirectWrite on Windows.

@behdad
Copy link
Member

behdad commented Jul 11, 2017

What Khaled said. We try hard to do what Uniscribe does. See:

https://github.com/behdad/harfbuzz/blob/master/src/hb-ot-shape-complex-arabic.cc#L181

Also. You can do hb-shape --debug to see what's going on. If you still think there's something that needs attention in HarfBuzz let me know and I'll debug.

@emuller-amazon
Copy link
Contributor Author

emuller-amazon commented Jul 11, 2017

I tried on Windows, comparing with Uniscribe, and the problem and diagnosis is the same.

$ harfbuzz-1.4.6-win32/hb-shape --text-file=test.txt --font-file=UthmanicHafs1\ Ver09.otf --output-format=json --no-glyph-names --verbose
1: (لَٰٓئ)
1: <U+0644,U+064E,U+0670,U+0653,U+0626>
1: [{"g":306,"cl":4,"dx":0,"dy":0,"ax":1202,"ay":0},{"g":290,"cl":0,"dx":-75,"dy":1515,"ax":0,"ay":0},{"g":104,"cl":0,"dx":0,"dy":1425,"ax":0,"ay":0},{"g":367,"cl":0,"dx":0,"dy":0,"ax":518,"ay":0}]

hb

$ harfbuzz-1.4.6-win32/hb-shape --text-file=test.txt --font-file=UthmanicHafs1\ Ver09.otf --output-format=json --no-glyph-names --verbose --shaper=uniscribe
1: (لَٰٓئ)
1: <U+0644,U+064E,U+0670,U+0653,U+0626>
1: [{"g":306,"cl":4,"dx":0,"dy":0,"ax":1202,"ay":0},{"g":290,"cl":0,"dx":50,"dy":350,"ax":0,"ay":0},{"g":104,"cl":0,"dx":550,"dy":1425,"ax":0,"ay":0},{"g":676,"cl":0,"dx":0,"dy":0,"ax":1065,"ay":0}]

uniscribe

Running hb-hshape with --debug shows that the GSUB lookups are applied in this order:

3 fina
2 medi
1 init
0 liga
4..16 calt
17..18 liga
19..28 calt
29..35 liga

Seems to me that fina, medi and init are applied separately, in that order (as specified by the MS spec), but then all the lookups of calt and liga are merged an applied in lookup order. The MS spec is pretty clear that each feature should be applied separately.

@behdad
Copy link
Member

behdad commented Jul 14, 2017

Thanks Eric. Fixing this.

@behdad behdad closed this as completed in c1432bc Jul 14, 2017
clrpackages pushed a commit to clearlinux-pkgs/harfbuzz that referenced this issue Jul 19, 2017
…1.4.7

Overview of changes leading to 1.4.7
Tuesday, July 18, 2017
====================================

- Multiple Indic, Tibetan, and Cham fixes.
- CoreText: Allow disabling kerning.
- Adjust Arabic feature order again.
- Misc build fixes.

    1.4.7

 NEWS         | 10 ++++++++++
 configure.ac |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

commit c1432bce3cfc1156d19b21892d4083afa8838d94
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Fri Jul 14 17:34:47 2017 +0100

    [arabic] Adjust feature order again

    Fixes harfbuzz/harfbuzz#505

 src/hb-ot-shape-complex-arabic.cc                        |   7 ++++++-

(NEWS truncated at 15 lines)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants