Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in Indic shaping of दिसंबर after #3490 #3513

Closed
drott opened this issue Mar 28, 2022 · 16 comments · Fixed by #3515
Closed

Regression in Indic shaping of दिसंबर after #3490 #3513

drott opened this issue Mar 28, 2022 · 16 comments · Fixed by #3515

Comments

@drott
Copy link
Collaborator

drott commented Mar 28, 2022

Bisected to 90f09b1

$ build/util/hb-view ~/dev/blink/src/third_party/test_fonts/test_fonts/Lohit-Devanagari.ttf "दिसंबर"

image

Expected, at least that's what it was before:

image

@drott
Copy link
Collaborator Author

drott commented Mar 28, 2022

This change discussed in #3488 - so perhaps this is the kind of font that is expecting the previous order?

@drott
Copy link
Collaborator Author

drott commented Mar 28, 2022

    <namerecord nameID="5" platformID="3" platEncID="1" langID="0x409">
      2.95.4
    </namerecord>

lohit_devanagari.zip

That seems to be the "newest" version of the font, according to https://pagure.io/lohit/tree/master

@behdad
Copy link
Member

behdad commented Mar 28, 2022

We should test what other shapers do with this. cc @dscorbett

@jfkthame
Copy link
Collaborator

Looks like this is a result of GSUB lookup 25, which ligates the i-matra with the anusvara, as part of the pres feature:

      <Lookup index="25">
        <LookupType value="4"/>
        <LookupFlag value="2"/><!-- ignoreBaseGlyphs -->
        <!-- SubTableCount=1 -->
        <LigatureSubst index="0" Format="1">
          <LigatureSet glyph="isigndeva">
            <Ligature components="radeva_viramadeva,anusvaradeva" glyph="isign_ra_virama_anusvara"/>
            <Ligature components="radeva_viramadeva" glyph="isign_ra_virama"/>
            <Ligature components="anusvaradeva" glyph="isigndeva_anusvaradeva"/>
          </LigatureSet>
          <LigatureSet glyph="isigndeva.alt1">
            <Ligature components="radeva_viramadeva,anusvaradeva" glyph="isign_ra_virama_anusvara.alt1"/>
            <Ligature components="radeva_viramadeva" glyph="isign_ra_virama.alt1"/>
            <Ligature components="anusvaradeva" glyph="isigndeva_anusvara.alt1"/>
          </LigatureSet>
          <LigatureSet glyph="isigndeva.alt2">
            <Ligature components="radeva_viramadeva,anusvaradeva" glyph="isign_ra_virama_anusvara.alt2"/>
            <Ligature components="radeva_viramadeva" glyph="isign_ra_virama.alt2"/>
            <Ligature components="anusvaradeva" glyph="isigndeva_anusvara.alt2"/>
          </LigatureSet>
          <LigatureSet glyph="isigndeva.alt3">
            <Ligature components="radeva_viramadeva,anusvaradeva" glyph="isign_ra_virama_anusvara.alt3"/>
            <Ligature components="radeva_viramadeva" glyph="isign_ra_virama.alt3"/>
            <Ligature components="anusvaradeva" glyph="isigndeva_anusvaradeva.alt3"/>
          </LigatureSet>
          <LigatureSet glyph="isigndeva.alt4">
            <Ligature components="radeva_viramadeva,anusvaradeva" glyph="isign_ra_virama_anusvara.alt4"/>
            <Ligature components="radeva_viramadeva" glyph="isign_ra_virama.alt4"/>
            <Ligature components="anusvaradeva" glyph="isigndeva_anusvaradeva.alt4"/>
          </LigatureSet>
          <LigatureSet glyph="isigndeva.ja">
            <Ligature components="radeva_viramadeva,anusvaradeva" glyph="isign_ra_virama_anusvara.ja"/>
            <Ligature components="radeva_viramadeva" glyph="isign_ra_virama.ja"/>
            <Ligature components="anusvaradeva" glyph="isigndeva_anusvaradeva.ja"/>
          </LigatureSet>
          <LigatureSet glyph="isigndeva.tha">
            <Ligature components="radeva_viramadeva,anusvaradeva" glyph="isign_ra_virama_anusvara.tha"/>
            <Ligature components="radeva_viramadeva" glyph="isign_ra_virama.tha"/>
            <Ligature components="anusvaradeva" glyph="isigndeva_anusvara.tha"/>
          </LigatureSet>
        </LigatureSubst>
      </Lookup>

This lookup ignores base glyphs (because it's expected that the syllable's base is between the i-matra and the anusvara), but since the syllable boundaries are lost, it can now reach across multiple base letters and "grab" an anusvara that doesn't belong here.

I haven't tested what other shapers do with this, but it's clear that this font, at least, is assuming pres operates only within the syllable.

@behdad
Copy link
Member

behdad commented Mar 28, 2022

We should test what other shapers do with this. cc @dscorbett

One possibility is that other shapers never clear clusters; that is, they apply all features, including user features, constrained to clusters.

@behdad
Copy link
Member

behdad commented Mar 28, 2022

One possibility is that other shapers never clear clusters; that is, they apply all features, including user features, constrained to clusters.

@tiroj have you had evidence that rejects this hypothesis?

@dscorbett
Copy link
Collaborator

With HarfBuzz 4.1.0 on Windows:

>hb-shape Lohit-Devanagari.ttf -u 0926,093F,0938,0902,092C,0930
[isigndeva_anusvaradeva=0+266|dadeva=0+541|sadeva=0+709|badeva=4+537|radeva=5+436]

>hb-shape Lohit-Devanagari.ttf -u 0926,093F,0938,0902,092C,0930 --shapers uniscribe
[isigndeva=0+266|dadeva=0+541|sadeva=2+709|anusvaradeva=2@0,-1+0|badeva=4+537|radeva=5+436]

@behdad
Copy link
Member

behdad commented Mar 28, 2022

Okay so we regressed.

@jfkthame
Copy link
Collaborator

FWIW, the Devanagari spec does say (under Indic input processing),

All shaping operations are done on a syllable-by-syllable basis, independent from other characters

and later under GSUB processing, it says that

the features for presentation forms are applied to the entire cluster simultaneously

which seems to reinforce the view that these features are still being processed on a per-cluster basis.

I think an argument could be made for clearing clusters after all the Indic-specific features have been applied, but before arbitrary user features (though I don't know if other engines do that).

(The other thing maybe worth noting is that according to the spec, GPOS features would also be applied on a per-cluster basis (it says that "[a]ll features are applied simultaneously to the entire cluster"). This would imply that GPOS can't be used to do cross-cluster kerning or other positioning, though, which seems like an unfortunate limitation. So we may want to continue clearing clusters before GPOS, if possible.)

@behdad
Copy link
Member

behdad commented Mar 28, 2022

I think an argument could be made for clearing clusters after all the Indic-specific features have been applied, but before arbitrary user features (though I don't know if other engines do that).

The reason we changed this last release is that @tiroj observed that other engines allow user-feature lookups to mix (ie. come before) presentation-form features. That would mean either that other engines don't clear syllables, or implement this in other ways, eg. allowing syllable-sensitivity per lookup.

behdad added a commit that referenced this issue Mar 28, 2022
This allows mix-and-matching per-syllable and other lookups.
In fact, removes the clear-syllables call completely.

Fixes #3513
@behdad
Copy link
Member

behdad commented Mar 28, 2022

or implement this in other ways, eg. allowing syllable-sensitivity per lookup.

I've implemented this now.

@behdad
Copy link
Member

behdad commented Mar 28, 2022

GPOS features would also be applied on a per-cluster basis

Since we have not been doing that so far, I prefer to keep it that way.

@behdad
Copy link
Member

behdad commented Mar 28, 2022

GPOS features would also be applied on a per-cluster basis

Since we have not been doing that so far, I prefer to keep it that way.

In my new implementation though, if pres etc are implemented as GPOS, they will be constrained to the syllable.

behdad added a commit that referenced this issue Mar 28, 2022
Alternate fix for #3513

This matches what Uniscribe does, but differs from what we chose
to do before, which was to allow user features to be applied across
syllable boundaries.
@behdad
Copy link
Member

behdad commented Mar 28, 2022

While we decide I'm going to revert the regressing commit.

behdad added a commit that referenced this issue Mar 28, 2022
This reverts commit 90f09b1.

This regressed Indic shaping. See:
#3513
behdad added a commit that referenced this issue Mar 28, 2022
This allows mix-and-matching per-syllable and other lookups.
In fact, removes the clear-syllables call completely.

Fixes #3513
behdad added a commit that referenced this issue Mar 28, 2022
This allows mix-and-matching per-syllable and other lookups.
In fact, removes the clear-syllables call completely.

Fixes #3513
@tiroj
Copy link

tiroj commented Mar 29, 2022

One possibility is that other shapers never clear clusters; that is, they apply all features, including user features, constrained to clusters.

That is what I documented in 2013: all GSUB in Indic shaping was being applied at the cluster level, which made it impossible to implement cross-cluster contextual GSUB behaviour. See http://tiro.com/John/Problems_for_Indic_Typography.pdf.

At what became the first meeting of the ad hoc OTL working group, in April 2014, we discussed this in detail, and decided that Indic v2 shaping engines should process the rclt feature across cluster boundaries, but to retain the cluster limitation for other GSUB features in case changing the existing behaviour broke existing fonts. The rclt feature was selected for the behaviour change because it was a relatively newly defined feature, so was a safer proposition. See http://tiro.com/John/Fixing_Indic2_OTL.pdf.

What happened after that is that Microsoft and Adobe failed to implement the rclt cross-cluster behaviour in their shaping engines. Apple implemented it in CoreText. HarfBuzz implemented it, and then since then has further relaxed the behaviour to apply additional GSUB across cluster boundaries. It looks like you might have landed on a font that breaks when this change is made, as was raised as a possibility at the 2014 meeting.

@behdad
Copy link
Member

behdad commented Mar 29, 2022

HarfBuzz implemented it, and then since then has further relaxed the behaviour to apply additional GSUB across cluster boundaries.

Correct. Previously the way we implemented that disallowed interleaving of lookups of presentation-forms features and user-features. I've now relaxed that limitation.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Mar 30, 2022
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Mar 31, 2022
aarongable pushed a commit to chromium/chromium that referenced this issue Apr 11, 2022
Contains fixes for https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38800 and
https://bugs.chromium.org/p/chromium/issues/detail?id=1303552,
see harfbuzz/harfbuzz@0308513

Earlier regression in Indic shaping addressed, see
harfbuzz/harfbuzz#3513

Includes fix for Apple Color Emoji shaping, see issue 1310282.

https://chromium.googlesource.com/external/github.com/harfbuzz/harfbuzz.git/+log/965cf1d66589..61486746d3d8

$ git log 965cf1d66..61486746d --date=short --no-merges --format='%ad %ae %s'
2022-03-28 behdad Revert "[indic] Clear syllables before presentation features"
2022-03-28 behdad Add test for previous commit
2022-03-28 behdad [matcher] Simplify syllable initialization
2022-03-28 behdad [aat] Remove morx deleted-glyphs before GPOS processing
2022-03-26 corbett.dav [indic] Categorize U+0D04 as Consonant_Placeholder
2022-03-23 behdad [use] Avoid O(n^2) in the machine
2022-03-24 behdad [apply-lookup] Try to fix the logic for contextual lookups
2022-03-25 behdad [subset] Require exact harfbuzz version in .pc file
2022-03-25 behdad [subset] Adjust name in .pc file
2022-03-25 khaled [set] Fix annotation
2022-03-25 behdad [gvar] Fix decoding of private vs shared points
2022-03-25 behdad [glyf] Don't bail rendering glyf even if gvar failed
2022-03-25 behdad [set] Minor touch-up on the previous commit
2022-03-25 behdad Remove accidental files
2022-03-25 45769922+andyjgf [set] Add call to export set contents to an array. (#3500)
2022-03-24 100675750+aneejit1  Meson build writes to the source directory (issue #3507 ) (#3508)
2022-03-24 behdad [ot-layout] Comment
2022-03-24 behdad [ot-layout] Change max nesting level of lookups from 6 to 64
2022-03-24 khaled [build] Change how platform shaper tests are enable
2022-03-24 fanchunwei test: Dist the platform shaper test data
2022-03-23 grieger [reorg] Use relative includes for hb-ot-layout-gsubgpos.hh
2022-03-23 grieger [reorg] Move GSUB into OT::Layout::GSUB namespace.
2022-03-23 grieger [reorg] Move SubstLookup and GSUB into the new layout.
2022-03-23 grieger [reorg] Move ReverseChainSingleSubst to new layout.
2022-03-23 grieger [reorg] Move LigatureSubst to new layout.
2022-03-23 grieger [reorg] Move AlternateSubst to new layout.
2022-03-23 grieger [reorg] Move MultipleSubst into new layout.
2022-01-26 behdad [reorg] Use relative include
2022-01-26 behdad [reorg] Move sanitize/dispatch and size macros to top
2022-01-20 grieger [reorg] Fix check-* scripts to work with sources files in directories.
2022-01-13 grieger [reorg] Move SingleSubst opentype fields to top of the classes.
2022-01-13 grieger [reorganization] WIP move single substitution into separate files.
2022-03-23 khaled 4.1.0
2022-03-22 behdad Remove old TODO file
2022-03-22 behdad [buffer] Whitespace
2022-03-21 behdad Update tests for recent changes
2022-03-21 behdad [ot-font] Vertically center glyph in vertical writing fallback
2022-03-21 behdad [ot-font] Use ascent+descent for fallback vertical advance
2022-03-21 behdad [hmtx] Change default advance for horizontal direction to upem/2 again
2022-03-21 behdad [ot-font] Only use vmtx side-bearing if table exists
2022-03-21 behdad [cmap] In collect_unicodes() of format 12/13, limit to max Unicode
2022-03-21 behdad [buffer] Fix out-buffer under memory-alloc failure
2022-03-22 khaled [set] Fix documentation
2022-03-21 corbett.dav [indic] Test clearing syllables earlier
2022-03-21 behdad [indic] Clear syllables before presentation features
2022-03-21 behdad [set] Fix-up previous commits
2022-03-21 andyj Move fn, fix doc.
2022-03-21 andyj Remove null checks.
2022-03-21 andyj Add option to insert a sorted arrays of values to sets.
2022-03-21 andyj Fix typo.
2022-03-21 andyj Use bit shifting instead of multiplying and dividing.
2022-03-21 andyj Add log base 2 versions of constants.
2022-03-21 behdad [coretext] Remove dead code
2022-03-19 62665768+TheBluuDot restores unintended addition in 43be5ba
2022-03-14 qxliu [subset] bug fix in prune_langsys
2022-03-15 behdad [baseline] Fix HB_NO_METRICS build
2022-03-15 behdad [layout] Whitespace
2022-03-15 behdad [metrics] Simplify x-height fallback
2022-03-15 behdad [baseline] Use ot-metrics fallback API

Created with:
  roll-dep src/third_party/harfbuzz-ng/src
R=bashi@chromium.org,behdad@chromium.org,bungeman@chromium.org,drott@chromium.org,jshin@chromium.org,kojii@chromium.org

Fixed: 1303552, 1310282
Change-Id: Ib60a8733b9d97d35c1f9a70bf14451d88cb6dcb1
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3550975
Reviewed-by: Koji Ishii <kojii@chromium.org>
Commit-Queue: Dominik Röttsches <drott@chromium.org>
Cr-Commit-Position: refs/heads/main@{#990962}
mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this issue Oct 14, 2022
Contains fixes for https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38800 and
https://bugs.chromium.org/p/chromium/issues/detail?id=1303552,
see harfbuzz/harfbuzz@0308513

Earlier regression in Indic shaping addressed, see
harfbuzz/harfbuzz#3513

Includes fix for Apple Color Emoji shaping, see issue 1310282.

https://chromium.googlesource.com/external/github.com/harfbuzz/harfbuzz.git/+log/965cf1d66589..61486746d3d8

$ git log 965cf1d66..61486746d --date=short --no-merges --format='%ad %ae %s'
2022-03-28 behdad Revert "[indic] Clear syllables before presentation features"
2022-03-28 behdad Add test for previous commit
2022-03-28 behdad [matcher] Simplify syllable initialization
2022-03-28 behdad [aat] Remove morx deleted-glyphs before GPOS processing
2022-03-26 corbett.dav [indic] Categorize U+0D04 as Consonant_Placeholder
2022-03-23 behdad [use] Avoid O(n^2) in the machine
2022-03-24 behdad [apply-lookup] Try to fix the logic for contextual lookups
2022-03-25 behdad [subset] Require exact harfbuzz version in .pc file
2022-03-25 behdad [subset] Adjust name in .pc file
2022-03-25 khaled [set] Fix annotation
2022-03-25 behdad [gvar] Fix decoding of private vs shared points
2022-03-25 behdad [glyf] Don't bail rendering glyf even if gvar failed
2022-03-25 behdad [set] Minor touch-up on the previous commit
2022-03-25 behdad Remove accidental files
2022-03-25 45769922+andyjgf [set] Add call to export set contents to an array. (#3500)
2022-03-24 100675750+aneejit1  Meson build writes to the source directory (issue #3507 ) (#3508)
2022-03-24 behdad [ot-layout] Comment
2022-03-24 behdad [ot-layout] Change max nesting level of lookups from 6 to 64
2022-03-24 khaled [build] Change how platform shaper tests are enable
2022-03-24 fanchunwei test: Dist the platform shaper test data
2022-03-23 grieger [reorg] Use relative includes for hb-ot-layout-gsubgpos.hh
2022-03-23 grieger [reorg] Move GSUB into OT::Layout::GSUB namespace.
2022-03-23 grieger [reorg] Move SubstLookup and GSUB into the new layout.
2022-03-23 grieger [reorg] Move ReverseChainSingleSubst to new layout.
2022-03-23 grieger [reorg] Move LigatureSubst to new layout.
2022-03-23 grieger [reorg] Move AlternateSubst to new layout.
2022-03-23 grieger [reorg] Move MultipleSubst into new layout.
2022-01-26 behdad [reorg] Use relative include
2022-01-26 behdad [reorg] Move sanitize/dispatch and size macros to top
2022-01-20 grieger [reorg] Fix check-* scripts to work with sources files in directories.
2022-01-13 grieger [reorg] Move SingleSubst opentype fields to top of the classes.
2022-01-13 grieger [reorganization] WIP move single substitution into separate files.
2022-03-23 khaled 4.1.0
2022-03-22 behdad Remove old TODO file
2022-03-22 behdad [buffer] Whitespace
2022-03-21 behdad Update tests for recent changes
2022-03-21 behdad [ot-font] Vertically center glyph in vertical writing fallback
2022-03-21 behdad [ot-font] Use ascent+descent for fallback vertical advance
2022-03-21 behdad [hmtx] Change default advance for horizontal direction to upem/2 again
2022-03-21 behdad [ot-font] Only use vmtx side-bearing if table exists
2022-03-21 behdad [cmap] In collect_unicodes() of format 12/13, limit to max Unicode
2022-03-21 behdad [buffer] Fix out-buffer under memory-alloc failure
2022-03-22 khaled [set] Fix documentation
2022-03-21 corbett.dav [indic] Test clearing syllables earlier
2022-03-21 behdad [indic] Clear syllables before presentation features
2022-03-21 behdad [set] Fix-up previous commits
2022-03-21 andyj Move fn, fix doc.
2022-03-21 andyj Remove null checks.
2022-03-21 andyj Add option to insert a sorted arrays of values to sets.
2022-03-21 andyj Fix typo.
2022-03-21 andyj Use bit shifting instead of multiplying and dividing.
2022-03-21 andyj Add log base 2 versions of constants.
2022-03-21 behdad [coretext] Remove dead code
2022-03-19 62665768+TheBluuDot restores unintended addition in 43be5ba
2022-03-14 qxliu [subset] bug fix in prune_langsys
2022-03-15 behdad [baseline] Fix HB_NO_METRICS build
2022-03-15 behdad [layout] Whitespace
2022-03-15 behdad [metrics] Simplify x-height fallback
2022-03-15 behdad [baseline] Use ot-metrics fallback API

Created with:
  roll-dep src/third_party/harfbuzz-ng/src
R=bashi@chromium.org,behdad@chromium.org,bungeman@chromium.org,drott@chromium.org,jshin@chromium.org,kojii@chromium.org

Fixed: 1303552, 1310282
Change-Id: Ib60a8733b9d97d35c1f9a70bf14451d88cb6dcb1
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3550975
Reviewed-by: Koji Ishii <kojii@chromium.org>
Commit-Queue: Dominik Röttsches <drott@chromium.org>
Cr-Commit-Position: refs/heads/main@{#990962}
NOKEYCHECK=True
GitOrigin-RevId: 46ea5057059e84f5116567dfa46b2c0cea288b86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants