Skip to content

USE shaping with ZWJ/ZWNJ #542

@punchcutter

Description

@punchcutter

The USE spec isn't entirely clear on how ZWJ and ZWNJ behave as part of a cluster since they aren't included in the cluster validation models. Including them in a sequence can have unexpected results.

The situation that revealed this was in the Buginese ligature 1A15 1A17 200D 1A10 as mentioned in the Unicode Standard. The USE shaper first splits the sequence into USE clusters which don't include ZWJ so this becomes three clusters instead of one. Since ccmp is one of the features applied per cluster rather than per run the ligature never forms. Move it to a liga feature and it's fine because that is applied per run.

If ZWJ/ZWNJ are intended to be processed as part of a valid USE cluster then something needs to change in harfbuzz as well as every other shaper I know of. It seems to be a rare situation, but here we had it and looking at some Devanagari I also see half forms being done by adding ZWJ to the end of a sequence. Currently that gets run through the Indic shaper so no problem, but if any scripts get run through USE and try to use ZWJ or ZWNJ in any of the per-cluster features (locl, ccmp, nukt, akhn, rphf, pref, rkrf, abvf, blwf, half, pstf, vatu, cjct) then things might not work as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    USEUniversal Shaping Engine

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions