Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace and USE dotted circles #3718

Open
simoncozens opened this issue Jul 13, 2022 · 6 comments
Open

Whitespace and USE dotted circles #3718

simoncozens opened this issue Jul 13, 2022 · 6 comments
Assignees

Comments

@simoncozens
Copy link
Collaborator

6e059a4 introduced a surprising change. Before that, a mark following a space glyph got a dotted circle:

$ ./util/hb-shape NotoSansLimbu-Regular.ttf -u 1921,20,1921
[uni25CC=0+721|uni1921=0@-86,0+0|space=1+260|uni25CC=1+721|uni1921=1@-86,0+0]

Afterwards, no added dotted circle:

$ ./util/hb-shape NotoSansLimbu-Regular.ttf -u 1921,20,1921
[uni25CC=0+721|uni1921=0@-86,0+0|space=1+260|uni1921=1+0]

Is this by design? If seems strange to me that a space should constitute a cluster head.

@behdad
Copy link
Member

behdad commented Jul 13, 2022

Is this by design?

I don't think so.

@behdad
Copy link
Member

behdad commented Jul 13, 2022

Oops. That's because:

export O↦       = 0; # OTHER 

and:

symbol_cluster = (O | GB) tail?; 

@dscorbett I think the O there is wrong. WDYT?

@behdad
Copy link
Member

behdad commented Jul 13, 2022

The spec says instead of O there should be there:

UGC = So except U+25CC;

UGC = Sc

@dscorbett
Copy link
Collaborator

Using O there does indeed go against the spec. See #3249 for why I disagree with the spec.

Unicode recommends U+00A0 as the base for isolated combining marks. It used to recommend U+0020 but explicitly no longer does. It doesn’t say that U+0020 is invalid, just that there might be “potential conflicts” and “complications”.

One could make the case that HarfBuzz should interpret this discouragement as a prohibition, thus restoring the dotted circle. The simplest implementation is to override U+0020 to the class WJ. For other characters, though, I think dotted circles should only be added to text that is actually invalid, not merely strange.

@behdad
Copy link
Member

behdad commented Jul 18, 2022

I'm fine separating U+0020 only.

Should we do the same in other syllabic shapers as well?

@dscorbett
Copy link
Collaborator

It’s best to keep them consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants