Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an odd HB_GLYPH_FLAG_UNSAFE_TO_BREAK case #3824

Closed
caolanm opened this issue Sep 21, 2022 · 15 comments · Fixed by #3874
Closed

an odd HB_GLYPH_FLAG_UNSAFE_TO_BREAK case #3824

caolanm opened this issue Sep 21, 2022 · 15 comments · Fixed by #3874

Comments

@caolanm
Copy link

caolanm commented Sep 21, 2022

In LibreOffice we're making use of HB_GLYPH_FLAG_UNSAFE_TO_BREAK to cache layout results but have found an odd edge case in one test document (of 250,000) which I've attempted to boil down to the test case here where on verifying that we could reuse the old layout I see different results on shaping one side of the original string which had no HB_GLYPH_FLAG_UNSAFE_TO_BREAK flags.

The output I get is:

full sequence: advance is 1235
full sequence: advance is 460
is unsafe to break set on any glyph? no, (expect no)
shape again without space at edge, expect same results as previous iteration
sub sequence: advance is 1135
sub sequence: advance is 460

which I expect is:

full sequence: advance is 1235
full sequence: advance is 460
is unsafe to break set on any glyph? no, (expect no)
shape again without space at edge, expect same results as previous iteration
sub sequence: advance is 1235
sub sequence: advance is 460

harfbuzz-demo.zip

@khaledhosny
Copy link
Collaborator

khaledhosny commented Sep 21, 2022

I’m not sure I follow what the C code is comparing, but if I try to shape the text with hb-shape, I get the same glyph advances:

$ hb-shape XBRoya34.ttf -u "20,D8,A7,D9,84,D8,A3" --direction=r --script=arab
[sterling=6+1134|Oslash=5+1588|.notdef=4+1239|Ugrave=3+1474|section=2+1134|Oslash=1+1588|space=0+600]
$ hb-shape XBRoya34.ttf -u "D8,A7,D9,84,D8,A3" --direction=r --script=arab
[sterling=5+1134|Oslash=4+1588|.notdef=3+1239|Ugrave=2+1474|section=1+1134|Oslash=0+1588]
$ hb-shape XBRoya34.ttf -u "20" --direction=r --script=arab
[space=0+600]

Note that HarfBuzz advances are relative not absolute positions, to measure the full with of a sequence of glyphs, you need to add the advances.

@khaledhosny
Copy link
Collaborator

Scratch that, I mis-decoded the UTF-8 string.

@khaledhosny
Copy link
Collaborator

khaledhosny commented Sep 21, 2022

More relevant output (I skipped the morx output for brevity):

$ hb-shape XBRoya34.ttf " الأ" --direction=r --script=arab --language=ar --show-flags -V
trace: start table GPOS	buffer: [space=0+600|afii57415=1+460|uniFEF7=2+1235]
trace: start lookup 0	buffer: [space=0+600|afii57415=1+460|uniFEF7=2+1235]
trace: kerning glyphs at 0,1	buffer: [space=0+600|afii57415=1+460|uniFEF7=2+1235]
trace: kerned glyphs at 0,1	buffer: [space=0+600|afii57415=1+460|uniFEF7=2+1235]
trace: end lookup 0	buffer: [space=0+600|afii57415=1+460|uniFEF7=2+1235]
trace: end table GPOS	buffer: [space=0+600|afii57415=1+460|uniFEF7=2+1235]
[uniFEF7=2+1235|afii57415=1+460|space=0+600]

$ hb-shape XBRoya34.ttf "الأ" --direction=r --script=arab --language=ar --show-flags  -V
trace: start table GPOS	buffer: [afii57415=0+460|uniFEF7=1+1235]
trace: start lookup 0	buffer: [afii57415=0+460|uniFEF7=1+1235]
trace: kerning glyphs at 0,1	buffer: [afii57415=0+460|uniFEF7=1+1235]
trace: kerned glyphs at 0,1	buffer: [afii57415=0+460|uniFEF7=1+1135]
trace: end lookup 0	buffer: [afii57415=0+460|uniFEF7=1+1135#3]
trace: end table GPOS	buffer: [afii57415=0+460|uniFEF7=1+1135#3]
[uniFEF7=1+1135#1|afii57415=0+460]

So there is a GPOS lookup that gets applied only when the string does not start with space, so that is why the substring has different glyph advances.

@behdad
Copy link
Member

behdad commented Sep 21, 2022

there is a GPOS lookup that gets applied only when the string does not start with space

That should have resulted in a unsafe-to-break somewhere... I'll debug. Thanks.

@behdad
Copy link
Member

behdad commented Sep 21, 2022

there is a GPOS lookup that gets applied only when the string does not start with space

It looks like a simple case of PairPos table with non-zero ValueFormat2, which causes kerning to only be applied to every other glyph. If you add a second space char at the beginning then the kern is correctly applied again...

@behdad
Copy link
Member

behdad commented Sep 21, 2022

From the spec:

"valueFormat2 applies to the ValueRecords for the second glyph in each pair. The single ValueFormat field applies to ValueRecords for all second glyphs. If valueFormat2 is set to 0, then the ValueRecords for the second glyph of the pair will be empty, the second glyph is not repositioned, and it becomes the “next” glyph for which a lookup is performed."

@behdad
Copy link
Member

behdad commented Sep 21, 2022

@jfkthame do you have any input here?

@jfkthame
Copy link
Collaborator

I suppose when valueFormat2 is non-zero, we should record it as "unsafe" to break between the pair even if the values applied were zero?

@behdad
Copy link
Member

behdad commented Sep 21, 2022

I suppose when valueFormat2 is non-zero, we should record it as "unsafe" to break between the pair even if the values applied were zero?

The problem arises between the second glyph and the glyph after it, which are not tried for kerning now...

I'm inclined to go against the spec and always use second glyph as next glyph. The current behavior doesn't make sense. If a font kerns 'oo' and does that half-and-half using valueFormat2 non-zero, then a sequence of 'oooooooooooooo' will get every other pair kerned only.

@behdad
Copy link
Member

behdad commented Sep 21, 2022

Proposed WIP patch in #3825

@jfkthame
Copy link
Collaborator

Yeah, I'm pretty sure I remember seeing that sort of effect in a font. Whether it's OK to deviate here.... I'm not sure. I guess it'll be interesting to see if it affects any tests, for a start.

@behdad
Copy link
Member

behdad commented Sep 21, 2022

I guess it'll be interesting to see if it affects any tests, for a start.

It only broke two AOTS tests that specifically test this behavior.

@behdad
Copy link
Member

behdad commented Nov 11, 2022

I suppose when valueFormat2 is non-zero, we should record it as "unsafe" to break between the pair even if the values applied were zero?

The problem arises between the second glyph and the glyph after it, which are not tried for kerning now...

Since valueFormat2 of nonzero seems to be rare, I suppose we can mark unsafe-to-break between the second glyph and the glyph after that...

@behdad
Copy link
Member

behdad commented Nov 11, 2022

Since valueFormat2 of nonzero seems to be rare, I suppose we can mark unsafe-to-break between the second glyph and the glyph after that...

#3874

@behdad
Copy link
Member

behdad commented Nov 11, 2022

$ hb-shape XBRoya34.ttf " الأ" --direction=r --script=arab --language=ar --show-flags -V
[uniFEF7=2+1235|afii57415=1+460|space=0+600]

With the patch:

$ hb-shape XBRoya34.ttf " الأ" --direction=r --script=arab --language=ar --show-flags
[uniFEF7=2+1235#1|afii57415=1+460|space=0+600]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants