Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero offsets in parse_at_offset16 #110

Closed
laurmaedje opened this issue Jan 9, 2023 · 17 comments
Closed

Zero offsets in parse_at_offset16 #110

laurmaedje opened this issue Jan 9, 2023 · 17 comments

Comments

@laurmaedje
Copy link
Contributor

The parse_at_offset16 method currently does not check for NULL offsets, leading to bugs where 0 indicates the absence of a subtable. (I experienced the bug in the math table.) Not all OpenType offsets allow 0, but I don't think 0 is ever a valid offset, so I assume it should be fine to just always return None for zero offsets in parse_at_offset16. If you agree, I can send a PR with fix.

@RazrFalcon
Copy link
Owner

so I assume it should be fine to just always return None for zero offsets

I would not assume anything in case of TrueType.

As you correctly noted, not all offsets are allowed to be NULL and for now you have to check them manually.
I would suggest adding parse_at_non_null_offset16 and use it only where the spec allows NULL. Because technically, a NULL value set for an offset that doesn't allow NULL is a hard error and we have to skip the whole table.

In other parts of the codebase I specifically use s.read::<Option<Offset16>>() for this. Ugly, but this is how it is.

@laurmaedje
Copy link
Contributor Author

From the spec:

A NULL subtable offset always indicates that the given subtable is not present. Applications should never interpret a NULL offset value as the offset to subtable data. For cases in which subtable offset fields are not documented as permitting NULL values, font compilers must include a subtable of the indicated format, even if it is a header stub without further data (for example, a coverage table with no glyph IDs). Applications parsing font data should, however, anticipate non-conformant font data that has a NULL subtable offset where only a non-NULL value is expected.

Adding a new parse_at_non_null_offset16 and keeping the current parse_at_offset16 as-is conflicts with "applications should never interpret a NULL offset value as the offset to subtable data", doesn't it? And if we adapt parse_at_offset16 to conform with this, what would be the difference between the two methods?

@RazrFalcon
Copy link
Owner

Can you give a link to this doc? I'm not sure if it's talking about the specific table or all of them.

Ok, my bad. I got confused with parse_at_offset16 and read_at_offset16. The first one isn't used anywhere except for MATH. But I guess you're proposing to change read_at_offset16 as well?
This would be a breaking change with unknown consequences.

@laurmaedje
Copy link
Contributor Author

In the data types sections: https://learn.microsoft.com/en-us/typography/opentype/spec/otff
Yes, that would have been my proposal.

@RazrFalcon
Copy link
Owner

harfbuzz has a clear separation between nullable and non-nullable offsets: link

Sadly, I can't read CRTP, so I'm not sure how exactly they handle NULL for non-NULL. But we should do the same.
Maybe you will be able to unravel it.

Also, this is OpenType spec. Apple can do whatever they want with their fonts, aka AAT.

@laurmaedje
Copy link
Contributor Author

I'm also not able to parse that with confidence. From these lines, it seems that Harfbuzz produces null-pointers for non-nullable offsets (maybe this is never reached due to sanitization) and empty nulled objects for nullable offsets (which are reached). This makes sense insofar as that the GlyphAssembly table, whose offset may be null, is accessed as if it was always present. But I also can't find code in the sanitization that does that.

@RazrFalcon
Copy link
Owner

@behdad Hi! Sorry for bothering you, but maybe you could clarify how NNOffsetNTo works in harfbuzz? Specifically, do you mark just the offset as missing/null, or do you mark the whole table as invalid? I've tried reading the code, but it's too advance for me.
And maybe you have some other insight regarding offsets handling in TrueType.

@behdad
Copy link

behdad commented Jan 9, 2023

NNOffsetTo

That's something else. That's used for a few places (mostly AAT tables and a couple of places in CFF) where a 0 offset is allowed.

In other places we always use OffsetTo. Even when the spec says a NULL offset is disallowed, we always allow it, and return a null object.

@behdad
Copy link

behdad commented Jan 9, 2023

In other places we always use OffsetTo. Even when the spec says a NULL offset is disallowed, we always allow it, and return a null object.

Ie. we never treat those as hard error.

@RazrFalcon
Copy link
Owner

Thanks a lot!

@laurmaedje I guess we can invert the default behavior then. And those methods are not used in AAT anyway, at least for now.

@laurmaedje
Copy link
Contributor Author

To summarize:

  • read_at_offset16 is used across the OpenType tables, but not for AAT
  • parse_at_offset16 is only used for MATH, but could be used for lots of OpenType parsing
  • read_at_offset32 is only used in AAT, and only twice

Do you want to only change parse_at_offset16 or also read_at_offset16? Changing read_at_offset32 breaks some test for the ankr table and the Apple TrueType spec talks about how 0 is a valid value.

@RazrFalcon
Copy link
Owner

RazrFalcon commented Jan 9, 2023

I would suggest to:

  1. read_at_offset16 should handle 0 now.
  2. parse_at_offset16 should be moved to MATH as a private Stream extension used only by MATH.
  3. read_at_offset32 should be moved toankr as a private Stream extension used only by ankr.

After updating everything, try running rustybuzz tests with those changes.

@laurmaedje
Copy link
Contributor Author

Okay, I tried that. 786 rustybuzz tests passed, but a single one failed. :(

thread 'use_010' panicked at 'assertion failed: `(left == right)`
  left: `"Ga.icd=0+367|Gha.diag=0@100,0+386|O_dv.lg=0+190|Candrabindu.sm=0@48,32+0|Ga.icd=5+367|Gha.diag=5@100,0+386|AU_dv.lg=5+190|Candrabindu.sm=5@-64,187+0"`,
 right: `"E_dv.alt=0+275|Ga.icd=0+367|Gha.diag=0@100,0+386|AA_dv.alt=0+208|Candrabindu=0@17,-8+0|E_dv.alt=5+275|Ga.icd=5+367|Gha.diag=5@100,0+386|AU_dv_part.alt=5+213|Candrabindu.sm=5@-52,179+0"`', tests/shaping_in_house.rs:10712:5

I would have to take a closer look to see what's going on here as the assertion failure message isn't particularly helpful.

PS: Math doesn't really work differently from the rest of OpenType and both could make use of the parse_at_offset16 in the same way. Currently OpenType calls Coverage::parse(s.read_at_offset16(data)?)? manually, while MATH calls s.parse_at_offset16::<Coverage>(data)?

@RazrFalcon
Copy link
Owner

rustybuzz has about 1600 tests, I guess cargo stoped earlier.

Anyway, as I've said before - there are no easy changes in TrueType.
I will take a look what might be wrong here.

@RazrFalcon
Copy link
Owner

Can you publish your current changes so I could run them locally?

@laurmaedje
Copy link
Contributor Author

Sure: https://github.com/laurmaedje/ttf-parser
Thanks for looking into it!

@RazrFalcon
Copy link
Owner

Ok, so in case of use_010, ChainedContextLookup allows nulls. As per spec.

Specifically, in that font format2.backtrack_classes is set to 0, which is apparently just fine.

So instead of changing the old behavior, you could add a custom one just for MATH. Otherwise we should spend some time carefully checking all read_at_offset16 calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants