Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hb-subset] Failure to subset NotoSansCJK-Regular.ttc #3173

Closed
madig opened this issue Aug 26, 2021 · 10 comments · Fixed by #3177
Closed

[hb-subset] Failure to subset NotoSansCJK-Regular.ttc #3173

madig opened this issue Aug 26, 2021 · 10 comments · Fixed by #3177
Assignees
Labels
subset hb-subset related bugs

Comments

@madig
Copy link
Contributor

madig commented Aug 26, 2021

HarfBuzz 2.9.0 from the provided Windows binaries.

  1. Get https://github.com/googlefonts/noto-cjk/blob/165c01b46ea533872e002e0785ff17e44f6d97d8/Sans/OTC/NotoSansCJK-Regular.ttc
  2. hb-subset.exe --unicodes-file subset.txt --layout-features="*" NotoSansCJK-Regular.ttc --face-index 0 -o out.ttf
  3. Observe hb-subset exits with error code 1 and leaves a 0 bytes out.ttf.
  4. Remove last Unicode value from subset.txt and try again
  5. See it working this time?!

Some times it works with the last value, too, for some reason?!?! If you put the value as the only one into a new txt file and use it, hb-subset also finishes properly...

subset.txt

@khaledhosny khaledhosny added the subset hb-subset related bugs label Aug 26, 2021
@garretrieger
Copy link
Collaborator

I suspect this is failing due to an overflow writing out the cmap4 subtable. That's why removing certain characters from the subset allows it to work. I'll confirm if that's actually the case.

@garretrieger
Copy link
Collaborator

Alright, this is indeed a cmap4 overflow.

@garretrieger
Copy link
Collaborator

To explain the cmap4 subtable can be at most 65kb long since the "length" field is a uint16. When subsetting it can actually increase the size of the cmap4 table since the table is range based and the subset can fragment the ranges. If the new size exceeds 65kb the subset fails as there's no way the table can be serialized. I'll check with Behdad and see if there's a reasonable fallback strategy we could use, such as outputting only a cmap12 subtable.

@behdad
Copy link
Member

behdad commented Aug 26, 2021

Outputting only cmap12 SGTM. We should make that default sooner than later (outputting just one).

@madig
Copy link
Contributor Author

madig commented Aug 26, 2021

Random thought: Should hb-subset trap overflows and explode, even in release builds?

@garretrieger
Copy link
Collaborator

It does, that's what caused the subset command to return status code 1 instead of 0 in this case. Though we could certainly improve the debug messaging to better indicate why it failed.

@behdad
Copy link
Member

behdad commented Aug 26, 2021

What would be better? Crash or generate broken font?

@madig
Copy link
Contributor Author

madig commented Aug 26, 2021

Ah, ok. I was puzzled because there was no message and my shell by default does not show return codes.

Crash is better of course :)

@garretrieger
Copy link
Collaborator

I noticed that this subsetting call was quite slow, found some O(n^2) code in the cmap4 handling. Fix is in #3178

@behdad
Copy link
Member

behdad commented Aug 29, 2021

And I made it print an error message if operation failed, since we don't expect that to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subset hb-subset related bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants