Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPOS: feature writers should split lookups based on language system #619

Closed
cmyr opened this issue Dec 1, 2023 · 4 comments
Closed

GPOS: feature writers should split lookups based on language system #619

cmyr opened this issue Dec 1, 2023 · 4 comments

Comments

@cmyr
Copy link
Member

cmyr commented Dec 1, 2023

This is a significant project, and includes splitting based on writing direction. A good place to start will be kernFeatureWriter2.py.

@cmyr cmyr added this to the matching Oswald GPOS (mark/kern) milestone Dec 1, 2023
@rsheeter rsheeter modified the milestones: matching Oswald GPOS (mark/kern), Compile Oswald and compare to fontmake Dec 4, 2023
@belluzj
Copy link

belluzj commented Jan 25, 2024

Hello, here I think you're first talking about splitting by direction (LTR vs RTL) which is necessary.

As a second step, or while you're doing this, you might be tempted to also split into one lookup per script, as we're doing in recent ufo2ft. After discussing this issue several times with Cosimo, here are my thoughts on the topic: basically, splitting scripts into different lookups was a compile-time performance and file size optimization, and it's not the best one, so I don't think it should be adopted here.

Pros of splitting into one lookup per script:

  • it happens early in the pipeline (as opposed to GPOS compaction near the end) so it saves processing time early by keeping small lookup sizes and avoiding overflow resolution
  • in terms of file size savings, it's a very good approximation of the probable best splitting (we observed that splitting into one lookup per script is as good for file size as doing GPOS compaction which doesn't know about scripts, and splits based on clustering the numeric data in the table)

Con: It's not functionally equivalent to the old way of having just one big lookup for all LTR kerning, and another big lookup for all RTL kerning. The functional differences are minor but they are regressions:

  • it prevents cross-script kerning: Khaled implemented a fix here, but the fix is to go back to grouping scripts together, so I believe it loses the performance and file size benefits, in favour of correctness
  • it causes regressions in Adobe InDesign (one and two) which arguably are Adobe's fault, but unfortunately will matter to customers. I tried to fix these by registering the lookups everywhere, and while that preserves the compile-time performance and small file size, it creates a runtime performance issue, as profiled by Behdad, so it's not good either.

Proposed solution: instead of splitting into lookups, consider making one big lookup and splitting it into subtables. As done in the GPOS compaction, it's possible to split into subtables while preserving functional equivalence, and while driving down the file size. So in that respect, subtables are the best tool to split.

Making one big lookup will allow cross-script kerning, which turns out to be desirable, and will allow the Adobe InDesign dumb composer to keep working.

I'm not sure however which criteria is best to split into subtables, between doing it by script or doing it based on clustering as in the GPOS compaction. Maybe in fontc you don't have the same constraints as in fontTools and you could apply the GPOS compaction on the IR directly, and so you would skip the cost of overflow resolution (if your IR support bigger offsets than possible in TTFs, which could be nice to allow passing big data at no cost from one step to the next, even if that data would require overflow resolution to be serialized to TTF)

Sorry about the long comment. FYI @anthrotype

@cmyr
Copy link
Member Author

cmyr commented Jan 25, 2024

Okay that is very useful @belluzj, thank you for taking the time to spell that all out.

Doing one lookup per writing direction sounds reasonable, and we can use some heuristics to add subtable splits so as to minimize the chances that we're going to overflow at compile time. I also suspect that we want to minimize the number of subtables, since the shaper will potentially need to inspect each individual subtable, and that has runtime costs.

@belluzj
Copy link

belluzj commented Jan 25, 2024

Yes exactly, you want to find the right middle spot between one big subtable and the other extreme, one tiny subtable per line of the original subtable. The code in this PR: fonttools/fonttools#2326 finds that middle ground by starting with one tiny subtable per line, and agglomerating them into bigger subtables as long as file size goes down. It does not take shaping speed into account, only file size. I think at the time we found that shaping speed was not so much affected.

@cmyr
Copy link
Member Author

cmyr commented Mar 18, 2024

closed by #731

@cmyr cmyr closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants