Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classdef cache #3636

Merged
merged 6 commits into from
Jun 7, 2022
Merged

Classdef cache #3636

merged 6 commits into from
Jun 7, 2022

Conversation

behdad
Copy link
Member

@behdad behdad commented Jun 5, 2022

[layout] Use a cache for main input ClassDef of ChainContextLookups

I expected this to speed up NotoNastaliqUrdu, but didn't make a dent.
Need to profile on Linux to understand why. But it's mostly because
the cache is only effective when we enter the lookup from the main
loop (apply_forward), but NotoNastaliq mostly recurses...

@behdad behdad self-assigned this Jun 5, 2022
@behdad behdad force-pushed the classdef-cache branch 2 times, most recently from 7f80147 to e374f32 Compare June 5, 2022 10:17
@behdad
Copy link
Member Author

behdad commented Jun 5, 2022

The cache has a hit rate of ~10% on NotoNastaliqUrdu shaping. So I'm baffled that I'm not seeing any speedup whatsoever; well, I see maybe ~1% range only. But that's probably because the ClassDef2 lookups are only ~20% of shaping. So this makes sense. :'-(

@behdad
Copy link
Member Author

behdad commented Jun 5, 2022

I see ~2 percent speedup and ~1 percent overhead, adding up to net ~1 percent speedup...

@behdad
Copy link
Member Author

behdad commented Jun 5, 2022

Phew! Finally some progress!

Looks like NotoNastaliqUrdu spends most of its time in GPOS ContextFormat2 (dot positioning, etc...), and I was only applying the cache to ChainContextFormat2. Applying the cache to ContextFormat2 as well shows 16% speedup in NotoNastaliqUrdu. I observe a slowdown of less than 3% in Amiri as the overhead of new infrastructure, and much less than that for other tests.

_

Interestingly, the Gulzar font does a lot more lookups than NotoNastaliqUrdu, but is 7 times faster. Need to investigate when I have a profiler (Linux machine) around.
_
Update: I was testing wrong. Gulzar is indeed 20 times slower than NotoNastaliqUrdu. This branch speeds up Gulzar by 5% only, because Gulzar does a LOT of work still using ChainContextFormat3. Simon's Format3 to Format2 converter does not do a full job yet it seems. cc @simoncozens

@behdad behdad force-pushed the classdef-cache branch 3 times, most recently from 107cd7a to f69d227 Compare June 5, 2022 13:06
@behdad
Copy link
Member Author

behdad commented Jun 5, 2022

cc @jfkthame @drott

@simoncozens
Copy link
Collaborator

simoncozens commented Jun 5, 2022

I didn’t use format 3 to format 2 converter in Gulzar. I should try it.

@behdad
Copy link
Member Author

behdad commented Jun 7, 2022

I observe a slowdown of less than 3% in Amiri as the overhead of new infrastructure

Upon further measurement the slowdown is nothing like that. That one should have been noise. I don't see any measurable slowdown whatsoever.

…ormat2

This commit adds a per-lookup caching infrastructure to GSUB/GPOS, and
uses it to cache input ClassDef.get_class value for (Chain)ContextLookupFormat2.

For fonts heavy on use of heave class-based2 context matching, this shows
a good speedup. For NotoNastaliqUrdu for example, I observe 17% speedup.

Unfortunately not many other lookups can use a cache like this :(.

#3636
@behdad
Copy link
Member Author

behdad commented Jun 7, 2022

This caching opportunity was first introduced to me by @mhosken many years ago.

@behdad behdad merged commit 845279c into main Jun 7, 2022
@behdad behdad deleted the classdef-cache branch June 7, 2022 15:40
behdad added a commit that referenced this pull request Jun 7, 2022
…rmat2

From the commit:

+    /* For ChainContextFormat2 we cache the LookaheadClassDef instead of InputClassDef.
+     * The reason is that most heavy fonts want to identify a glyph in context and apply
+     * a lookup to it. In this scenario, the length of the input sequence is one, whereas
+     * the lookahead / backtrack are typically longer.  The one glyph in input sequence is
+     * looked-up below and no input glyph is looked up in individual rules, whereas the
+     * lookahead and backtrack glyphs are tried.  Since we match lookahead before backtrack,
+     * we should cache lookahead.  This decisions showed a 20% improvement in shaping of
+     * the Gulzar font.

#3636
@behdad
Copy link
Member Author

behdad commented Jun 7, 2022

From last commit I pushed:

[layout-cache] Cache lookahead, not input, classdef in ChainContextFo…
…rmat2

From the commit:

+    /* For ChainContextFormat2 we cache the LookaheadClassDef instead of InputClassDef.
+     * The reason is that most heavy fonts want to identify a glyph in context and apply
+     * a lookup to it. In this scenario, the length of the input sequence is one, whereas
+     * the lookahead / backtrack are typically longer.  The one glyph in input sequence is
+     * looked-up below and no input glyph is looked up in individual rules, whereas the
+     * lookahead and backtrack glyphs are tried.  Since we match lookahead before backtrack,
+     * we should cache lookahead.  This decisions showed a 20% improvement in shaping of
+     * the Gulzar font.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants