New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cff2] Significantly slower than freetype #3559
Comments
This proof of concept patch speeds up CFF2 evaluation by around 8%: diff --git a/src/hb-cff-interp-common.hh b/src/hb-cff-interp-common.hh
index 2a9e60ffd..1f924d8b8 100644
--- a/src/hb-cff-interp-common.hh
+++ b/src/hb-cff-interp-common.hh
@@ -430,8 +430,8 @@ struct cff_stack_t
unsigned int get_count () const { return count; }
bool is_empty () const { return !count; }
- hb_array_t<const ELEM> get_subarray (unsigned int start) const
- { return hb_array_t<const ELEM> (elements).sub_array (start); }
+ hb_array_t<const ELEM> sub_array (unsigned start, unsigned length) const
+ { return hb_array_t<const ELEM> (elements).sub_array (start, length); }
private:
bool error;
diff --git a/src/hb-cff2-interp-cs.hh b/src/hb-cff2-interp-cs.hh
index 60a29e1ca..767f23d0c 100644
--- a/src/hb-cff2-interp-cs.hh
+++ b/src/hb-cff2-interp-cs.hh
@@ -151,6 +151,20 @@ struct cff2_cs_interp_env_t : cs_interp_env_t<blend_arg_t, CFF2Subrs>
void set_ivs (unsigned int ivs_) { ivs = ivs_; }
bool seen_vsindex () const { return seen_vsindex_; }
+ double blend_deltas (hb_array_t<const blend_arg_t> deltas)
+ {
+ double v = 0;
+ if (do_blend)
+ {
+ if (likely (scalars.length == deltas.length))
+ {
+ for (unsigned int i = 0; i < scalars.length; i++)
+ v += (double) scalars[i] * deltas[i].to_real ();
+ }
+ }
+ return v;
+ }
+
protected:
void blend_arg (blend_arg_t &arg)
{
@@ -234,8 +248,10 @@ struct cff2_cs_opset_t : cs_opset_t<blend_arg_t, OPSET, cff2_cs_interp_env_t, PA
}
for (unsigned int i = 0; i < n; i++)
{
- const hb_array_t<const blend_arg_t> blends = env.argStack.get_subarray (start + n + (i * k));
- env.argStack[start + i].set_blends (n, i, k, blends);
+ //const hb_array_t<const blend_arg_t> blends = env.argStack.subarray (start + n + (i * k));
+ //xxxxxxxxxxxxxxxxx
+ env.argStack[start + i].set_real (env.argStack[start + i].to_real () +
+ env.blend_deltas (env.argStack.sub_array (start + n + (i * k), k)));
}
/* pop off blend values leaving default values now adorned with blend values */ We might need to duplicate this type, as it seems like the subsetter relies on the old behavior. |
behdad
added a commit
that referenced
this issue
May 10, 2022
Do the blending immediately. Fixes #3559 Benchmark on AdobeVFPrototype shows 35% speedup. Now we're faster than FreeType: Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------ BM_Font/glyph_extents/AdobeVFPrototype.otf/hb -0.3792 -0.3792 1584 983 1581 982 BM_Font/glyph_extents/AdobeVFPrototype.otf/ft +0.0228 +0.0224 1220 1248 1218 1245 BM_Font/glyph_extents/AdobeVFPrototype.otf/var/hb -0.3513 -0.3518 1616 1048 1613 1046 BM_Font/glyph_extents/AdobeVFPrototype.otf/var/ft +0.0172 +0.0169 1232 1254 1230 1251
behdad
added a commit
that referenced
this issue
May 10, 2022
Do the blending immediately. Fixes #3559 Benchmark on AdobeVFPrototype shows 35% speedup. Now we're faster than FreeType: Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------ BM_Font/glyph_extents/AdobeVFPrototype.otf/hb -0.3792 -0.3792 1584 983 1581 982 BM_Font/glyph_extents/AdobeVFPrototype.otf/ft +0.0228 +0.0224 1220 1248 1218 1245 BM_Font/glyph_extents/AdobeVFPrototype.otf/var/hb -0.3513 -0.3518 1616 1048 1613 1046 BM_Font/glyph_extents/AdobeVFPrototype.otf/var/ft +0.0172 +0.0169 1232 1254 1230 1251
behdad
added a commit
that referenced
this issue
May 10, 2022
Do the blending immediately. Fixes #3559 Benchmark on AdobeVFPrototype shows 35% speedup. Now we're faster than FreeType: Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------ BM_Font/glyph_extents/AdobeVFPrototype.otf/hb -0.3792 -0.3792 1584 983 1581 982 BM_Font/glyph_extents/AdobeVFPrototype.otf/ft +0.0228 +0.0224 1220 1248 1218 1245 BM_Font/glyph_extents/AdobeVFPrototype.otf/var/hb -0.3513 -0.3518 1616 1048 1613 1046 BM_Font/glyph_extents/AdobeVFPrototype.otf/var/ft +0.0172 +0.0169 1232 1254 1230 1251
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Our perf suite shows that our interpretation performance for variable CFF2 fonts is about 50% slower than freetype, even for the default master. I took a look, and I believe it's because of this: when interpretting
blend
operator, instead of just computing the blended value and storing that on the stack, we allocate a vector and store blend arguments in a vector, only to compute it later and free the vector. That amounts to a malloc/free per coordinate x/y for a variable font!!!I'm not sure why this code was written that way, but that's completely unnecessary and unfortunate. The code involved is
hb-cff2-interp-cs.hh::set_blends()
.The text was updated successfully, but these errors were encountered: