New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize out divisions in shaping at upem size with hb-ot-font #1801
Comments
I measured this a bit.. It's easy enough to optimize |
Actually I take that back. The instruction count changes in the 1% range. But by not stalling the pipeline (removing division), instructions-per-cycle goes up from 3.00 to 3.22. That suggests ~5% speedup might be measurable. |
For the record. Current function is: static void
hb_ot_get_glyph_h_advances (hb_font_t* font, void* font_data,
unsigned count,
const hb_codepoint_t *first_glyph,
unsigned glyph_stride,
hb_position_t *first_advance,
unsigned advance_stride,
void *user_data HB_UNUSED)
{
const hb_ot_face_t *ot_face = (const hb_ot_face_t *) font_data;
const OT::hmtx_accelerator_t &hmtx = *ot_face->hmtx;
for (unsigned int i = 0; i < count; i++)
{
*first_advance = font->em_scale_x (hmtx.get_advance (*first_glyph, font));
first_glyph = &StructAtOffsetUnaligned<hb_codepoint_t> (first_glyph, glyph_stride);
first_advance = &StructAtOffsetUnaligned<hb_position_t> (first_advance, advance_stride);
}
} where hb_position_t em_scale (int16_t v, int scale)
{
int upem = face->get_upem ();
int64_t scaled = v * (int64_t) scale;
scaled += scaled >= 0 ? upem/2 : -upem/2; /* Round. */
return (hb_position_t) (scaled / upem);
} The Rounding, maybe do without branch? Finally, I don't know what the code size tradeoff would be, but at least when
|
Tried it. Didn't seem to matter. |
Another possibility is to move the division outside the loop, by first transforming hb_position_t em_scale (int16_t v, int scale)
{
unsigned upem = face->get_upem ();
int64_t mult = ((int64_t) scale << 16) / (int) upem;
return (hb_position_t) ((v * mult) >> 16);
} Compiler doesn't seem to figure out it can move the division outside the loop, so needs to be helped out.
|
Humm. Actually, we can make the font cache |
If font doesn't have variations set, we can call the version of unsigned int get_advance (hb_codepoint_t glyph,
hb_font_t *font) const
{
unsigned int advance = get_advance (glyph);
if (likely (glyph < num_metrics))
{
advance += (font->num_coords ? var_table->get_advance_var (glyph, font->coords, font->num_coords) : 0); // TODO Optimize?!
}
return advance;
} But then we are approaching combinatorial explosion on the different branches. |
Okay! Completely removed division from scaling! |
Part of #1801 The assumption that compiler optimizes "upem/2" to a shift only works if upem is unsigned... Anyway, spoon-feed the compiler.
cc @drott |
There's a lot that can be optimized, both in the main scaling as well as parent scaling in hb-font.cc. We should measure.
The text was updated successfully, but these errors were encountered: