Optimize out divisions in shaping at upem size with hb-ot-font #1801

behdad · 2019-06-27T20:26:21Z

There's a lot that can be optimized, both in the main scaling as well as parent scaling in hb-font.cc. We should measure.

behdad · 2019-07-05T20:06:37Z

I measured this a bit.. It's easy enough to optimize hb_ot_get_glyph_h_advances() but the speedup seems to be in the 1% range at best (if I disable liga / kern).

behdad · 2019-07-05T20:08:30Z

Actually I take that back. The instruction count changes in the 1% range. But by not stalling the pipeline (removing division), instructions-per-cycle goes up from 3.00 to 3.22. That suggests ~5% speedup might be measurable.

behdad · 2019-07-05T20:17:02Z

For the record. Current function is:

static void
hb_ot_get_glyph_h_advances (hb_font_t* font, void* font_data,
                            unsigned count,
                            const hb_codepoint_t *first_glyph,
                            unsigned glyph_stride,
                            hb_position_t *first_advance,
                            unsigned advance_stride,
                            void *user_data HB_UNUSED)
{
  const hb_ot_face_t *ot_face = (const hb_ot_face_t *) font_data;
  const OT::hmtx_accelerator_t &hmtx = *ot_face->hmtx;

  for (unsigned int i = 0; i < count; i++)
  {
    *first_advance = font->em_scale_x (hmtx.get_advance (*first_glyph, font));
    first_glyph = &StructAtOffsetUnaligned<hb_codepoint_t> (first_glyph, glyph_stride);
    first_advance = &StructAtOffsetUnaligned<hb_position_t> (first_advance, advance_stride);
  }
}

where em_scale_x (v) expands to em_scale (v, font->x_scale), and em_scale is:

  hb_position_t em_scale (int16_t v, int scale) 
  { 
    int upem = face->get_upem (); 
    int64_t scaled = v * (int64_t) scale; 
    scaled += scaled >= 0 ? upem/2 : -upem/2; /* Round. */ 
    return (hb_position_t) (scaled / upem); 
  }

The get_upem() part obviously can be moved out of the loop. We annotate get_upem() with HB_PURE_FUNC. Not sure if that's enough for the compiler to move it out of the loop after inlining. Unlikely.

Rounding, maybe do without branch?

Finally, I don't know what the code size tradeoff would be, but at least when __OPTIMIZE_SIZE__ is not defined, we can have three loop bodies for:

The general case, with the multiplication and division,
Change division to right-shift if upem happens to be power-of-two.
No multiplication/division if scale happens to be equal to upem.

behdad · 2019-07-05T20:22:47Z

Rounding, maybe do without branch?

Tried it. Didn't seem to matter.

behdad · 2019-07-05T20:29:59Z

Another possibility is to move the division outside the loop, by first transforming em_scale to (ignore rounding):

  hb_position_t em_scale (int16_t v, int scale) 
  { 
    unsigned upem = face->get_upem (); 
    int64_t mult = ((int64_t) scale << 16) / (int) upem; 
 
    return (hb_position_t) ((v * mult) >> 16); 
  }

Compiler doesn't seem to figure out it can move the division outside the loop, so needs to be helped out.

behdad · 2019-07-05T20:31:49Z

Humm. Actually, we can make the font cache x_mult / y_mult. Let me try that.

behdad · 2019-07-05T20:43:25Z

    *first_advance = font->em_scale_x (hmtx.get_advance (*first_glyph, font));

If font doesn't have variations set, we can call the version of get_advance without passing font in:

    unsigned int get_advance (hb_codepoint_t  glyph,
                              hb_font_t      *font) const
    {
      unsigned int advance = get_advance (glyph);
      if (likely (glyph < num_metrics))
      {
        advance += (font->num_coords ? var_table->get_advance_var (glyph, font->coords, font->num_coords) : 0); // TODO Optimize?!
      }
      return advance;
    }

But then we are approaching combinatorial explosion on the different branches.

behdad · 2019-07-05T20:57:38Z

Okay! Completely removed division from scaling!

Part of #1801 The assumption that compiler optimizes "upem/2" to a shift only works if upem is unsigned... Anyway, spoon-feed the compiler.

Part of #1801

behdad · 2019-07-05T20:59:54Z

cc @drott

behdad self-assigned this Jun 27, 2019

behdad added a commit that referenced this issue Jul 5, 2019

[ot-font] Optimize rounding

0d425e1

Part of #1801 The assumption that compiler optimizes "upem/2" to a shift only works if upem is unsigned... Anyway, spoon-feed the compiler.

behdad added a commit that referenced this issue Jul 5, 2019

[font] Keep font-space to user-space multiplier

b847769

Part of #1801

behdad closed this as completed in f18ea1d Jul 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize out divisions in shaping at upem size with hb-ot-font #1801

Optimize out divisions in shaping at upem size with hb-ot-font #1801

behdad commented Jun 27, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019 •

edited

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

Optimize out divisions in shaping at upem size with hb-ot-font #1801

Optimize out divisions in shaping at upem size with hb-ot-font #1801

Comments

behdad commented Jun 27, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019 • edited

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019

behdad commented Jul 5, 2019 •

edited