Skip to content

Commit

Permalink
md5 benchmark 15% speed by removing "slow LEA"
Browse files Browse the repository at this point in the history
Remove a special case from the assembler that generates "Slow LEA"
instructions that can execute poorly on Skylake and Haswell CPUs.

The "slow LEA" is one that uses base+index+offset operands. These
instructions "have increased latency and reduced dispatch port choices
compared to other LEAs."

Links:
- https://software.intel.com/en-us/node/544484
- http://stackoverflow.com/questions/21288214/what-are-fast-lea-and-slow-lea-unit-in-the-microarchitecture-of-intes-cpu

Resolves raptorjit/raptorjit#54.

Here is an example of a "slow LEA" instruction that was emitted before:

    lea eax, [rbx+rdx+1234]

The new replacement avoids the bad case:

    lea eax, [rbx+1234]
    add rax, rdx

On Haswell and Skylake CPUs this improves the md5 benchmark
performance by ~15%. The difference in cycles (time) correlates
closely with the difference in slow LEA instructions executed (as
reported by the CPU performance monitoring unit.)

Before:

     Performance counter stats for './luajit ../../luajit-test-cleanup/bench/md5.lua 20000':

         8,166,721,155      instructions              #    2.02  insn per cycle
         4,039,743,481      cycles
           633,604,974      uops_issued_slow_lea

           1.683641631 seconds time elapsed

After:

         8,463,581,471      instructions              #    2.45  insn per cycle
         3,454,061,396      cycles
           340,049,934      uops_issued_slow_lea
  • Loading branch information
lukego committed Mar 22, 2017
1 parent d54947d commit 7356708
Showing 1 changed file with 0 additions and 14 deletions.
14 changes: 0 additions & 14 deletions src/lj_asm_x86.h
Original file line number Diff line number Diff line change
Expand Up @@ -1710,20 +1710,6 @@ static int asm_lea(ASMState *as, IRIns *ir)
} else {
return 0;
}
} else if (ir->op1 != ir->op2 && irl->o == IR_ADD && mayfuse(as, ir->op1) &&
(irref_isk(ir->op2) || irref_isk(irl->op2))) {
Reg idx, base = ra_alloc1(as, irl->op1, allow);
rset_clear(allow, base);
as->mrm.base = (uint8_t)base;
if (irref_isk(ir->op2)) {
as->mrm.ofs = irr->i;
idx = ra_alloc1(as, irl->op2, allow);
} else {
as->mrm.ofs = IR(irl->op2)->i;
idx = ra_alloc1(as, ir->op2, allow);
}
rset_clear(allow, idx);
as->mrm.idx = (uint8_t)idx;
} else {
return 0;
}
Expand Down

0 comments on commit 7356708

Please sign in to comment.