YJIT: Optimize `cmp REG, 0` into `test REG, REG` #7471

k0kubun · 2023-03-08T00:23:07Z

This implements the idea suggested at #7242 (comment). When checking equality of a register against 0, using test instruction generates more compact code.

This is particularly useful when you want to do something like asm.cmp(block_handler, VM_BLOCK_HANDLER_NONE.into()); where you would need to assert VM_BLOCK_HANDLER_NONE == 0 in order to use asm.test.

Generated code

Before

  # block_given?
  ...
  0x5583d243b07b: cmp rax, 0
  0x5583d243b07f: mov eax, 0x14
  0x5583d243b084: mov ecx, 0
  0x5583d243b089: cmove rax, rcx

After

  # block_given?
  ..
  0x5567d51cb07b: test rax, rax
  0x5567d51cb07e: mov eax, 0x14
  0x5567d51cb083: mov ecx, 0
  0x5567d51cb088: cmove rax, rcx

Code size stats

On railsbench, inline code size went down by 0.1%.

Before

inline_code_size:          2,160,054
outlined_code_size:        2,158,816

After

inline_code_size:          2,157,565
outlined_code_size:        2,156,784

maximecb · 2023-03-08T15:15:00Z

It's nice to be able to do this optimization, but if we keep adding passes, there is a risk that compilation time could become a concern.

One way that we could evaluate compilation time is to measure the time taken by 30k_ifelse on the first iteration with --yjit-call-threshold=1 and compare that against the time taken by the CRuby interpreter with YJIT disabled (compute the difference). This should be evaluated on Linux because AFAIK on macOS, the mprotect calls are very slow. The overhead may turn out to be insignificant, but it's still something we should try to quantify.

Otherwise, I would be tempted to ask, can we do this in the x86_merge pass you added the other day? Typically, peephole optimizations like this are done at the end, because the compiler may introduce inefficiencies during compilation.

k0kubun · 2023-03-08T17:39:56Z

I'll benchmark the 1st itr of 30k_ifelse on Linux and report it here.

Otherwise, I would be tempted to ask, can we do this in the x86_merge pass you added the other day? Typically, peephole optimizations like this are done at the end, because the compiler may introduce inefficiencies during compilation.

What I wanted to explain in the code comments was that x86_split inserts Load instructions in between Cmp and Cmove, for example, so at least peeking one instruction doesn't allow this optimization at x86_merge, which is why I placed x86_replace before x86_split. On the other hand, x86_split overwrites all out operands to Insn::Out, so x86_merge cannot be placed before x86_split because Opnd::Reg could be overwritten to Opnd::Out unexpectedly.

k0kubun · 2023-03-08T18:46:19Z

Looks like the 1st itr of 30k_ifelse visualizes the compilation speed pretty well. The slowdown by x86_replace is as follows, and it seems that x86_merge is also adding the same kind of overhead. So it seems wise to keep the number of passes as low as possible.

before: ruby 3.3.0dev (2023-03-07T22:59:37Z master 3e731cd945) +YJIT [x86_64-linux]
after: ruby 3.3.0dev (2023-03-08T00:22:49Z yjit-backend-cmp 3d9ff80b51) +YJIT [x86_64-linux]

----------  -----------  ----------  ----------  ----------  ------------  -------------
bench       before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
30k_ifelse  904.6        0.0         961.2       0.0         0.94          0.94
----------  -----------  ----------  ----------  ----------  ------------  -------------

I'll think about removing at least one pass here. At least, maybe we could do this as part of x86_split.

maximecb · 2023-03-08T21:14:07Z

If we could do it in x86_split that would be attractive 👍

Another potential alternative is to do it right when the instruction is inserted, peek at previous instruction.

k0kubun · 2023-03-09T00:52:55Z

I did that in x86_split 5b0a26c, and it seems like the idea worked 🙂

before: ruby 3.3.0dev (2023-03-08T21:13:23Z master 309dd50a01) +YJIT [x86_64-linux]
after: ruby 3.3.0dev (2023-03-09T00:48:09Z yjit-backend-cmp 5b0a26c393) +YJIT [x86_64-linux]

----------  -----------  ----------  ----------  ----------  ------------  -------------
bench       before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
30k_ifelse  912.4        0.0         917.4       0.0         0.99          0.99
----------  -----------  ----------  ----------  ----------  ------------  -------------

maximecb · 2023-03-09T15:19:52Z

Looks good! Thanks for taking the time to validate the effect on compilation time :)

matzbot requested a review from a team March 8, 2023 00:23

k0kubun force-pushed the yjit-backend-cmp branch from 3d9ff80 to e01c1b8 Compare March 9, 2023 00:31

YJIT: Optimize cmp REG, 0 into test REG, REG

5b0a26c

k0kubun force-pushed the yjit-backend-cmp branch from e01c1b8 to 5b0a26c Compare March 9, 2023 00:51

maximecb merged commit 22d8e95 into ruby:master Mar 9, 2023
97 checks passed

k0kubun deleted the yjit-backend-cmp branch March 9, 2023 15:43

k0kubun mentioned this pull request Mar 15, 2023

YJIT: Eliminate unnecessary mov for trampolines #7537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YJIT: Optimize `cmp REG, 0` into `test REG, REG` #7471

YJIT: Optimize `cmp REG, 0` into `test REG, REG` #7471

k0kubun commented Mar 8, 2023 •

edited

maximecb commented Mar 8, 2023

k0kubun commented Mar 8, 2023 •

edited

k0kubun commented Mar 8, 2023 •

edited

maximecb commented Mar 8, 2023

k0kubun commented Mar 9, 2023

maximecb commented Mar 9, 2023

YJIT: Optimize cmp REG, 0 into test REG, REG #7471

YJIT: Optimize cmp REG, 0 into test REG, REG #7471

Conversation

k0kubun commented Mar 8, 2023 • edited

Generated code

Before

After

Code size stats

Before

After

maximecb commented Mar 8, 2023

k0kubun commented Mar 8, 2023 • edited

k0kubun commented Mar 8, 2023 • edited

maximecb commented Mar 8, 2023

k0kubun commented Mar 9, 2023

maximecb commented Mar 9, 2023

YJIT: Optimize `cmp REG, 0` into `test REG, REG` #7471

YJIT: Optimize `cmp REG, 0` into `test REG, REG` #7471

k0kubun commented Mar 8, 2023 •

edited

k0kubun commented Mar 8, 2023 •

edited

k0kubun commented Mar 8, 2023 •

edited