Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YJIT: Stack temp register allocation #7651

Merged
merged 1 commit into from Apr 4, 2023
Merged

Conversation

k0kubun
Copy link
Member

@k0kubun k0kubun commented Apr 4, 2023

This PR implements register allocation for stack temps. It currently works only on x86_64 and I'm working on arm64 support, but I'm filing this now to reduce the risks of conflict first.

Design

  • Use 5 caller-saved registers for stack temps on x86_64.
    • They are used only for stack temps for now, but we could share it with Opnd::InsnOut in the future.
    • They conflict with registers used for C call arguments, so asm.ccall asserts that no register is allocated to stack temps at the time.
  • Share the 5 registers among 8 stack temps, assigned by modulo
    • reg_idx = stack_idx % num_regs. So, given 5 registers, stack[0] and stack[5] share the same register.
    • If a register is already allocated to a conflicting stack temp, simply skip allocating a register to a new stack temp.
  • Spill registers before C calls, method calls, and side exits.
    • C calls: Argument registers conflict, and we need values on stack for GC. This also covers the case that an argument is a VALUE * pointing to the stack.
    • Method calls: It doesn't perform cross-method register allocation. A caller and the caller want to use the same register for stack temp 0, for example.
    • Side exits: Values must be on stack before returning to the interpreter.
  • Branch stubs spill registers only in the branch stub code and preserve register mapping.
    • We push/pop registers for calling branch_stub_hit. In a branch stub, registers are spilled for jit.peek_at_stack, but they're reloaded by pop instructions and the next block can use registers again.
  • --yjit-temp-regs to control the number of assigned registers
    • You can use --yjit-temp-regs=5 to enable this feature, and --yjit-temp-regs=0 to disable this feature.
    • Using --yjit-temp-regs=0 by default at least until we implement arm64 support.

Benchmark

Here's the current performance:

Headline

This PR speeds up activerecord, hexapdf, liquid-render, and mail by 4-5%.

regs=0: ruby 3.3.0dev (2023-04-04T16:27:22Z yjit-stack-temps 0fa81a7fe6) +YJIT [x86_64-linux]
regs=5: ruby 3.3.0dev (2023-04-04T16:27:22Z yjit-stack-temps 0fa81a7fe6) +YJIT [x86_64-linux]

-------------  -----------  ----------  -----------  ----------  -------------  --------------
bench          regs=0 (ms)  stddev (%)  regs=5 (ms)  stddev (%)  regs=0/regs=5  regs=5 1st itr
activerecord   34.0         0.5         32.5         0.5         1.05           0.94
erubi_rails    11.2         15.6        11.1         11.6        1.01           0.72
hexapdf        1391.7       1.2         1332.0       1.9         1.04           0.97
liquid-c       40.8         2.5         39.8         2.6         1.03           0.85
liquid-render  74.7         2.2         71.9         2.4         1.04           0.88
mail           91.5         0.3         88.3         0.6         1.04           0.87
psych-load     1227.2       0.1         1209.7       0.2         1.01           1.01
railsbench     1240.9       1.5         1232.5       1.6         1.01           0.94
ruby-lsp       41.3         26.6        40.8         32.4        1.01           0.84
sequel         50.9         0.3         51.1         0.3         0.99           0.98
-------------  -----------  ----------  -----------  ----------  -------------  --------------

Other

regs=0: ruby 3.3.0dev (2023-04-04T16:27:22Z yjit-stack-temps 0fa81a7fe6) +YJIT [x86_64-linux]
regs=5: ruby 3.3.0dev (2023-04-04T16:27:22Z yjit-stack-temps 0fa81a7fe6) +YJIT [x86_64-linux]

-------------  -----------  ----------  -----------  ----------  -------------  --------------
bench          regs=0 (ms)  stddev (%)  regs=5 (ms)  stddev (%)  regs=0/regs=5  regs=5 1st itr
binarytrees    150.9        0.3         135.8        0.5         1.11           1.10
chunky_png     399.4        0.1         367.6        0.1         1.09           1.05
erubi          158.1        1.0         158.1        1.2         1.00           1.00
etanni         253.5        0.8         253.7        0.8         1.00           1.01
fannkuchredux  612.8        0.2         471.5        0.4         1.30           1.00
lee            556.3        1.1         527.6        1.3         1.05           1.04
nbody          46.0         0.4         43.5         0.3         1.06           1.03
optcarrot      1764.7       0.8         1545.5       0.5         1.14           1.11
ruby-json      2462.0       0.1         2459.4       0.0         1.00           1.00
rubykon        4834.7       0.3         4472.3       0.4         1.08           1.08
-------------  -----------  ----------  -----------  ----------  -------------  --------------

Micro

regs=0: ruby 3.3.0dev (2023-04-04T16:27:22Z yjit-stack-temps 0fa81a7fe6) +YJIT [x86_64-linux]
regs=5: ruby 3.3.0dev (2023-04-04T16:27:22Z yjit-stack-temps 0fa81a7fe6) +YJIT [x86_64-linux]

--------------  -----------  ----------  -----------  ----------  -------------  --------------
bench           regs=0 (ms)  stddev (%)  regs=5 (ms)  stddev (%)  regs=0/regs=5  regs=5 1st itr
30k_ifelse      265.8        0.1         240.3        0.3         1.11           0.77
30k_methods     535.9        0.1         535.1        0.1         1.00           0.90
cfunc_itself    23.0         1.9         23.1         2.1         1.00           1.11
fib             50.4         0.4         31.2         2.5         1.61           1.66
getivar         60.8         12.6        10.2         93.7        5.97           1.00
keyword_args    39.1         0.5         43.0         0.8         0.91           0.90
respond_to      20.2         0.6         14.1         0.9         1.43           1.29
setivar         10.5         47.3        8.9          52.7        1.19           0.90
setivar_object  36.5         16.3        34.7         17.7        1.05           1.07
setivar_young   36.8         21.0        35.1         21.8        1.05           1.08
str_concat      25.9         1.3         24.1         1.4         1.07           1.07
throw           14.7         0.2         14.6         0.3         1.01           1.00
--------------  -----------  ----------  -----------  ----------  -------------  --------------
Code size stats on railsbench and liquid-c

railsbench

before (regs=0)

inline_code_size:          2,272,690
outlined_code_size:        2,271,540
code_region_size:          4,550,656

after (regs=5)

inline_code_size:          2,276,717
outlined_code_size:        2,276,130
code_region_size:          4,554,752

liquid-c

before (regs=0)

inline_code_size:            400,811
outlined_code_size:          398,227
code_region_size:            802,816

after (regs=5)

inline_code_size:            403,570
outlined_code_size:          403,624
code_region_size:            815,104

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
@matzbot matzbot requested a review from a team April 4, 2023 16:33
@maximecb
Copy link
Contributor

maximecb commented Apr 4, 2023

In addition to it, they must be spilled for jit.peek_at_stack

As a potential improvement in a future PR, it might be possible in that case to write the values to the stack but preserve the register mapping. As in, the values need to be written to the stack for peek_at_stack, but they don't need to be unmapped from registers?

@k0kubun
Copy link
Member Author

k0kubun commented Apr 4, 2023

As a potential improvement in a future PR, it might be possible in that case to write the values to the stack but preserve the register mapping. As in, the values need to be written to the stack for peek_at_stack, but they don't need to be unmapped from registers?

I'm sorry if the description was confusing, but it's spilled only in a branch stub code, meaning it preserves the register mapping and doesn't spill when it doesn't jump to a branch stub. Even when it hits a branch stub, pop instructions reload registers after branch stub hit, so it probably doesn't have a problem you worry about.

@k0kubun k0kubun merged commit b7717fc into ruby:master Apr 4, 2023
100 checks passed
@k0kubun k0kubun deleted the yjit-stack-temps branch April 4, 2023 17:58
k0kubun added a commit that referenced this pull request Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants