Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YJIT: Fallback send instructions to vm_sendish #8106

Merged
merged 2 commits into from Jul 24, 2023

Conversation

k0kubun
Copy link
Member

@k0kubun k0kubun commented Jul 21, 2023

send, opt_send_without_block, invokesuper, and invokeblock instructions are currently the top exit reasons for most benchmarks and production applications. It's because they side-exit when YJIT doesn't know how to optimize them.

This PR changes YJIT to call the implementation of the interpreter in such scenarios so that the execution stays in JIT code as long as possible.

Ratio in YJIT

It improves ratio_in_yjit from 92.6% to 97.1% on railsbench.

Before

***YJIT: Printing YJIT statistics on exit***
method call exit reasons:
                          block_arg:     63,519 (20.5%)
             iseq_has_rest_and_send:     51,188 (16.5%)
                        iseq_zsuper:     45,901 (14.8%)
                   iseq_arity_error:     30,517 ( 9.9%)
                iseq_ruby2_keywords:     24,696 ( 8.0%)
                     iseq_has_no_kw:     21,095 ( 6.8%)
          args_splat_cfunc_var_args:     16,018 ( 5.2%)
            iseq_materialized_block:     15,942 ( 5.2%)
                           kw_splat:     12,611 ( 4.1%)
                    iseq_has_kwrest:      7,172 ( 2.3%)
                        not_fixnums:      4,968 ( 1.6%)
                  klass_megamorphic:      4,286 ( 1.4%)
           iseq_missing_optional_kw:      3,942 ( 1.3%)
             args_splat_cfunc_zuper:      1,972 ( 0.6%)
        iseq_has_rest_opt_and_block:      1,959 ( 0.6%)
                      iseq_has_post:      1,958 ( 0.6%)
              cfunc_ruby_array_varg:      1,118 ( 0.4%)
                       mid_mismatch:        389 ( 0.1%)
                           keywords:         90 ( 0.0%)
    args_splat_cfunc_ruby2_keywords:         33 ( 0.0%)
                        interrupted:         17 ( 0.0%)
                      zsuper_method:         15 ( 0.0%)
invokeblock exit reasons:
      proc:      1,807 (93.5%)
    symbol:        126 ( 6.5%)
invokesuper exit reasons:
    me_changed:      4,676 (69.2%)
         block:      2,081 (30.8%)
leave exit reasons:
    interp_return:  1,678,795 (100.0%)
     se_interrupt:         17 ( 0.0%)
getblockparamproxy exit reasons:
    block_param_modified:         77 (96.2%)
          not_gc_guarded:          3 ( 3.8%)
getinstancevariable exit reasons:
    (all relevant counters are zero)
setinstancevariable exit reasons:
    (all relevant counters are zero)
definedivar exit reasons:
    (all relevant counters are zero)
opt_aref exit reasons:
    (all relevant counters are zero)
expandarray exit reasons:
            splat:      9,945 (99.9%)
    rhs_too_small:          5 ( 0.1%)
opt_getinlinecache exit reasons:
    miss:          3 (100.0%)
invalidation reasons:
          method_lookup:        378 (68.6%)
    constant_state_bump:        121 (22.0%)
       constant_ic_fill:         52 ( 9.4%)
num_send:                 13,176,721
num_send_known_class:        471,932 ( 3.6%)
num_send_polymorphic:        999,997 ( 7.6%)
num_send_x86_rel32:           19,700
num_send_x86_reg:                 38
iseq_stack_too_large:              0
iseq_too_long:                     0
temp_reg_opnd:               108,435
temp_mem_opnd:                78,766
temp_spill:                   69,350
bindings_allocations:              0
bindings_set:                      0
compiled_iseq_entry:           1,214
compiled_iseq_count:           2,153
compiled_blockid_count:       16,690
compiled_block_count:         20,987
versions_per_block:            1.257
compiled_branch_count:        37,360
block_next_count:             18,924
defer_count:                   6,589
defer_empty_count:             1,355
branch_insn_count:             1,785
branch_known_count:              324 (18.2%)
freed_iseq_count:                 33
invalidation_count:              551
constant_state_bumps:              0
get_ivar_max_depth:               20
inline_code_size:          2,248,796
outlined_code_size:        1,992,972
code_region_size:          4,403,200
freed_code_size:                   0
live_context_size:           778,969
live_context_count:           26,861
live_page_count:                 269
freed_page_count:                  0
code_gc_count:                     0
num_gc_obj_refs:              17,701
object_shape_count:            2,326
side_exit_count:             349,584
total_exit_count:          2,028,379
total_insns_count:        87,371,151
vm_insns_count:            6,511,097
yjit_insns_count:         81,209,638
ratio_in_yjit:                 92.5%
avg_len_in_yjit:                39.9
Top-18 most frequent exit ops (100.0% of exits):
               invokesuper:    102,325 (29.3%)
                      send:     97,831 (28.0%)
    opt_send_without_block:     78,965 (22.6%)
               invokeblock:     30,479 ( 8.7%)
                  opt_aref:     13,971 ( 4.0%)
               expandarray:      9,950 ( 2.8%)
             setlocal_WC_0:      8,140 ( 2.3%)
                    opt_eq:      4,968 ( 1.4%)
        getblockparamproxy:      2,260 ( 0.6%)
                 opt_nil_p:        244 ( 0.1%)
      opt_getconstant_path:        211 ( 0.1%)
                checkmatch:        144 ( 0.0%)
                      once:         58 ( 0.0%)
                     leave:         17 ( 0.0%)
               objtostring:         17 ( 0.0%)
                  opt_plus:          2 ( 0.0%)
                  opt_size:          1 ( 0.0%)
                  branchif:          1 ( 0.0%)
Total time spent benchmarking: 4s

before: ruby 3.3.0dev (2023-07-22T04:07:04Z master dd04def10f) +YJIT [x86_64-linux]

----------  -----------  ----------
bench       before (ms)  stddev (%)
railsbench  1670.5       0.0
----------  -----------  ----------

After

***YJIT: Printing YJIT statistics on exit***
method call exit reasons:
                          block_arg:     63,795 (19.4%)
             iseq_has_rest_and_send:     53,159 (16.1%)
                        iseq_zsuper:     45,901 (13.9%)
                   iseq_arity_error:     32,488 ( 9.9%)
                     iseq_has_no_kw:     25,011 ( 7.6%)
                iseq_ruby2_keywords:     24,756 ( 7.5%)
          args_splat_cfunc_var_args:     16,019 ( 4.9%)
            iseq_materialized_block:     15,942 ( 4.8%)
                           kw_splat:     14,624 ( 4.4%)
                    iseq_has_kwrest:     11,118 ( 3.4%)
                        not_fixnums:      4,968 ( 1.5%)
                  klass_megamorphic:      4,291 ( 1.3%)
           iseq_missing_optional_kw:      3,943 ( 1.2%)
                 args_splat_bmethod:      3,399 ( 1.0%)
        iseq_has_rest_opt_and_block:      1,981 ( 0.6%)
             args_splat_cfunc_zuper:      1,972 ( 0.6%)
         iseq_has_rest_and_captured:      1,958 ( 0.6%)
                      iseq_has_post:      1,958 ( 0.6%)
              cfunc_ruby_array_varg:      1,760 ( 0.5%)
                       mid_mismatch:        389 ( 0.1%)
                           keywords:         90 ( 0.0%)
    args_splat_cfunc_ruby2_keywords:         34 ( 0.0%)
                        interrupted:         20 ( 0.0%)
                      zsuper_method:         15 ( 0.0%)
invokeblock exit reasons:
      proc:      1,807 (93.5%)
    symbol:        126 ( 6.5%)
invokesuper exit reasons:
    me_changed:      4,676 (69.2%)
         block:      2,081 (30.8%)
leave exit reasons:
    interp_return:  1,708,600 (100.0%)
     se_interrupt:         17 ( 0.0%)
getblockparamproxy exit reasons:
    block_param_modified:         77 (96.2%)
          not_gc_guarded:          3 ( 3.8%)
getinstancevariable exit reasons:
    (all relevant counters are zero)
setinstancevariable exit reasons:
    (all relevant counters are zero)
definedivar exit reasons:
    (all relevant counters are zero)
opt_aref exit reasons:
    (all relevant counters are zero)
expandarray exit reasons:
            splat:      9,945 (99.9%)
    rhs_too_small:          5 ( 0.1%)
opt_getinlinecache exit reasons:
    miss:          3 (100.0%)
invalidation reasons:
          method_lookup:        382 (68.0%)
    constant_state_bump:        125 (22.2%)
       constant_ic_fill:         55 ( 9.8%)
num_send:                 13,794,298
num_send_known_class:        493,928 ( 3.6%)
num_send_polymorphic:      1,008,032 ( 7.3%)
num_send_x86_rel32:           21,667
num_send_x86_reg:                 46
iseq_stack_too_large:              0
iseq_too_long:                     0
temp_reg_opnd:               116,936
temp_mem_opnd:                85,539
temp_spill:                   74,846
bindings_allocations:              0
bindings_set:                      0
compiled_iseq_entry:           1,124
compiled_iseq_count:           2,159
compiled_blockid_count:       17,999
compiled_block_count:         22,616
versions_per_block:            1.257
compiled_branch_count:        40,252
block_next_count:             20,337
defer_count:                   7,182
defer_empty_count:             1,486
branch_insn_count:             1,912
branch_known_count:              334 (17.5%)
freed_iseq_count:                 33
invalidation_count:              562
constant_state_bumps:              0
get_ivar_max_depth:               20
inline_code_size:          2,400,539
outlined_code_size:        2,172,050
code_region_size:          4,763,648
freed_code_size:                   0
live_context_size:           844,915
live_context_count:           29,135
live_page_count:                 291
freed_page_count:                  0
code_gc_count:                     0
num_gc_obj_refs:              19,076
object_shape_count:            2,326
side_exit_count:              39,258
total_exit_count:          1,747,858
total_insns_count:        88,293,673
vm_insns_count:            2,521,541
yjit_insns_count:         85,811,390
ratio_in_yjit:                 97.1%
avg_len_in_yjit:                49.1
Top-14 most frequent exit ops (100.0% of exits):
             setlocal_WC_0:     10,186 (25.9%)
               expandarray:      9,950 (25.3%)
               invokesuper:      6,758 (17.2%)
                    opt_eq:      4,968 (12.7%)
    opt_send_without_block:      4,181 (10.7%)
        getblockparamproxy:      2,260 ( 5.8%)
                      send:        255 ( 0.6%)
                 opt_nil_p:        244 ( 0.6%)
      opt_getconstant_path:        214 ( 0.5%)
                checkmatch:        144 ( 0.4%)
                      once:         58 ( 0.1%)
                     leave:         21 ( 0.1%)
               objtostring:         18 ( 0.0%)
                opt_length:          1 ( 0.0%)
Total time spent benchmarking: 4s

after: ruby 3.3.0dev (2023-07-22T04:07:50Z yjit-send-fallback 39eeab23c6) +YJIT [x86_64-linux]

----------  ----------  ----------
bench       after (ms)  stddev (%)
railsbench  1619.2      0.0
----------  ----------  ----------

Benchmarks

The following benchmarks get 5%+ speedup.

before: ruby 3.3.0dev (2023-07-22T04:07:04Z master dd04def10f) +YJIT [x86_64-linux]
after: ruby 3.3.0dev (2023-07-22T04:07:50Z yjit-send-fallback 39eeab23c6) +YJIT [x86_64-linux]

------------  -----------  ----------  ----------  ----------  -------------  ------------
bench         before (ms)  stddev (%)  after (ms)  stddev (%)  after 1st itr  before/after
activerecord  32.6         5.1         29.8        5.1         1.01           1.10 
erubi-rails   11.5         25.5        10.9        32.9        1.04           1.05
hexapdf       1401.9       1.2         1341.4      1.1         1.04           1.05  
railsbench    1232.5       1.2         1149.2      1.2         1.03           1.07
ruby-lsp      38.7         19.2        36.2        21.8        0.96           1.07
------------  -----------  ----------  ----------  ----------  -------------  ------------

@k0kubun k0kubun marked this pull request as ready for review July 22, 2023 04:48
@matzbot matzbot requested a review from a team July 22, 2023 04:49
@k0kubun
Copy link
Member Author

k0kubun commented Jul 24, 2023

I added the Before and After of full YJIT stats as "Details" under "Ratio in YJIT" in the PR description. You can still see it's important to support block_arg for optimizing method calls. I think the current counters are good enough for merging this, but I plan to add another counter for counting the number of fallback calls in another PR (I can do this in this PR if you prefer it, but this PR is already complex enough for me).

vm_exec.h Outdated Show resolved Hide resolved
yjit/src/codegen.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@maximecb maximecb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice PR. The results look great, and I'm really looking forward to seeing the impact it has on SFR!

@maximecb
Copy link
Contributor

I plan to add another counter for counting the number of fallback calls in another PR (I can do this in this PR if you prefer it, but this PR is already complex enough for me).

This is a good idea. As we close down exits, it will become more important to track places where we do things that don't exit but could possibly be more optimized. We'll also want to take the habit of profiling more often.

@k0kubun k0kubun merged commit cef60e9 into ruby:master Jul 24, 2023
1 of 3 checks passed
@k0kubun k0kubun deleted the yjit-send-fallback branch July 24, 2023 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants