Skip to content

Conversation

@st0012
Copy link
Member

@st0012 st0012 commented Oct 21, 2025

Similar to SendWithoutBlock, this should eliminate most of the remaining unoptimized cfucc send.

This PR removes send_cfunc_variadic: 2,178,105 ( 4.6%) from Lobsters' send fallback reasons.

Closes Shopify#825

Lobsters Before

Top-16 send fallback reasons (100.0% of total 47,268,771):
                          send_without_block_polymorphic: 20,390,289 (43.1%)
                                           uncategorized:  7,199,864 (15.2%)
                          send_without_block_no_profiles:  4,912,054 (10.4%)
                          send_not_optimized_method_type:  3,507,220 ( 7.4%)
                                        send_no_profiles:  3,312,755 ( 7.0%)
                            one_or_more_complex_arg_pass:  3,087,629 ( 6.5%)
                                     send_cfunc_variadic:  2,178,105 ( 4.6%)
  send_without_block_not_optimized_method_type_optimized:    996,869 ( 2.1%)
                                        send_polymorphic:    730,950 ( 1.5%)
                          send_without_block_megamorphic:    678,490 ( 1.4%)
                                   too_many_args_for_lir:    172,280 ( 0.4%)
                 send_without_block_cfunc_array_variadic:     35,486 ( 0.1%)
                                        send_megamorphic:     28,623 ( 0.1%)
                                obj_to_string_not_string:     25,487 ( 0.1%)
            send_without_block_not_optimized_method_type:     11,049 ( 0.0%)
                          ccall_with_frame_too_many_args:      1,621 ( 0.0%)
Top-20 not inlined C methods (55.4% of total 16,729,627):
                                                 Hash#[]=: 1,518,600 ( 9.1%)
                                               Hash#fetch: 1,204,876 ( 7.2%)
                                            Regexp#match?:   807,872 ( 4.8%)
                                                Hash#key?:   731,104 ( 4.4%)
                                           Array#include?:   496,469 ( 3.0%)
                                              String#sub!:   482,035 ( 2.9%)
                                               Kernel#dup:   430,963 ( 2.6%)
                                                String#<<:   396,049 ( 2.4%)
                                       String#start_with?:   376,726 ( 2.3%)
                               ObjectSpace::WeakKeyMap#[]:   356,654 ( 2.1%)
                                              Hash#delete:   326,490 ( 2.0%)
                                               String.new:   306,747 ( 1.8%)
                                             Kernel#is_a?:   296,037 ( 1.8%)
                                             Set#include?:   260,425 ( 1.6%)
                                    Process.clock_gettime:   228,138 ( 1.4%)
                                            String#match?:   226,352 ( 1.4%)
                                          String#downcase:   215,957 ( 1.3%)
                                              Integer#<=>:   204,498 ( 1.2%)
                                            Range#member?:   203,067 ( 1.2%)
                                          String#include?:   193,214 ( 1.2%)
Top-20 calls to C functions from JIT code (80.8% of total 139,927,686):
                             rb_vm_opt_send_without_block: 30,379,267 (21.7%)
                                rb_vm_setinstancevariable: 11,384,692 ( 8.1%)
                                             rb_hash_aref: 10,515,536 ( 7.5%)
                                rb_vm_getinstancevariable: 10,268,755 ( 7.3%)
                                               rb_vm_send:  9,758,456 ( 7.0%)
                                          rb_vm_env_write:  8,257,283 ( 5.9%)
                                        rb_obj_is_kind_of:  5,833,876 ( 4.2%)
                                        rb_vm_invokesuper:  4,671,672 ( 3.3%)
                                              rb_ivar_get:  3,846,072 ( 2.7%)
                                             rb_ary_entry:  3,577,586 ( 2.6%)
                               rb_vm_opt_getconstant_path:  2,110,775 ( 1.5%)
                                        rb_vm_invokeblock:  1,672,171 ( 1.2%)
                                              rb_ary_push:  1,546,708 ( 1.1%)
                                                 Hash#[]=:  1,518,600 ( 1.1%)
                                       rb_gc_writebarrier:  1,510,300 ( 1.1%)
                                        rb_str_buf_append:  1,383,432 ( 1.0%)
                                          rb_ary_new_capa:  1,372,111 ( 1.0%)
                               rb_class_allocate_instance:  1,226,136 ( 0.9%)
                                               Hash#fetch:  1,204,876 ( 0.9%)
                                                    _bi20:    997,172 ( 0.7%)
Top-2 not optimized method types for send (100.0% of total 3,507,220):
  iseq: 3,504,398 (99.9%)
  null:     2,822 ( 0.1%)
Top-4 not optimized method types for send_without_block (100.0% of total 1,007,918):
        optimized_send: 531,555 (52.7%)
        optimized_call: 461,015 (45.7%)
                  null:  11,049 ( 1.1%)
  optimized_block_call:   4,299 ( 0.4%)
Top-4 instructions with uncategorized fallback reason (100.0% of total 7,199,864):
             invokesuper: 4,671,672 (64.9%)
             invokeblock: 1,672,171 (23.2%)
             sendforward:   787,205 (10.9%)
  opt_send_without_block:    68,816 ( 1.0%)
Top-16 send fallback reasons (100.0% of total 47,268,771):
                          send_without_block_polymorphic: 20,390,289 (43.1%)
                                           uncategorized:  7,199,864 (15.2%)
                          send_without_block_no_profiles:  4,912,054 (10.4%)
                          send_not_optimized_method_type:  3,507,220 ( 7.4%)
                                        send_no_profiles:  3,312,755 ( 7.0%)
                            one_or_more_complex_arg_pass:  3,087,629 ( 6.5%)
                                     send_cfunc_variadic:  2,178,105 ( 4.6%)
  send_without_block_not_optimized_method_type_optimized:    996,869 ( 2.1%)
                                        send_polymorphic:    730,950 ( 1.5%)
                          send_without_block_megamorphic:    678,490 ( 1.4%)
                                   too_many_args_for_lir:    172,280 ( 0.4%)
                 send_without_block_cfunc_array_variadic:     35,486 ( 0.1%)
                                        send_megamorphic:     28,623 ( 0.1%)
                                obj_to_string_not_string:     25,487 ( 0.1%)
            send_without_block_not_optimized_method_type:     11,049 ( 0.0%)
                          ccall_with_frame_too_many_args:      1,621 ( 0.0%)
Top-6 invokeblock handler (100.0% of total 1,672,171):
        polymorphic: 838,375 (50.1%)
   monomorphic_iseq: 711,922 (42.6%)
  monomorphic_other:  58,125 ( 3.5%)
  monomorphic_ifunc:  55,504 ( 3.3%)
        megamorphic:   4,316 ( 0.3%)
        no_profiles:   3,929 ( 0.2%)
Top-9 popular complex argument-parameter features not optimized (100.0% of total 3,352,928):
       caller_kwarg: 827,686 (24.7%)
           param_kw: 758,802 (22.6%)
        param_block: 660,358 (19.7%)
  param_forwardable: 658,707 (19.6%)
         param_rest: 286,679 ( 8.6%)
       param_kwrest: 122,633 ( 3.7%)
       caller_splat:  36,665 ( 1.1%)
    caller_blockarg:     803 ( 0.0%)
    caller_kw_splat:     595 ( 0.0%)
Top-1 compile error reasons (100.0% of total 248,252):
  exception_handler: 248,252 (100.0%)
Top-7 unhandled YARV insns (100.0% of total 189,091):
       getblockparam: 102,212 (54.1%)
  invokesuperforward:  81,665 (43.2%)
       setblockparam:   2,837 ( 1.5%)
         getconstant:   1,594 ( 0.8%)
         expandarray:     360 ( 0.2%)
          checkmatch:     298 ( 0.2%)
                once:     125 ( 0.1%)
Top-4 unhandled HIR insns (100.0% of total 299,542):
          throw: 256,478 (85.6%)
  invokebuiltin:  35,372 (11.8%)
     fixnum_div:   4,971 ( 1.7%)
      array_max:   2,721 ( 0.9%)
Top-18 side exit reasons (100.0% of total 14,297,101):
                   guard_type_failure: 7,013,938 (49.1%)
                  guard_shape_failure: 4,329,505 (30.3%)
  block_param_proxy_not_iseq_or_ifunc: 1,228,950 ( 8.6%)
     patchpoint_stable_constant_names:   415,818 ( 2.9%)
                   unhandled_hir_insn:   299,542 ( 2.1%)
                        compile_error:   248,252 ( 1.7%)
        patchpoint_no_singleton_class:   245,187 ( 1.7%)
          patchpoint_method_redefined:   209,714 ( 1.5%)
                  unhandled_yarv_insn:   189,091 ( 1.3%)
                 fixnum_mult_overflow:    50,739 ( 0.4%)
           block_param_proxy_modified:    28,111 ( 0.2%)
         unhandled_newarray_send_pack:    14,481 ( 0.1%)
               fixnum_lshift_overflow:    10,085 ( 0.1%)
              patchpoint_no_ep_escape:     7,821 ( 0.1%)
             guard_bit_equals_failure:     4,533 ( 0.0%)
               obj_to_string_fallback:     1,177 ( 0.0%)
                            interrupt:       135 ( 0.0%)
               guard_type_not_failure:        22 ( 0.0%)
                             send_count: 151,198,032
                     dynamic_send_count:  47,268,771 (31.3%)
                   optimized_send_count: 103,929,261 (68.7%)
              iseq_optimized_send_count:  38,251,427 (25.3%)
      inline_cfunc_optimized_send_count:  42,451,304 (28.1%)
       inline_iseq_optimized_send_count:   3,790,279 ( 2.5%)
non_variadic_cfunc_optimized_send_count:  13,393,046 ( 8.9%)
    variadic_cfunc_optimized_send_count:   6,043,205 ( 4.0%)
dynamic_getivar_count:                       14,114,827
dynamic_setivar_count:                       11,439,455
compiled_iseq_count:                              5,273
failed_iseq_count:                                    0
compile_time:                                  15,638ms
profile_time:                                      67ms
gc_time:                                           58ms
invalidation_time:                                338ms
vm_write_pc_count:                          139,923,653
vm_write_sp_count:                          191,568,126
vm_write_locals_count:                      134,357,340
vm_write_stack_count:                       134,357,340
vm_write_to_parent_iseq_local_count:            548,930
vm_read_from_parent_iseq_local_count:        15,139,490
guard_type_count:                           151,921,176
guard_type_exit_ratio:                             4.6%
guard_shape_count:                           48,021,720
guard_shape_exit_ratio:                            9.0%
code_region_bytes:                           38,240,256
zjit_alloc_bytes:                            19,995,236
total_mem_bytes:                             58,235,492
side_exit_count:                             14,297,101
total_insn_count:                           930,885,233
vm_insn_count:                              155,775,123
zjit_insn_count:                            775,110,110
ratio_in_zjit:                                    83.3%

Lobsters After

Top-15 send fallback reasons (100.0% of total 45,095,904):
                          send_without_block_polymorphic: 20,390,333 (45.2%)
                                           uncategorized:  7,199,910 (16.0%)
                          send_without_block_no_profiles:  4,912,063 (10.9%)
                          send_not_optimized_method_type:  3,507,220 ( 7.8%)
                                        send_no_profiles:  3,312,754 ( 7.3%)
                            one_or_more_complex_arg_pass:  3,092,766 ( 6.9%)
  send_without_block_not_optimized_method_type_optimized:    996,869 ( 2.2%)
                                        send_polymorphic:    730,950 ( 1.6%)
                          send_without_block_megamorphic:    678,490 ( 1.5%)
                                   too_many_args_for_lir:    172,282 ( 0.4%)
                 send_without_block_cfunc_array_variadic:     35,487 ( 0.1%)
                                        send_megamorphic:     28,623 ( 0.1%)
                                obj_to_string_not_string:     25,487 ( 0.1%)
            send_without_block_not_optimized_method_type:     11,049 ( 0.0%)
                          ccall_with_frame_too_many_args:      1,621 ( 0.0%)
Top-20 not inlined C methods (57.5% of total 18,902,628):
                                               Hash#fetch: 2,651,243 (14.0%)
                                                 Hash#[]=: 1,518,604 ( 8.0%)
                                            Regexp#match?:   807,876 ( 4.3%)
                                                Hash#key?:   731,105 ( 3.9%)
                                           Array#include?:   496,469 ( 2.6%)
                                              String#sub!:   482,035 ( 2.6%)
                                               Kernel#dup:   430,963 ( 2.3%)
                                                String#<<:   396,049 ( 2.1%)
                                       String#start_with?:   376,732 ( 2.0%)
                               ObjectSpace::WeakKeyMap#[]:   356,654 ( 1.9%)
                                              Hash#delete:   326,490 ( 1.7%)
                                               Array#any?:   308,993 ( 1.6%)
                                               String.new:   306,747 ( 1.6%)
                                             Kernel#is_a?:   296,040 ( 1.6%)
                                             Set#include?:   260,424 ( 1.4%)
                                               Array#all?:   253,438 ( 1.3%)
                                    Process.clock_gettime:   228,138 ( 1.2%)
                                            String#match?:   226,352 ( 1.2%)
                                          String#downcase:   215,960 ( 1.1%)
                                              Integer#<=>:   204,498 ( 1.1%)
Top-20 calls to C functions from JIT code (79.5% of total 139,927,922):
                             rb_vm_opt_send_without_block: 30,379,325 (21.7%)
                                rb_vm_setinstancevariable: 11,384,704 ( 8.1%)
                                             rb_hash_aref: 10,515,544 ( 7.5%)
                                rb_vm_getinstancevariable: 10,268,760 ( 7.3%)
                                          rb_vm_env_write:  8,257,298 ( 5.9%)
                                               rb_vm_send:  7,585,485 ( 5.4%)
                                        rb_obj_is_kind_of:  5,833,878 ( 4.2%)
                                        rb_vm_invokesuper:  4,671,675 ( 3.3%)
                                              rb_ivar_get:  3,846,072 ( 2.7%)
                                             rb_ary_entry:  3,577,603 ( 2.6%)
                               rb_vm_opt_getconstant_path:  2,110,776 ( 1.5%)
                                        rb_vm_invokeblock:  1,672,214 ( 1.2%)
                                              rb_ary_push:  1,546,718 ( 1.1%)
                                                 Hash#[]=:  1,518,604 ( 1.1%)
                                       rb_gc_writebarrier:  1,510,301 ( 1.1%)
                                                    fetch:  1,446,363 ( 1.0%)
                                        rb_str_buf_append:  1,383,436 ( 1.0%)
                                          rb_ary_new_capa:  1,372,117 ( 1.0%)
                               rb_class_allocate_instance:  1,226,139 ( 0.9%)
                                               Hash#fetch:  1,204,880 ( 0.9%)
Top-2 not optimized method types for send (100.0% of total 3,507,220):
  iseq: 3,504,398 (99.9%)
  null:     2,822 ( 0.1%)
Top-4 not optimized method types for send_without_block (100.0% of total 1,007,918):
        optimized_send: 531,555 (52.7%)
        optimized_call: 461,015 (45.7%)
                  null:  11,049 ( 1.1%)
  optimized_block_call:   4,299 ( 0.4%)
Top-4 instructions with uncategorized fallback reason (100.0% of total 7,199,910):
             invokesuper: 4,671,675 (64.9%)
             invokeblock: 1,672,214 (23.2%)
             sendforward:   787,205 (10.9%)
  opt_send_without_block:    68,816 ( 1.0%)
Top-15 send fallback reasons (100.0% of total 45,095,904):
                          send_without_block_polymorphic: 20,390,333 (45.2%)
                                           uncategorized:  7,199,910 (16.0%)
                          send_without_block_no_profiles:  4,912,063 (10.9%)
                          send_not_optimized_method_type:  3,507,220 ( 7.8%)
                                        send_no_profiles:  3,312,754 ( 7.3%)
                            one_or_more_complex_arg_pass:  3,092,766 ( 6.9%)
  send_without_block_not_optimized_method_type_optimized:    996,869 ( 2.2%)
                                        send_polymorphic:    730,950 ( 1.6%)
                          send_without_block_megamorphic:    678,490 ( 1.5%)
                                   too_many_args_for_lir:    172,282 ( 0.4%)
                 send_without_block_cfunc_array_variadic:     35,487 ( 0.1%)
                                        send_megamorphic:     28,623 ( 0.1%)
                                obj_to_string_not_string:     25,487 ( 0.1%)
            send_without_block_not_optimized_method_type:     11,049 ( 0.0%)
                          ccall_with_frame_too_many_args:      1,621 ( 0.0%)
Top-6 invokeblock handler (100.0% of total 1,672,214):
        polymorphic: 838,385 (50.1%)
   monomorphic_iseq: 711,954 (42.6%)
  monomorphic_other:  58,125 ( 3.5%)
  monomorphic_ifunc:  55,505 ( 3.3%)
        megamorphic:   4,316 ( 0.3%)
        no_profiles:   3,929 ( 0.2%)
Top-9 popular complex argument-parameter features not optimized (100.0% of total 3,358,065):
       caller_kwarg: 827,685 (24.6%)
           param_kw: 758,802 (22.6%)
        param_block: 660,358 (19.7%)
  param_forwardable: 658,711 (19.6%)
         param_rest: 286,678 ( 8.5%)
       param_kwrest: 122,633 ( 3.7%)
       caller_splat:  36,665 ( 1.1%)
    caller_blockarg:   5,938 ( 0.2%)
    caller_kw_splat:     595 ( 0.0%)
Top-1 compile error reasons (100.0% of total 248,252):
  exception_handler: 248,252 (100.0%)
Top-7 unhandled YARV insns (100.0% of total 189,091):
       getblockparam: 102,212 (54.1%)
  invokesuperforward:  81,665 (43.2%)
       setblockparam:   2,837 ( 1.5%)
         getconstant:   1,594 ( 0.8%)
         expandarray:     360 ( 0.2%)
          checkmatch:     298 ( 0.2%)
                once:     125 ( 0.1%)
Top-4 unhandled HIR insns (100.0% of total 299,542):
          throw: 256,478 (85.6%)
  invokebuiltin:  35,372 (11.8%)
     fixnum_div:   4,971 ( 1.7%)
      array_max:   2,721 ( 0.9%)
Top-18 side exit reasons (100.0% of total 14,297,048):
                   guard_type_failure: 7,013,941 (49.1%)
                  guard_shape_failure: 4,329,503 (30.3%)
  block_param_proxy_not_iseq_or_ifunc: 1,228,950 ( 8.6%)
     patchpoint_stable_constant_names:   415,818 ( 2.9%)
                   unhandled_hir_insn:   299,542 ( 2.1%)
                        compile_error:   248,252 ( 1.7%)
        patchpoint_no_singleton_class:   245,187 ( 1.7%)
          patchpoint_method_redefined:   209,714 ( 1.5%)
                  unhandled_yarv_insn:   189,091 ( 1.3%)
                 fixnum_mult_overflow:    50,739 ( 0.4%)
           block_param_proxy_modified:    28,111 ( 0.2%)
         unhandled_newarray_send_pack:    14,481 ( 0.1%)
               fixnum_lshift_overflow:    10,085 ( 0.1%)
              patchpoint_no_ep_escape:     7,821 ( 0.1%)
             guard_bit_equals_failure:     4,533 ( 0.0%)
               obj_to_string_fallback:     1,177 ( 0.0%)
                            interrupt:        81 ( 0.0%)
               guard_type_not_failure:        22 ( 0.0%)
                             send_count: 151,198,255
                     dynamic_send_count:  45,095,904 (29.8%)
                   optimized_send_count: 106,102,351 (70.2%)
              iseq_optimized_send_count:  38,251,459 (25.3%)
      inline_cfunc_optimized_send_count:  42,451,362 (28.1%)
       inline_iseq_optimized_send_count:   3,790,280 ( 2.5%)
non_variadic_cfunc_optimized_send_count:  13,393,061 ( 8.9%)
    variadic_cfunc_optimized_send_count:   8,216,189 ( 5.4%)
dynamic_getivar_count:                       14,114,832
dynamic_setivar_count:                       11,439,467
compiled_iseq_count:                              5,273
failed_iseq_count:                                    0
compile_time:                                  15,269ms
profile_time:                                      65ms
gc_time:                                           56ms
invalidation_time:                                378ms
vm_write_pc_count:                          139,923,859
vm_write_sp_count:                          199,784,568
vm_write_locals_count:                      134,357,529
vm_write_stack_count:                       134,357,529
vm_write_to_parent_iseq_local_count:            548,930
vm_read_from_parent_iseq_local_count:        15,139,506
guard_type_count:                           154,055,277
guard_type_exit_ratio:                             4.6%
guard_shape_count:                           48,021,695
guard_shape_exit_ratio:                            9.0%
code_region_bytes:                           38,420,480
zjit_alloc_bytes:                            19,997,861
total_mem_bytes:                             58,418,341
side_exit_count:                             14,297,048
total_insn_count:                           930,885,012
vm_insn_count:                              155,773,995
zjit_insn_count:                            775,111,017
ratio_in_zjit:                                    83.3%

zjit/src/hir.rs Outdated
@@ -12907,26 +12933,46 @@
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to not use Array#index because it only considers either the passed arg, or the block. And if you pass both, it'd warn about block not being used.

@launchable-app

This comment has been minimized.

@st0012 st0012 force-pushed the zjit-optimize-send-variadic branch 2 times, most recently from 5691488 to 00e7889 Compare October 24, 2025 22:51
@st0012 st0012 force-pushed the zjit-optimize-send-variadic branch 2 times, most recently from 72f68f3 to 1186b4f Compare October 30, 2025 21:08
@st0012 st0012 force-pushed the zjit-optimize-send-variadic branch from 1186b4f to 555c8b2 Compare November 26, 2025 17:34
@st0012 st0012 force-pushed the zjit-optimize-send-variadic branch from 555c8b2 to 835ac10 Compare November 26, 2025 18:35
gen_stack_overflow_check(jit, asm, state, state.stack_size());

gen_prepare_non_leaf_call(jit, asm, state);
let args_with_recv_len = args.len() + 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The math here is different than gen_ccall_with_frame's because CCallWithFrame doesn't have recv field. I think this is confusing so I'll add a recv to it in a separate PR to make things consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@st0012 st0012 marked this pull request as ready for review November 26, 2025 21:58
@matzbot matzbot requested a review from a team November 26, 2025 21:58
let ci_flags = unsafe { vm_ci_flag(call_info) };

// When seeing &block argument, fall back to dynamic dispatch for now
// TODO: Support block forwarding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, @tekknolagi suggested before that TODOs have to be specifically targeted to ensure that they have someone to follow up on.

Like todo(aidenfoxivey): foo bar baz. I could be wrong though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is different then I'm aware of. I can create an issue for it though.

Copy link
Member

@k0kubun k0kubun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@st0012 st0012 merged commit c58970b into ruby:master Dec 1, 2025
92 checks passed
@st0012 st0012 deleted the zjit-optimize-send-variadic branch December 1, 2025 17:14
tagomoris pushed a commit to tagomoris/ruby that referenced this pull request Dec 2, 2025
…#14898)

ZJIT: Optimize variadic cfunc Send calls into CCallVariadic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize variadic cfunc send

4 participants