Skip to content

Conversation

@nozomemein
Copy link
Contributor

@nozomemein nozomemein commented Dec 27, 2025

Closes: Shopify#804

Benchmark

loops-times

  • wall clock time
    • before patch: Average of last 10, non-warmup iters: 3557ms
    • after patch: Average of last 10, non-warmup iters: 3362ms
  • zjit stats below
before patch
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (100.0% of total 400,106,363):
              Array#[]=: 400,099,942 (100.0%)
       Numeric#nonzero?:       1,686 ( 0.0%)
               String#+:         737 ( 0.0%)
             File.file?:         737 ( 0.0%)
             Array#any?:         593 ( 0.0%)
          Regexp#match?:         565 ( 0.0%)
              String#-@:         286 ( 0.0%)
           String#split:         230 ( 0.0%)
          String#chomp!:         230 ( 0.0%)
       File.expand_path:         218 ( 0.0%)
  String#delete_prefix!:         196 ( 0.0%)
              String#[]:         196 ( 0.0%)
     String#start_with?:         196 ( 0.0%)
       String#end_with?:         196 ( 0.0%)
            String#to_i:         119 ( 0.0%)
           String#gsub!:          90 ( 0.0%)
           String#strip:          90 ( 0.0%)
            File.exist?:          20 ( 0.0%)
             Array#join:          13 ( 0.0%)
          Array#compact:          13 ( 0.0%)
Top-20 calls to C functions from JIT code (100.0% of total 1,600,309,241):
                          rb_ary_entry: 400,100,742 (25.0%)
                             Array#[]=: 400,099,942 (25.0%)
                        rb_fix_mod_fix: 399,999,971 (25.0%)
                     rb_vm_invokeblock: 399,984,346 (25.0%)
                            rb_vm_send:     100,406 ( 0.0%)
          rb_vm_opt_send_without_block:       3,389 ( 0.0%)
  rb_zjit_writebarrier_check_immediate:       2,711 ( 0.0%)
                                _bi290:       1,746 ( 0.0%)
                      Numeric#nonzero?:       1,686 ( 0.0%)
                             String#==:       1,681 ( 0.0%)
                       rb_vm_env_write:         985 ( 0.0%)
        rb_ivar_get_at_no_ractor_check:         864 ( 0.0%)
                           rb_gvar_get:         842 ( 0.0%)
                            File.file?:         737 ( 0.0%)
                              String#+:         737 ( 0.0%)
                           io_readline:         646 ( 0.0%)
                                  any?:         593 ( 0.0%)
                         Regexp#match?:         565 ( 0.0%)
                          Array#empty?:         550 ( 0.0%)
                     rb_obj_is_kind_of:         452 ( 0.0%)
Top-1 not optimized method types for send (100.0% of total 100,167):
  iseq: 100,167 (100.0%)
Top-3 instructions with uncategorized fallback reason (100.0% of total 399,984,689):
             invokeblock: 399,984,346 (100.0%)
             invokesuper:         302 ( 0.0%)
  opt_send_without_block:          41 ( 0.0%)
Top-8 send fallback reasons (100.0% of total 400,088,443):
                            uncategorized: 399,984,689 (100.0%)
           send_not_optimized_method_type:     100,167 ( 0.0%)
           send_without_block_polymorphic:       1,727 ( 0.0%)
             one_or_more_complex_arg_pass:         806 ( 0.0%)
           send_without_block_no_profiles:         553 ( 0.0%)
  send_without_block_cfunc_array_variadic:         261 ( 0.0%)
                 obj_to_string_not_string:         197 ( 0.0%)
                         send_no_profiles:          43 ( 0.0%)
Top-1 setivar fallback reasons (100.0% of total 41):
  not_monomorphic: 41 (100.0%)
Top-1 getivar fallback reasons (100.0% of total 219):
  not_monomorphic: 219 (100.0%)
Top-1 invokeblock handler (100.0% of total 399,984,346):
  monomorphic_iseq: 399,984,346 (100.0%)
Top-2 popular complex argument-parameter features not optimized (100.0% of total 806):
     param_kw_opt: 610 (75.7%)
  caller_blockarg: 196 (24.3%)
Top-1 unhandled YARV insns (100.0% of total 13):
  getconstant: 13 (100.0%)
Top-4 side exit reasons (100.0% of total 135):
                  guard_shape_failure: 77 (57.0%)
                   guard_type_failure: 42 (31.1%)
                  unhandled_yarv_insn: 13 ( 9.6%)
  block_param_proxy_not_iseq_or_ifunc:  3 ( 2.2%)
                             send_count: 2,800,482,956
                     dynamic_send_count:   400,088,443 (14.3%)
                   optimized_send_count: 2,400,394,513 (85.7%)
                  dynamic_setivar_count:            41 ( 0.0%)
                  dynamic_getivar_count:           219 ( 0.0%)
              dynamic_definedivar_count:             0 ( 0.0%)
              iseq_optimized_send_count:        13,645 ( 0.0%)
      inline_cfunc_optimized_send_count: 2,000,273,767 (71.4%)
       inline_iseq_optimized_send_count:           738 ( 0.0%)
non_variadic_cfunc_optimized_send_count:         3,684 ( 0.0%)
    variadic_cfunc_optimized_send_count:   400,102,679 (14.3%)
compiled_iseq_count:                                      68
failed_iseq_count:                                         0
compile_time:                                           20ms
profile_time:                                            0ms
gc_time:                                                 0ms
invalidation_time:                                       0ms
vm_write_pc_count:                               800,214,327
vm_write_sp_count:                               800,214,327
vm_write_locals_count:                           800,211,325
vm_write_stack_count:                            800,211,325
vm_write_to_parent_iseq_local_count:                       0
vm_read_from_parent_iseq_local_count:          1,200,202,112
guard_type_count:                              2,800,630,070
guard_type_exit_ratio:                                  0.0%
guard_shape_count:                                    22,778
guard_shape_exit_ratio:                                 0.3%
code_region_bytes:                                   442,368
zjit_alloc_bytes:                                    369,281
total_mem_bytes:                                     811,649
side_exit_count:                                         135
total_insn_count:                              9,204,000,761
vm_insn_count:                                     1,193,386
zjit_insn_count:                               9,202,807,375
ratio_in_zjit:                                        100.0%
after patch
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (100.0% of total 5,833):
       Numeric#nonzero?: 1,591 (27.3%)
             File.file?:   686 (11.8%)
               String#+:   686 (11.8%)
             Array#any?:   550 ( 9.4%)
          Regexp#match?:   449 ( 7.7%)
              String#-@:   262 ( 4.5%)
           String#split:   213 ( 3.7%)
          String#chomp!:   213 ( 3.7%)
       File.expand_path:   200 ( 3.4%)
  String#delete_prefix!:   182 ( 3.1%)
       String#end_with?:   182 ( 3.1%)
     String#start_with?:   182 ( 3.1%)
              String#[]:   182 ( 3.1%)
           String#gsub!:    80 ( 1.4%)
           String#strip:    80 ( 1.4%)
            String#to_i:    58 ( 1.0%)
            File.exist?:    17 ( 0.3%)
          Array#compact:     9 ( 0.2%)
             Array#join:     9 ( 0.2%)
       Module#const_set:     1 ( 0.0%)
Top-20 calls to C functions from JIT code (100.0% of total 1,200,206,966):
                          rb_ary_entry: 400,100,670 (33.3%)
                        rb_fix_mod_fix: 399,999,971 (33.3%)
                     rb_vm_invokeblock: 399,984,318 (33.3%)
                            rb_vm_send:     100,357 ( 0.0%)
          rb_vm_opt_send_without_block:       2,987 ( 0.0%)
  rb_zjit_writebarrier_check_immediate:       2,480 ( 0.0%)
                                _bi290:       1,630 ( 0.0%)
                      Numeric#nonzero?:       1,591 ( 0.0%)
                             String#==:       1,589 ( 0.0%)
                       rb_vm_env_write:         914 ( 0.0%)
                           rb_gvar_get:         786 ( 0.0%)
        rb_ivar_get_at_no_ractor_check:         784 ( 0.0%)
                            File.file?:         686 ( 0.0%)
                              String#+:         686 ( 0.0%)
                           io_readline:         604 ( 0.0%)
                                  any?:         550 ( 0.0%)
                          Array#empty?:         506 ( 0.0%)
                         Regexp#match?:         449 ( 0.0%)
                                 _bi12:         430 ( 0.0%)
                          rb_hash_aref:         371 ( 0.0%)
Top-1 not optimized method types for send (100.0% of total 100,153):
  iseq: 100,153 (100.0%)
Top-3 instructions with uncategorized fallback reason (100.0% of total 399,984,616):
             invokeblock: 399,984,318 (100.0%)
             invokesuper:         278 ( 0.0%)
  opt_send_without_block:          20 ( 0.0%)
Top-8 send fallback reasons (100.0% of total 400,087,940):
                            uncategorized: 399,984,616 (100.0%)
           send_not_optimized_method_type:     100,153 ( 0.0%)
           send_without_block_polymorphic:       1,611 ( 0.0%)
             one_or_more_complex_arg_pass:         746 ( 0.0%)
           send_without_block_no_profiles:         373 ( 0.0%)
  send_without_block_cfunc_array_variadic:         236 ( 0.0%)
                 obj_to_string_not_string:         183 ( 0.0%)
                         send_no_profiles:          22 ( 0.0%)
Top-1 setivar fallback reasons (100.0% of total 20):
  not_monomorphic: 20 (100.0%)
Top-1 getivar fallback reasons (100.0% of total 113):
  not_monomorphic: 113 (100.0%)
Top-1 invokeblock handler (100.0% of total 399,984,318):
  monomorphic_iseq: 399,984,318 (100.0%)
Top-2 popular complex argument-parameter features not optimized (100.0% of total 746):
     param_kw_opt: 564 (75.6%)
  caller_blockarg: 182 (24.4%)
Top-1 unhandled YARV insns (100.0% of total 9):
  getconstant: 9 (100.0%)
Top-4 side exit reasons (100.0% of total 98):
                  guard_shape_failure: 50 (51.0%)
                   guard_type_failure: 37 (37.8%)
                  unhandled_yarv_insn:  9 ( 9.2%)
  block_param_proxy_not_iseq_or_ifunc:  2 ( 2.0%)
                             send_count: 2,800,480,306
                     dynamic_send_count:   400,087,940 (14.3%)
                   optimized_send_count: 2,400,392,366 (85.7%)
                  dynamic_setivar_count:            20 ( 0.0%)
                  dynamic_getivar_count:           113 ( 0.0%)
              dynamic_definedivar_count:             0 ( 0.0%)
              iseq_optimized_send_count:        12,648 ( 0.0%)
      inline_cfunc_optimized_send_count: 2,400,373,175 (85.7%)
       inline_iseq_optimized_send_count:           710 ( 0.0%)
non_variadic_cfunc_optimized_send_count:         3,434 ( 0.0%)
    variadic_cfunc_optimized_send_count:         2,399 ( 0.0%)
compiled_iseq_count:                                      68
failed_iseq_count:                                         0
compile_time:                                           22ms
profile_time:                                            0ms
gc_time:                                                 0ms
invalidation_time:                                       0ms
vm_write_pc_count:                               400,111,882
vm_write_sp_count:                               400,111,882
vm_write_locals_count:                           400,109,062
vm_write_stack_count:                            400,109,062
vm_write_to_parent_iseq_local_count:                       0
vm_read_from_parent_iseq_local_count:          1,200,201,947
guard_type_count:                              3,200,726,488
guard_type_exit_ratio:                                  0.0%
guard_shape_count:                                    21,063
guard_shape_exit_ratio:                                 0.2%
code_region_bytes:                                   442,368
zjit_alloc_bytes:                                    374,834
total_mem_bytes:                                     817,202
side_exit_count:                                          98
total_insn_count:                              9,203,984,554
vm_insn_count:                                     1,190,946
zjit_insn_count:                               9,202,793,608
ratio_in_zjit:                                        100.0%

optcarrot

  • wall clock time
    • before patch: Average of last 10, non-warmup iters: 7408 ms
    • after patch: Average of last 10, non-warmup iters: 7269 ms
  • zjit stats below
before patch
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (100.0% of total 152,051,106):
             Integer#[]: 73,418,673 (48.3%)
              Array#[]=: 39,375,756 (25.9%)
          Array#rotate!: 38,357,428 (25.2%)
             Integer#<=:    405,116 ( 0.3%)
              Method#[]:    397,510 ( 0.3%)
             Integer#>>:     38,868 ( 0.0%)
            Array#clear:     14,913 ( 0.0%)
            Fiber.yield:      4,971 ( 0.0%)
  Process.clock_gettime:      4,971 ( 0.0%)
                Float#/:      4,971 ( 0.0%)
            Array#shift:      4,971 ( 0.0%)
                Float#-:      4,971 ( 0.0%)
           Array#concat:      4,971 ( 0.0%)
               Float#**:      4,971 ( 0.0%)
       Numeric#nonzero?:      1,686 ( 0.0%)
               String#+:        737 ( 0.0%)
             File.file?:        737 ( 0.0%)
             Array#any?:        593 ( 0.0%)
          Regexp#match?:        565 ( 0.0%)
            Float#floor:        483 ( 0.0%)
Top-20 calls to C functions from JIT code (100.0% of total 1,628,197,525):
                          rb_ary_entry: 425,336,671 (26.1%)
          rb_vm_opt_send_without_block: 390,605,606 (24.0%)
  rb_zjit_writebarrier_check_immediate: 330,061,301 (20.3%)
             rb_vm_getinstancevariable: 201,447,225 (12.4%)
                            Integer#[]:  73,418,673 ( 4.5%)
             rb_vm_setinstancevariable:  55,673,852 ( 3.4%)
                     rb_vm_splat_array:  48,937,000 ( 3.0%)
                             Array#[]=:  39,375,756 ( 2.4%)
                         Array#rotate!:  38,357,428 ( 2.4%)
                    rb_jit_fix_div_fix:  11,105,867 ( 0.7%)
            rb_vm_opt_getconstant_path:   4,663,515 ( 0.3%)
                           rb_ary_push:   3,841,256 ( 0.2%)
                            Array#size:   3,700,369 ( 0.2%)
                            Integer#<=:     405,116 ( 0.0%)
                             Method#[]:     397,510 ( 0.0%)
                                _bi125:     215,032 ( 0.0%)
                          rb_hash_aref:     197,218 ( 0.0%)
                     rb_vm_invokesuper:     117,844 ( 0.0%)
                        rb_fix_mod_fix:     115,073 ( 0.0%)
             rb_ec_ary_new_from_values:     101,649 ( 0.0%)
Top-1 not optimized method types for send (100.0% of total 196):
  iseq: 196 (100.0%)
Top-1 not optimized method types for send_without_block (100.0% of total 50,363,393):
  optimized_send: 50,363,393 (100.0%)
Top-3 instructions with uncategorized fallback reason (100.0% of total 8,340,348):
  opt_send_without_block: 8,222,158 (98.6%)
             invokesuper:   117,844 ( 1.4%)
             invokeblock:       346 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 390,724,231):
                          send_without_block_no_profiles: 283,036,363 (72.4%)
  send_without_block_not_optimized_method_type_optimized:  50,363,393 (12.9%)
                            one_or_more_complex_arg_pass:  48,937,706 (12.5%)
                                           uncategorized:   8,340,348 ( 2.1%)
                          send_without_block_polymorphic:      45,724 ( 0.0%)
                 send_without_block_cfunc_array_variadic:         261 ( 0.0%)
                                obj_to_string_not_string:         197 ( 0.0%)
                          send_not_optimized_method_type:         196 ( 0.0%)
                                        send_no_profiles:          43 ( 0.0%)
Top-1 setivar fallback reasons (100.0% of total 55,673,852):
  not_monomorphic: 55,673,852 (100.0%)
Top-1 getivar fallback reasons (100.0% of total 201,447,225):
  not_monomorphic: 201,447,225 (100.0%)
Top-1 invokeblock handler (100.0% of total 346):
  monomorphic_iseq: 346 (100.0%)
Top-3 popular complex argument-parameter features not optimized (100.0% of total 48,937,706):
     caller_splat: 48,936,900 (100.0%)
     param_kw_opt:        610 ( 0.0%)
  caller_blockarg:        196 ( 0.0%)
Top-1 unhandled YARV insns (100.0% of total 13):
  getconstant: 13 (100.0%)
Top-1 unhandled HIR insns (100.0% of total 3):
  throw: 3 (100.0%)
Top-7 side exit reasons (100.0% of total 1,157,436,440):
                  guard_shape_failure: 1,157,329,455 (100.0%)
                   guard_type_failure:       102,122 ( 0.0%)
                 fixnum_mult_overflow:         4,361 ( 0.0%)
          unhandled_newarray_send_min:           483 ( 0.0%)
                  unhandled_yarv_insn:            13 ( 0.0%)
                   unhandled_hir_insn:             3 ( 0.0%)
  block_param_proxy_not_iseq_or_ifunc:             3 ( 0.0%)
                             send_count: 2,295,539,890
                     dynamic_send_count:   390,724,231 (17.0%)
                   optimized_send_count: 1,904,815,659 (83.0%)
                  dynamic_setivar_count:    55,673,852 ( 2.4%)
                  dynamic_getivar_count:   201,447,225 ( 8.8%)
              dynamic_definedivar_count:             0 ( 0.0%)
              iseq_optimized_send_count:   167,512,426 ( 7.3%)
      inline_cfunc_optimized_send_count: 1,585,025,115 (69.0%)
       inline_iseq_optimized_send_count:       225,879 ( 0.0%)
non_variadic_cfunc_optimized_send_count:       479,686 ( 0.0%)
    variadic_cfunc_optimized_send_count:   151,572,553 ( 6.6%)
compiled_iseq_count:                                    272
failed_iseq_count:                                        0
compile_time:                                         100ms
profile_time:                                           1ms
gc_time:                                                0ms
invalidation_time:                                      0ms
vm_write_pc_count:                              835,044,014
vm_write_sp_count:                              835,044,014
vm_write_locals_count:                          819,764,140
vm_write_stack_count:                           819,764,140
vm_write_to_parent_iseq_local_count:                      0
vm_read_from_parent_iseq_local_count:             6,173,725
guard_type_count:                             6,944,913,489
guard_type_exit_ratio:                                 0.0%
guard_shape_count:                            4,612,440,704
guard_shape_exit_ratio:                               25.1%
code_region_bytes:                                2,375,680
zjit_alloc_bytes:                                 1,669,354
total_mem_bytes:                                  4,045,034
side_exit_count:                              1,157,436,440
total_insn_count:                            33,993,871,829
vm_insn_count:                               21,346,722,601
zjit_insn_count:                             12,647,149,228
ratio_in_zjit:                                        37.2%
after patch
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (100.0% of total 151,032,190):
             Integer#[]: 73,418,673 (48.6%)
              Array#[]=: 38,357,428 (25.4%)
          Array#rotate!: 38,357,428 (25.4%)
             Integer#<=:    405,116 ( 0.3%)
              Method#[]:    397,510 ( 0.3%)
             Integer#>>:     38,868 ( 0.0%)
            Array#clear:     14,913 ( 0.0%)
                Float#/:      4,971 ( 0.0%)
  Process.clock_gettime:      4,971 ( 0.0%)
                Float#-:      4,971 ( 0.0%)
           Array#concat:      4,971 ( 0.0%)
            Fiber.yield:      4,971 ( 0.0%)
               Float#**:      4,971 ( 0.0%)
            Array#shift:      4,971 ( 0.0%)
       Numeric#nonzero?:      1,591 ( 0.0%)
               String#+:        686 ( 0.0%)
             File.file?:        686 ( 0.0%)
             Array#any?:        550 ( 0.0%)
            Float#floor:        483 ( 0.0%)
              Integer#*:        483 ( 0.0%)
Top-20 calls to C functions from JIT code (100.0% of total 1,628,015,382):
                          rb_ary_entry: 425,332,561 (26.1%)
          rb_vm_opt_send_without_block: 390,605,204 (24.0%)
  rb_zjit_writebarrier_check_immediate: 330,909,683 (20.3%)
             rb_vm_getinstancevariable: 201,447,119 (12.4%)
                            Integer#[]:  73,418,673 ( 4.5%)
             rb_vm_setinstancevariable:  55,673,831 ( 3.4%)
                     rb_vm_splat_array:  48,937,000 ( 3.0%)
                         Array#rotate!:  38,357,428 ( 2.4%)
                             Array#[]=:  38,357,428 ( 2.4%)
                    rb_jit_fix_div_fix:  11,105,867 ( 0.7%)
            rb_vm_opt_getconstant_path:   4,663,513 ( 0.3%)
                           rb_ary_push:   3,839,237 ( 0.2%)
                            Array#size:   3,700,361 ( 0.2%)
                            Integer#<=:     405,116 ( 0.0%)
                             Method#[]:     397,510 ( 0.0%)
                                _bi125:     215,032 ( 0.0%)
                          rb_hash_aref:     197,187 ( 0.0%)
                     rb_vm_invokesuper:     117,820 ( 0.0%)
                        rb_fix_mod_fix:     115,073 ( 0.0%)
             rb_ec_ary_new_from_values:      97,605 ( 0.0%)
Top-1 not optimized method types for send (100.0% of total 182):
  iseq: 182 (100.0%)
Top-1 not optimized method types for send_without_block (100.0% of total 50,363,393):
  optimized_send: 50,363,393 (100.0%)
Top-3 instructions with uncategorized fallback reason (100.0% of total 8,340,275):
  opt_send_without_block: 8,222,137 (98.6%)
             invokesuper:   117,820 ( 1.4%)
             invokeblock:       318 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 390,723,728):
                          send_without_block_no_profiles: 283,036,183 (72.4%)
  send_without_block_not_optimized_method_type_optimized:  50,363,393 (12.9%)
                            one_or_more_complex_arg_pass:  48,937,646 (12.5%)
                                           uncategorized:   8,340,275 ( 2.1%)
                          send_without_block_polymorphic:      45,608 ( 0.0%)
                 send_without_block_cfunc_array_variadic:         236 ( 0.0%)
                                obj_to_string_not_string:         183 ( 0.0%)
                          send_not_optimized_method_type:         182 ( 0.0%)
                                        send_no_profiles:          22 ( 0.0%)
Top-1 setivar fallback reasons (100.0% of total 55,673,831):
  not_monomorphic: 55,673,831 (100.0%)
Top-1 getivar fallback reasons (100.0% of total 201,447,119):
  not_monomorphic: 201,447,119 (100.0%)
Top-1 invokeblock handler (100.0% of total 318):
  monomorphic_iseq: 318 (100.0%)
Top-3 popular complex argument-parameter features not optimized (100.0% of total 48,937,646):
     caller_splat: 48,936,900 (100.0%)
     param_kw_opt:        564 ( 0.0%)
  caller_blockarg:        182 ( 0.0%)
Top-1 unhandled YARV insns (100.0% of total 9):
  getconstant: 9 (100.0%)
Top-1 unhandled HIR insns (100.0% of total 3):
  throw: 3 (100.0%)
Top-8 side exit reasons (100.0% of total 1,157,438,426):
                  guard_shape_failure: 1,157,329,428 (100.0%)
                   guard_type_failure:       102,117 ( 0.0%)
                 fixnum_mult_overflow:         4,361 ( 0.0%)
                   guard_less_failure:         2,023 ( 0.0%)
          unhandled_newarray_send_min:           483 ( 0.0%)
                  unhandled_yarv_insn:             9 ( 0.0%)
                   unhandled_hir_insn:             3 ( 0.0%)
  block_param_proxy_not_iseq_or_ifunc:             2 ( 0.0%)
                             send_count: 2,295,523,103
                     dynamic_send_count:   390,723,728 (17.0%)
                   optimized_send_count: 1,904,799,375 (83.0%)
                  dynamic_setivar_count:    55,673,831 ( 2.4%)
                  dynamic_getivar_count:   201,447,119 ( 8.8%)
              dynamic_definedivar_count:             0 ( 0.0%)
              iseq_optimized_send_count:   167,511,429 ( 7.3%)
      inline_cfunc_optimized_send_count: 1,586,028,772 (69.1%)
       inline_iseq_optimized_send_count:       225,851 ( 0.0%)
non_variadic_cfunc_optimized_send_count:       479,436 ( 0.0%)
    variadic_cfunc_optimized_send_count:   150,553,887 ( 6.6%)
compiled_iseq_count:                                    272
failed_iseq_count:                                        0
compile_time:                                          97ms
profile_time:                                           0ms
gc_time:                                                0ms
invalidation_time:                                      0ms
vm_write_pc_count:                              834,017,126
vm_write_sp_count:                              834,017,126
vm_write_locals_count:                          818,743,491
vm_write_stack_count:                           818,743,491
vm_write_to_parent_iseq_local_count:                      0
vm_read_from_parent_iseq_local_count:             6,173,560
guard_type_count:                             6,945,465,080
guard_type_exit_ratio:                                 0.0%
guard_shape_count:                            4,612,438,989
guard_shape_exit_ratio:                               25.1%
code_region_bytes:                                2,375,680
zjit_alloc_bytes:                                 1,715,901
total_mem_bytes:                                  4,091,581
side_exit_count:                              1,157,438,426
total_insn_count:                            33,993,859,664
vm_insn_count:                               21,346,776,705
zjit_insn_count:                             12,647,082,959
ratio_in_zjit:                                        37.2%

@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch 2 times, most recently from c3280a3 to a7e915e Compare December 27, 2025 09:47
@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch from a7e915e to 5ef58f9 Compare December 31, 2025 06:21
@launchable-app

This comment has been minimized.

@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch 2 times, most recently from 64f31f5 to a7a557d Compare December 31, 2025 23:54
@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch from a7a557d to 19d5b63 Compare January 1, 2026 00:11
@tekknolagi
Copy link
Contributor

Thank you for the PR. A couple of notes:

  • We want to limit to in-bounds (defined as nonnegative index) -- do you have stats on how often people use that vs the negative indexing? This is because:
  • If it's truly the fast & most common path, we shouldn't need to do a function call & we can instead emit an inline write instruction
  • We absolutely need a length guard an frozen-ness check because otherwise it might raise or allocate
  • There is maybe also something about a shared check too (?) but I don't fully know how that works. I will see what YJIT does here
  • Array and Hash are different in that Hash will ~always necessarily do hash/equality checks and this is ~always running arbitrary code, whereas an array with a Fixnum index can just be a write, which means that the lack of a frame is not (as easily) observable

@tekknolagi
Copy link
Contributor

Also, it seems like the inlined/not-inlined stats did not change much before/after your change. This indicates to me something is wrong here (not being applied fully)

@nozomemein
Copy link
Contributor Author

@tekknolagi
Thank you for the detailed feedback!

do you have stats on how often people use that vs the negative indexing? This is because:

Yes, I collected some stats here. It looks like the vast majority of the inline fast-path cases are in-bounds.
Although it is not always, would it make sense to emit an inline write and side-exit for the other cases?

#15747 (comment)

Also, it seems like the inlined/not-inlined stats did not change much before/after your change. This indicates to me something is wrong here (not being applied fully)

Well...this is because there are some cases where Array#[]= has arguments not as Fixnum, which is not covered by the current implementation.

ary = [0, 1, 2, 3]
ary[1, 2] = ["a", "b", "c", "d"]
p ary                        #=> [0, "a", "b", "c", "d", 3]

ary = [0, 1, 2, 3, 4, 5]
ary[0..2] = ["a", "b"]
p ary  # => ["a", "b", 3, 4, 5]

Would you prefer that I keep this PR focused on the Fixnum-index fast path, and handle slice/range assignment in a separate PR? Or should I expand this PR to cover those cases as well?

@tekknolagi
Copy link
Contributor

tekknolagi commented Jan 5, 2026

Maybe I am being naive, but I would be shocked if range aset was more common than fixnum aset. This feels to me like there is something going wrong with this specialization, meaning it only applies in vanishingly small cases

Also, your definition of in-bounds includes negatives (which require adjustment) -- it is important to know the prevalence of positive vs negative indices here

@tekknolagi
Copy link
Contributor

Oh, these are numbers for a very specific benchmark. Let me actually go look at optcarrot, sorry

@tekknolagi
Copy link
Contributor

Ah, yes, I am being naive 😓

    def load_tiles
      return unless @any_show
      @bg_pixels.rotate!(8)
      @bg_pixels[@scroll_xfine, 8] = @bg_pattern_lut[@bg_pattern]
    end

is very hot and the source of our troubles here

@tekknolagi
Copy link
Contributor

Ok, well, we can also look at the loops-times benchmark (which, despite being silly, does have a lot of array+fixnum aset). This PR should be able to get rid of 80MM method calls and turn them into pointer writes

@nozomemein
Copy link
Contributor Author

nozomemein commented Jan 5, 2026

Also, your definition of in-bounds includes negatives (which require adjustment) -- it is important to know the prevalence of positive vs negative indices here

I have collected the stats which includes positive vs negative indices below, and it seems it's always positive index.

in-bounds/out-bounds check

Terms:

  • array_aset_fixnum_inline_count: Number of times the inline Array#[]= fast path was taken with a Fixnum index.
  • array_aset_fixnum_inline_in_bounds_pos_count: Among those inline calls, count of non‑negative in‑bounds indices (0 <= idx < len).
  • array_aset_fixnum_inline_in_bounds_neg_count: Among those inline calls, count of negative in‑bounds indices (-len <= idx < 0).
  • array_aset_fixnum_inline_oob_pos_count: Among those inline calls, count of non‑negative out‑of‑bounds indices (idx >= len).
  • array_aset_fixnum_inline_oob_neg_count: Among those inline calls, count of negative out‑of‑bounds indices (idx < -len).

Results:

  • liquid renderer
              array_aset_fixnum_inline_count:          35
array_aset_fixnum_inline_in_bounds_pos_count:           0
array_aset_fixnum_inline_in_bounds_neg_count:           0
      array_aset_fixnum_inline_oob_pos_count:          35
      array_aset_fixnum_inline_oob_neg_count:           0
  • rails bench
              array_aset_fixnum_inline_count:     131,681
array_aset_fixnum_inline_in_bounds_pos_count:     131,675
array_aset_fixnum_inline_in_bounds_neg_count:           6
      array_aset_fixnum_inline_oob_pos_count:           0
      array_aset_fixnum_inline_oob_neg_count:           0
  • optcarrot
              array_aset_fixnum_inline_count:     1,018,328
array_aset_fixnum_inline_in_bounds_pos_count:     1,016,305
array_aset_fixnum_inline_in_bounds_neg_count:             0
      array_aset_fixnum_inline_oob_pos_count:         2,023
      array_aset_fixnum_inline_oob_neg_count:             0
  • looptimes
              array_aset_fixnum_inline_count:   400,099,942
array_aset_fixnum_inline_in_bounds_pos_count:   400,099,942
array_aset_fixnum_inline_in_bounds_neg_count:             0
      array_aset_fixnum_inline_oob_pos_count:             0
      array_aset_fixnum_inline_oob_neg_count:             0

@nozomemein

This comment was marked as outdated.

@tekknolagi
Copy link
Contributor

Super! Looking forward to seeing new guards (length+frozen). I think you can get inspiration from YJIT and other ZJIT codegen snippets for the codegen implementation---if you have questions, please ask. I am happy to help.

@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch 2 times, most recently from eb080d2 to 8e83e96 Compare January 6, 2026 09:34
@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch from 8e83e96 to 80d3d61 Compare January 6, 2026 15:03
@nozomemein
Copy link
Contributor Author

@tekknolagi

if you have questions, please ask. I am happy to help.

Thank you !! I have pushed the changes in the commit below, and also left some comments and updated the PR description with the latest benchmark results.

If you have a moment, I’d really appreciate it if you could take another look.

Copy link
Contributor

@tekknolagi tekknolagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good

@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch 4 times, most recently from 92c1436 to c0d9013 Compare January 7, 2026 00:51
- update test cases so that they are tested in simpler way
- rename GuardArrayNotShared -> GuardNotShared
- reuse WriteBarrier HIR after ArrayAsetFixnum call
- remove GuardInBounds HIR and reuse GuardGreaterEqual and GuardLessThan instructions
- rename ArrayAsetFixnum HIR -> ArrayAset
@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch from c0d9013 to 934434d Compare January 7, 2026 01:10
@nozomemein nozomemein marked this pull request as ready for review January 7, 2026 02:05
@matzbot matzbot requested a review from a team January 7, 2026 02:05
Copy link
Contributor

@tekknolagi tekknolagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! let me know when you are done with these edits and ready to merge

@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch from 35bab4d to f5b7bbd Compare January 7, 2026 09:17
@nozomemein nozomemein changed the title ZJIT: Add ArrayAsetFixnum instruction to hir ZJIT: Add ArrayAset instruction to hir Jan 7, 2026
@nozomemein nozomemein force-pushed the zjit-specialize-array-set branch from f5b7bbd to 35f010f Compare January 7, 2026 13:36
@nozomemein
Copy link
Contributor Author

@tekknolagi

Thanks! Fixed in this commit (35f010f)

@tekknolagi
Copy link
Contributor

Let me know when it's ready to merge

@nozomemein
Copy link
Contributor Author

@tekknolagi
I think it's ready to merge 👍

@tekknolagi tekknolagi merged commit 950ffa9 into ruby:master Jan 7, 2026
91 checks passed
@tekknolagi
Copy link
Contributor

Thank you and congratulations!

@nozomemein
Copy link
Contributor Author

Thank you so much, Max! I really appreciate the support.
I’m excited to keep contributing 💪

YO4 pushed a commit to YO4/ruby that referenced this pull request Jan 8, 2026
Inline `Array#[]=` into `ArrayAset`.
@tekknolagi
Copy link
Contributor

Screenshot 2026-01-09 at 9 48 06 AM

:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZJIT: Inline Array#[]= into HIR

2 participants