Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always inline rb_to_integer to prevent a method call penalty for integer types #2001

Closed
wants to merge 1 commit into from

Conversation

methodmissing
Copy link
Contributor

On going through a hardware event trace of optcarrot in Intel VTune Amplifier I noticed rb_to_int showing up in the Core Bound category:

screenshot from 2018-11-03 00-16-43

screenshot from 2018-11-03 00-08-04

I looked through the source in object.c and noticed it calls rb_to_integer which is also invoked from only one other callsite and noticed the early return on RB_INTEGER_TYPE_P for integer types (bignum specifically). On re-running optcarrot CPI (cycles per instruction) went down significantly and I got similar values out of various runs:

screenshot from 2018-11-03 00-09-19

screenshot from 2018-11-03 00-10-00

screenshot from 2018-11-03 00-15-51

Inlining for this case does not bloat code size by much as the function is small and has only 2 callsites but does seem to improve CPI rate quite a bit.

optcarrot

lourens@CarbonX1:~/src/optcarrot$ benchmark-driver -e "inline-rb-to-integer::~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux" -e "trunk::~/src/ruby/trunk/ruby -I~/src/ruby/trunk/lib -I~/src/ruby/trunk/. -I~/src/ruby/trunk/.ext/x86_64-linux" --repeat-count 10 benchmark.yml
Calculating -------------------------------------
                     inline-rb-to-integer       trunk 
           optcarrot               49.232      45.364 fps

Comparison:
                        optcarrot
inline-rb-to-integer:        49.2 fps 
               trunk:        45.4 fps - 1.09x  slower
lourens@CarbonX1:~/src/optcarrot$ benchmark-driver -e "inline-rb-to-integer::~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux" -e "trunk::~/src/ruby/trunk/ruby -I~/src/ruby/trunk/lib -I~/src/ruby/trunk/. -I~/src/ruby/trunk/.ext/x86_64-linux" --repeat-count 10 benchmark.yml
Calculating -------------------------------------
                     inline-rb-to-integer       trunk 
           optcarrot               47.584      43.686 fps

Comparison:
                        optcarrot
inline-rb-to-integer:        47.6 fps 
               trunk:        43.7 fps - 1.09x  slower
lourens@CarbonX1:~/src/optcarrot$ benchmark-driver -e "inline-rb-to-integer::~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux" -e "trunk::~/src/ruby/trunk/ruby -I~/src/ruby/trunk/lib -I~/src/ruby/trunk/. -I~/src/ruby/trunk/.ext/x86_64-linux" --repeat-count 10 benchmark.yml
Calculating -------------------------------------
                     inline-rb-to-integer       trunk 
           optcarrot               45.710      42.930 fps

Comparison:
                        optcarrot
inline-rb-to-integer:        45.7 fps 
               trunk:        42.9 fps - 1.06x  slower

benchmarks

Some noise but I think they're expected for some.

lourens@CarbonX1:~/src/ruby/ruby$ make benchmark COMPARE_RUBY="~/src/ruby/trunk/ruby -I~/src/ruby/trunk/lib -I~/src/ruby/trunk"
/usr/bin/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::~/src/ruby/trunk/ruby -I~/src/ruby/trunk/lib -I~/src/ruby/trunk -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common  -r./prelude --disable-gem" \
            $(find ./benchmark -maxdepth 1 -name '**.yml' -o -name '**.rb' | sort) 
Calculating -------------------------------------
                               compare-ruby  built-ruby 
                    app_answer       45.809      47.424 i/s -       1.000 times in 0.021830s 0.021086s
                   app_aobench        0.030       0.015 i/s -       1.000 times in 33.401929s 65.442848s
                       app_erb       8.694k      8.992k i/s -     15.000k times in 1.725275s 1.668058s
                 app_factorial        1.525       1.605 i/s -       1.000 times in 0.655634s 0.623192s
                       app_fib        3.038       3.093 i/s -       1.000 times in 0.329200s 0.323330s
               app_lc_fizzbuzz        0.044       0.046 i/s -       1.000 times in 22.536389s 21.760657s
                app_mandelbrot        1.015       0.981 i/s -       1.000 times in 0.985123s 1.019837s
                 app_pentomino        0.094       0.094 i/s -       1.000 times in 10.692741s 10.587383s
                     app_raise        7.117       7.887 i/s -       1.000 times in 0.140503s 0.126787s
                 app_strconcat        2.770       2.750 i/s -       1.000 times in 0.360997s 0.363610s
                       app_tak        2.354       2.369 i/s -       1.000 times in 0.424736s 0.422176s
                     app_tarai        2.992       2.951 i/s -       1.000 times in 0.334267s 0.338848s
                       app_uri        2.659       2.675 i/s -       1.000 times in 0.376047s 0.373877s
        array_sample_100k__100       16.996      21.458 i/s -       1.000 times in 0.058838s 0.046602s
       array_sample_100k___10k        0.415       0.421 i/s -       1.000 times in 2.410055s 2.373961s
          array_sample_100k_10      150.050     127.263 i/s -       1.000 times in 0.006664s 0.007858s
          array_sample_100k_11       73.871      80.807 i/s -       1.000 times in 0.013537s 0.012375s
         array_sample_100k__1k        2.454       2.444 i/s -       1.000 times in 0.407573s 0.409243s
       array_sample_100k___50k        0.130       0.120 i/s -       1.000 times in 7.684671s 8.365602s
         array_sample_100k__6k        0.571       0.594 i/s -       1.000 times in 1.752663s 1.682788s
                   array_shift        0.567       0.551 i/s -       1.000 times in 1.762512s 1.814166s
               array_small_and       78.755      68.984 i/s -       1.000 times in 0.012698s 0.014496s
              array_small_diff       70.944      70.082 i/s -       1.000 times in 0.014096s 0.014269s
                array_small_or       53.359      51.713 i/s -       1.000 times in 0.018741s 0.019338s
              array_sort_block        0.202       0.194 i/s -       1.000 times in 4.949996s 5.163504s
              array_sort_float        0.672       0.603 i/s -       1.000 times in 1.487680s 1.659480s
           array_values_at_int      129.999     138.713 i/s -       1.000 times in 0.007692s 0.007209s
         array_values_at_range        4.602       4.557 i/s -       1.000 times in 0.217277s 0.219426s
                       bighash        0.572       0.595 i/s -       1.000 times in 1.746733s 1.680554s
                   dir_empty_p        2.987       3.216 i/s -       1.000 times in 0.334739s 0.310965s
          enum_lazy_grep_v_100        3.489       3.933 i/s -       1.000 times in 0.286580s 0.254274s
           enum_lazy_grep_v_20        6.701       6.839 i/s -       1.000 times in 0.149221s 0.146225s
           enum_lazy_grep_v_50        5.004       5.502 i/s -       1.000 times in 0.199854s 0.181759s
            enum_lazy_uniq_100        2.905       2.975 i/s -       1.000 times in 0.344248s 0.336167s
             enum_lazy_uniq_20        6.005       5.677 i/s -       1.000 times in 0.166531s 0.176153s
             enum_lazy_uniq_50        4.013       3.981 i/s -       1.000 times in 0.249200s 0.251221s
                    erb_render       1.134M      1.124M i/s -      1.500M times in 1.323141s 1.334440s
                    file_chmod        3.153       3.174 i/s -       1.000 times in 0.317199s 0.315099s
                   file_rename        0.885       0.890 i/s -       1.000 times in 1.130221s 1.123085s
           hash_aref_dsym_long        0.299       0.304 i/s -       1.000 times in 3.341323s 3.287371s
                hash_aref_dsym        3.645       3.503 i/s -       1.000 times in 0.274383s 0.285430s
                 hash_aref_fix        3.885       3.842 i/s -       1.000 times in 0.257382s 0.260273s
                 hash_aref_flo       28.924      34.869 i/s -       1.000 times in 0.034574s 0.028679s
                hash_aref_miss        2.799       2.587 i/s -       1.000 times in 0.357325s 0.386502s
                 hash_aref_str        3.012       3.038 i/s -       1.000 times in 0.332015s 0.329192s
            hash_aref_sym_long        2.522       2.317 i/s -       1.000 times in 0.396574s 0.431504s
                 hash_aref_sym        3.360       3.567 i/s -       1.000 times in 0.297597s 0.280347s
                  hash_flatten        5.986       5.841 i/s -       1.000 times in 0.167066s 0.171212s
                hash_ident_flo       31.692      28.871 i/s -       1.000 times in 0.031554s 0.034637s
                hash_ident_num        3.848       3.873 i/s -       1.000 times in 0.259888s 0.258215s
                hash_ident_obj        3.637       3.874 i/s -       1.000 times in 0.274946s 0.258124s
                hash_ident_str        3.889       3.747 i/s -       1.000 times in 0.257120s 0.266857s
                hash_ident_sym        3.646       3.359 i/s -       1.000 times in 0.274290s 0.297705s
                     hash_keys       10.735      11.148 i/s -       1.000 times in 0.093152s 0.089704s
                     hash_long        1.658       1.655 i/s -       1.000 times in 0.603284s 0.604170s
                    hash_shift      118.862     116.037 i/s -       1.000 times in 0.008413s 0.008618s
                hash_shift_u16       15.637      17.641 i/s -       1.000 times in 0.063949s 0.056687s
                hash_shift_u24       15.741      17.693 i/s -       1.000 times in 0.063529s 0.056520s
                hash_shift_u32       15.765      17.494 i/s -       1.000 times in 0.063432s 0.057163s
                   hash_small2        1.859       1.981 i/s -       1.000 times in 0.538045s 0.504818s
                   hash_small4        1.467       1.480 i/s -       1.000 times in 0.681605s 0.675663s
                   hash_small8        0.982       0.999 i/s -       1.000 times in 1.018450s 1.000523s
                  hash_to_proc      372.122     331.097 i/s -       1.000 times in 0.002687s 0.003020s
                   hash_values       10.653      12.222 i/s -       1.000 times in 0.093874s 0.081818s
                       int_quo        1.172       1.152 i/s -       1.000 times in 0.852893s 0.867791s
          io_copy_stream_write        5.732       6.194 i/s -       1.000 times in 0.174461s 0.161444s
   io_copy_stream_write_socket      571.512     382.024 i/s -       1.000 times in 0.001750s 0.002618s
                io_file_create        0.879       0.880 i/s -       1.000 times in 1.137430s 1.136432s
                  io_file_read        0.938       1.026 i/s -       1.000 times in 1.065788s 0.975030s
                 io_file_write        1.456       1.496 i/s -       1.000 times in 0.686973s 0.668469s
             io_nonblock_noex2        0.708       0.705 i/s -       1.000 times in 1.411532s 1.417809s
              io_nonblock_noex        0.615       0.610 i/s -       1.000 times in 1.626382s 1.639540s
                    io_pipe_rw        0.964       0.940 i/s -       1.000 times in 1.037290s 1.063496s
                    io_select2        0.508       0.549 i/s -       1.000 times in 1.970211s 1.822986s
                    io_select3       86.375      75.231 i/s -       1.000 times in 0.011577s 0.013292s
                     io_select        0.594       0.596 i/s -       1.000 times in 1.684734s 1.678993s
                      loop_for        0.986       0.951 i/s -       1.000 times in 1.014360s 1.051968s
                loop_generator        2.214       2.215 i/s -       1.000 times in 0.451772s 0.451390s
                    loop_times        1.067       1.065 i/s -       1.000 times in 0.937584s 0.939224s
               loop_whileloop2       11.961      12.075 i/s -       1.000 times in 0.083602s 0.082813s
                loop_whileloop        2.403       2.400 i/s -       1.000 times in 0.416113s 0.416651s
              marshal_dump_flo        4.774       4.830 i/s -       1.000 times in 0.209468s 0.207049s
       marshal_dump_load_geniv        3.311       3.330 i/s -       1.000 times in 0.302007s 0.300276s
        marshal_dump_load_time        1.515       1.556 i/s -       1.000 times in 0.660234s 0.642545s
                require_thread        0.020       0.022 i/s -       1.000 times in 48.951655s 44.677606s
                       require        1.081       1.425 i/s -       1.000 times in 0.925329s 0.701762s
                  securerandom        4.361       4.455 i/s -       1.000 times in 0.229288s 0.224489s
                  so_ackermann        1.886       1.958 i/s -       1.000 times in 0.530219s 0.510715s
                      so_array        1.076       1.079 i/s -       1.000 times in 0.929349s 0.926620s
               so_binary_trees        0.176       0.190 i/s -       1.000 times in 5.684107s 5.275014s
                so_concatenate        0.293       0.298 i/s -       1.000 times in 3.407739s 3.350192s
                so_count_words        5.840       5.873 i/s -       1.000 times in 0.171237s 0.170261s
                  so_exception        4.356       4.264 i/s -       1.000 times in 0.229543s 0.234545s
                   so_fannkuch        1.607       1.510 i/s -       1.000 times in 0.622432s 0.662223s
                      so_fasta        0.572       0.584 i/s -       1.000 times in 1.749299s 1.711525s
               so_k_nucleotidepreparing /tmp/fasta.output.100000
        1.011       1.051 i/s -       1.000 times in 0.988858s 0.951214s
                      so_lists        2.523       2.375 i/s -       1.000 times in 0.396295s 0.420972s
                 so_mandelbrot        0.515       0.536 i/s -       1.000 times in 1.940491s 1.865239s
                     so_matrix        2.037       2.228 i/s -       1.000 times in 0.490852s 0.448922s
             so_meteor_contest        0.380       0.397 i/s -       1.000 times in 2.633571s 2.519249s
                      so_nbody        0.935       0.920 i/s -       1.000 times in 1.069357s 1.086940s
                so_nested_loop        1.246       1.235 i/s -       1.000 times in 0.802465s 0.809854s
                so_nsieve_bits        0.630       0.632 i/s -       1.000 times in 1.588024s 1.583345s
                     so_nsieve        0.815       0.855 i/s -       1.000 times in 1.227633s 1.170055s
                     so_object        1.917       2.059 i/s -       1.000 times in 0.521755s 0.485698s
               so_partial_sums        0.735       0.733 i/s -       1.000 times in 1.359887s 1.364217s
                   so_pidigits        1.366       1.376 i/s -       1.000 times in 0.732164s 0.726735s
                     so_random        2.481       2.422 i/s -       1.000 times in 0.403125s 0.412854s
         so_reverse_complementpreparing /tmp/fasta.output.2500000
        0.930       0.906 i/s -       1.000 times in 1.075443s 1.103970s
                      so_sieve        2.781       2.753 i/s -       1.000 times in 0.359644s 0.363285s
               so_spectralnorm        0.750       0.739 i/s -       1.000 times in 1.333871s 1.354055s
                  string_index        3.279       2.944 i/s -       1.000 times in 0.304966s 0.339690s
                string_scan_re        6.190       5.883 i/s -       1.000 times in 0.161540s 0.169975s
               string_scan_str        9.493       9.042 i/s -       1.000 times in 0.105339s 0.110599s
                   time_subsec        1.047       1.022 i/s -       1.000 times in 0.955096s 0.978530s
             vm1_attr_ivar_set      42.215M     43.834M i/s -     30.000M times in 0.710645s 0.684393s
                 vm1_attr_ivar      59.357M     56.732M i/s -     30.000M times in 0.505419s 0.528803s
           vm1_blockparam_call      20.041M     19.999M i/s -     30.000M times in 1.496923s 1.500071s
           vm1_blockparam_pass      15.626M     15.740M i/s -     30.000M times in 1.919817s 1.906016s
          vm1_blockparam_yield      23.030M     22.101M i/s -     30.000M times in 1.302659s 1.357430s
                vm1_blockparam      30.343M     31.247M i/s -     30.000M times in 0.988711s 0.960085s
                     vm1_block      30.384M     32.123M i/s -     30.000M times in 0.987352s 0.933916s
                     vm1_const     157.440M    157.258M i/s -     30.000M times in 0.190549s 0.190769s
                    vm1_ensure        2.477       2.439 i/s -       1.000 times in 0.403774s 0.410039s
              vm1_float_simple      15.046M     14.547M i/s -     30.000M times in 1.993948s 2.062275s
            vm1_gc_short_lived       7.772M      7.771M i/s -     30.000M times in 3.859791s 3.860575s
vm1_gc_short_with_complex_long       1.024M      5.381M i/s -     30.000M times in 29.302920s 5.575261s
        vm1_gc_short_with_long       5.587M      5.706M i/s -     30.000M times in 5.369505s 5.258017s
      vm1_gc_short_with_symbol       6.651M      7.002M i/s -     30.000M times in 4.510498s 4.284321s
        vm1_gc_wb_ary_promoted      54.314M     56.756M i/s -     30.000M times in 0.552347s 0.528580s
                 vm1_gc_wb_ary      58.077M     56.551M i/s -     30.000M times in 0.516559s 0.530499s
        vm1_gc_wb_obj_promoted      70.633M     78.933M i/s -     30.000M times in 0.424731s 0.380069s
                 vm1_gc_wb_obj      76.639M     85.005M i/s -     30.000M times in 0.391447s 0.352920s
                  vm1_ivar_set     152.398M    150.701M i/s -     30.000M times in 0.196853s 0.199069s
                      vm1_ivar     145.601M    131.962M i/s -     30.000M times in 0.206043s 0.227338s
                    vm1_length      96.001M    103.800M i/s -     30.000M times in 0.312496s 0.289018s
                 vm1_lvar_init        0.745       0.756 i/s -       1.000 times in 1.342239s 1.322386s
                  vm1_lvar_set      17.827M     17.086M i/s -     30.000M times in 1.682847s 1.755825s
                       vm1_neq      69.489M     72.555M i/s -     30.000M times in 0.431721s 0.413480s
                       vm1_not     192.969M    192.476M i/s -     30.000M times in 0.155465s 0.155864s
                    vm1_rescue     704.467M    702.235M i/s -     30.000M times in 0.042585s 0.042721s
              vm1_simplereturn      79.282M     78.489M i/s -     30.000M times in 0.378398s 0.382221s
                      vm1_swap     152.454M    148.295M i/s -     30.000M times in 0.196781s 0.202300s
                     vm1_yield        1.127       1.109 i/s -       1.000 times in 0.887165s 0.901839s
                     vm2_array      29.059M     28.809M i/s -      6.000M times in 0.206476s 0.208265s
                  vm2_bigarray       1.887M      1.715M i/s -      6.000M times in 3.179650s 3.498251s
                   vm2_bighash     131.382k    121.989k i/s -     60.000k times in 0.456683s 0.491847s
                  vm2_case_lit        2.092       2.137 i/s -       1.000 times in 0.477961s 0.467936s
                      vm2_case      77.211M     85.770M i/s -      6.000M times in 0.077709s 0.069954s
            vm2_defined_method       2.647M      2.729M i/s -      6.000M times in 2.266994s 2.198555s
                      vm2_dstr       7.568M      7.507M i/s -      6.000M times in 0.792848s 0.799207s
                      vm2_eval     382.740k    130.762k i/s -      6.000M times in 15.676439s 45.884974s
              vm2_fiber_switch     988.682k      1.063M i/s -      6.000M times in 6.068686s 5.642898s
              vm2_freezestring       8.092M      7.980M i/s -      6.000M times in 0.741498s 0.751835s
            vm2_method_missing       2.438M      2.518M i/s -      6.000M times in 2.460998s 2.382646s
         vm2_method_with_block       6.234M      6.485M i/s -      6.000M times in 0.962435s 0.925207s
                    vm2_method       7.435M      7.665M i/s -      6.000M times in 0.807036s 0.782742s
      vm2_module_ann_const_set       1.306M      1.396M i/s -      6.000M times in 4.593104s 4.297203s
          vm2_module_const_set       1.386M      1.397M i/s -      6.000M times in 4.327854s 4.296172s
                     vm2_mutex      12.286M     10.523M i/s -      6.000M times in 0.488341s 0.570205s
                 vm2_newlambda      10.858M     11.448M i/s -      6.000M times in 0.552579s 0.524098s
            vm2_poly_method_ov        3.839       3.847 i/s -       1.000 times in 0.260510s 0.259929s
               vm2_poly_method        0.464       0.466 i/s -       1.000 times in 2.155434s 2.147868s
            vm2_poly_singleton        1.032       1.020 i/s -       1.000 times in 0.968624s 0.980597s
                      vm2_proc      35.662M     36.738M i/s -      6.000M times in 0.168244s 0.163317s
                    vm2_raise1       1.879M      1.872M i/s -      6.000M times in 3.192521s 3.205435s
                    vm2_raise2       1.140M      1.201M i/s -      6.000M times in 5.262587s 4.994649s
                    vm2_regexp       7.523M      7.526M i/s -      6.000M times in 0.797524s 0.797240s
                      vm2_send      23.996M     23.441M i/s -      6.000M times in 0.250039s 0.255966s
            vm2_string_literal      47.660M     47.091M i/s -      6.000M times in 0.125893s 0.127413s
        vm2_struct_big_aref_hi      43.924M     52.838M i/s -      6.000M times in 0.136598s 0.113555s
        vm2_struct_big_aref_lo      53.784M     47.241M i/s -      6.000M times in 0.111558s 0.127009s
           vm2_struct_big_aset        4.495       4.560 i/s -       1.000 times in 0.222490s 0.219316s
        vm2_struct_big_href_hi      29.434M     30.270M i/s -      6.000M times in 0.203849s 0.198217s
        vm2_struct_big_href_lo      29.089M     30.707M i/s -      6.000M times in 0.206263s 0.195398s
           vm2_struct_big_hset        3.203       3.086 i/s -       1.000 times in 0.312198s 0.324082s
         vm2_struct_small_aref      80.776M     71.286M i/s -      6.000M times in 0.074279s 0.084168s
         vm2_struct_small_aset        4.863       4.735 i/s -       1.000 times in 0.205629s 0.211180s
         vm2_struct_small_href      36.659M     37.172M i/s -      6.000M times in 0.163671s 0.161411s
         vm2_struct_small_hset      33.299M     32.893M i/s -      6.000M times in 0.180188s 0.182412s
                     vm2_super      20.390M     20.468M i/s -      6.000M times in 0.294265s 0.293140s
                     vm2_unif1      66.830M     64.008M i/s -      6.000M times in 0.089780s 0.093738s
                    vm2_zsuper      19.626M     19.127M i/s -      6.000M times in 0.305721s 0.313694s
                 vm3_backtrace        9.753       9.894 i/s -       1.000 times in 0.102536s 0.101075s
          vm3_clearmethodcache        4.985       5.586 i/s -       1.000 times in 0.200602s 0.179004s
               vm3_gc_old_full        0.373       0.367 i/s -       1.000 times in 2.682735s 2.723101s
          vm3_gc_old_immediate        0.560       0.496 i/s -       1.000 times in 1.786821s 2.015963s
               vm3_gc_old_lazy        0.397       0.391 i/s -       1.000 times in 2.519232s 2.556574s
                        vm3_gc        0.812       0.818 i/s -       1.000 times in 1.231078s 1.223168s
          vm_symbol_block_pass        1.260       1.283 i/s -       1.000 times in 0.793460s 0.779474s
        vm_thread_alive_check1       19.222      22.667 i/s -       1.000 times in 0.052022s 0.044117s
               vm_thread_close        1.881       1.911 i/s -       1.000 times in 0.531724s 0.523305s
            vm_thread_condvar1        1.472       1.440 i/s -       1.000 times in 0.679352s 0.694618s
            vm_thread_condvar2        1.569       1.865 i/s -       1.000 times in 0.637233s 0.536216s
         vm_thread_create_join        1.049       1.123 i/s -       1.000 times in 0.953475s 0.890818s
              vm_thread_mutex1        2.547       2.584 i/s -       1.000 times in 0.392638s 0.386976s
              vm_thread_mutex2        2.629       2.495 i/s -       1.000 times in 0.380407s 0.400727s
              vm_thread_mutex3        1.055       0.874 i/s -       1.000 times in 0.947615s 1.144158s
          vm_thread_pass_flood       19.225      20.829 i/s -       1.000 times in 0.052015s 0.048009s
                vm_thread_pass        2.994       3.211 i/s -       1.000 times in 0.333969s 0.311461s
                vm_thread_pipe        4.506       6.539 i/s -       1.000 times in 0.221919s 0.152933s
               vm_thread_queue       12.701      12.844 i/s -       1.000 times in 0.078732s 0.077859s
        vm_thread_sized_queue2        1.838       1.692 i/s -       1.000 times in 0.544110s 0.591028s
        vm_thread_sized_queue3        1.727       1.774 i/s -       1.000 times in 0.578993s 0.563790s
        vm_thread_sized_queue4        2.817       3.240 i/s -       1.000 times in 0.354992s 0.308627s
         vm_thread_sized_queue        6.463       5.946 i/s -       1.000 times in 0.154718s 0.168182s

@k0kubun
Copy link
Member

k0kubun commented Nov 3, 2018

Cool!

optcarrot

What's the architecture of your machine (I recommend to always use -v of benchmark-driver command when you share results)? I failed to reproduce such a big difference. Compiler's version would be also helpful (I'm using gcc 7.3.0).

$ benchmark-driver benchmark.yml --rbenv 'before::before --disable-gems;after::after --disable-gems' -v --repeat-count 24
before: ruby 2.6.0dev (2018-11-03 trunk 65510) [x86_64-linux]
after: ruby 2.6.0dev (2018-11-03 trunk 65510) [x86_64-linux]
last_commit=Always inline rb_to_integer to prevent a method call penalty
Calculating -------------------------------------
                             before       after
Optcarrot Lan_Master.nes     54.515      54.851 fps

Comparison:
             Optcarrot Lan_Master.nes
                   after:        54.9 fps
                  before:        54.5 fps - 1.01x  slower

@methodmissing
Copy link
Contributor Author

Thanks for the feedback (and benchmark-driver :-) ) - forgot about --disable-gems as well

cpu arch is 4 core i7-8650U:

lourens@CarbonX1:~/asan$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
stepping	: 10
microcode	: 0x96
cpu MHz		: 3200.015
cache size	: 8192 KB

gcc 7.3, with CFLAGS -g -O3:

lourens@CarbonX1:~/src/ruby/trunk$ gcc --version
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

Results less skewed and inline with yours, thanks:

lourens@CarbonX1:~/src/optcarrot$ benchmark-driver -e "inline-rb-to-integer::~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux --disable-gems" -e "trunk::~/src/ruby/trunk/ruby -I~/src/ruby/trunk/lib -I~/src/ruby/trunk/. -I~/src/ruby/trunk/.ext/x86_64-linux --disable-gems" -v --repeat-count 24 benchmark.yml
inline-rb-to-integer: ruby 2.6.0dev (2018-11-03 inline-rb-to-i.. 65513) [x86_64-linux]
last_commit=Always inline rb_to_integer to prevent a method call penalty for integer types
trunk: ruby 2.6.0dev (2018-11-03 trunk 65513) [x86_64-linux]
Calculating -------------------------------------
                     inline-rb-to-integer       trunk 
           optcarrot               49.394      48.343 fps

Comparison:
                        optcarrot
inline-rb-to-integer:        49.4 fps 
               trunk:        48.3 fps - 1.02x  slower

@matzbot matzbot closed this in 38caab2 Nov 3, 2018
@k0kubun
Copy link
Member

k0kubun commented Nov 3, 2018

Now that's what I see on my environment, and it does improve the performance as such. As long as you see that in the Core Bound category, I trust that this is worth merging. 38caab2

Thanks!

@hsbt hsbt added the Backport label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

3 participants