New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24% reduction of compiled vm_exec_core function #1779

Closed
wants to merge 4 commits into
base: trunk
from

Conversation

1 participant
@shyouhei
Member

shyouhei commented Dec 22, 2017

By carefully arranging PC and SP manipulations, this changeset reduces the binary size of vm_exec_core from 37,040 bytes to 28,304 bytes on my machine.

For instance, instruction opt_plus changes like this (indentation modified):

--- trunk/vm.inc:opt_plus	2017-12-22 16:07:21.000000000 +0900
+++ ours/vm.inc:opt_plus	2017-12-22 16:07:18.000000000 +0900
@@ -1,32 +1,30 @@
 INSN_ENTRY(opt_plus){
+    MAYBE_UNUSED(CALL_CACHE) cc;
+    MAYBE_UNUSED(CALL_INFO) ci;
+    MAYBE_UNUSED(VALUE) obj, recv, val;
+
     START_OF_ORIGINAL_INSN(opt_plus);
-    {
-    VALUE val;
-    CALL_CACHE cc = (CALL_CACHE)GET_OPERAND(2);
-    CALL_INFO ci = (CALL_INFO)GET_OPERAND(1);
-    VALUE recv = TOPN(1);
-    VALUE obj = TOPN(0);
+    ci = (CALL_INFO)GET_OPERAND(1);
+    cc = (CALL_CACHE)GET_OPERAND(2);
+    recv = TOPN(1);
+    obj = TOPN(0);
     DEBUG_ENTER_INSN("opt_plus");
-    ADD_PC(1+2);
-    PREFETCH(GET_PC());
-    POPN(2);
     COLLECT_USAGE_INSN(BIN(opt_plus));
     COLLECT_USAGE_OPERAND(BIN(opt_plus), 0, ci);
     COLLECT_USAGE_OPERAND(BIN(opt_plus), 1, cc);
     {
 #line nnnn "insns.def"
     val = vm_opt_plus(recv, obj);
 
     if (val == Qundef) {
-        /* other */
-        PUSH(recv);
-        PUSH(obj);
-        CALL_SIMPLE_METHOD(recv);
+        DISPATCH_ORIGINAL_INSN(opt_send_without_block);
     }
 #line nnnn "vm.inc"
     }
+    ADD_PC(3);
+    PREFETCH(GET_PC());
+    ADJ_SP(-1);
+    TOPN(0) = val;
     CHECK_VM_STACK_OVERFLOW_FOR_INSN(VM_REG_CFP, 1);
-    PUSH(val);
     END_INSN(opt_plus);
-    }
 }

Here you can see the ADD_PC and POPN / INC_SP macros moved from the prelude of the instruction to the finale. This makes it possible to replace CALL_SIMPLE_METHOD(recv) with DISPATCH_ORIGINAL_INSN(opt_send_without_block). These two macros differ in size very much and results in this big difference in compiled binary size.

And here you are the benchmark results:

screen shot 2017-12-22 at 14 10 53

``` Elapsed time: 347.98301 (sec) ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name 2.4 trunk ours so_ackermann 0.521 0.479 0.475 so_array 0.843 0.952 0.810 so_binary_trees 6.277 6.304 5.971 so_concatenate 4.758 4.475 3.752 so_count_words 0.171 0.171 0.172 so_exception 0.303 0.269 0.265 so_fannkuch 1.153 1.103 1.078 so_fasta 1.712 1.655 1.596 so_k_nucleotide 1.312 1.311 1.316 so_lists 0.534 0.506 0.520 so_mandelbrot 2.661 2.851 2.412 so_matrix 0.556 0.579 0.491 so_meteor_contest 3.263 3.318 3.126 so_nbody 1.470 1.537 1.561 so_nested_loop 1.310 1.370 1.151 so_nsieve 1.782 1.804 1.771 so_nsieve_bits 2.427 2.425 2.343 so_object 0.806 0.755 0.753 so_partial_sums 1.867 2.064 2.054 so_pidigits 1.192 1.179 1.175 so_random 0.420 0.409 0.388 so_reverse_complement 0.609 0.610 0.578 so_sieve 0.501 0.514 0.522 so_spectralnorm 1.876 1.984 1.812

Speedup ratio: compare with the result of `2.4' (greater is better)
name trunk ours
so_ackermann 1.087 1.097
so_array 0.886 1.041
so_binary_trees 0.996 1.051
so_concatenate 1.063 1.268
so_count_words 0.999 0.992
so_exception 1.128 1.142
so_fannkuch 1.046 1.069
so_fasta 1.035 1.073
so_k_nucleotide 1.000 0.997
so_lists 1.054 1.027
so_mandelbrot 0.933 1.103
so_matrix 0.960 1.131
so_meteor_contest 0.983 1.044
so_nbody 0.957 0.942
so_nested_loop 0.957 1.138
so_nsieve 0.988 1.006
so_nsieve_bits 1.001 1.036
so_object 1.067 1.070
so_partial_sums 0.905 0.909
so_pidigits 1.011 1.015
so_random 1.029 1.083
so_reverse_complement 1.000 1.055
so_sieve 0.974 0.960
so_spectralnorm 0.945 1.035

</details>
@shyouhei

This comment has been minimized.

Show comment
Hide comment
@shyouhei

shyouhei Jan 22, 2018

Member

First half of this request was split into #1783 then merged. I had to rebase this branch to the recent trunk.

Member

shyouhei commented Jan 22, 2018

First half of this request was split into #1783 then merged. I had to rebase this branch to the recent trunk.

@shyouhei shyouhei changed the title from 25% reduction of compiled vm_exec_core function to 24% reduction of compiled vm_exec_core function Jan 24, 2018

shyouhei added some commits Dec 17, 2017

extensive use of instruction attributes
Instead of using magic numbers, let us define a series of attributes
and use them from the VM core.  Proper function declarations makes
these attributes inlined in most modern compilers.  On my machine
exact same binary is generated with or without this changeset.
also use sp_inc in vm core
Now that sp_inc attributes are officially provided as inline
functions. Why not use them directly from the vm core, not just
by the compiler. By doing so, it is now possible for us to
optimize stack manipulations. We can now know exactly how many
words of stack space an instruction consumes before it actually
does. This changeset deletes some lines from insns.def because
they are no longer needed.  As a result it reduces the size of
vm_exec_core function from 32,400 bytes to 32,352 bytes on my
machine.

It seems it does not affect performance:

-----------------------------------------------------------
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name    before  after
loop_for         1.093  1.061
loop_generator   1.156  1.152
loop_times       0.982  0.974
loop_whileloop   0.549  0.587
loop_whileloop2  0.115  0.121

Speedup ratio: compare with the result of `before' (greater is better)
name    after
loop_for        1.030
loop_generator  1.003
loop_times      1.008
loop_whileloop  0.935
loop_whileloop2 0.949
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/
Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace
CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros
differ in size very much and results in this big difference in
compiled binary size. This changeset reduces the size of
vm_exec_core from 32,352 bytes to 27,008 bytes on my machine.  As
a result it yields slightly better performance.
Closes [GH-1779].

-----------------------------------------------------------
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name    before  after
so_ackermann     0.484  0.454
so_array         0.837  0.779
so_binary_trees  5.928  5.801
so_concatenate   3.473  3.543
so_count_words   0.201  0.222
so_exception     0.255  0.252
so_fannkuch      1.080  1.019
so_fasta         1.459  1.463
so_k_nucleotide  1.218  1.180
so_lists         0.499  0.484
so_mandelbrot    2.189  2.324
so_matrix        0.510  0.496
so_meteor_contest        3.025  2.925
so_nbody         1.319  1.273
so_nested_loop   0.941  0.932
so_nsieve        1.806  1.647
so_nsieve_bits   2.151  2.078
so_object        0.632  0.621
so_partial_sums  1.560  1.632
so_pidigits      1.190  1.183
so_random        0.333  0.353
so_reverse_complement    0.604  0.586
so_sieve         0.521  0.481
so_spectralnorm  1.774  1.722

Speedup ratio: compare with the result of `before' (greater is better)
name    after
so_ackermann    1.065
so_array        1.075
so_binary_trees 1.022
so_concatenate  0.980
so_count_words  0.903
so_exception    1.009
so_fannkuch     1.059
so_fasta        0.997
so_k_nucleotide 1.032
so_lists        1.032
so_mandelbrot   0.942
so_matrix       1.028
so_meteor_contest       1.034
so_nbody        1.036
so_nested_loop  1.009
so_nsieve       1.097
so_nsieve_bits  1.035
so_object       1.018
so_partial_sums 0.956
so_pidigits     1.006
so_random       0.943
so_reverse_complement   1.032
so_sieve        1.083
so_spectralnorm 1.030
eliminate CALL_SIMPLE_METHOD
Arrange operands of several opt_something insns so that jumps to
opt_send_without_block can be applied to them. This makes it
possible to eliminate CALL_SIMPLE_METHOD macro at all.  Results
in binary size of vm_exec_core to change from 27,008 bytes to
26,016 bytes on my machine.

Note however that PC can point somewhere non-instruction now.

-----------------------------------------------------------
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name    before  after
so_ackermann     0.450  0.426
so_array         0.789  0.824
so_binary_trees  5.760  5.635
so_concatenate   3.594  3.508
so_count_words   0.211  0.196
so_exception     0.256  0.244
so_fannkuch      1.049  1.044
so_fasta         1.485  1.472
so_k_nucleotide  1.195  1.216
so_lists         0.517  0.513
so_mandelbrot    2.264  2.394
so_matrix        0.501  0.468
so_meteor_contest        2.987  2.912
so_nbody         1.307  1.289
so_nested_loop   0.908  0.925
so_nsieve        1.679  1.614
so_nsieve_bits   2.131  2.092
so_object        0.620  0.625
so_partial_sums  1.623  1.675
so_pidigits      1.135  1.190
so_random        0.357  0.321
so_reverse_complement    0.619  0.583
so_sieve         0.493  0.496
so_spectralnorm  1.749  1.737

Speedup ratio: compare with the result of `before' (greater is better)
name    after
so_ackermann    1.057
so_array        0.958
so_binary_trees 1.022
so_concatenate  1.024
so_count_words  1.077
so_exception    1.049
so_fannkuch     1.004
so_fasta        1.009
so_k_nucleotide 0.983
so_lists        1.007
so_mandelbrot   0.946
so_matrix       1.072
so_meteor_contest       1.026
so_nbody        1.013
so_nested_loop  0.982
so_nsieve       1.040
so_nsieve_bits  1.018
so_object       0.992
so_partial_sums 0.969
so_pidigits     0.954
so_random       1.111
so_reverse_complement   1.062
so_sieve        0.994
so_spectralnorm 1.007
@shyouhei

This comment has been minimized.

Show comment
Hide comment
@shyouhei

shyouhei Jan 29, 2018

Member

3234245 is (was) also a part of this pull request that was cherry-picked. Rebased again.

Member

shyouhei commented Jan 29, 2018

3234245 is (was) also a part of this pull request that was cherry-picked. Rebased again.

matzbot pushed a commit that referenced this pull request Jan 29, 2018

s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/
Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace
CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros
differ in size very much and results in this big difference in
compiled binary size. This changeset reduces the size of
vm_exec_core from 32,352 bytes to 27,008 bytes on my machine.  As
a result it yields slightly better performance.
Closes [GH-1779].

-----------------------------------------------------------
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name    before  after
so_ackermann     0.484  0.454
so_array         0.837  0.779
so_binary_trees  5.928  5.801
so_concatenate   3.473  3.543
so_count_words   0.201  0.222
so_exception     0.255  0.252
so_fannkuch      1.080  1.019
so_fasta         1.459  1.463
so_k_nucleotide  1.218  1.180
so_lists         0.499  0.484
so_mandelbrot    2.189  2.324
so_matrix        0.510  0.496
so_meteor_contest        3.025  2.925
so_nbody         1.319  1.273
so_nested_loop   0.941  0.932
so_nsieve        1.806  1.647
so_nsieve_bits   2.151  2.078
so_object        0.632  0.621
so_partial_sums  1.560  1.632
so_pidigits      1.190  1.183
so_random        0.333  0.353
so_reverse_complement    0.604  0.586
so_sieve         0.521  0.481
so_spectralnorm  1.774  1.722

Speedup ratio: compare with the result of `before' (greater is better)
name    after
so_ackermann    1.065
so_array        1.075
so_binary_trees 1.022
so_concatenate  0.980
so_count_words  0.903
so_exception    1.009
so_fannkuch     1.059
so_fasta        0.997
so_k_nucleotide 1.032
so_lists        1.032
so_mandelbrot   0.942
so_matrix       1.028
so_meteor_contest       1.034
so_nbody        1.036
so_nested_loop  1.009
so_nsieve       1.097
so_nsieve_bits  1.035
so_object       1.018
so_partial_sums 0.956
so_pidigits     1.006
so_random       0.943
so_reverse_complement   1.032
so_sieve        1.083
so_spectralnorm 1.030

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

@matzbot matzbot closed this in c788fb4 Jan 29, 2018

@shyouhei

This comment has been minimized.

Show comment
Hide comment
@shyouhei

shyouhei Jan 31, 2018

Member

This is a part of #1419 (not a straight cherry-pick, through).

Member

shyouhei commented Jan 31, 2018

This is a part of #1419 (not a straight cherry-pick, through).

@shyouhei shyouhei deleted the shyouhei:per-insn-annotation branch Jan 31, 2018

@shyouhei shyouhei referenced this pull request Feb 1, 2018

Closed

Rewrite VM generator #1783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment