Provide a way for methods to omit their return value (rev.2) #3271

shyouhei · 2020-06-29T07:30:23Z

Abstract

This pull request implements RubyVM.return_value_is_used? (and its CAPI counterpart rb_whether_the_return_value_is_used_p()), which could be used by 3rd party applications/libraries to optimise their code.

Revision history

Provide a way for methods to omit their return value (rev.2) #3271 Revision 2 (current).
- Removed dependency to Omit executing methods/blocks that are side-effect free #1943 (which was a show-stopper). This means we gave up automatic abortion of an instruction sequence.
- Reconsidered names of internal flags.
- Split commits into smaller ones.
send-pop optimisation #2100 Initial revision.

Introduction

Ruby do not force you a style of writing. In Ruby, one thing tends to be doable in more than one ways. This is considered to be a good thing. To make it possible, a return value of a method is not forced to be used: every method can (and does) return possibly multiple values, while its callers are free to ignore them. Even when a method does not expect its callers to take any return values, it tends to return something meaningful "just in case" the expectation breaks.

However, these "just in case" return values rarely gets used in practice. Most of the time they are just silently ignored. They become instant garbage unless referenced elsewhere; which is of course a waste of both time and space. There is a room of improvements around this area.

How often does this happen? We can observe it in the following scheme. The interpreter implements its instructions and runs them in series. This series can be seen as a conceptual language, and its 2-gram can be thought of. By taking such 2-grams of the entire execution of a Ruby program, we can see the ratio of the operation we are talking about.

Following is the top 10 list of 2 grams of the entire execution of mame/optcarrot benchmark:

zsh % LANG=C wc -l 2gram.txt
1143155369
zsh % LANG=C sort 2gram.txt | uniq -c | sort -nr | head -n 10
69065813 getinstancevariable -> getinstancevariable
65600442 putself -> getinstancevariable
59624140 getinstancevariable -> branchunless
59116388 branchunless -> getinstancevariable
52828407 leave -> pop
50434175 getinstancevariable -> putobject
30368815 pop -> putself
27717161 setinstancevariable -> getinstancevariable
25661090 branchunless -> putself
25165032 getinstancevariable -> branchif

Here, the leave instruction (almost) resembles ruby's return statement, and the pop instruction (almost) resembles ruby's ';' delimiter. So the "leave -> pop" output indicates that a method returns a value, and that value is not used. It seems such situation is # 5 most frequent operation in the entire execution of a program, which is about 4.6% of the whole.

Telling methods that their return values are not used

To remedy, we introduce a new method calling convention that allows methods to return arbitrary return values when not used. We do not force them to eliminate unused return values. This is because at the beginning every method in the wild -- especially those written in C -- already returns something. In order not to break existing codes, methods must be allowed to return values even if they are discarded. However, for new ones, let us make room for optimisations.

This is done by setting a 1-bit flag in a method stack frame. Every time a method is called, several flags are set in the VM's stack. We add a flag called VM_FRAME_FLAG_DISCARDED which denotes that the return value is not used.

zsh % ./miniruby --dump=i -e 'discarded_method(used_method);nil'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,33)> (catch: FALSE)
0000 putself                                                          (   1)[Li]
0001 putself
0002 opt_send_without_block                 <calldata!mid:used_method, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0004 opt_send_without_block                 <calldata!mid:discarded_method, argc:1, FCALL|ARGS_SIMPLE|DISCARDED>
0006 pop
0007 putnil
0008 leave

Here in the example above, the return value of discarded_method is not used; thus it is marked as "DISCARDED". OTOH that of used_method is used.

APIs for the flag

We provide Ruby level API RubyVM.return_value_is_used? and its CAPI counterpart rb_whether_the_return_value_is_used_p(). These two API simply look at the frame flag and return whether VM_FRAME_FLAG_DISCARDED is set or not. Because looking at the current frame's flag by calling another method is kind of suboptimal, we also provide a new VM instruction opt_RubyVM_return_value_is_used_ which transparently does the same thing without modifying the execution context at all.

A typical application area for those flags is String#slice! method. This is a method which destructively slices its receiver, and returns what was sliced. However it often is the case that its users are not interested in the return values; this method tends to be used only to mutate the receiver. By skipping allocation of those unused return values, not only does the method runs faster but GC pressures can also be reduced.

Benchmarks

We are only providing APIs in this pull request. Neither positive nor negative effects can be observed.

Calculating -------------------------------------
                             master        ours
Optcarrot Lan_Master.nes     25.222      25.311 fps

Comparison:
             Optcarrot Lan_Master.nes
                    ours:        25.3 fps
                  master:        25.2 fps - 1.00x  slower

Calculating -------------------------------------
                         master        ours
             sinatra    12.967k     12.938k i/s -    100.000k times in 7.711948s 7.729280s

Comparison:
                          sinatra
              master:     12966.9 i/s
                ours:     12937.8 i/s - 1.00x  slower

However when you take a closer look, for instance String#slice! does improve considerably.

Warming up --------------------------------------
        regexp-short     2.190M i/s -      2.277M times in 1.039406s (456.55ns/i)
         regexp-long   159.324k i/s -    161.392k times in 1.012980s (6.28μs/i)
        string-short     4.393M i/s -      4.444M times in 1.011546s (227.62ns/i)
         string-long     1.529M i/s -      1.560M times in 1.020005s (653.81ns/i)
Calculating -------------------------------------
                         master        ours
        regexp-short     2.335M      2.892M i/s -      6.571M times in 2.814090s 2.272405s
         regexp-long   159.398k    162.607k i/s -    477.971k times in 2.998608s 2.939417s
        string-short     5.139M      6.318M i/s -     13.180M times in 2.564713s 2.086098s
         string-long     1.598M      1.656M i/s -      4.588M times in 2.871445s 2.770246s

Comparison:
                     regexp-short
                ours:   2891685.4 i/s
              master:   2335064.6 i/s - 1.24x  slower

                      regexp-long
                ours:    162607.4 i/s
              master:    159397.7 i/s - 1.02x  slower

                     string-short
                ours:   6317880.0 i/s
              master:   5138864.6 i/s - 1.23x  slower

                      string-long
                ours:   1656349.9 i/s
              master:   1597974.5 i/s - 1.04x  slower

Conclusions

We propose Ruby/C APIs for 3rd party programmers to know if a return value is used. They do not speed things up by themselves. However programmers can use them to optimise their own programs.

Future work

A well-known waste of memory is when a block ends with an assignment. "Just in case" the value of that block is used, an array is created to store the assigned values, like this:

% ruby --dump=i -ve '1.times {|i| x, y = self, i }'
ruby 2.8.0dev (2020-06-29T02:06:18Z master 1020e120e0) [aarch64-linux]
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,29)> (catch: FALSE)
== catch table
| catch type: break  st: 0000 ed: 0004 sp: 0000 cont: 0004
| == disasm: #<ISeq:block in <main>@-e:1 (1,8)-(1,29)> (catch: FALSE)
| == catch table
| | catch type: redo   st: 0001 ed: 0014 sp: 0000 cont: 0001
| | catch type: next   st: 0001 ed: 0014 sp: 0000 cont: 0014
| |------------------------------------------------------------------------
| local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
| [ 3] i@0<Arg>   | [ 2] x@1        | [ 1] y@2
| 0000 nop                                                              (   1)[Bc]
| 0001 putself                                [Li]
| 0002 getlocal_WC_0                          i@0
| 0004 newarray                               2
| 0006 dup
| 0007 expandarray                            2, 0
| 0010 setlocal_WC_0                          x@1
| 0012 setlocal_WC_0                          y@2
| 0014 nop
| 0015 leave                                                            (   1)[Br]
|------------------------------------------------------------------------
0000 putobject_INT2FIX_1_                                             (   1)[Li]
0001 send                                   <calldata!mid:times, argc:0>, block in <main>
0004 nop
0005 leave                                                            (   1)

Note the unnecessarily complex output of block in <main>'s disasm. The proposed changeset is not capable of eliminating this newarray-dup-expandarray combo. Because assignments can never be function calls, there is no way for the proposed API to save such situations.

Optimisation of this situation is beyond this proposal. We could think of some automatic detection of such sequences. However assignments are complex. The amount of reconstructing compiler infrastructure is too much.

This is no longer used.

I now think that it is hard to read something like code block looong comment code block Let's move comments around.

This macro is needed when you want to know what is the instruction PC is pointing to. Because PC holds different things depending on dispatch methods, it needs to be implemented this way. Also note that PC tends to point the *next* instruction so be careful.

This frame flag denotes that the return value of the current frame is not used. Callee methods are free to utilize this information, like to avoid allocating return value objects.

The value of class definition statements tend not be used. There are chances for optimisations.

This flag denotes that a call site does not use its return value. This information shall be stored in the vm_ci_flag because it is a compile- time constant. At runtime it shall be propagated to the callee via VM_FRAME_FLAG_DISCAEDED.

Scans the entire instruction sequence to find a send-then-pop sequence. Attribute VM_CALL_DISCAEDED on such occasions. This could be made more efficient when doen at the moment of ADD_SEND, because there already is popped variable. However it is difficult to cover everything that way.

Was a bit too cryptic.

Pass the precompiled flags to vm_push_frame so that callee methods can infer.

The information is passed to this function so why not propagate to the callee.

calling->flags can be nonzero when a ruby method yields a ruby block. Why not tell the block that the result is not used.

This is a CAPI to check if the innermost block/method that the calling function represents is expected to return meaningful return value(s) or not. If an extension library could know such information there could be rooms of optimisation, by avoid allocating return values when not requested.

Looking at how `make rdoc` is working, I noticed that strings allocated inside of StringScanner#scan (which is called from lib/rdoc/markup/parser.rb) are becoming garbages immediately. Why not use rb_whether_the_return_value_is_used_p to detect such situations and skip allocating string objects.

This method modifies its receiver and returns what was sliced. However it often is the case when the intention is modification, return value ignored. Let's not allocate then.

Enumerable#grep is interesting in two things. First, despite almost everybody think it has nothing to do with return value optimisations, it does. The usage without return value can be seen at ext/extmk.rb:368, "grep(/\A#{var}=(.*)/) {return $1}". Second, even when there is no block passed and no return value used at the same time, it cannot be a no-op. We have to reroute [Bug ruby#5801].

RDoc::Parser::RubyTools#skip_tkspace_without_nl is one of such methods that is frequently called with return value discarded. Eliminating the allocated array can benefit both time and memory consumption. The problem is, it is hard to auto-eliminate such wasted return values even when we can tell the method that we don't need them. This is because variables _could_ escape from the scope. For instance, uget_tk() might be an alias of eval(). That is not the case in particular, but auto-detecting such evil activities are very hard -- if not impossible. So to ease the situation we implement RubyVM.return_value_is_used? method. By manually checking that property we can define hand-crafted faster variation of skip_tkspace_without_nl that do not allocate the return value.

MJIT-generated function can have struct rb_calling_info on stack. They need to be properly initialized.

This test failed on my machine, because `foo; foo` do not share their call cache. The second method invocation can still allocate, which is not the intention of the test. We need to loop instead, but with carefully avoiding any other method calls like i += 1, i > 0, etc.

ko1 · 2020-07-01T07:11:19Z

Could you make a redmine ticket to discuss new feature?

ko1 · 2020-07-01T07:24:17Z

vm_exec.h

@@ -64,6 +64,7 @@ error !

 #define START_OF_ORIGINAL_INSN(x) /* ignore */
 #define DISPATCH_ORIGINAL_INSN(x) return LABEL(x)(ec, reg_cfp);
+#define CURRENT_INSN_IS(pop) ((rb_insn_func_t)GET_CURRENT_INSN() == LABEL(pop))


ok to use pop? (insn?)

It's OK because currently pop is the only usage. But insn may be a polite name.

shyouhei · 2020-07-01T09:25:31Z

@ko1:

Could you make a redmine ticket to discuss new feature?

Sure: https://bugs.ruby-lang.org/issues/17004

ioquatix · 2020-07-01T15:32:17Z

Nice work!

chrisseaton · 2020-07-01T15:47:47Z

Ah this is really interesting - I was investigating the same thing myself recently.

Ruby's automatic return causes values to escape compilation units in implementations of Ruby that are using a JIT. This means that objects which could otherwise be allocated as 'local variables' have to be actually allocated on the heap. This can cause heap allocation where there would otherwise be none, even for simple numeric code that accidentally returns something as trivial as an Integer.

I've been looking at using the Sorbet type system to inform my JIT to not allocate objects.

kaoru · 2020-07-01T16:12:14Z

Very cool!

You might not know that Perl has something similar - the poorly named wantarray function.

It returns something truthy (1) if the subroutine is being called in list context, falsey ("") if the subroutine is being called in scalar context, and undefined (undef) if the subroutine is being called in void context.

I say poorly named because the name mixes up "list" and "array", which are different but related concepts.

> perl -MData::Dumper -E 'sub x { print Dumper(wantarray); }; $a = x(); @a = x(); x();'      
$VAR1 = '';
$VAR1 = 1;
$VAR1 = undef;

Ruby doesn't have an equivalent of scalar/list context so RubyVM.return_value_is_used? gets to return a Boolean. And Ruby has real true and false values too 😁

eregon · 2020-07-01T16:21:39Z

Reposting some of https://bugs.ruby-lang.org/issues/17004#note-5 here:

Do you have measurements on real applications, not just micro-benchmarks?

The masgn case ('1.times {|i| x, y = self, i }') could be done transparently by the VM without exposing any predicate.
Same for core methods like String#slice! (e.g., with an internal predicate).
I'm not sure if it's a good idea to expose this to Ruby (and C ext) users, it seems very low level.
At least, I think we should take advantage of this in language/core before exposing to users.

Is there a good example for this predicate in user code?

enebo · 2020-07-01T16:39:02Z

Having only thought about this for about half and hour I am concerned with this feature. Will this lead to people writing APIs where the users of that API need to worry about whether they are assigning the method or not? More or less assignment could end up changing the semantics of the method. Whether a method is assigned or not it should still do the same thing. This feature will allow API designers to break that.

chrisseaton · 2020-07-01T17:07:00Z

A fun syntax for this could be returns { }, where the block is only run if the return value is wanted, and not if not.

shyouhei · 2020-07-02T05:31:18Z

@kaoru Yes perl's wantarray is the primary source of inspiration of this proposed feature.

shyouhei · 2020-07-21T00:54:18Z

Closing, as @matz says:

To disclose this kind of information to the Ruby level is just too much

shyouhei added 18 commits June 29, 2020 12:00

ON_DEBUG: delete unused macro

285b84c

This is no longer used.

move comments [ci skip]

0628303

I now think that it is hard to read something like code block looong comment code block Let's move comments around.

CURRENT_INSN_IS: added

2e69047

This macro is needed when you want to know what is the instruction PC is pointing to. Because PC holds different things depending on dispatch methods, it needs to be implemented this way. Also note that PC tends to point the *next* instruction so be careful.

VM_FRAME_FLAG_DISCARDED: added

f22cce0

This frame flag denotes that the return value of the current frame is not used. Callee methods are free to utilize this information, like to avoid allocating return value objects.

defineclass: pass VM_FRAME_FLAGS_DISCAEDED

1ac07d2

The value of class definition statements tend not be used. There are chances for optimisations.

VM_CALL_DISCARDED: added

6f38103

This flag denotes that a call site does not use its return value. This information shall be stored in the vm_ci_flag because it is a compile- time constant. At runtime it shall be propagated to the callee via VM_FRAME_FLAG_DISCAEDED.

is_effectively_pop: refactor

3b25ac7

Was a bit too cryptic.

vm_call_iseq_setup_noramal: pass VM_FRAME_FLAG_DISCARDED

44b1db4

Pass the precompiled flags to vm_push_frame so that callee methods can infer.

vm_call_cfunc_with_frame: pass calling->flags

3e12a30

The information is passed to this function so why not propagate to the callee.

vm_invoke_iseq_block: pass calling->flags

7ab7855

calling->flags can be nonzero when a ruby method yields a ruby block. Why not tell the block that the result is not used.

optimise String#slice!

0c07d96

This method modifies its receiver and returns what was sliced. However it often is the case when the intention is modification, return value ignored. Let's not allocate then.

_mjit_compile_send: eliminate uninitialized bits

121ea62

MJIT-generated function can have struct rb_calling_info on stack. They need to be properly initialized.

ko1 reviewed Jul 1, 2020

View reviewed changes

shyouhei closed this Jul 21, 2020

shyouhei deleted the opt_bailout branch July 4, 2023 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a way for methods to omit their return value (rev.2) #3271

Provide a way for methods to omit their return value (rev.2) #3271

shyouhei commented Jun 29, 2020

ko1 commented Jul 1, 2020

ko1 Jul 1, 2020

shyouhei Jul 1, 2020

shyouhei commented Jul 1, 2020

ioquatix commented Jul 1, 2020

chrisseaton commented Jul 1, 2020

kaoru commented Jul 1, 2020

eregon commented Jul 1, 2020 •

edited

Loading

enebo commented Jul 1, 2020

chrisseaton commented Jul 1, 2020

shyouhei commented Jul 2, 2020

shyouhei commented Jul 21, 2020

Provide a way for methods to omit their return value (rev.2) #3271

Provide a way for methods to omit their return value (rev.2) #3271

Conversation

shyouhei commented Jun 29, 2020

Abstract

Revision history

Introduction

Telling methods that their return values are not used

APIs for the flag

Benchmarks

Conclusions

Future work

ko1 commented Jul 1, 2020

ko1 Jul 1, 2020

Choose a reason for hiding this comment

shyouhei Jul 1, 2020

Choose a reason for hiding this comment

shyouhei commented Jul 1, 2020

ioquatix commented Jul 1, 2020

chrisseaton commented Jul 1, 2020

kaoru commented Jul 1, 2020

eregon commented Jul 1, 2020 • edited Loading

enebo commented Jul 1, 2020

chrisseaton commented Jul 1, 2020

shyouhei commented Jul 2, 2020

shyouhei commented Jul 21, 2020

eregon commented Jul 1, 2020 •

edited

Loading