Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way for methods to omit their return value (rev.2) #3271

Closed
wants to merge 18 commits into from

Conversation

shyouhei
Copy link
Member

Abstract

This pull request implements RubyVM.return_value_is_used? (and its CAPI counterpart rb_whether_the_return_value_is_used_p()), which could be used by 3rd party applications/libraries to optimise their code.

Revision history

Introduction

Ruby do not force you a style of writing. In Ruby, one thing tends to be doable in more than one ways. This is considered to be a good thing. To make it possible, a return value of a method is not forced to be used: every method can (and does) return possibly multiple values, while its callers are free to ignore them. Even when a method does not expect its callers to take any return values, it tends to return something meaningful "just in case" the expectation breaks.

However, these "just in case" return values rarely gets used in practice. Most of the time they are just silently ignored. They become instant garbage unless referenced elsewhere; which is of course a waste of both time and space. There is a room of improvements around this area.

How often does this happen? We can observe it in the following scheme. The interpreter implements its instructions and runs them in series. This series can be seen as a conceptual language, and its 2-gram can be thought of. By taking such 2-grams of the entire execution of a Ruby program, we can see the ratio of the operation we are talking about.

Following is the top 10 list of 2 grams of the entire execution of mame/optcarrot benchmark:
zsh % LANG=C wc -l 2gram.txt
1143155369
zsh % LANG=C sort 2gram.txt | uniq -c | sort -nr | head -n 10
69065813 getinstancevariable -> getinstancevariable
65600442 putself -> getinstancevariable
59624140 getinstancevariable -> branchunless
59116388 branchunless -> getinstancevariable
52828407 leave -> pop
50434175 getinstancevariable -> putobject
30368815 pop -> putself
27717161 setinstancevariable -> getinstancevariable
25661090 branchunless -> putself
25165032 getinstancevariable -> branchif

Here, the leave instruction (almost) resembles ruby's return statement, and the pop instruction (almost) resembles ruby's ';' delimiter. So the "leave -> pop" output indicates that a method returns a value, and that value is not used. It seems such situation is # 5 most frequent operation in the entire execution of a program, which is about 4.6% of the whole.

Telling methods that their return values are not used

To remedy, we introduce a new method calling convention that allows methods to return arbitrary return values when not used. We do not force them to eliminate unused return values. This is because at the beginning every method in the wild -- especially those written in C -- already returns something. In order not to break existing codes, methods must be allowed to return values even if they are discarded. However, for new ones, let us make room for optimisations.

This is done by setting a 1-bit flag in a method stack frame. Every time a method is called, several flags are set in the VM's stack. We add a flag called VM_FRAME_FLAG_DISCARDED which denotes that the return value is not used.

zsh % ./miniruby --dump=i -e 'discarded_method(used_method);nil'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,33)> (catch: FALSE)
0000 putself                                                          (   1)[Li]
0001 putself
0002 opt_send_without_block                 <calldata!mid:used_method, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0004 opt_send_without_block                 <calldata!mid:discarded_method, argc:1, FCALL|ARGS_SIMPLE|DISCARDED>
0006 pop
0007 putnil
0008 leave

Here in the example above, the return value of discarded_method is not used; thus it is marked as "DISCARDED". OTOH that of used_method is used.

APIs for the flag

We provide Ruby level API RubyVM.return_value_is_used? and its CAPI counterpart rb_whether_the_return_value_is_used_p(). These two API simply look at the frame flag and return whether VM_FRAME_FLAG_DISCARDED is set or not. Because looking at the current frame's flag by calling another method is kind of suboptimal, we also provide a new VM instruction opt_RubyVM_return_value_is_used_ which transparently does the same thing without modifying the execution context at all.

A typical application area for those flags is String#slice! method. This is a method which destructively slices its receiver, and returns what was sliced. However it often is the case that its users are not interested in the return values; this method tends to be used only to mutate the receiver. By skipping allocation of those unused return values, not only does the method runs faster but GC pressures can also be reduced.

Benchmarks

We are only providing APIs in this pull request. Neither positive nor negative effects can be observed.

Calculating -------------------------------------
                             master        ours
Optcarrot Lan_Master.nes     25.222      25.311 fps

Comparison:
             Optcarrot Lan_Master.nes
                    ours:        25.3 fps
                  master:        25.2 fps - 1.00x  slower

Calculating -------------------------------------
                         master        ours
             sinatra    12.967k     12.938k i/s -    100.000k times in 7.711948s 7.729280s

Comparison:
                          sinatra
              master:     12966.9 i/s
                ours:     12937.8 i/s - 1.00x  slower

However when you take a closer look, for instance String#slice! does improve considerably.

Warming up --------------------------------------
        regexp-short     2.190M i/s -      2.277M times in 1.039406s (456.55ns/i)
         regexp-long   159.324k i/s -    161.392k times in 1.012980s (6.28μs/i)
        string-short     4.393M i/s -      4.444M times in 1.011546s (227.62ns/i)
         string-long     1.529M i/s -      1.560M times in 1.020005s (653.81ns/i)
Calculating -------------------------------------
                         master        ours
        regexp-short     2.335M      2.892M i/s -      6.571M times in 2.814090s 2.272405s
         regexp-long   159.398k    162.607k i/s -    477.971k times in 2.998608s 2.939417s
        string-short     5.139M      6.318M i/s -     13.180M times in 2.564713s 2.086098s
         string-long     1.598M      1.656M i/s -      4.588M times in 2.871445s 2.770246s

Comparison:
                     regexp-short
                ours:   2891685.4 i/s
              master:   2335064.6 i/s - 1.24x  slower

                      regexp-long
                ours:    162607.4 i/s
              master:    159397.7 i/s - 1.02x  slower

                     string-short
                ours:   6317880.0 i/s
              master:   5138864.6 i/s - 1.23x  slower

                      string-long
                ours:   1656349.9 i/s
              master:   1597974.5 i/s - 1.04x  slower

Conclusions

We propose Ruby/C APIs for 3rd party programmers to know if a return value is used. They do not speed things up by themselves. However programmers can use them to optimise their own programs.

Future work

A well-known waste of memory is when a block ends with an assignment. "Just in case" the value of that block is used, an array is created to store the assigned values, like this:
% ruby --dump=i -ve '1.times {|i| x, y = self, i }'
ruby 2.8.0dev (2020-06-29T02:06:18Z master 1020e120e0) [aarch64-linux]
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,29)> (catch: FALSE)
== catch table
| catch type: break  st: 0000 ed: 0004 sp: 0000 cont: 0004
| == disasm: #<ISeq:block in <main>@-e:1 (1,8)-(1,29)> (catch: FALSE)
| == catch table
| | catch type: redo   st: 0001 ed: 0014 sp: 0000 cont: 0001
| | catch type: next   st: 0001 ed: 0014 sp: 0000 cont: 0014
| |------------------------------------------------------------------------
| local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
| [ 3] i@0<Arg>   | [ 2] x@1        | [ 1] y@2
| 0000 nop                                                              (   1)[Bc]
| 0001 putself                                [Li]
| 0002 getlocal_WC_0                          i@0
| 0004 newarray                               2
| 0006 dup
| 0007 expandarray                            2, 0
| 0010 setlocal_WC_0                          x@1
| 0012 setlocal_WC_0                          y@2
| 0014 nop
| 0015 leave                                                            (   1)[Br]
|------------------------------------------------------------------------
0000 putobject_INT2FIX_1_                                             (   1)[Li]
0001 send                                   <calldata!mid:times, argc:0>, block in <main>
0004 nop
0005 leave                                                            (   1)

Note the unnecessarily complex output of block in <main>'s disasm. The proposed changeset is not capable of eliminating this newarray-dup-expandarray combo. Because assignments can never be function calls, there is no way for the proposed API to save such situations.

Optimisation of this situation is beyond this proposal. We could think of some automatic detection of such sequences. However assignments are complex. The amount of reconstructing compiler infrastructure is too much.

This is no longer used.
I now think that it is hard to read something like

  code block
  looong comment
  code block

Let's move comments around.
This macro is needed when you want to know what is the instruction PC is
pointing to.  Because PC holds different things depending on dispatch
methods, it needs to be implemented this way.

Also note that PC tends to point the *next* instruction so be careful.
This frame flag denotes that the return value of the current frame is
not used.  Callee methods are free to utilize this information, like to
avoid allocating return value objects.
The value of class definition statements tend not be used.  There are
chances for optimisations.
This flag denotes that a call site does not use its return value.  This
information shall be stored in the vm_ci_flag because it is a compile-
time constant.  At runtime it shall be propagated to the callee via
VM_FRAME_FLAG_DISCAEDED.
Scans the entire instruction sequence to find a send-then-pop sequence.
Attribute VM_CALL_DISCAEDED on such occasions.

This could be made more efficient when doen at the moment of ADD_SEND,
because there already is popped variable.  However it is difficult to
cover everything that way.
Was a bit too cryptic.
Pass the precompiled flags to vm_push_frame so that callee methods can
infer.
The information is passed to this function so why not propagate to the
callee.
calling->flags can be nonzero when a ruby method yields a ruby block.
Why not tell the block that the result is not used.
This is a CAPI to check if the innermost block/method that the calling
function represents is expected to return meaningful return value(s) or
not.  If an extension library could know such information there could be
rooms of optimisation, by avoid allocating return values when not
requested.
Looking at how `make rdoc` is working, I noticed that strings allocated
inside of StringScanner#scan (which is called from
lib/rdoc/markup/parser.rb) are becoming garbages immediately.  Why not
use rb_whether_the_return_value_is_used_p to detect such situations and
skip allocating string objects.
This method modifies its receiver and returns what was sliced.  However
it often is the case when the intention is modification, return value
ignored.  Let's not allocate then.
Enumerable#grep is interesting in two things.  First, despite almost
everybody think it has nothing to do with return value optimisations, it
does.  The usage without return value can be seen at ext/extmk.rb:368,
"grep(/\A#{var}=(.*)/) {return $1}".  Second, even when there is no
block passed and no return value used at the same time, it cannot be a
no-op.  We have to reroute [Bug ruby#5801].
RDoc::Parser::RubyTools#skip_tkspace_without_nl is one of such methods
that is frequently called with return value discarded.  Eliminating the
allocated array can benefit both time and memory consumption.

The problem is, it is hard to auto-eliminate such wasted return values
even when we can tell the method that we don't need them.  This is
because variables _could_ escape from the scope.  For instance, uget_tk()
might be an alias of eval().  That is not the case in particular, but
auto-detecting such evil activities are very hard -- if not impossible.

So to ease the situation we implement RubyVM.return_value_is_used?
method.  By manually checking that property we can define hand-crafted
faster variation of skip_tkspace_without_nl that do not allocate the
return value.
MJIT-generated function can have struct rb_calling_info on stack.  They
need to be properly initialized.
This test failed on my machine, because `foo; foo` do not share their
call cache.  The second method invocation can still allocate, which is
not the intention of the test.  We need to loop instead, but with
carefully avoiding any other method calls like i += 1, i > 0, etc.
@ko1
Copy link
Contributor

ko1 commented Jul 1, 2020

Could you make a redmine ticket to discuss new feature?

@@ -64,6 +64,7 @@ error !

#define START_OF_ORIGINAL_INSN(x) /* ignore */
#define DISPATCH_ORIGINAL_INSN(x) return LABEL(x)(ec, reg_cfp);
#define CURRENT_INSN_IS(pop) ((rb_insn_func_t)GET_CURRENT_INSN() == LABEL(pop))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to use pop? (insn?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK because currently pop is the only usage. But insn may be a polite name.

@shyouhei
Copy link
Member Author

shyouhei commented Jul 1, 2020

@ko1:

Could you make a redmine ticket to discuss new feature?

Sure: https://bugs.ruby-lang.org/issues/17004

@ioquatix
Copy link
Member

ioquatix commented Jul 1, 2020

Nice work!

@chrisseaton
Copy link
Contributor

Ah this is really interesting - I was investigating the same thing myself recently.

Ruby's automatic return causes values to escape compilation units in implementations of Ruby that are using a JIT. This means that objects which could otherwise be allocated as 'local variables' have to be actually allocated on the heap. This can cause heap allocation where there would otherwise be none, even for simple numeric code that accidentally returns something as trivial as an Integer.

I've been looking at using the Sorbet type system to inform my JIT to not allocate objects.

@kaoru
Copy link

kaoru commented Jul 1, 2020

Very cool!

You might not know that Perl has something similar - the poorly named wantarray function.

It returns something truthy (1) if the subroutine is being called in list context, falsey ("") if the subroutine is being called in scalar context, and undefined (undef) if the subroutine is being called in void context.

I say poorly named because the name mixes up "list" and "array", which are different but related concepts.

> perl -MData::Dumper -E 'sub x { print Dumper(wantarray); }; $a = x(); @a = x(); x();'      
$VAR1 = '';
$VAR1 = 1;
$VAR1 = undef;

Ruby doesn't have an equivalent of scalar/list context so RubyVM.return_value_is_used? gets to return a Boolean. And Ruby has real true and false values too 😁

@eregon
Copy link
Member

eregon commented Jul 1, 2020

Reposting some of https://bugs.ruby-lang.org/issues/17004#note-5 here:

Do you have measurements on real applications, not just micro-benchmarks?

The masgn case ('1.times {|i| x, y = self, i }') could be done transparently by the VM without exposing any predicate.
Same for core methods like String#slice! (e.g., with an internal predicate).
I'm not sure if it's a good idea to expose this to Ruby (and C ext) users, it seems very low level.
At least, I think we should take advantage of this in language/core before exposing to users.

Is there a good example for this predicate in user code?

@enebo
Copy link
Contributor

enebo commented Jul 1, 2020

Having only thought about this for about half and hour I am concerned with this feature. Will this lead to people writing APIs where the users of that API need to worry about whether they are assigning the method or not? More or less assignment could end up changing the semantics of the method. Whether a method is assigned or not it should still do the same thing. This feature will allow API designers to break that.

@chrisseaton
Copy link
Contributor

A fun syntax for this could be returns { }, where the block is only run if the return value is wanted, and not if not.

@shyouhei
Copy link
Member Author

shyouhei commented Jul 2, 2020

@kaoru Yes perl's wantarray is the primary source of inspiration of this proposed feature.

@shyouhei
Copy link
Member Author

@shyouhei shyouhei closed this Jul 21, 2020
@shyouhei shyouhei deleted the opt_bailout branch July 4, 2023 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants