Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize cfunc large array splat fix to fix many additional cases raising SystemStackError #7522

Merged
merged 6 commits into from Apr 25, 2023

Conversation

jeremyevans
Copy link
Contributor

Originally, when 2e7bceb fixed cfuncs to no longer use the VM stack for large array splats, it was thought to have fully fixed Bug #4040, since the issue was fixed for methods defined in Ruby (iseqs) back in Ruby 2.2.

After additional research, I determined that same issue affects almost all types of method calls, not just iseq and cfunc calls. There were two main types of remaining issues, important cases (where large array splat should work) and pedantic cases (where large array splat raised SystemStackError instead of ArgumentError).

Important cases:

# bmethod
define_method(:a){|*a|}
a(*1380888.times)

# send
def b(*a); end
send(:b, *1380888.times)

# symproc
:b.to_proc.call(self, *1380888.times)

# method to proc
def d; yield(*1380888.times) end
d(&method(:b))

# method_missing
def self.method_missing(*a); end
not_a_method(*1380888.times)

Pedantic cases:

# iseq with only required or optional arguments
def a; end
a(*1380888.times)
def b(_); end
b(*1380888.times)
def c(_=nil); end
c(*1380888.times)

# attr reader/writer
c = Class.new do
  attr_accessor :a
  alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)

# Struct aref/aset
c = Struct.new(:a) do
  alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)

This patch fixes all usage of CALLER_SETUP_ARG with splatting a large number of arguments, and required similar fixes to use a temporary hidden array in three other cases where the VM would use the VM stack for handling a large number of arguments. However, it is possible there may be additional cases where splatting a large number of arguments still causes a SystemStackError.

This has a measurable performance impact, as it requires additional checks for a large number of arguments in many additional cases.

This change is fairly invasive, as there were many different VM functions that needed to be modified to support this. To avoid too much API change, I modified struct rb_calling_info to add a heap_argv member for storing the array, so I would not have to thread it through many functions. This struct is always stack allocated, which helps ensure sure GC doesn't collect it early.

Because of how invasive the changes are, and how rarely large arrays are actually splatted in Ruby code, the existing test/spec suites are not great at testing for correct behavior. To try to find and fix all issues, I set VM_ARGC_STACK_MAX to -1, ensuring that a temporary array is used for all array splat method calls. This was very helpful in finding breaking cases, especially ones involving flagged keyword hashes.

Fixes [Bug #4040]

Note that this initial commit uses -1 as VM_ARGC_STACK_MAX. This is for testing, to ensure that a temporary array is used for almost all argument splats. This makes things significantly slower. After we get a clean CI run, I'll push another commit that changes VM_ARGC_STACK_MAX back to 128.

@jeremyevans jeremyevans requested a review from ko1 March 14, 2023 23:35
@jeremyevans
Copy link
Contributor Author

CI is clean except for arm64 YJIT failures (which appear to definitely be related to these changes). Going to push the commit to set VM_ARGC_STACK_MAX back to 128.

@jeremyevans jeremyevans force-pushed the large-array-splat-4040 branch 2 times, most recently from a96468d to 96cdd85 Compare March 24, 2023 02:48
vm_insnhelper.c Outdated
cfp->sp[-1] = argv_ary;
calling->argc = 1;
calling->heap_argv = argv_ary;
RB_GC_GUARD(argv_ary);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it needed? (calling is on the stack)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one may not be needed (I think I added it first when debugging). I'll try removing it.

calling->argc = 1;
calling->heap_argv = argv_ary;
RB_GC_GUARD(argv_ary);
RB_GC_GUARD(ary);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it needed? (caller may mark the ary?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was definitely needed to fix a segfault under GC.stress = true when VM_ARGC_STACK_MAX was set to -1 (forget which test did that).

Comment on lines +2548 to +2549
rb_ary_cat(argv_ary, argv, argc);
rb_ary_cat(argv_ary, ptr, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean there is always a full copy of the arguments when nb-of-args > VM_ARGC_STACK_MAX? (for any kind of call)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for callers that use CALLER_SETUP_ARG with ALLOW_HEAP_ARGV or ALLOW_HEAP_ARGV_KEEP_KWSPLAT. Currently, that is these type of calls only:

  • cfunc
  • ifunc
  • non-iseq bmethod
  • send(*args) # but not send(:meth, *args)
  • :symbol.to_proc.call(*args) # but not :symbol.to_proc.call(recv, *args)

@jeremyevans
Copy link
Contributor Author

This passes CI. I'm going to squash all of the heap_argv related code into a single commit, with separate commits for the five separate optimizations. After that passes CI, I plan to merge this.

jeremyevans and others added 6 commits April 24, 2023 14:58
…aising SystemStackError

Originally, when 2e7bceb fixed cfuncs to no
longer use the VM stack for large array splats, it was thought to have fully
fixed Bug ruby#4040, since the issue was fixed for methods defined in Ruby (iseqs)
back in Ruby 2.2.

After additional research, I determined that same issue affects almost all
types of method calls, not just iseq and cfunc calls.  There were two main
types of remaining issues, important cases (where large array splat should
work) and pedantic cases (where large array splat raised SystemStackError
instead of ArgumentError).

Important cases:

```ruby
define_method(:a){|*a|}
a(*1380888.times)

def b(*a); end
send(:b, *1380888.times)

:b.to_proc.call(self, *1380888.times)

def d; yield(*1380888.times) end
d(&method(:b))

def self.method_missing(*a); end
not_a_method(*1380888.times)

```

Pedantic cases:

```ruby
def a; end
a(*1380888.times)
def b(_); end
b(*1380888.times)
def c(_=nil); end
c(*1380888.times)

c = Class.new do
  attr_accessor :a
  alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)

c = Struct.new(:a) do
  alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)
```

This patch fixes all usage of CALLER_SETUP_ARG with splatting a large
number of arguments, and required similar fixes to use a temporary
hidden array in three other cases where the VM would use the VM stack
for handling a large number of arguments.  However, it is possible
there may be additional cases where splatting a large number
of arguments still causes a SystemStackError.

This has a measurable performance impact, as it requires additional
checks for a large number of arguments in many additional cases.

This change is fairly invasive, as there were many different VM
functions that needed to be modified to support this. To avoid
too much API change, I modified struct rb_calling_info to add a
heap_argv member for storing the array, so I would not have to
thread it through many functions.  This struct is always stack
allocated, which helps ensure sure GC doesn't collect it early.

Because of how invasive the changes are, and how rarely large
arrays are actually splatted in Ruby code, the existing test/spec
suites are not great at testing for correct behavior.  To try to
find and fix all issues, I tested this in CI with
VM_ARGC_STACK_MAX to -1, ensuring that a temporary array is used
for all array splat method calls.  This was very helpful in
finding breaking cases, especially ones involving flagged keyword
hashes.

Fixes [Bug ruby#4040]

Co-authored-by: Jimmy Miller <jimmy.miller@shopify.com>
Currently, bmethod arguments are copied from the VM stack to the
C stack in vm_call_bmethod, then copied from the C stack to the VM
stack later in invoke_iseq_block_from_c.  This is inefficient.

This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod.
vm_call_iseq_bmethod is an optimized method that skips stack
copies (though there is one copy to remove the receiver from
the stack), and avoids calling vm_call_bmethod_body,
rb_vm_invoke_bmethod, invoke_block_from_c_proc,
invoke_iseq_block_from_c, and vm_yield_setup_args.

Th vm_call_iseq_bmethod argument handling is similar to the
way normal iseq methods are called, and allows for similar
performance optimizations when using splats or keywords.
However, even in the no argument case it's still significantly
faster.

A benchmark is added for bmethod calling.  In my environment,
it improves bmethod calling performance by 38-59% for simple
bmethod calls, and up to 180% for bmethod calls passing
literal keywords on both sides.

```

./miniruby-iseq-bmethod:  18159792.6 i/s
          ./miniruby-m:  13174419.1 i/s - 1.38x  slower

                   bmethod_simple_1
./miniruby-iseq-bmethod:  15890745.4 i/s
          ./miniruby-m:  10008972.7 i/s - 1.59x  slower

             bmethod_simple_0_splat
./miniruby-iseq-bmethod:  13142804.3 i/s
          ./miniruby-m:  11168595.2 i/s - 1.18x  slower

             bmethod_simple_1_splat
./miniruby-iseq-bmethod:  12375791.0 i/s
          ./miniruby-m:   8491140.1 i/s - 1.46x  slower

                   bmethod_no_splat
./miniruby-iseq-bmethod:  10151258.8 i/s
          ./miniruby-m:   8716664.1 i/s - 1.16x  slower

                    bmethod_0_splat
./miniruby-iseq-bmethod:   8138802.5 i/s
          ./miniruby-m:   7515600.2 i/s - 1.08x  slower

                    bmethod_1_splat
./miniruby-iseq-bmethod:   8028372.7 i/s
          ./miniruby-m:   5947658.6 i/s - 1.35x  slower

                   bmethod_10_splat
./miniruby-iseq-bmethod:   6953514.1 i/s
          ./miniruby-m:   4840132.9 i/s - 1.44x  slower

                  bmethod_100_splat
./miniruby-iseq-bmethod:   5287288.4 i/s
          ./miniruby-m:   2243218.4 i/s - 2.36x  slower

                         bmethod_kw
./miniruby-iseq-bmethod:   8931358.2 i/s
          ./miniruby-m:   3185818.6 i/s - 2.80x  slower

                      bmethod_no_kw
./miniruby-iseq-bmethod:  12281287.4 i/s
          ./miniruby-m:  10041727.9 i/s - 1.22x  slower

                   bmethod_kw_splat
./miniruby-iseq-bmethod:   5618956.8 i/s
          ./miniruby-m:   3657549.5 i/s - 1.54x  slower
```
This optimizes the following calls:

* ~10-15% for f(*a) when a does not end with a flagged keywords hash
* ~10-15% for f(*a) when a ends with an empty flagged keywords hash
* ~35-40% for f(*a, **kw) if kw is empty

This still copies the array contents to the VM stack, but avoids some
overhead. It would be faster to use the array pointer directly,
but that could cause problems if the array was modified during
the call to the function. You could do that optimization for frozen
arrays, but as splatting frozen arrays is uncommon, and the speedup
is minimal (<5%), it doesn't seem worth it.

The vm_send_cfunc benchmark has been updated to test additional cfunc
call types, and the numbers above were taken from the benchmark results.
Similar to the bmethod optimization, this avoids using
CALLER_ARG_SPLAT if not necessary.  As long as the method argument
can be shifted off, other arguments are passed through as-is.

This optimizes the following types of calls:

* send(meth, arg) ~5%
* send(meth, *args) ~75% for args.length == 200
* send(meth, *args, **kw) ~50% for args.length == 200
* send(meth, **kw) ~25%
* send(meth, kw: 1) ~115%

Note that empty argument splats do get slower with this approach,
by about 20%.  This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.

The following types of calls are not optimized:

* send(*args)
* send(*args, **kw)

This is because the you cannot shift the method argument off
without first splatting the arg.
Similar to the bmethod/send optimization, this avoids using
CALLER_ARG_SPLAT if not necessary.  As long as the receiver argument
can be shifted off, other arguments are passed through as-is.

This optimizes the following types of calls:

* symproc.(recv) ~5%
* symproc.(recv, *args) ~65% for args.length == 200
* symproc.(recv, *args, **kw) ~45% for args.length == 200
* symproc.(recv, **kw) ~30%
* symproc.(recv, kw: 1) ~100%

Note that empty argument splats do get slower with this approach,
by about 2-3%.  This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.

The following types of calls are not optimized:

* symproc.(*args)
* symproc.(*args, **kw)

This is because the you cannot shift the receiver argument off
without first splatting the arg.
CALLER_ARG_SPLAT is not necessary for method_missing.  We just need
to unshift the method name into the arguments.

This optimizes all method_missing calls:

* mm(recv) ~9%
* mm(recv, *args) ~215% for args.length == 200
* mm(recv, *args, **kw) ~55% for args.length == 200
* mm(recv, **kw) ~22%
* mm(recv, kw: 1) ~100%

Note that empty argument splats do get slower with this approach,
by about 30-40%.  Other than non-empty argument splats, other
argument splats are faster, with the speedup depending on the
number of arguments.
@jeremyevans
Copy link
Contributor Author

CI is clean after the rebasing. I plan to merge this around midnight JST.

@jeremyevans jeremyevans merged commit a82a24e into ruby:master Apr 25, 2023
96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants