New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce implicit array allocations on caller side of method calling #8853
Reduce implicit array allocations on caller side of method calling #8853
Conversation
Due to how the compiler works, while f(*a) does not allocate an array f(1, *a) does. This is possible to fix in the compiler, but the change is much more complex. This attempts to fix the issue in a simpler way using the peephole optimizer. Eliminating this array allocation is safe, since just as in the f(*a) case, nothing else on the caller side can modify the array.
/* | ||
* Eliminate array allocation for f(*a, **lvar) and f(*a, **@iv) | ||
* | ||
* splatarray true | ||
* getlocal / getinstancevariable | ||
* send ARGS_SPLAT|KW_SPLAT and not ARGS_BLOCKARG | ||
* => | ||
* splatarray false | ||
* getlocal / getinstancevariable | ||
* send | ||
*/ | ||
else if (!(flag & VM_CALL_ARGS_BLOCKARG) && (flag & VM_CALL_KW_SPLAT)) { | ||
OPERAND_AT(iobj, 0) = Qfalse; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe for this code?
ary = [1,2]
(kwd = Object.new).singleton_class.define_method(:to_hash) {ary << 3; {}}
p(*ary, **kwd)
This should print 1
and 2
lines now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in the pull request description and commit messages, this is as safe as f(*a, &lvar)
. The same issue you describe currently exists with f(*a, &lvar)
:
ary = [1,2]
kwd = Object.new
kwd.define_singleton_method(:to_hash) {ary << 3; {}}
kwd.define_singleton_method(:to_proc) {ary << 4; lambda{}}
p(*ary, &kwd)
puts
ary = [1,2]
p(*ary, **kwd)
puts
ary = [1,2]
p(*ary, **kwd, &kwd)
Currently:
1
2
4
1
2
1
2
After:
1
2
4
1
2
3
1
2
4
3
Note the 4
before 3
is a different bug where Ruby calls to_proc
before to_hash
(even in the current case where an array is allocated).
As we currently accept the behavior for f(*a, &lvar)
, I think we should accept the behavior for the other cases.
Alternatively, we could have f(*a, &lvar)
allocate an array. However, that makes almost all code using that style of method calling slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I submitted a pull request to ensure Ruby calls to_hash
before to_proc
: #8877
compile.c
Outdated
/* | ||
* Eliminate array allocation for f(*a, **lvar, &lvar) and f(*a, **@iv, &@iv) | ||
* | ||
* splatarray true | ||
* getlocal / getinstancevariable | ||
* getlocal / getinstancevariable | ||
* send ARGS_SPLAT|KW_SPLAT|ARGS_BLOCKARG | ||
* => | ||
* splatarray false | ||
* getlocal / getinstancevariable | ||
* getlocal / getinstancevariable | ||
* send | ||
*/ | ||
if (IS_NEXT_INSN_ID(niobj, send)) { | ||
niobj = niobj->next; | ||
unsigned int flag = vm_ci_flag((const struct rb_callinfo *)OPERAND_AT(niobj, 0)); | ||
|
||
if ((flag & VM_CALL_ARGS_SPLAT) && (flag & VM_CALL_KW_SPLAT) && (flag & VM_CALL_ARGS_BLOCKARG)) { | ||
OPERAND_AT(iobj, 0) = Qfalse; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for to_proc
.
compile.c
Outdated
@@ -3888,6 +3888,30 @@ iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcal | |||
OPERAND_AT(iobj, 0) = Qfalse; | |||
} | |||
} | |||
} else if (IS_NEXT_INSN_ID(niobj, getlocal) | IS_NEXT_INSN_ID(niobj, getinstancevariable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that should be ||
, though in this case |
has the same behavior.
The first commit makes sense for me, but others seem too ad-hoc for me (limited to lvar and ivar). Maybe node is needed for it? |
The reason I limited this to lvar and ivar is that those are fairly easy to handle, not risky (only pathological code fails), and very common. The other common case is splatting of method calls, but handling that is more complex, more risky (actual code might fail), and slightly less common, so I don't think we should do that. This could easily be extended to gvar and cvar, but those very uncommon, so I don't think it is worth it. |
To check how common these types of methods are, I added some metrics to the code, to see how many methods would be optimized by removing allocations: For
Current stdlib:
minitest 5.20 (bundled gem):
Rails master:
So all optimizations seem to be used by standard lib and common gems, except maybe Code to enable metrics: diff --git a/compile.c b/compile.c
index 41008f3cfd..88b281eedd 100644
--- a/compile.c
+++ b/compile.c
@@ -3848,6 +3848,7 @@ iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcal
niobj = niobj->next;
unsigned int flag = vm_ci_flag((const struct rb_callinfo *)OPERAND_AT(niobj, 0));
if ((flag & VM_CALL_ARGS_SPLAT) && !(flag & (VM_CALL_KW_SPLAT|VM_CALL_ARGS_BLOCKARG))) {
+write(1, "1", 1);
OPERAND_AT(iobj, 0) = Qfalse;
}
} else if (IS_NEXT_INSN_ID(niobj, getlocal) || IS_NEXT_INSN_ID(niobj, getinstancevariable)) {
@@ -3870,6 +3871,7 @@ iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcal
* send
*/
if ((flag & VM_CALL_ARGS_BLOCKARG) && !(flag & VM_CALL_KW_SPLAT)) {
+write(1, "2", 1);
OPERAND_AT(iobj, 0) = Qfalse;
}
@@ -3885,6 +3887,7 @@ iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcal
* send
*/
else if (!(flag & VM_CALL_ARGS_BLOCKARG) && (flag & VM_CALL_KW_SPLAT)) {
+write(1, "3", 1);
OPERAND_AT(iobj, 0) = Qfalse;
}
}
@@ -3909,6 +3912,7 @@ iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcal
unsigned int flag = vm_ci_flag((const struct rb_callinfo *)OPERAND_AT(niobj, 0));
if ((flag & VM_CALL_ARGS_SPLAT) && (flag & VM_CALL_KW_SPLAT) && (flag & VM_CALL_ARGS_BLOCKARG)) {
+write(1, "4", 1);
OPERAND_AT(iobj, 0) = Qfalse;
}
}
@@ -3933,6 +3937,7 @@ iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcal
if ((flag & VM_CALL_ARGS_SPLAT) && (flag & VM_CALL_KW_SPLAT) &&
(flag & VM_CALL_KW_SPLAT_MUT) && !(flag & VM_CALL_ARGS_BLOCKARG)) {
+write(1, "5", 1);
OPERAND_AT(iobj, 0) = Qfalse;
}
}
@@ -3958,6 +3963,7 @@ iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcal
if ((flag & VM_CALL_ARGS_SPLAT) && (flag & VM_CALL_KW_SPLAT) &&
(flag & VM_CALL_KW_SPLAT_MUT) && (flag & VM_CALL_ARGS_BLOCKARG)) {
+write(1, "6", 1);
OPERAND_AT(iobj, 0) = Qfalse;
}
} Program to load files and collect metrics (pipe the output of this into the next program): # coding: UTF-8
b = binding
ARGV.flat_map{|x| File.directory?(x) ? Dir["#{x}/**/*.rb"] : x}.each do |file|
code = "# coding: UTF-8\nBEGIN{throw :valid, true}\n" + File.binread(file)
catch(:valid){eval(code, b, file)}
rescue SyntaxError
end
puts Program to parse metrics to get above output: res = $stdin.read
types = (<<END).split("\n")
f(1, *a)
f(1, *a, &lvar) | f(1, *a, &@ivar)
f(*a, **lvar) | f(*a, **@ivar)
f(*a, **lvar, &lvar) | f(*a, **@ivar, &@ivar)
f(*a, kw: 1)
f(*a, kw:1, &lvar) | f(*a, kw:1, &@ivar)
END
%w[1 2 3 4 5 6].each do |i|
count = res.count(i)
next unless count > 0
print count.to_s.rjust(3), ' : ', types[i.to_i-1]
puts
end |
Have you tried this patch on |
Here are the results. I ran with Note that other than the compiler change to change
|
Hum, not sure why To share my line of thoughts, if this change show significant gains on at least one headline benchmark then I'm in favor of taking the risk to merge such a change soonish so it makes it into 3.3. But otherwise I'd advocate for just waiting until January so we don't take the risk of introducing a subtle bug this close to release. Either way, if we want to merge this for 3.3, I strongly believe it should be before the first RC (coming very soon I heard). |
Due to how the compiler works, while f(*a, &lvar) and f(*a, &@Iv) do not allocate an array, but f(1, *a, &lvar) and f(1, *a, &@Iv) do. It's probably possible to fix this in the compiler, but seems easiest to fix this in the peephole optimizer. Eliminating this array allocation is as safe as the current elimination of the array allocation for f(*a, &lvar) and f(*a, &@Iv).
The compiler already eliminates the array allocation for f(*a, &lvar) and f(*a, &@Iv), and eliminating the array allocation for keyword splat is as safe as eliminating it for block passes.
In cases where the compiler can detect the hash is static, it would use duphash for the hash part. As the hash is static, there is no need to allocate an array.
For the following: ``` def f(*a); a end p f(*a, kw: 3) ``` `setup_parameters_complex` pushes `{kw: 3}` onto `a`. This worked fine back when `concatarray true` was used and `a` was already a copy. It does not work correctly with the optimization to switch to `concatarray false`. This duplicates the array on the callee side in such a case. This affects cases when passing a regular splat and a keyword splat (or literal keywords) in a method call, where the method does not accept keywords. This allocation could probably be avoided, but doing so would make `setup_parameters_complex` more complicated.
… f(*a, kw: 1, &arg) These are similar to the f(1, *a, &lvar), f(*a, **kw, &lvar) and f(*a, kw: 1, &lvar) optimizations, but they use getblockparamproxy instruction instead of getlocal. This also fixes the else style to be more similar to the surrounding code.
7e82b16
to
f1c43a7
Compare
This eliminates array allocations for the following common cases:
This also eliminates array allocations for these less common cases that I think are still worth optimizing:
This is handled via the peephole optimizer.
In terms of safety, currently,
f(*a, &lvar)
andf(*a, &@ivar)
both avoid array allocations, and all of the above are as safe as those in terms of safety.