Enumerable#each_slice and #each_cons should use fewer Arrays #596
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently
#each_slice
creates a new Array each time it yields, to hold the elements (each_slice_i
). However, if the block given to#each_slice
immediately separates the elements into multiple block arguments, then that new Array is immediately discarded, when it could have been used for every yield from#each_slice
. Take these two examples:In the first example, the yielded Array is
a
, so we can't reuse the allocated Array for the next yield (we must create a new one). In the second example, however, Ruby immediately separates the elements of the yielded Array intoa
,b
, andc
, discarding the Array instance. This can be made faster. (Everything above is true for#each_cons
as well.)The patch
This patch uses
rb_block_arity()
to test how many arguments the given block can take, and only creates new Arrays if the arity is 1 or -1 (this should mean that the block either takes exactly 1 argument (no separation), only a splat argument (no separation), or is a lambda with 0 required arguments (unknown separation) (#arity docs). This performance trick is already used in Hash#each_pair:Speedup
In a micro-benchmark, the speedup is decent, approximately 30% for arg-separating blocks. Here are some numbers for
#each_cons
. Scores are seconds of user-time (smaller is better) running variants of200_000.times { (1..100).each_cons(3) { |a,b,c| sum = a+b+c } }
. Some accept 1 arg, and some accept 3 args. Some used a block, some a proc, and one used a lambda: