Add pluck_each
and pluck_in_batches
batching methods
#47894
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Example:
Plucking in batches is a very popular feature I saw many projects reimplement themselves to gain some performance.
I saw this in 2 my previous projects, in OSS projects (was able to find in mastodon), a few popular gems.
Benchmarks
Tested on a table with 50M records.
Compared to the recently introduced optimization for range batching.
Whole table batching
Using ranges:
Elapsed: 209.20533800008707s
Plucking in batches:
Elapsed: 113.7704949999461s
馃敟Batching with conditions
Using ranges:
Elapsed: 28.136486999923363s
No ranges:
Elapsed: 39.96518399997149s
Plucking in batches:
Elapsed: 16.415813000057824s
馃敟These numbers are for the db on my local machine. The improvement will be much larger in production due to simpler queries and SQL queries reduction by half.
Also, implementing this feature would make #47466 unneeded.
The logic in
pluck_in_batches
looks similar toin_batches
, but trying to dry it (extracting similar logic into helper methods or trying to reusepluck_in_batches
insidein_batches
) will make the code more complex and less understandable.cc @nvasilevski (as we discussed it in https://discuss.rubyonrails.org/t/yield-record-ids-to-in-batches-block/81102)