-
Notifications
You must be signed in to change notification settings - Fork 21.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added ActiveRecord::Base.each and ActiveRecord::Base.find_in_batches …
…for batch processing [DHH/Jamis Buck]
- Loading branch information
Showing
5 changed files
with
123 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| module ActiveRecord | ||
| module Batches # :nodoc: | ||
| def self.included(base) | ||
| base.extend(ClassMethods) | ||
| end | ||
|
|
||
| # When processing large numbers of records, it's often a good idea to do so in batches to prevent memory ballooning. | ||
| module ClassMethods | ||
| # Yields each record that was found by the find +options+. The find is performed by find_in_batches | ||
| # with a batch size of 1000 (or as specified by the +limit+ option). | ||
| # | ||
| # Example: | ||
| # | ||
| # Person.each(:conditions => "age > 21") do |person| | ||
| # person.party_all_night! | ||
| # end | ||
| # | ||
| # Note: This method is only intended to use for batch processing of large amounts of records that wouldn't fit in | ||
| # memory all at once. If you just need to loop over less than 1000 records, it's probably better just to use the | ||
| # regular find methods. | ||
| def each(options = {}) | ||
| find_in_batches(options) do |records| | ||
| records.each { |record| yield record } | ||
| end | ||
|
|
||
| self | ||
| end | ||
|
|
||
| # Yields each batch of records that was found by the find +options+ as an array. The size of each batch is | ||
| # set by the +limit+ option; the default is 1000. | ||
| # | ||
| # You can control the starting point for the batch processing by supplying the +start+ option. This is especially | ||
| # useful if you want multiple workers dealing with the same processing queue. You can make worker 1 handle all the | ||
| # records between id 0 and 10,000 and worker 2 handle from 10,000 and beyond (by setting the +start+ option on that | ||
| # worker). | ||
| # | ||
| # It's not possible to set the order. That is automatically set to ascending on the primary key ("id ASC") | ||
| # to make the batch ordering work. This also mean that this method only works with integer-based primary keys. | ||
| # You can't set the limit either, that's used to control the the batch sizes. | ||
| # | ||
| # Example: | ||
| # | ||
| # Person.find_in_batches(:conditions => "age > 21") do |group| | ||
| # sleep(50) # Make sure it doesn't get too crowded in there! | ||
| # group.each { |person| person.party_all_night! } | ||
| # end | ||
| def find_in_batches(options = {}) | ||
| raise "You can't specify an order, it's forced to be #{batch_order}" if options[:order] | ||
| raise "You can't specify a limit, it's forced to be the batch_size" if options[:limit] | ||
|
|
||
| start = options.delete(:start).to_i | ||
|
|
||
| with_scope(:find => options.merge(:order => batch_order, :limit => options.delete(:batch_size) || 1000)) do | ||
| records = find(:all, :conditions => [ "#{table_name}.#{primary_key} >= ?", start ]) | ||
|
|
||
| while records.any? | ||
| yield records | ||
| records = find(:all, :conditions => [ "#{table_name}.#{primary_key} > ?", records.last.id ]) | ||
| end | ||
| end | ||
| end | ||
|
|
||
|
|
||
| private | ||
| def batch_order | ||
| "#{table_name}.#{primary_key} ASC" | ||
| end | ||
| end | ||
| end | ||
| end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| require 'cases/helper' | ||
| require 'models/post' | ||
|
|
||
| class EachTest < ActiveRecord::TestCase | ||
| fixtures :posts | ||
|
|
||
| def setup | ||
| @posts = Post.all(:order => "id asc") | ||
| @total = Post.count | ||
| end | ||
|
|
||
| def test_each_should_excecute_one_query_per_batch | ||
| assert_queries(Post.count + 1) do | ||
| Post.each(:batch_size => 1) do |post| | ||
| assert_kind_of Post, post | ||
| end | ||
| end | ||
| end | ||
|
|
||
| def test_each_should_raise_if_the_order_is_set | ||
| assert_raise(RuntimeError) do | ||
| Post.each(:order => "title") { |post| post } | ||
| end | ||
| end | ||
|
|
||
| def test_each_should_raise_if_the_limit_is_set | ||
| assert_raise(RuntimeError) do | ||
| Post.each(:limit => 1) { |post| post } | ||
| end | ||
| end | ||
|
|
||
| def test_find_in_batches_should_return_batches | ||
| assert_queries(Post.count + 1) do | ||
| Post.find_in_batches(:batch_size => 1) do |batch| | ||
| assert_kind_of Array, batch | ||
| assert_kind_of Post, batch.first | ||
| end | ||
| end | ||
| end | ||
|
|
||
| def test_find_in_batches_should_start_from_the_start_option | ||
| assert_queries(Post.count) do | ||
| Post.find_in_batches(:batch_size => 1, :start => 2) do |batch| | ||
| assert_kind_of Array, batch | ||
| assert_kind_of Post, batch.first | ||
| end | ||
| end | ||
| end | ||
| end |
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool !!!!
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s a mistake in the doc of find_in_batches: the size of each batch can be set by the
option, and not byd13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was fixed in https://github.com/rails/rails/commit/45787bdd0e9ec20b111e570a20b5f66a949b400c
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a plugin/path with order and limit support:
http://github.com/yyyc514/active_record_each
It also add support for map and collect.
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! This is going to help me a lot :)
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gorgeous !
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See http://weblog.jamisbuck.org/2007/4/6/faking-cursors-in-activerecord for more info.
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dig it
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, this is very useful. Thanks.
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stor tak skal du have DDH!
You won’t beleive how useful this is going to be.. at least for me :-)
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn’t this be better off as a parameter to find to activate batch mode ?
d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful addition, but I don’t like the way it’s implemented. Since
AR::Batchesonly contains aClassMethodsmodule, it should be refactored in such a way that it can be enabled withextendinstead ofinclude. See Evil Hook Methods? by Ola Bini.d13623cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that
eachis backported (?) in 2.3.x asfind_each.