Skip to content

Commit

Permalink
Added ActiveRecord::Base.each and ActiveRecord::Base.find_in_batches …
Browse files Browse the repository at this point in the history
…for batch processing [DHH/Jamis Buck]
  • Loading branch information
dhh committed Feb 23, 2009
1 parent 441e4e2 commit d13623c
Show file tree
Hide file tree
Showing 5 changed files with 123 additions and 1 deletion.
2 changes: 2 additions & 0 deletions activerecord/CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
*Edge*

* Added ActiveRecord::Base.each and ActiveRecord::Base.find_in_batches for batch processing [DHH/Jamis Buck]

* Added that ActiveRecord::Base.exists? can be called with no arguments #1817 [Scott Taylor]


Expand Down
1 change: 1 addition & 0 deletions activerecord/lib/active_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ def self.load_all!
autoload :AttributeMethods, 'active_record/attribute_methods'
autoload :AutosaveAssociation, 'active_record/autosave_association'
autoload :Base, 'active_record/base'
autoload :Batches, 'active_record/batches'
autoload :Calculations, 'active_record/calculations'
autoload :Callbacks, 'active_record/callbacks'
autoload :Dirty, 'active_record/dirty'
Expand Down
2 changes: 1 addition & 1 deletion activerecord/lib/active_record/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3147,7 +3147,7 @@ def clone_attribute_value(reader_method, attribute_name)
# #save_with_autosave_associations to be wrapped inside a transaction.
include AutosaveAssociation, NestedAttributes

include Aggregations, Transactions, Reflection, Calculations, Serialization
include Aggregations, Transactions, Reflection, Batches, Calculations, Serialization
end
end

Expand Down
70 changes: 70 additions & 0 deletions activerecord/lib/active_record/batches.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
module ActiveRecord
module Batches # :nodoc:
def self.included(base)
base.extend(ClassMethods)
end

# When processing large numbers of records, it's often a good idea to do so in batches to prevent memory ballooning.
module ClassMethods
# Yields each record that was found by the find +options+. The find is performed by find_in_batches
# with a batch size of 1000 (or as specified by the +limit+ option).
#
# Example:
#
# Person.each(:conditions => "age > 21") do |person|
# person.party_all_night!
# end
#
# Note: This method is only intended to use for batch processing of large amounts of records that wouldn't fit in
# memory all at once. If you just need to loop over less than 1000 records, it's probably better just to use the
# regular find methods.
def each(options = {})
find_in_batches(options) do |records|
records.each { |record| yield record }
end

self
end

# Yields each batch of records that was found by the find +options+ as an array. The size of each batch is
# set by the +limit+ option; the default is 1000.
#
# You can control the starting point for the batch processing by supplying the +start+ option. This is especially
# useful if you want multiple workers dealing with the same processing queue. You can make worker 1 handle all the
# records between id 0 and 10,000 and worker 2 handle from 10,000 and beyond (by setting the +start+ option on that
# worker).
#
# It's not possible to set the order. That is automatically set to ascending on the primary key ("id ASC")
# to make the batch ordering work. This also mean that this method only works with integer-based primary keys.
# You can't set the limit either, that's used to control the the batch sizes.
#
# Example:
#
# Person.find_in_batches(:conditions => "age > 21") do |group|
# sleep(50) # Make sure it doesn't get too crowded in there!
# group.each { |person| person.party_all_night! }
# end
def find_in_batches(options = {})
raise "You can't specify an order, it's forced to be #{batch_order}" if options[:order]
raise "You can't specify a limit, it's forced to be the batch_size" if options[:limit]

start = options.delete(:start).to_i

with_scope(:find => options.merge(:order => batch_order, :limit => options.delete(:batch_size) || 1000)) do
records = find(:all, :conditions => [ "#{table_name}.#{primary_key} >= ?", start ])

while records.any?
yield records
records = find(:all, :conditions => [ "#{table_name}.#{primary_key} > ?", records.last.id ])
end
end
end


private
def batch_order
"#{table_name}.#{primary_key} ASC"
end
end
end
end
49 changes: 49 additions & 0 deletions activerecord/test/cases/batches_test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
require 'cases/helper'
require 'models/post'

class EachTest < ActiveRecord::TestCase
fixtures :posts

def setup
@posts = Post.all(:order => "id asc")
@total = Post.count
end

def test_each_should_excecute_one_query_per_batch
assert_queries(Post.count + 1) do
Post.each(:batch_size => 1) do |post|
assert_kind_of Post, post
end
end
end

def test_each_should_raise_if_the_order_is_set
assert_raise(RuntimeError) do
Post.each(:order => "title") { |post| post }
end
end

def test_each_should_raise_if_the_limit_is_set
assert_raise(RuntimeError) do
Post.each(:limit => 1) { |post| post }
end
end

def test_find_in_batches_should_return_batches
assert_queries(Post.count + 1) do
Post.find_in_batches(:batch_size => 1) do |batch|
assert_kind_of Array, batch
assert_kind_of Post, batch.first
end
end
end

def test_find_in_batches_should_start_from_the_start_option
assert_queries(Post.count) do
Post.find_in_batches(:batch_size => 1, :start => 2) do |batch|
assert_kind_of Array, batch
assert_kind_of Post, batch.first
end
end
end
end

15 comments on commit d13623c

@bansalakhil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool !!!!

@dubek
Copy link
Contributor

@dubek dubek commented on d13623c Feb 23, 2009

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a mistake in the doc of find_in_batches: the size of each batch can be set by the

:batch_size
option, and not by
:limit

@adrianpacala
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was fixed in https://github.com/rails/rails/commit/45787bdd0e9ec20b111e570a20b5f66a949b400c

@guillermo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a plugin/path with order and limit support:
http://github.com/yyyc514/active_record_each

It also add support for map and collect.

@ciaranlee
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! This is going to help me a lot :)

@dmathieu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gorgeous !

@smtlaissezfaire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See http://weblog.jamisbuck.org/2007/4/6/faking-cursors-in-activerecord for more info.

@javan
Copy link
Contributor

@javan javan commented on d13623c Feb 23, 2009

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dig it

@evansenter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@amerine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.

@GavinJoyce
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this is very useful. Thanks.

@libo
Copy link

@libo libo commented on d13623c Feb 24, 2009

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stor tak skal du have DDH!
You won’t beleive how useful this is going to be.. at least for me :-)

@defsdoor
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t this be better off as a parameter to find to activate batch mode ?

@Roman2K
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful addition, but I don’t like the way it’s implemented. Since AR::Batches only contains a ClassMethods module, it should be refactored in such a way that it can be enabled with extend instead of include. See Evil Hook Methods? by Ola Bini.

@dolzenko
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that each is backported (?) in 2.3.x as find_each.

Please sign in to comment.