Skip to content

Commit

Permalink
Merge pull request rails#23099 from vipulnsward/change_start_at_end_at
Browse files Browse the repository at this point in the history
Changed options for find_each and variants to have options start/finish
  • Loading branch information
kaspth committed Jan 18, 2016
2 parents b505d45 + da26934 commit 426b312
Show file tree
Hide file tree
Showing 5 changed files with 56 additions and 92 deletions.
4 changes: 2 additions & 2 deletions activerecord/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -634,7 +634,7 @@
* Add `ActiveRecord::Relation#in_batches` to work with records and relations
in batches.

Available options are `of` (batch size), `load`, `begin_at`, and `end_at`.
Available options are `of` (batch size), `load`, `start`, and `finish`.

Examples:

Expand Down Expand Up @@ -1282,7 +1282,7 @@

*Yves Senn*

* `find_in_batches` now accepts an `:end_at` parameter that complements the `:start`
* `find_in_batches` now accepts an `:finish` parameter that complements the `:start`
parameter to specify where to stop batch processing.

*Vipul A M*
Expand Down
67 changes: 26 additions & 41 deletions activerecord/lib/active_record/relation/batches.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ module Batches
#
# ==== Options
# * <tt>:batch_size</tt> - Specifies the size of the batch. Default to 1000.
# * <tt>:begin_at</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:end_at</tt> - Specifies the primary key value to end at, inclusive of the value.
# * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:finish</tt> - Specifies the primary key value to end at, inclusive of the value.
# This is especially useful if you want multiple workers dealing with
# the same processing queue. You can make worker 1 handle all the records
# between id 0 and 10,000 and worker 2 handle from 10,000 and beyond
# (by setting the +:begin_at+ and +:end_at+ option on each worker).
# (by setting the +:start+ and +:finish+ option on each worker).
#
# # Let's process for a batch of 2000 records, skipping the first 2000 rows
# Person.find_each(begin_at: 2000, batch_size: 2000) do |person|
# Person.find_each(start: 2000, batch_size: 2000) do |person|
# person.party_all_night!
# end
#
Expand All @@ -48,22 +48,15 @@ module Batches
#
# NOTE: You can't set the limit either, that's used to control
# the batch sizes.
def find_each(begin_at: nil, end_at: nil, batch_size: 1000, start: nil)
if start
begin_at = start
ActiveSupport::Deprecation.warn(<<-MSG.squish)
Passing `start` value to find_each is deprecated, and will be removed in Rails 5.1.
Please pass `begin_at` instead.
MSG
end
def find_each(start: nil, finish: nil, batch_size: 1000)
if block_given?
find_in_batches(begin_at: begin_at, end_at: end_at, batch_size: batch_size) do |records|
find_in_batches(start: start, finish: finish, batch_size: batch_size) do |records|
records.each { |record| yield record }
end
else
enum_for(:find_each, begin_at: begin_at, end_at: end_at, batch_size: batch_size) do
enum_for(:find_each, start: start, finish: finish, batch_size: batch_size) do
relation = self
apply_limits(relation, begin_at, end_at).size
apply_limits(relation, start, finish).size
end
end
end
Expand All @@ -88,15 +81,15 @@ def find_each(begin_at: nil, end_at: nil, batch_size: 1000, start: nil)
#
# ==== Options
# * <tt>:batch_size</tt> - Specifies the size of the batch. Default to 1000.
# * <tt>:begin_at</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:end_at</tt> - Specifies the primary key value to end at, inclusive of the value.
# * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:finish</tt> - Specifies the primary key value to end at, inclusive of the value.
# This is especially useful if you want multiple workers dealing with
# the same processing queue. You can make worker 1 handle all the records
# between id 0 and 10,000 and worker 2 handle from 10,000 and beyond
# (by setting the +:begin_at+ and +:end_at+ option on each worker).
# (by setting the +:start+ and +:finish+ option on each worker).
#
# # Let's process the next 2000 records
# Person.find_in_batches(begin_at: 2000, batch_size: 2000) do |group|
# Person.find_in_batches(start: 2000, batch_size: 2000) do |group|
# group.each { |person| person.party_all_night! }
# end
#
Expand All @@ -107,24 +100,16 @@ def find_each(begin_at: nil, end_at: nil, batch_size: 1000, start: nil)
#
# NOTE: You can't set the limit either, that's used to control
# the batch sizes.
def find_in_batches(begin_at: nil, end_at: nil, batch_size: 1000, start: nil)
if start
begin_at = start
ActiveSupport::Deprecation.warn(<<-MSG.squish)
Passing `start` value to find_in_batches is deprecated, and will be removed in Rails 5.1.
Please pass `begin_at` instead.
MSG
end

def find_in_batches(start: nil, finish: nil, batch_size: 1000)
relation = self
unless block_given?
return to_enum(:find_in_batches, begin_at: begin_at, end_at: end_at, batch_size: batch_size) do
total = apply_limits(relation, begin_at, end_at).size
return to_enum(:find_in_batches, start: start, finish: finish, batch_size: batch_size) do
total = apply_limits(relation, start, finish).size
(total - 1).div(batch_size) + 1
end
end

in_batches(of: batch_size, begin_at: begin_at, end_at: end_at, load: true) do |batch|
in_batches(of: batch_size, start: start, finish: finish, load: true) do |batch|
yield batch.to_a
end
end
Expand Down Expand Up @@ -153,18 +138,18 @@ def find_in_batches(begin_at: nil, end_at: nil, batch_size: 1000, start: nil)
# ==== Options
# * <tt>:of</tt> - Specifies the size of the batch. Default to 1000.
# * <tt>:load</tt> - Specifies if the relation should be loaded. Default to false.
# * <tt>:begin_at</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:end_at</tt> - Specifies the primary key value to end at, inclusive of the value.
# * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:finish</tt> - Specifies the primary key value to end at, inclusive of the value.
#
# This is especially useful if you want to work with the
# ActiveRecord::Relation object instead of the array of records, or if
# you want multiple workers dealing with the same processing queue. You can
# make worker 1 handle all the records between id 0 and 10,000 and worker 2
# handle from 10,000 and beyond (by setting the +:begin_at+ and +:end_at+
# handle from 10,000 and beyond (by setting the +:start+ and +:finish+
# option on each worker).
#
# # Let's process the next 2000 records
# Person.in_batches(of: 2000, begin_at: 2000).update_all(awesome: true)
# Person.in_batches(of: 2000, start: 2000).update_all(awesome: true)
#
# An example of calling where query method on the relation:
#
Expand All @@ -186,18 +171,18 @@ def find_in_batches(begin_at: nil, end_at: nil, batch_size: 1000, start: nil)
#
# NOTE: You can't set the limit either, that's used to control the batch
# sizes.
def in_batches(of: 1000, begin_at: nil, end_at: nil, load: false)
def in_batches(of: 1000, start: nil, finish: nil, load: false)
relation = self
unless block_given?
return BatchEnumerator.new(of: of, begin_at: begin_at, end_at: end_at, relation: self)
return BatchEnumerator.new(of: of, start: start, finish: finish, relation: self)
end

if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end

relation = relation.reorder(batch_order).limit(of)
relation = apply_limits(relation, begin_at, end_at)
relation = apply_limits(relation, start, finish)
batch_relation = relation

loop do
Expand Down Expand Up @@ -225,9 +210,9 @@ def in_batches(of: 1000, begin_at: nil, end_at: nil, load: false)

private

def apply_limits(relation, begin_at, end_at)
relation = relation.where(table[primary_key].gteq(begin_at)) if begin_at
relation = relation.where(table[primary_key].lteq(end_at)) if end_at
def apply_limits(relation, start, finish)
relation = relation.where(table[primary_key].gteq(start)) if start
relation = relation.where(table[primary_key].lteq(finish)) if finish
relation
end

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ module Batches
class BatchEnumerator
include Enumerable

def initialize(of: 1000, begin_at: nil, end_at: nil, relation:) #:nodoc:
def initialize(of: 1000, start: nil, finish: nil, relation:) #:nodoc:
@of = of
@relation = relation
@begin_at = begin_at
@end_at = end_at
@start = start
@finish = finish
end

# Looping through a collection of records from the database (using the
Expand All @@ -34,7 +34,7 @@ def initialize(of: 1000, begin_at: nil, end_at: nil, relation:) #:nodoc:
def each_record
return to_enum(:each_record) unless block_given?

@relation.to_enum(:in_batches, of: @of, begin_at: @begin_at, end_at: @end_at, load: true).each do |relation|
@relation.to_enum(:in_batches, of: @of, start: @start, finish: @finish, load: true).each do |relation|
relation.to_a.each { |record| yield record }
end
end
Expand All @@ -46,7 +46,7 @@ def each_record
# People.in_batches.update_all('age = age + 1')
[:delete_all, :update_all, :destroy_all].each do |method|
define_method(method) do |*args, &block|
@relation.to_enum(:in_batches, of: @of, begin_at: @begin_at, end_at: @end_at, load: false).each do |relation|
@relation.to_enum(:in_batches, of: @of, start: @start, finish: @finish, load: false).each do |relation|
relation.send(method, *args, &block)
end
end
Expand All @@ -58,7 +58,7 @@ def each_record
# relation.update_all(awesome: true)
# end
def each
enum = @relation.to_enum(:in_batches, of: @of, begin_at: @begin_at, end_at: @end_at, load: false)
enum = @relation.to_enum(:in_batches, of: @of, start: @start, finish: @finish, load: false)
return enum.each { |relation| yield relation } if block_given?
enum
end
Expand Down
45 changes: 12 additions & 33 deletions activerecord/test/cases/batches_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def test_each_should_return_an_enumerator_if_no_block_is_present
if Enumerator.method_defined? :size
def test_each_should_return_a_sized_enumerator
assert_equal 11, Post.find_each(batch_size: 1).size
assert_equal 5, Post.find_each(batch_size: 2, begin_at: 7).size
assert_equal 5, Post.find_each(batch_size: 2, start: 7).size
assert_equal 11, Post.find_each(batch_size: 10_000).size
end
end
Expand Down Expand Up @@ -101,16 +101,16 @@ def test_find_in_batches_should_return_batches

def test_find_in_batches_should_start_from_the_start_option
assert_queries(@total) do
Post.find_in_batches(batch_size: 1, begin_at: 2) do |batch|
Post.find_in_batches(batch_size: 1, start: 2) do |batch|
assert_kind_of Array, batch
assert_kind_of Post, batch.first
end
end
end

def test_find_in_batches_should_end_at_the_end_option
def test_find_in_batches_should_finish_the_end_option
assert_queries(6) do
Post.find_in_batches(batch_size: 1, end_at: 5) do |batch|
Post.find_in_batches(batch_size: 1, finish: 5) do |batch|
assert_kind_of Array, batch
assert_kind_of Post, batch.first
end
Expand Down Expand Up @@ -175,7 +175,7 @@ def test_find_in_batches_should_not_ignore_the_default_scope_if_it_is_other_then

def test_find_in_batches_should_not_modify_passed_options
assert_nothing_raised do
Post.find_in_batches({ batch_size: 42, begin_at: 1 }.freeze){}
Post.find_in_batches({ batch_size: 42, start: 1 }.freeze){}
end
end

Expand All @@ -184,7 +184,7 @@ def test_find_in_batches_should_use_any_column_as_primary_key
start_nick = nick_order_subscribers.second.nick

subscribers = []
Subscriber.find_in_batches(batch_size: 1, begin_at: start_nick) do |batch|
Subscriber.find_in_batches(batch_size: 1, start: start_nick) do |batch|
subscribers.concat(batch)
end

Expand Down Expand Up @@ -311,15 +311,15 @@ def test_in_batches_should_return_relations
def test_in_batches_should_start_from_the_start_option
post = Post.order('id ASC').where('id >= ?', 2).first
assert_queries(2) do
relation = Post.in_batches(of: 1, begin_at: 2).first
relation = Post.in_batches(of: 1, start: 2).first
assert_equal post, relation.first
end
end

def test_in_batches_should_end_at_the_end_option
def test_in_batches_should_finish_the_end_option
post = Post.order('id DESC').where('id <= ?', 5).first
assert_queries(7) do
relation = Post.in_batches(of: 1, end_at: 5, load: true).reverse_each.first
relation = Post.in_batches(of: 1, finish: 5, load: true).reverse_each.first
assert_equal post, relation.last
end
end
Expand Down Expand Up @@ -371,7 +371,7 @@ def test_in_batches_should_not_ignore_default_scope_without_order_statements

def test_in_batches_should_not_modify_passed_options
assert_nothing_raised do
Post.in_batches({ of: 42, begin_at: 1 }.freeze){}
Post.in_batches({ of: 42, start: 1 }.freeze){}
end
end

Expand All @@ -380,7 +380,7 @@ def test_in_batches_should_use_any_column_as_primary_key
start_nick = nick_order_subscribers.second.nick

subscribers = []
Subscriber.in_batches(of: 1, begin_at: start_nick) do |relation|
Subscriber.in_batches(of: 1, start: start_nick) do |relation|
subscribers.concat(relation)
end

Expand Down Expand Up @@ -441,32 +441,11 @@ def test_in_batches_relations_update_all_should_not_affect_matching_records_in_o
assert_equal 2, person.reload.author_id # incremented only once
end

def test_find_in_batches_start_deprecated
assert_deprecated do
assert_queries(@total) do
Post.find_in_batches(batch_size: 1, start: 2) do |batch|
assert_kind_of Array, batch
assert_kind_of Post, batch.first
end
end
end
end

def test_find_each_start_deprecated
assert_deprecated do
assert_queries(@total) do
Post.find_each(batch_size: 1, start: 2) do |post|
assert_kind_of Post, post
end
end
end
end

if Enumerator.method_defined? :size
def test_find_in_batches_should_return_a_sized_enumerator
assert_equal 11, Post.find_in_batches(:batch_size => 1).size
assert_equal 6, Post.find_in_batches(:batch_size => 2).size
assert_equal 4, Post.find_in_batches(batch_size: 2, begin_at: 4).size
assert_equal 4, Post.find_in_batches(batch_size: 2, start: 4).size
assert_equal 4, Post.find_in_batches(:batch_size => 3).size
assert_equal 1, Post.find_in_batches(:batch_size => 10_000).size
end
Expand Down
20 changes: 10 additions & 10 deletions guides/source/active_record_querying.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ end

The `find_each` method accepts most of the options allowed by the regular `find` method, except for `:order` and `:limit`, which are reserved for internal use by `find_each`.

Three additional options, `:batch_size`, `:begin_at` and `:end_at`, are available as well.
Three additional options, `:batch_size`, `:start` and `:finish`, are available as well.

**`:batch_size`**

Expand All @@ -360,34 +360,34 @@ User.find_each(batch_size: 5000) do |user|
end
```

**`:begin_at`**
**`:start`**

By default, records are fetched in ascending order of the primary key, which must be an integer. The `:begin_at` option allows you to configure the first ID of the sequence whenever the lowest ID is not the one you need. This would be useful, for example, if you wanted to resume an interrupted batch process, provided you saved the last processed ID as a checkpoint.
By default, records are fetched in ascending order of the primary key, which must be an integer. The `:start` option allows you to configure the first ID of the sequence whenever the lowest ID is not the one you need. This would be useful, for example, if you wanted to resume an interrupted batch process, provided you saved the last processed ID as a checkpoint.

For example, to send newsletters only to users with the primary key starting from 2000, and to retrieve them in batches of 5000:

```ruby
User.find_each(begin_at: 2000, batch_size: 5000) do |user|
User.find_each(start: 2000, batch_size: 5000) do |user|
NewsMailer.weekly(user).deliver_now
end
```

**`:end_at`**
**`:finish`**

Similar to the `:begin_at` option, `:end_at` allows you to configure the last ID of the sequence whenever the highest ID is not the one you need.
This would be useful, for example, if you wanted to run a batch process, using a subset of records based on `:begin_at` and `:end_at`
Similar to the `:start` option, `:finish` allows you to configure the last ID of the sequence whenever the highest ID is not the one you need.
This would be useful, for example, if you wanted to run a batch process, using a subset of records based on `:start` and `:finish`

For example, to send newsletters only to users with the primary key starting from 2000 up to 10000 and to retrieve them in batches of 5000:

```ruby
User.find_each(begin_at: 2000, end_at: 10000, batch_size: 5000) do |user|
User.find_each(start: 2000, finish: 10000, batch_size: 5000) do |user|
NewsMailer.weekly(user).deliver_now
end
```

Another example would be if you wanted multiple workers handling the same
processing queue. You could have each worker handle 10000 records by setting the
appropriate `:begin_at` and `:end_at` options on each worker.
appropriate `:start` and `:finish` options on each worker.

#### `find_in_batches`

Expand All @@ -402,7 +402,7 @@ end

##### Options for `find_in_batches`

The `find_in_batches` method accepts the same `:batch_size`, `:begin_at` and `:end_at` options as `find_each`.
The `find_in_batches` method accepts the same `:batch_size`, `:start` and `:finish` options as `find_each`.

Conditions
----------
Expand Down

0 comments on commit 426b312

Please sign in to comment.