Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ActiveRecord batching over composite primary keys #48440

Merged
merged 1 commit into from Jun 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 3 additions & 1 deletion activerecord/lib/active_record/relation/batches.rb
Expand Up @@ -343,7 +343,9 @@ def batch_condition(relation, columns, values, operators)
where_clause = predicate_builder[first_clause_column, first_clause_value, operator]

cursor_positions.reverse_each do |column_name, value, operator|
where_clause = predicate_builder[column_name, value, operator].and(where_clause)
where_clause = predicate_builder[column_name, value, operator == :lteq ? :lt : :gt].or(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check it a little bit later but it looks like we are unnecessarily flip-flopping between gteq/lteq and gt/lt

For example in_batches first builds an array that consists of gteq/lteq values

order == :desc ? :lteq : :gteq

then immediately passes it to batch_condition where we flip it back.
Any chance that we could change in_batches on line 310 to build correct values from the beginning? Similar thing seems to be happening in apply_start_limit and apply_finish_limit - we build an array with gteq/lteq just to flip it back later in batch_condition

Though I may be missing some corner cases and I have to admit it's a little hard to follow the clause building logic just by reading it. I'll try to have a closer look later. But at the end of the day it's a tiny performance concern rather than a bug

predicate_builder[column_name, value, :eq].and(where_clause)
)
end

relation.where(where_clause)
Expand Down
44 changes: 30 additions & 14 deletions activerecord/test/cases/batches_test.rb
Expand Up @@ -773,9 +773,13 @@ def test_find_in_batches_should_return_a_sized_enumerator

test ".find_each iterates over composite primary key" do
orders = Cpk::Order.order(*Cpk::Order.primary_key).to_a
Cpk::Order.find_each(batch_size: 1).with_index do |order, index|

index = 0
Cpk::Order.find_each(batch_size: 1) do |order|
assert_equal orders[index], order
index += 1
end
assert_equal orders.size, index
end

test ".in_batches should start from the start option when using composite primary key" do
Expand All @@ -798,37 +802,49 @@ def test_find_in_batches_should_return_a_sized_enumerator
end

test ".find_each with multiple column ordering and using composite primary key" do
Cpk::Book.create(author_id: 1, number: 1)
Cpk::Book.create(author_id: 1, number: 2)
Cpk::Book.create(author_id: 1, number: 3)
Cpk::Book.insert_all!([
{ author_id: 1, number: 1 },
{ author_id: 2, number: 1 },
{ author_id: 2, number: 2 }
])
books = Cpk::Book.order(author_id: :asc, number: :desc).to_a
Cpk::Book.find_each(batch_size: 1, order: [:asc, :desc]).with_index do |book, index|

index = 0
Cpk::Book.find_each(batch_size: 1, order: [:asc, :desc]) do |book|
assert_equal books[index], book
index += 1
end
assert_equal books.size, index
end

test ".in_batches should start from the start option when using composite primary key with multiple column ordering" do
Cpk::Book.create(author_id: 1, number: 1)
Cpk::Book.create(author_id: 1, number: 2)
Cpk::Book.create(author_id: 1, number: 3)
Cpk::Book.insert_all!([
{ author_id: 1, number: 1 },
{ author_id: 1, number: 2 },
{ author_id: 1, number: 3 }
])
second_book = Cpk::Book.order(author_id: :asc, number: :desc).second
relation = Cpk::Book.in_batches(of: 1, start: second_book.id, order: [:asc, :desc]).first
assert_equal second_book, relation.first
end

test ".in_batches should end at the finish option when using composite primary key with multiple column ordering" do
Cpk::Book.create(author_id: 1, number: 1)
Cpk::Book.create(author_id: 1, number: 2)
Cpk::Book.create(author_id: 1, number: 3)
Cpk::Book.insert_all!([
{ author_id: 1, number: 1 },
{ author_id: 1, number: 2 },
{ author_id: 1, number: 3 }
])
second_book = Cpk::Book.order(author_id: :asc, number: :desc).second
relation = Cpk::Book.in_batches(of: 1, finish: second_book.id, order: [:asc, :desc]).to_a.last
assert_equal second_book, relation.first
end

test ".in_batches with scope and multiple column ordering and using composite primary key" do
Cpk::Book.create(author_id: 1, number: 1)
Cpk::Book.create(author_id: 1, number: 2)
Cpk::Book.create(author_id: 1, number: 3)
Cpk::Book.insert_all!([
{ author_id: 1, number: 1 },
{ author_id: 1, number: 2 },
{ author_id: 1, number: 3 }
])
book1, book2 = Cpk::Book.order(author_id: :asc, number: :desc).first(2)
author_id, number = book1.id
relation = Cpk::Book.where("author_id >= ? AND number < ?", author_id, number).in_batches(of: 1, order: [:asc, :desc]).first
Expand Down