Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor inversing logic of CollectionAssociation #45399

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jonathanhefner
Copy link
Member

Follow-up to #40379, #42524, and #43445.

The logic to associate records with a CollectionAssociation when using has_many inversing has become increasingly complex. Prior to this commit, we defined an "add record" method (add_to_target) that can accept a replace: true option, and a "replace record" method (replace_on_target) that can accept a replace: false option. And regardless of the value of the :replace option, the record may be added if it cannot be replaced (i.e. does not exist), or replaced if a complex set of conditions dictate it should not be simply added.

This commit aims to simplify this logic by:

  • Removing the :replace option from add_to_target.
  • Renaming replace_on_target to a low-level add_record method, supporting a :replace option. The name add_record was chosen for parity with the similarly low-level remove_records method.
  • Replacing the complex conditions of replace_on_target with an "implicit target records" concept. Implicit target records are those that have been associated via inversing but not explicitly added. Implicit target records exist in the @target array, and are also tracked via an @implicit_target_records array. Whenever an implicit target record is explicitly added, it is instead removed from the @implicit_target_records array, to prevent a double occurrence in the @target array. (Though if the record is explicitly added a 2nd time, a 2nd occurrence will be added to the @target array. We have tests that expect this.)

Additionally, this commit provides a very small speed increase, according to the following benchmark:

Benchmark
require "benchmark"

ActiveRecord::Base.logger = nil

ActiveRecord::Schema.define do
  create_table :authors, force: true do |t|
  end
  create_table :posts, force: true do |t|
    t.references :author
  end
end

class Author < ActiveRecord::Base
  has_many :posts
end

class Post < ActiveRecord::Base
  belongs_to :author
end

POST_COUNTS = [1, 10, 100, 1000]
ITERATIONS = 100
CALLS_PER_ITERATION = 100

eval <<~RUBY
def ms_per_build(posts)
  Benchmark.ms { #{"posts.build;" * CALLS_PER_ITERATION} }.to_f / CALLS_PER_ITERATION
end

def ms_per_shovel(posts, post)
  Benchmark.ms { #{"posts << post;" * CALLS_PER_ITERATION} }.to_f / CALLS_PER_ITERATION
end
RUBY

def report(label, &block)
  [false, true].each do |has_many_inversing|
    ActiveRecord::Base.has_many_inversing = has_many_inversing

    POST_COUNTS.each do |post_count|
      Post.delete_all
      Post.insert_all(post_count.times.map { { author_id: 1 } })

      block.call # warmup
      ms = ITERATIONS.times.sum(&block).to_f / ITERATIONS

      puts "has_many_inversing == #{has_many_inversing}, posts.count == #{post_count}".ljust(50) +
        "=> `#{label}` @ #{"%.3f" % ms}ms"
    end
  end
end

author = Author.create!(id: 1)

report("posts.build") do
  posts = author.reload.posts.tap(&:load_target)
  ms_per_build(posts)
end

report("posts << post") do
  posts = author.reload.posts.tap(&:load_target)
  post = posts[posts.length / 2]
  ms_per_shovel(posts, post)
end

Results with main branch:

has_many_inversing == false, posts.count == 1     => `posts.build` @ 0.154ms
has_many_inversing == false, posts.count == 10    => `posts.build` @ 0.142ms
has_many_inversing == false, posts.count == 100   => `posts.build` @ 0.141ms
has_many_inversing == false, posts.count == 1000  => `posts.build` @ 0.144ms
has_many_inversing == true, posts.count == 1      => `posts.build` @ 0.143ms
has_many_inversing == true, posts.count == 10     => `posts.build` @ 0.142ms
has_many_inversing == true, posts.count == 100    => `posts.build` @ 0.142ms
has_many_inversing == true, posts.count == 1000   => `posts.build` @ 0.147ms
has_many_inversing == false, posts.count == 1     => `posts << post` @ 0.192ms
has_many_inversing == false, posts.count == 10    => `posts << post` @ 0.192ms
has_many_inversing == false, posts.count == 100   => `posts << post` @ 0.202ms
has_many_inversing == false, posts.count == 1000  => `posts << post` @ 0.196ms
has_many_inversing == true, posts.count == 1      => `posts << post` @ 0.209ms
has_many_inversing == true, posts.count == 10     => `posts << post` @ 0.204ms
has_many_inversing == true, posts.count == 100    => `posts << post` @ 0.221ms
has_many_inversing == true, posts.count == 1000   => `posts << post` @ 0.195ms

Results with this branch:

has_many_inversing == false, posts.count == 1     => `posts.build` @ 0.144ms
has_many_inversing == false, posts.count == 10    => `posts.build` @ 0.132ms
has_many_inversing == false, posts.count == 100   => `posts.build` @ 0.133ms
has_many_inversing == false, posts.count == 1000  => `posts.build` @ 0.135ms
has_many_inversing == true, posts.count == 1      => `posts.build` @ 0.132ms
has_many_inversing == true, posts.count == 10     => `posts.build` @ 0.133ms
has_many_inversing == true, posts.count == 100    => `posts.build` @ 0.132ms
has_many_inversing == true, posts.count == 1000   => `posts.build` @ 0.140ms
has_many_inversing == false, posts.count == 1     => `posts << post` @ 0.200ms
has_many_inversing == false, posts.count == 10    => `posts << post` @ 0.187ms
has_many_inversing == false, posts.count == 100   => `posts << post` @ 0.188ms
has_many_inversing == false, posts.count == 1000  => `posts << post` @ 0.191ms
has_many_inversing == true, posts.count == 1      => `posts << post` @ 0.187ms
has_many_inversing == true, posts.count == 10     => `posts << post` @ 0.188ms
has_many_inversing == true, posts.count == 100    => `posts << post` @ 0.197ms
has_many_inversing == true, posts.count == 1000   => `posts << post` @ 0.189ms

/cc @ghiculescu since you authored #42524, and @jstncarvalho since you authored #43445. If you have time, would mind testing out this branch and verifying that it does not cause a regression (performance or otherwise) for you?

@jonathanhefner jonathanhefner force-pushed the collection_association-refactor-inversing-logic branch from 27a2224 to da7e496 Compare June 19, 2022 02:22
@jonathanhefner jonathanhefner force-pushed the collection_association-refactor-inversing-logic branch from da7e496 to 83f8208 Compare September 16, 2022 16:55
@jonathanhefner jonathanhefner marked this pull request as ready for review September 16, 2022 17:11
Follow-up to rails#40379, rails#42524, and rails#43445.

The logic to associate records with a `CollectionAssociation` when using
`has_many` inversing has become increasingly complex.  Prior to this
commit, we defined an "add record" method (`add_to_target`) that can
accept a `replace: true` option, and a "replace record" method
(`replace_on_target`) that can accept a `replace: false` option. And
regardless of the value of the `:replace` option, the record may be
added if it cannot be replaced (i.e. does not exist), or replaced if a
complex set of conditions dictate it should not be simply added.

This commit aims to simplify this logic by:

* Removing the `:replace` option from `add_to_target`.
* Renaming `replace_on_target` to a low-level `add_record` method,
  supporting a `:replace` option.  The name `add_record` was chosen for
  parity with the similarly low-level `remove_records` method.
* Replacing the complex conditions of `replace_on_target` with an
  "implicit target records" concept.  Implicit target records are those
  that have been associated via inversing but not explicitly added.
  Implicit target records exist in the `@target` array, and are also
  tracked via an `@implicit_target_records` array.  Whenever an implicit
  target record is explicitly added, it is always replaced in the
  `@target` array to prevent a double occurrence, and then removed from
  the `@implicit_target_records` array.  (Thus if the record is
  explicitly added a 2nd time, a 2nd occurrence *will* be added to the
  `@target` array.  We have tests that expect this.)

Additionally, this commit provides a very small speed increase,
according to the following benchmark:

```ruby
require "benchmark"

ActiveRecord::Base.logger = nil

ActiveRecord::Schema.define do
  create_table :authors, force: true do |t|
  end
  create_table :posts, force: true do |t|
    t.references :author
  end
end

class Author < ActiveRecord::Base
  has_many :posts
end

class Post < ActiveRecord::Base
  belongs_to :author
end

POST_COUNTS = [1, 10, 100, 1000]
ITERATIONS = 100
CALLS_PER_ITERATION = 100

eval <<~RUBY
def ms_per_build(posts)
  Benchmark.ms { #{"posts.build;" * CALLS_PER_ITERATION} }.to_f / CALLS_PER_ITERATION
end

def ms_per_shovel(posts, post)
  Benchmark.ms { #{"posts << post;" * CALLS_PER_ITERATION} }.to_f / CALLS_PER_ITERATION
end
RUBY

def report(label, &block)
  [false, true].each do |has_many_inversing|
    ActiveRecord::Base.has_many_inversing = has_many_inversing

    POST_COUNTS.each do |post_count|
      Post.delete_all
      Post.insert_all(post_count.times.map { { author_id: 1 } })

      block.call # warmup
      ms = ITERATIONS.times.sum(&block).to_f / ITERATIONS

      puts "has_many_inversing == #{has_many_inversing}, posts.count == #{post_count}".ljust(50) +
        "=> `#{label}` @ #{"%.3f" % ms}ms"
    end
  end
end

author = Author.create!(id: 1)

report("posts.build") do
  posts = author.reload.posts.tap(&:load_target)
  ms_per_build(posts)
end

report("posts << post") do
  posts = author.reload.posts.tap(&:load_target)
  post = posts[posts.length / 2]
  ms_per_shovel(posts, post)
end
```

Results with `main` branch:

```
has_many_inversing == false, posts.count == 1     => `posts.build` @ 0.154ms
has_many_inversing == false, posts.count == 10    => `posts.build` @ 0.142ms
has_many_inversing == false, posts.count == 100   => `posts.build` @ 0.141ms
has_many_inversing == false, posts.count == 1000  => `posts.build` @ 0.144ms
has_many_inversing == true, posts.count == 1      => `posts.build` @ 0.143ms
has_many_inversing == true, posts.count == 10     => `posts.build` @ 0.142ms
has_many_inversing == true, posts.count == 100    => `posts.build` @ 0.142ms
has_many_inversing == true, posts.count == 1000   => `posts.build` @ 0.147ms
has_many_inversing == false, posts.count == 1     => `posts << post` @ 0.192ms
has_many_inversing == false, posts.count == 10    => `posts << post` @ 0.192ms
has_many_inversing == false, posts.count == 100   => `posts << post` @ 0.202ms
has_many_inversing == false, posts.count == 1000  => `posts << post` @ 0.196ms
has_many_inversing == true, posts.count == 1      => `posts << post` @ 0.209ms
has_many_inversing == true, posts.count == 10     => `posts << post` @ 0.204ms
has_many_inversing == true, posts.count == 100    => `posts << post` @ 0.221ms
has_many_inversing == true, posts.count == 1000   => `posts << post` @ 0.195ms
```

Results with this branch:

```
has_many_inversing == false, posts.count == 1     => `posts.build` @ 0.144ms
has_many_inversing == false, posts.count == 10    => `posts.build` @ 0.132ms
has_many_inversing == false, posts.count == 100   => `posts.build` @ 0.133ms
has_many_inversing == false, posts.count == 1000  => `posts.build` @ 0.135ms
has_many_inversing == true, posts.count == 1      => `posts.build` @ 0.132ms
has_many_inversing == true, posts.count == 10     => `posts.build` @ 0.133ms
has_many_inversing == true, posts.count == 100    => `posts.build` @ 0.132ms
has_many_inversing == true, posts.count == 1000   => `posts.build` @ 0.140ms
has_many_inversing == false, posts.count == 1     => `posts << post` @ 0.200ms
has_many_inversing == false, posts.count == 10    => `posts << post` @ 0.187ms
has_many_inversing == false, posts.count == 100   => `posts << post` @ 0.188ms
has_many_inversing == false, posts.count == 1000  => `posts << post` @ 0.191ms
has_many_inversing == true, posts.count == 1      => `posts << post` @ 0.187ms
has_many_inversing == true, posts.count == 10     => `posts << post` @ 0.188ms
has_many_inversing == true, posts.count == 100    => `posts << post` @ 0.197ms
has_many_inversing == true, posts.count == 1000   => `posts << post` @ 0.189ms
```
@rafaelfranca rafaelfranca force-pushed the collection_association-refactor-inversing-logic branch from 77889e8 to ed79453 Compare April 22, 2024 18:29
def add_to_target(record, skip_callbacks: false, replace: false, &block)
replace_on_target(record, skip_callbacks, replace: replace || association_scope.distinct_value, &block)
def add_to_target(record, skip_callbacks: false, &block)
add_record(record, replace: association_scope.distinct_value, skip_callbacks: skip_callbacks, &block)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this changes the behavior on build when replace: was being passed as true? Now it will use association_scope.distinct_value to determine if it should replace or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, replace: true was added in 50e4aaa because the following line ends up calling replace_on_target twice:

add_to_target(build_record(attributes, &block), replace: true)

The 1st time via target= inside the build_record call, and the 2nd time via add_to_target.

For the 2nd call, replace: true causes the following conditional to compute index:

def replace_on_target(record, skip_callbacks, replace:, inversing: false)
if replace && (!record.new_record? || @replaced_or_added_targets.include?(record))
index = @target.index(record)
end

However, a "fallback" was also added in 6f6c441 to compute index anyway:

if !index && @replaced_or_added_targets.include?(record)
index = @target.index(record)
end

In other words, I believe replace: true inside build is no longer necessary even on main.

That said, this PR changes how targets are tracked. With this PR, CollectionAssociation#inversed_from will override Association#inversed_from, so build_record inside build will call add_record with implicit: true instead of target=.

Thus the 2nd call to add_to_target / add_record will find the record in @implicit_target_records and replace it at index:

was_implicit = @implicit_target_records&.delete(record)
(@implicit_target_records ||= []) << record if implicit
if (replace || was_implicit) && index = target.index(record)
target[index] = record

It has been a long time since I looked at this code though, so I could be missing something. A while ago I asked Eileen to test this branch against the Shopify codebase, and she said there were errors but the cause wasn't clear. Unfortunately, I was not able to dig further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I just tried this PR again and there were some errors. Some of them is because we were calling add_to_target directly with replace: true. The others were is not clear. But points to me that some behavior is changing in this refactoring, so I was trying to see visually what would be the low hanging fruits. I can dig deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants