Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory consumption when calling first & last on dirty association #39627

Conversation

alipman88
Copy link
Contributor

Avoid loading association when calling first, last or other finder methods (e.g. take, second, third) on a dirty association. Instead of loading all records in an association from the database, load no more than the requested limit. If the number of available database records falls short of the requested limit, pad as needed with unsaved records.

Fixes #39455.

@alipman88 alipman88 force-pushed the reduce_association_finder_methods_memory_consumption branch 4 times, most recently from 3ced014 to 98b4980 Compare June 15, 2020 02:40
@alipman88 alipman88 marked this pull request as draft June 15, 2020 02:46
@alipman88 alipman88 force-pushed the reduce_association_finder_methods_memory_consumption branch 4 times, most recently from 8bad7ad to 811e47e Compare June 16, 2020 01:10
@alipman88 alipman88 marked this pull request as ready for review June 16, 2020 01:11
@alipman88
Copy link
Contributor Author

Benchmarking memory usage, mostly for fun -

Memory usage before (running the following script without the patched files) and after (with the patch):

before:
Total allocated: 1.22 MB (12316 objects)
Total retained:  1.09 MB (10031 objects)

after:
Total allocated: 13.63 kB (219 objects)
Total retained:  2.59 kB (30 objects)
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails"
  gem "sqlite3"
  gem "memory_profiler"
end

require "active_record"
require "minitest/autorun"
require "logger"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
    t.string :content
  end
end

# patched files - refer to PR for details
require './collection_association'
require './collection_proxy'
require './finder_methods'

class Post < ActiveRecord::Base
  has_many :comments
end

class Comment < ActiveRecord::Base
  belongs_to :post
end

post = Post.create
1000.times { post.comments.create }
post.reload
post.comments.build

report = MemoryProfiler.report do
  post.comments.first
end

report.pretty_print(detailed_report: false, scale_bytes: true)

@alipman88 alipman88 force-pushed the reduce_association_finder_methods_memory_consumption branch from 811e47e to 303de7a Compare June 16, 2020 05:20
@alipman88
Copy link
Contributor Author

@kamipo, would you care to review this? I believe I've worked around the challenges you noted in #39455 in a way that negligibly affects behavior.

@kamipo
Copy link
Member

kamipo commented Jun 22, 2020

I'm still don't get about the use case that .first after .association.build (expects .association.exists?? or expects unsaved record returned?).

I'm not excited to change any behavior for such like unsure use case, at least this PR will possibly change returned record (I've added a regression test for that 60b43ab).

@alipman88
Copy link
Contributor Author

alipman88 commented Jun 24, 2020

Thank you for the feedback, @kamipo.

I don't fully understand the use case for calling .first after .association.build either (perhaps @rovr can chime in), but I'm worried that by loading an entire association, we're leaving a hazard in the codebase for developers to stumble over.

I recognize your concern about changing behavior, but I want to point out the existing behavior isn't consistent - .first and .last can already return different records depending on whether an association is loaded:

# This test currently fails
def test_calling_first_on_association_returns_same_record_whether_loaded_or_not_loaded
  topics = Topic.where(author_name: "Carl")
  assert_equal topics.first, topics.reload.first
end

This conflicts with the documentation, which states that first will order by primary key by default: https://guides.rubyonrails.org/active_record_querying.html#first

I'm starting to wonder if a cleaner solution is to always use ordered_relation when calling first and last, and ignore loaded relations entirely (perhaps unless strict_loading is on). This would change behavior, but also removes some inconsistency and meets the specification in the Rails guide. I defer to you and the Rails team, but the existing code feels somewhat off/inelegant to me.

I would like to think about this some more, and maybe submit another revision for folks to look at in a few days, or open an issue on first & last not ordering by primary key for loaded relations.

@rovr
Copy link

rovr commented Jun 24, 2020

@alipman88 Thank you for looking into this issue!

In our case we were checking that the new submission is not a duplicate, so after building a new record we were pulling the previous saved record and comparing some sets of values for similarity. To our suprise that check despite asking for .last was loading all user records.

A possibly more common use case I could foresee might be a form to create a new record + displaying some previous records on the same page. E.g. form_for user.comments.new and somewhere nearby render of user.comments.last(5) that would actually be loading every single user comment, not the last(5) as is explicitly requested.

It was a five second fix for us with a manual .limit(1) but it was indeed quite a stumble and I though it might be worth to bring this to you guys's attention.

@kamipo
Copy link
Member

kamipo commented Jun 24, 2020

I know the inconsistency (relation.first and relation.to_a.first are not always same result) enough, that is a reason why I was able to write a regression test right away, and I also know that people had sometimes confused about the inconsistency for a long time.

Related: #30800 (comment), #31006, #24131 (comment), #33492.

The inconsistency was introduced at #5153, personally I think it is hard to restore perfectly consistency unless reverting/opt-out #5153.

@alipman88
Copy link
Contributor Author

personally I think it is hard to restore perfectly consistency unless reverting/opt-out #5153.

I think you're right - we might consider using .min_by in find_nth_with_limit/find_nth_from_last to extract ordered records from an unordered loaded collection without the expense of a full sort:

if loaded?
  if order_values.present?
    return records[index, limit] || []
  else
    return records
      .min_by(index + limit) { |r| [r.id ? 0 : 1, r.attributes[implicit_order_column], r.id] }
      .last(limit) || []
  end
end

But I'm not convinced that extra code is worth it. In practice, I don't think the current difference in ordering between loaded & unloaded relations causes that much confusion. (I can only find two issues, #39061 and #13743).

I still would like to reduce the memory usage of calling first & last on dirty associations, perhaps in a minor release where some behavior change can be justified.

@alipman88 alipman88 force-pushed the reduce_association_finder_methods_memory_consumption branch 3 times, most recently from d808a54 to 28877ef Compare July 1, 2020 03:40
@alipman88
Copy link
Contributor Author

alipman88 commented Jul 1, 2020

@kamipo - I've created an alternate branch that satisfies your regression test (by not applying default ordering when calling first or last on a dirty association, thus mirroring the ordering of loaded associations): alipman88@da8109a

But I'm not sure if preserving the existing behavior is worth the extra code complexity - it may be wise to hold off on this for now, and reconsider it for an upcoming minor release when we might excuse a behavior change.

I intend to take a break from this issue. I respect your judgement either way. 🙂

@alipman88 alipman88 force-pushed the reduce_association_finder_methods_memory_consumption branch from 28877ef to ef126d6 Compare July 1, 2020 04:20
Avoid loading association when calling first, last or other finder
methods (e.g. take, second, third) on a dirty association. Instead of
loading all records in an association from the database, load no more
than the requested limit. If the number of available database records
falls short of the requested limit, pad as needed with unsaved records.
@alipman88 alipman88 force-pushed the reduce_association_finder_methods_memory_consumption branch from ef126d6 to c165caa Compare July 24, 2020 01:01
@rails-bot
Copy link

rails-bot bot commented Oct 22, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Oct 22, 2020
@rails-bot rails-bot bot closed this Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calling associations.new results in an unexpected change in behavior of .first, .last
3 participants