Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batched_method for custom batch preloading in ActiveRecord #42010

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

seejohnrun
Copy link
Member

@seejohnrun seejohnrun commented Apr 18, 2021

Summary

Often with Rails models, we end up iterating over a collection or
relation and calling the same method repeatedly. For these cases,
includes comes in really handy, but it has the limitation that it
only works on associations. More complex logic needs separate one-off
code to load in a way that will still avoid N+1s.

We write a lot of code like this in our applications, and it became
clear that a lot of these shared the same type of interface. There would
be a class method that took in an Array of objects to load, and
returned a Hash where the keys of the hash are the elements in the
Array and the values are the results, which would be loaded all at once.
What would it look like to build this pattern into Rails?

This commit proposes an interface called batched_method which is a DSL
allowing for the definition of methods in models which play nice
with includes & preload, but can have any definition. The idea is
that you'd define a batch method like:

class Post < ActiveRecord::Base
  # An implementation which takes in an Array[Post], and returns
  # a Hash[Post => Array[Comment]]
  batch_method(:featured_comments) do |posts|
    comments_by_post_id = Comment.featured.where(post: posts).group_by(&:post_id)
    posts.index_with { |post| comments_by_post_id[post.id] }
  end
end

By writing the above implementation, you get the ability to call
featured_comments on a Post object like a normal method, but you can
also use includes(:featured_comments) and it loads them all the first
time that a single one is accessed (!).

I've been playing with this idea over in https://github.com/seejohnrun/prelude
(see README for more detail on motivations) for a bit and really think that
this would be a nice addition to Rails to help application authors think more
in terms of batching when iterating over model objects. I've taken
inspiration from over there, but rewritten the implementation to be clearer.

There are a bunch of ways that this can get more robust in the future,
but for this commit I've included a few neat additional features:

  • The ability to define a batch_size on individual batch method
    definitions so that they load elements in batches only up to a certain
    size.

  • The ability to define batched methods that take arguments. This is
    useful for common patterns like authentication checks where you want
    to batch things like post.editable_by?(current_user).

  • Compatibility with batch methods that happen to return a Hash with #default_proc.

Thanks for reading and I appreciate any feedback!

cc / @eileencodes & @jhawthorn who I've chatted with about this idea previously

@seejohnrun seejohnrun force-pushed the batched_method branch 3 times, most recently from 7f85235 to 3bd6d00 Compare April 18, 2021 23:34
Often with Rails models, we end up iterating over a collection or
relation and calling the same method repeatedly. For these cases,
`includes` comes in really handy, but it has the limitation that it
only works on associations. More complex logic needs separate one-off
code to load in a way that will still avoid N+1s.

We write a lot of code like this in our applications, and it became
clear that a lot of these shared the same type of interface. There would
be a class method that took in an `Array` of objects to load, and
returned a `Hash` where the keys of the hash are the elements in the
`Array` and the values are the results, which would be loaded all at once.
What would it look like to build this pattern into Rails?

This commit proposes an interface called `batched_method` which is a DSL
allowing for the definition of methods in models which play nice
with `includes` & `preload`, but can have any definition. The idea is
that you'd define a batch method like:

``` ruby
class Post < ActiveRecord::Base
  # An implementation which takes in an Array[Post], and returns
  # a Hash[Post]=>Array[Comment]
  batch_method(:featured_comments) do |posts|
    comments_by_post_id = Comment.featured.where(post: posts).group_by(&:post_id)
    posts.index_with { |post| comments_by_post_id[post.id] }
  end
end
```

By writing the above implementation, you get the ability to call
`featured_comments` on a `Post` object like a normal method, but you can
also use `includes(:featured_comments)` and it loads them all the first
time that a single one is accessed (!).

I've been playing with this idea over in https://github.com/seejohnrun/prelude
(see README for more detail on motivations) for a bit and really think that
this would be a nice addition to Rails to help application authors think more
in terms of batching when iterating over model objects. I've taken
inspiration from over there, but rewritten the implementation to be clearer.

There are a bunch of ways that this can get more robust in the future,
but for this commit I've included a few neat additional features:

* The ability to define a `batch_size` on individual batch method
  definitions so that they load elements in batches only up to a certain
  size.

* The ability to define batched methods that take arguments. This is
  useful for common patterns like authentication checks where you want
  to batch things like `post.editable_by?(current_user)`.

* Compatibility with batch methods that happen to return `Hash#default_proc`.

Thanks for reading and I appreciate any feedback!
@kaspth
Copy link
Contributor

kaspth commented Apr 19, 2021

Interesting! There's definitely something about the singular records access and the many-records access/preload.

Do we have more examples to look at? As it stands to me featured_comments could be rewritten as has_many :featured_comments, -> { featured }, class_name: "Comment" and we'd get all the stated benefits, unless I'm mistaken?

I don't quite get what other cases we'd see, or how the API would be used in more advanced cases. I worry it's a little too invention-heavy versus leaning on extraction. That's essentially what played out, when I tried to scope out some other possible expressions for this API. Lots of places to go, but what do we want?

I'll share these in case the naming is useful:

Comment.associated_with(posts: @posts).preload_into :featured_comments, with: -> { featured }

# `many` extracts a Relation-like from a Relation or Array.
# Auto-composes the alias name of `featured_comments` onto Post.
Post.many(@posts).aliased_preload :comments, :featured
Post.many(@posts).aliased_preload :comments, :featured_comments, -> { featured.reverse_chronologically }

# Allow more user-API reflection on associations for ad-hoc tweaks.
# Mostly extracted from the lone `featured` example, so may not hold for more advanced examples.
Post.comments_association(@posts).preload :featured
# posts.each { |post| post.comments.featured.each { |comment| } }

Post.preload_on @posts, :comments, :featured, only: ->(comment) { Current.user.has_access_to? comment }
Post.preload_on @posts, :featured_comments, -> { Comment.featured.reverse_chronologically }

@seejohnrun
Copy link
Member Author

seejohnrun commented Apr 19, 2021

@kaspth Thanks! :) I am happy to provide a few more examples of behavior we've seen here, sorry if a few of these are a bit more indirect I'm trying not to show any real code. I'll also update the description above to include more of the same as far as reasoning.


Example 1

We have a collection of items and want to check for each if the current user is able to edit them. The authentication check happens in a separate system not controlled by the Rails app so we want a way to look up multiple at once and unwrap them per-model. With this API:

batch_method :can_edit? do |items, current_user|
  authentication_entries = AuthenticationService.can_many(object_type: items.first.class, object_ids: items.map(&:id), actor_type: 'User', actor_id: current_user.id)
  authentication_entries_by_object_id = authentication_entries.index_by(&:object_id)
  items.index_with { |item| authentication_entries_by_object_id[item.id].allowed? }
end

Example 2

Sometimes the logic used inside of a method like featured_comments isn't just related to the database. For example, a modified featured_comments adds a Ruby predicate as well:

batch_method(:featured_comments) do |posts|
  comments_by_post_id = Comment.featured.where(post: posts).group_by(&:post_id)
  posts.index_with { |post| comments_by_post_id[post.id].select(&:not_flagged?) }
end

Of course we could include the additional checks in the view, but that's not probably where they belong since in the case of a single-model object we'd certainly want them inside of the model.

Example 3

Sometimes it's more performant to pass multiple blocks that need rendering to a library at once rather than pass them individually. In this example we want to be able to batch calls to a syntax highlighter:

batch_method(:rendered_syntax_html) do |entries|
  entry_raw_syntaxes = entries.map(&:syntax)
  rendered_syntaxes = SyntaxHighlight.highlight_batch(entry_raw_syntaxes)
  entries.index_with.with_index { |*, i| rendered_syntaxes[i] }
end

Example 4

Sometimes I have a more complex query involving potentially custom SQL or interactions with multiple tables and it's just more convenient to define it myself.

batch_method(:some_value) do |entries|
  results = release_orders.select('user_id, sum(total) as lifetime_total').where(entry_id: entries).order('lifetime_total desc').limit(5)
  results_by_entry_id = results.index_by(&:entry_id)
  entries.index_with { |entry| results_by_entry_id[entry.id] }
end

@seejohnrun
Copy link
Member Author

seejohnrun commented Apr 19, 2021

@kaspth Another thing to note is that while this API isn't something we're actively using internally, the pattern of a method that returns a Hash of record to result is something we do all over our codebase for a variety of reasons like those described above. I'm just trying to work through what it would be like to standardize on a single approach since a lot of times we're relying on memoization inside of service objects to accomplish the above. Something like:

def rendered_syntax_html(entry)
  rendered_syntax_htmls_by_entry[entry]
end

private

def rendered_syntax_htmls_by_entry
  return @rendered_syntax_htmls_by_entry if @rendered_syntax_htmls_by_entry

  entry_raw_syntaxes = entries.map(&:syntax)
  rendered_syntaxes = SyntaxHighlight.highlight_batch(entry_raw_syntaxes)
  @rendered_syntax_htmls_by_entry = entries.index_with.with_index { |*, i| rendered_syntaxes[i] }
end

Or worse, building the call to the batch method into our views and destructuring it as we iterate.

seejohnrun added a commit to seejohnrun/prelude that referenced this pull request Apr 27, 2021
In order to bring this library more in line with
rails/rails#42010 we're removing the patch for
now so that all behavior on iteration is opt-in for Relation just like
for Array.

Co-authored-by: John Hawthorn <john@hawthorn.email>
@rails-bot
Copy link

rails-bot bot commented Jul 18, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Jul 18, 2021
@rails-bot rails-bot bot closed this Jul 25, 2021
@eileencodes eileencodes reopened this Jul 26, 2021
@rails-bot rails-bot bot removed the stale label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants