Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add `group_by` to `ActiveRecord::FinderMethods`, take two #10797

Closed
wants to merge 1 commit into from

4 participants

@afeld

Replaces #10447, since Github Pull Requests don't understand rebasing.

  • Incorporated feedback from @egilburg.
  • Updated benchmarks, and included pre-patch times here.

Enables collecting of records into sets, grouped by distinct values for the specified field. Leverages ActiveRecord::Relation to be far more efficient than Enumerable#group_by when selecting based on a column name.

Example:

User.group_by(:role)
# => {
#   "normal" => #<ActiveRecord::Relation [...]>,
#   "admin" => #<ActiveRecord::Relation [...]>
# }

I work with a lot of people who are new to Rails, and I've had multiple people ask if there was a way to do this with ActiveRecord. Figured it was time to finally support it.

One question: should documentation be added suggesting an index on any column that group_by is being called on regularly? I haven't seen those kinds of tips anywhere else in the documentation, but think it might be beneficial.

@senny
Owner

@afeld why did you open a new PR? You can rebase and force-push to an open PR just fine...

@afeld

I did that... Wasn't showing the right thing in the Commits panel. Almost seemed like a caching issue, actually. Or maybe I confused myself.

@senny
Owner

@afeld it takes a couple of seconds (~20) to update... We can continue with this PR but try not to reopen in the future, it scatters the discussions and is hard to follow.

@afeld

Bump!

activerecord/CHANGELOG.md
@@ -1,3 +1,12 @@
+* Add `ActiveRecord::FinderMethods#group_by` method, which collects records into sets, grouped by distinct values for the specified `field`. This patch leverages `ActiveRecord::Relation` to be far more efficient than `Enumerable#group_by` when selecting based on a column name.
@robin850 Collaborator
robin850 added a note

Could you wrap your additions around 80 chars please ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@robin850
Collaborator

Thanks for your patch, this looks like a great feature to me and the required code is pretty simple! :heart:

I take the liberty to report a little problem I've noticed however (but it may be up to the user to resolve this problem). Let's say I have a Task model with a complete field without any default value. If I run:

Task.select(:complete).distinct

I get nil, true and false as possibilities. I don't know if I'm not using the boolean fields correctly but I think all these three values are valid. However, I would expect group_by to return two sets of data : one with the complete tasks and another with the incomplete ones.

Just my two cents here, I don't know if it really matters, the user could gather the two arrays on the controller side anyway.

@robin850
Collaborator

Also, I don't know if the patch will be accepted but if it is, you will need to rebase when a core member reviewed this and I think we could update the "Active Record Query Interface" guide accordingly.

@afeld afeld Add `group_by` to `ActiveRecord::FinderMethods`
Enables collecting of records into sets, grouped by distinct values for the
specified `field`. Leverages `ActiveRecord::Relation` to be far more
efficient than `Enumerable#group_by` when selecting based on a column
name.

    Example:

        User.group_by('role')
        # => {
        #   "normal" => #<ActiveRecord::Relation [...]>,
        #   "admin" => #<ActiveRecord::Relation [...]>
        # }
5f0e96b
@afeld

Thanks for the feedback! Rebased, hard-wrapped the CHANGELOG entry, and added an entry to the guide.

I would expect group_by to return two sets of data: one with the complete tasks and another with the incomplete ones

I agree it's confusing (especially for newbies) that boolean columns can be nil... I'd probably recommend that the migration generator set default: false when adding boolean columns to avoid the problem entirely. The same issue would come up with queries like Task.where(complete: false). Despite being an anti-pattern, there's nothing stopping someone from treating nil differently than false, so I don't think it's worth trying to be clever in this one method.

@robin850 robin850 commented on the diff
guides/source/active_record_querying.md
@@ -659,6 +661,25 @@ FROM orders
GROUP BY date(created_at)
```
+### `group_by`
+
+Similar to [Ruby's `Enumerable#group_by`](http://ruby-doc.org/core/Enumerable.html#method-i-group_by), Active Record's `group_by` method provides a way to retrieve all records that correspond to each distinct value.
@robin850 Collaborator
robin850 added a note

Could you wrap this line around 80 chars as well please ? :smile:

@afeld
afeld added a note

None of the rest of the guide is wrapped, right?

@robin850 Collaborator
robin850 added a note

Yes but we are slowly wrapping the new additions around 80 chars. ;-) But let's wait some more feedback before updating anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@robin850 robin850 commented on the diff
guides/source/active_record_querying.md
@@ -659,6 +661,25 @@ FROM orders
GROUP BY date(created_at)
```
+### `group_by`
+
+Similar to [Ruby's `Enumerable#group_by`](http://ruby-doc.org/core/Enumerable.html#method-i-group_by), Active Record's `group_by` method provides a way to retrieve all records that correspond to each distinct value.
+
+For example, to get the list of users with each role:
@robin850 Collaborator
robin850 added a note

This is not consistent with your example ; you should either update the example or this sentence. Actually, I would be a bit more descriptive, saying something like "Given you have a X model with a Y column, you can group ...", what do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@robin850 robin850 commented on the diff
guides/source/active_record_querying.md
@@ -659,6 +661,25 @@ FROM orders
GROUP BY date(created_at)
```
+### `group_by`
+
+Similar to [Ruby's `Enumerable#group_by`](http://ruby-doc.org/core/Enumerable.html#method-i-group_by), Active Record's `group_by` method provides a way to retrieve all records that correspond to each distinct value.
+
+For example, to get the list of users with each role:
+
+```ruby
+Address.group_by(:state)
+# => {"AL" => #<ActiveRecord::Relation [...]>, "AK" => #<ActiveRecord::Relation [...]>, ...}
+```
+
+Under the hood, it will execute the SQL to retrieve the distinct values of the column:
@robin850 Collaborator
robin850 added a note

"it will execute the following SQL query", what do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@robin850 robin850 commented on the diff
...verecord/lib/active_record/relation/finder_methods.rb
@@ -217,6 +217,27 @@ def exists?(conditions = :none)
connection.select_value(relation, "#{name} Exists", relation.bind_values) ? true : false
end
+ # Collect records into sets, grouped by distinct values for the specified +field+.
+ #
+ # User.select([:id, :name])
+ # => [#<User id: 1, name: "Oscar">, #<User id: 2, name: "Oscar">, #<User id: 3, name: "Foo">
+ #
+ # User.group_by(:name)
+ # => {"Foo" => #<ActiveRecord::Relation [...]>, "Oscar" => #<ActiveRecord::Relation [...]>}
+ def group_by(field = nil, &block)
+ if field.nil? || block_given?
+ super(&block)
+ else
+ result = {}
@robin850 Collaborator
robin850 added a note

What do you think about using an HashWithIndifferentAccess here ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@robin850
Collaborator

I would advise you not to update this too quickly ; I don't know if this will be accepted (but I hope).

I'd probably recommend that the migration generator set default: false when adding boolean columns to avoid the problem entirely. The same issue would come up with queries like Task.where(complete: false). Despite being an anti-pattern, there's nothing stopping someone from treating nil differently than false, so I don't think it's worth trying to be clever in this one method.

It makes sense (in my opinion) ; actually, it looks like the null values on boolean fields are not really handled:

>> Task.where.not(complete: true).length
=> 1
>> Task.where(complete: false).length
=> 1
>> Task.where(complete: nil).length
=> 7
@rafaelfranca
Owner

This method is very inefficient. For each distinct value of the field it will run a query, so we have a N+1 case here where N is the number of distinct values of the field. Also the method name is very easy to confound with the group method.

I'm :-1: for these reasons.

cc @jeremy @carlosantoniodasilva

@afeld

I appreciate all the feedback. Anyone else (from core, especially) have a general :+1: or :-1: before I go in and give it finishing touches?

@senny
Owner

I'm :-1: on this one. If you need something like that, it's better to have the code in the application. It's as simple as:

result = Hash.new {|h, k| h[k] = []}
Model.pluck("DISTINCT name").map do |name|
  result[name] << Model.where(name: name)]
end
result

I don't think adding this snippet to Rails is a good idea because:

  • Method name conflicts with group. One might think group_by is to perform SQL GROUP BY operations.
  • The construct does not abstract a basic SQL operation but a higher level construct. It's better to keep that code in the applications.
  • The implementation hides an N+1 query problem.
  • Enumerable#group_by was mentioned. The Relation API is closer to SQL terminology than Enumerable. See #9683 for example. This makes the method name even more confusing.
@afeld

Thanks for all the feedback, everyone. Throwing in the towel.

@afeld afeld closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 3, 2014
  1. @afeld

    Add `group_by` to `ActiveRecord::FinderMethods`

    afeld authored
    Enables collecting of records into sets, grouped by distinct values for the
    specified `field`. Leverages `ActiveRecord::Relation` to be far more
    efficient than `Enumerable#group_by` when selecting based on a column
    name.
    
        Example:
    
            User.group_by('role')
            # => {
            #   "normal" => #<ActiveRecord::Relation [...]>,
            #   "admin" => #<ActiveRecord::Relation [...]>
            # }
This page is out of date. Refresh to see the latest.
View
12 activerecord/CHANGELOG.md
@@ -1,3 +1,15 @@
+* Add `ActiveRecord::FinderMethods#group_by` method, which collects records
+ into sets, grouped by distinct values for the specified `field`. This patch
+ leverages `ActiveRecord::Relation` to be far more efficient than
+ `Enumerable#group_by` when selecting based on a column name.
+
+ Example:
+
+ User.group_by(:role)
+ # => {"normal" => #<ActiveRecord::Relation [...]>, "admin" => #<ActiveRecord::Relation [...]>}
+
+ *Aidan Feldman*
+
* Since the `test_help.rb` in Railties now automatically maintains
your test schema, the `rake db:test:*` tasks are deprecated. This
doesn't stop you manually running other tasks on your test database
View
21 activerecord/lib/active_record/relation/finder_methods.rb
@@ -217,6 +217,27 @@ def exists?(conditions = :none)
connection.select_value(relation, "#{name} Exists", relation.bind_values) ? true : false
end
+ # Collect records into sets, grouped by distinct values for the specified +field+.
+ #
+ # User.select([:id, :name])
+ # => [#<User id: 1, name: "Oscar">, #<User id: 2, name: "Oscar">, #<User id: 3, name: "Foo">
+ #
+ # User.group_by(:name)
+ # => {"Foo" => #<ActiveRecord::Relation [...]>, "Oscar" => #<ActiveRecord::Relation [...]>}
+ def group_by(field = nil, &block)
+ if field.nil? || block_given?
+ super(&block)
+ else
+ result = {}
@robin850 Collaborator
robin850 added a note

What do you think about using an HashWithIndifferentAccess here ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ select(field).distinct.each do |item|
+ value = item[field]
+ result[value] = where(field => value)
+ end
+
+ result
+ end
+ end
+
# This method is called whenever no records are found with either a single
# id or multiple ids and raises a +ActiveRecord::RecordNotFound+ exception.
#
View
29 activerecord/test/cases/finder_test.rb
@@ -883,6 +883,35 @@ def test_finder_with_offset_string
assert_nothing_raised(ActiveRecord::StatementInvalid) { Topic.offset("3").to_a }
end
+ def test_group_by_with_no_args
+ assert_kind_of Enumerator, Company.all.group_by
+ end
+
+ def test_group_by_with_symbol
+ groups = Company.all.group_by(:type)
+ assert_kind_of Hash, groups
+
+ keys = groups.keys
+ assert_equal 5, keys.size
+ assert_equal [nil, 'Client', 'DependentFirm', 'ExclusivelyDependentFirm', 'Firm'].to_set, keys.to_set
+
+ assert_equal [2, 3, 5, 10], groups['Client'].map(&:id).sort
+ assert_equal [1, 4], groups['Firm'].map(&:id).sort
+ assert_equal nil, groups['Nonexistent']
+ end
+
+ def test_group_by_with_block
+ groups = Company.all.group_by { |c| c.firm_id }
+
+ group = groups[4]
+ assert_kind_of Array, group
+ assert_equal [5, 10], group.map(&:id).sort
+ end
+
+ def test_group_by_with_no_args
+ assert_kind_of Enumerable, Company.all.group_by
+ end
+
protected
def bind(statement, *vars)
if vars.first.is_a?(Hash)
View
25 guides/source/active_record_querying.md
@@ -638,8 +638,10 @@ will return instead a maximum of 5 clients beginning with the 31st. The SQL look
SELECT * FROM clients LIMIT 5 OFFSET 30
```
-Group
------
+Grouping
+--------
+
+### `group`
To apply a `GROUP BY` clause to the SQL fired by the finder, you can specify the `group` method on the find.
@@ -659,6 +661,25 @@ FROM orders
GROUP BY date(created_at)
```
+### `group_by`
+
+Similar to [Ruby's `Enumerable#group_by`](http://ruby-doc.org/core/Enumerable.html#method-i-group_by), Active Record's `group_by` method provides a way to retrieve all records that correspond to each distinct value.
@robin850 Collaborator
robin850 added a note

Could you wrap this line around 80 chars as well please ? :smile:

@afeld
afeld added a note

None of the rest of the guide is wrapped, right?

@robin850 Collaborator
robin850 added a note

Yes but we are slowly wrapping the new additions around 80 chars. ;-) But let's wait some more feedback before updating anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
+For example, to get the list of users with each role:
@robin850 Collaborator
robin850 added a note

This is not consistent with your example ; you should either update the example or this sentence. Actually, I would be a bit more descriptive, saying something like "Given you have a X model with a Y column, you can group ...", what do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
+```ruby
+Address.group_by(:state)
+# => {"AL" => #<ActiveRecord::Relation [...]>, "AK" => #<ActiveRecord::Relation [...]>, ...}
+```
+
+Under the hood, it will execute the SQL to retrieve the distinct values of the column:
@robin850 Collaborator
robin850 added a note

"it will execute the following SQL query", what do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
+```sql
+# SELECT DISTINCT state FROM addresses
+```
+
+Without loading the full records from the database.
+
Having
------
Something went wrong with that request. Please try again.