Add `group_by` to `ActiveRecord::FinderMethods` #10447

Closed
wants to merge 1 commit into
from

Projects

None yet

2 participants

@afeld
afeld commented May 3, 2013

Enables collecting of records into sets, grouped by distinct values for the
specified field. Leverages ActiveRecord::Relation to be far more
efficient than Enumerable#group_by when selecting based on a column
name.

Example:

User.group_by('role')
# => {
#   "normal" => #<ActiveRecord::Relation [...]>,
#   "admin" => #<ActiveRecord::Relation [...]>
# }

I work with a lot of people who are new to Rails, and I've had multiple people ask if there was a way to do this with ActiveRecord. Figured it was time to finally support it.

One question: should documentation be added suggesting an index on any column that group_by is being called on regularly? I haven't seen those kinds of tips anywhere else in the documentation, but think it might be beneficial.

@afeld afeld Add `group_by` to `ActiveRecord::FinderMethods`
Enables collecting of records into sets, grouped by distinct values for the
specified `field`. Leverages `ActiveRecord::Relation` to be far more
efficient than `Enumerable#group_by` when selecting based on a column
name.

    Example:

        User.group_by('role')
        # => {
        #   "normal" => #<ActiveRecord::Relation [...]>,
        #   "admin" => #<ActiveRecord::Relation [...]>
        # }
eacb28e
@egilburg
Contributor
egilburg commented May 3, 2013

Could you provide some benchmarks to compare your group_by implementation performance to one that was previously available via Enumerable? Try both lazy (e.g. relations stay relations) and non lazy (all relations eventually become arrays)

@egilburg egilburg commented on the diff May 3, 2013
activerecord/test/cases/finder_test.rb
+
+ keys = groups.keys
+ assert_equal 5, keys.size
+ assert_equal [nil, 'Client', 'DependentFirm', 'ExclusivelyDependentFirm', 'Firm'].to_set, keys.to_set
+
+ assert_equal [2, 3, 5, 10], groups['Client'].map(&:id).sort
+ assert_equal [1, 4], groups['Firm'].map(&:id).sort
+ assert_equal nil, groups['Nonexistent']
+ end
+
+ def test_group_by_with_block
+ groups = Company.all.group_by { |c| c.firm_id }
+
+ group = groups[4]
+ assert_kind_of Array, group
+ assert_equal 2, group.size
@egilburg
egilburg May 3, 2013 Contributor

perhaps have more detailed test here, to ensure both values are what you expect, e.g. by comparing ids or some other attributes

@afeld
afeld commented May 4, 2013

Here you go: https://gist.github.com/afeld/5516129

The new group_by is also more memory-efficient, in that it doesn't need to hold all records in memory after they're grouped.

@egilburg
Contributor
egilburg commented May 4, 2013

To have a basis for comparison, I was expecting to see your group_by vs Enumerable's group_by (i.e. how much performance gain we get after this patch as compared to before this patch). It seems your gist only gives benchmarks for after-patch case, unless I'm missing something. Could you run the same benchmarks before the patch was applied, and compare them to after-patch?

@egilburg egilburg commented on the diff May 4, 2013
activerecord/CHANGELOG.md
@@ -1,3 +1,12 @@
+* Add `ActiveRecord::FinderMethods#group_by` method, which collects records into sets, grouped by distinct values for the specified `field`. This patch leverages `ActiveRecord::Relation` to be far more efficient than `Enumerable#group_by` when selecting based on a column name.
+
+ Example:
+
+ User.group_by('role')
@egilburg
egilburg May 4, 2013 Contributor

Should probably use symbol rather than string here as an example here.

@egilburg egilburg commented on the diff May 4, 2013
...verecord/lib/active_record/relation/finder_methods.rb
@@ -176,6 +176,28 @@ def exists?(conditions = :none)
false
end
+ # Collect records into sets, grouped by distinct values for the specified +field+.
+ #
+ # User.select([:id, :name])
+ # => [#<User id: 1, name: "Oscar">, #<User id: 2, name: "Oscar">, #<User id: 3, name: "Foo">
+ #
+ # User.group_by(:name)
+ # => {"Foo" => #<ActiveRecord::Relation [...]>, "Oscar" => #<ActiveRecord::Relation [...]>}
+ def group_by(*args, &block)
@egilburg
egilburg May 4, 2013 Contributor

If you only expect one name, why do you accept varargrs? super (Enumerable's group_by) doesn't accept args at all (other than block), so there is currently no need for method signature to accept more than a single name.

Granted, it might be interesting for multi-arg group_by to work, returning nested structure such as:

array.group_by(:first_attr, :second_attr)

# =>
#  
#   { 
#    :first_value_1 => { :second_value_1 => subrelation_1, :second_value_2 => subrelation_2 }, 
#    :first_value_2 =>  { :second_value_1 => subrelation_3, :second_value_2 => subrelation_4 }
#  } 

But this topic can be explored in a different PR. For this one, I think its fine to only accept one name, defaulting to nil, and call super(&block) if its nil.

@afeld
afeld May 30, 2013

I thought I tried

def group_by(field = nil, &block)
  # ...
  super(&block)
  # ...
end

and had ArgumentErrors... must be going nuts.

@egilburg egilburg commented on the diff May 4, 2013
...verecord/lib/active_record/relation/finder_methods.rb
@@ -176,6 +176,28 @@ def exists?(conditions = :none)
false
end
+ # Collect records into sets, grouped by distinct values for the specified +field+.
+ #
+ # User.select([:id, :name])
+ # => [#<User id: 1, name: "Oscar">, #<User id: 2, name: "Oscar">, #<User id: 3, name: "Foo">
+ #
+ # User.group_by(:name)
+ # => {"Foo" => #<ActiveRecord::Relation [...]>, "Oscar" => #<ActiveRecord::Relation [...]>}
+ def group_by(*args, &block)
+ field = args[0]
+ if field.nil? || block_given?
+ super(&block)
+ else
+ result = {}
+ self.select(field).distinct.each do |item|
@egilburg
egilburg May 4, 2013 Contributor

don't need to use self. prefix in this method for select or where

@afeld afeld closed this May 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment