Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

#count doesn't honor distinct attributes in select clause #5554

Closed
mtalcott opened this Issue · 15 comments

7 participants

@mtalcott

Calling count on an ActiveRecord::Relation with more than 1 distinct attribute in the select clause doesn't honor the select conditions. For example, this works:

User.select('distinct username').all    #=> SELECT distinct username FROM "users"
User.select('distinct username').count  #=> SELECT COUNT(distinct username) FROM "users"

However, the distinct clause disappears from the count SQL when another distinct attribute is specified:

User.select('distinct username, email').all     #=> SELECT distinct username, email FROM "users"
User.select('distinct username, email').count   #=> SELECT count(*) FROM "users"

Same thing with distinct users.*:

User.select('distinct users.*').all     #=> SELECT distinct users.* FROM "users"
User.select('distinct users.*').count   #=> SELECT count(*) FROM "users"

That means for a relation with multiple distinct columns, relation.count is not necessarily the same as relation.all.count.

For distinct users.*, count(:distinct => true) works (it generates SELECT COUNT(DISTINCT "users"."id") FROM "users"), but that won't work for distinct username, email.

I've verified this behavior on Postgres and SQLite.

@Empact

FYI, as a workaround you can use the :select option to replace the "*" with the columns, &c. of your choosing.

User.count(select: 'distinct users.*') #=> SELECT count(distinct users.*) FROM "users"
@mtalcott

I'm seeing the same behavior with that workaround as well:

User.count(select: 'distinct users.*') #=> SELECT COUNT(*) FROM "users"

@Empact

Ah you're right. Unfortunately I don't have a record of what I originally ran that set me astray.

@jasonhutchens

It seems Kaminari is working around this bug, with massive performance consequences for us. I've written a post describing the issue here: http://devblog.agworld.com.au/post/22194500063/distinctly-uncountable

I've replicated the issue here: https://github.com/agworld/distinctly_uncountable

It seems that #sum is also broken by DISTINCT, as are probably the #min, #max, #avg and other calculations. This could have dire consequences for anyone generating mission-critical reports and wondering why the numbers don't come out right. Heck, it'd be nicer if AR just raised a "too hard" exception, rather than give incorrect results.

BTW, this may be related to #1003.

@mtalcott

Nice writeup of the issue and performance in different cases. It seems that for now, if you need an accurate count/sum/etc. and there is a distinct clause on the query, you have to instantiate the AR object array. Not very performant, but that's better than getting inaccurate results.

1003 is definitely related.

@frodsan

Hi, i think this is expected as you see here.

If you remove that validation, you will get:

>> User.select('distinct username, email').count
   (0.2ms)  SELECT COUNT(distinct username, email) FROM "users" 
SQLite3::SQLException: wrong number of arguments to function COUNT(): SELECT COUNT(distinct username, email) FROM "users" 

>> User.select('distinct users.*').count
   (0.2ms)  SELECT COUNT(distinct users.*) FROM "users" 
SQLite3::SQLException: near "*": syntax error: SELECT COUNT(distinct users.*) FROM "users" 

/cc @carlosantoniodasilva @rafaelfranca I'm not sure about this, but i'd prefer to raise an error to avoid this kind of issues.

@frodsan

Something like:

def select_for_count
  if select_values.present?
    select = select_values.join(", ")
    raise IDontFigureOutAnErrorForThisYet if select = ~/[,*]/
    select
  end
end

What do you think?

@jasonhutchens

Yes, raising an exception is much more preferable to returning an incorrect result.

@frodsan frodsan referenced this issue from a commit
Francesco Rodriguez fix calculation for a relation with multiple selected columns
Before this patch, `count` returns an incorrect result when is
executed with multiple columns:

    User.select('distinct username, email').count
    # => SELECT count(*) FROM "users"

    User.select('distinct users.*').count
    # => SELECT count(*) FROM "users"

It proposes to raise an ActiveRecordError instead.

    >> User.select('distinct username, email').count
    # => ActiveRecord::ActiveRecordError: calculation of multiple columns is not
    #    supported

    >> User.select('distinct users.*').count
    # => ActiveRecord::ActiveRecordError: calculation of multiple columns is not
    #    supported

Fixes #5554.
9d0b444
@rafaelfranca
Owner

Related with #6865

@anthonyalberto anthonyalberto referenced this issue in mislav/will_paginate
Closed

has_many association with uniq #251

@senny
Owner

Is this still happening? I might have fixed this by working on a different issue. Let me know if it's still a problem on master and I'll take a look.

@jasonhutchens

I've updated https://github.com/agworld/distinctly_uncountable to work with rails master, and the tests still fail, so would be great if you can fix this @senny.

You should also note that this bug affects everything in ActiveRecord::Calculations (so sum, average, minimum, maximum and so on, as well as count).

@senny
Owner

I'll take a look

@senny
Owner

not every database supports counting on multiple distinct columns. The fallback is to use subqueries.

The code that silently falls back to count(*) feels hacky at best. I think we should let the user know what's happening and if a count can't be performed.

@dangrahn

@mtalcott, would you mind to show an example of an AR object array that would work around this issue?

Following is generating correct count for me:
User.find(:all, :select => "DISTINCT username, email").count

However, since I need this in a model scope it wont work since this returns an array instead of an AR relation.

@senny senny referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@senny
Owner

I submitted a PR to solve this issue: #10710

@jonleighton jonleighton closed this in #10710
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.