Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix different count calculation when using size with DISTINCT select #35361

Merged

Conversation

jvillarejo
Copy link
Contributor

Summary

When using select with 'DISTINCT( ... )' if you use method size on a non loaded relation it overrides the column selected by passing :all so it returns different value than count.

Fixes #35214

@jvillarejo jvillarejo changed the title fixed different count calculation when using size with DISTINCT select Fix different count calculation when using size with DISTINCT select Feb 21, 2019
@jvillarejo jvillarejo changed the title Fix different count calculation when using size with DISTINCT select Fix different count calculation when using size with DISTINCT select Feb 21, 2019
@jvillarejo jvillarejo force-pushed the fix_wrong_size_query_with_distinct_select branch from c4020d9 to 957ea79 Compare February 21, 2019 21:49
@kamipo
Copy link
Member

kamipo commented Feb 21, 2019

This change also affects to count(:all).
Can you add test case for count(:all) as well (e.g. ebc09ed)?

@jvillarejo
Copy link
Contributor Author

Thanks for your reply! Totally! I will add more tests cases.

@jvillarejo
Copy link
Contributor Author

jvillarejo commented Feb 22, 2019

Hey @kamipo I added more test cases, as some of them didn't gave the correct original count value when reassigning the column_name.

So in order to fix the issue and keep the functionality instead of changing the column_name, distinct is assigned with the select_for_count values.

I think this fixes the issue with the minor impact.

However in terms of code I don't think it's very understandable. Assigning distinct to select_for_count but then 4 lines below assign distinct to nil when column_name isn't :all is very confusing.

What do you think would be better?

Also I think the test cases about size shouldn't be on test/cases/calculations_test.rb file but inside test/cases/relation folder. Do you think that creating a file test/cases/relation/size_test.rb there would be a good approach?

Thanks!

end

def test_size_with_manual_select_with_group_with
c = Account.select("DISTINCT accounts.firm_id").group("accounts.firm_id").size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return 5, since it should be the same result with Account.select("DISTINCT accounts.firm_id").group("accounts.firm_id").load.size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow executing:

Account.select("DISTINCT accounts.firm_id").group("accounts.firm_id").size
# { nil => 1, 1 => 1, 2 => 1, 6 => 2, 9 => 1 }

Now returns the hash. But I agree that it should return the same of:

Account.select("DISTINCT accounts.firm_id").group("accounts.firm_id").load.size
# 5

Should I investigate this for this PR ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to create another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an user, I expect .group.size (same as .group.count) to return an hash. And doing .load.size / to_a.size would return a collection of the instances of the model and then how many of them there are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, relation.size behaves fragile, that is a breaking change though.

relation = Post.left_joins(:comments).group(:id)
relation.size # => return a hash
relation.each { |record| puts record.id }
relation.size # => return an integer

relation = Post.left_joins(:comments).group(:id)
relation.count # => return a hash
relation.each { |record| puts record.id }
relation.count # => return a hash

relation = Post.left_joins(:comments).group(:id)
relation.length # => return an integer
relation.each { |record| puts record.id }
relation.length # => return an integer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in my opinion size and load.size should have the same return value the difference should be that when using just size if the collection is loaded it will not execute a query but if it isn't it executes a query without loading the collection instances.

That they return different values when the collection is loaded and when it isn't is confusing.

def test_size_with_manual_distinct_select_with_loaded_size
size = Account.select("DISTINCT accounts.firm_id").size
loaded_size = Account.select("DISTINCT accounts.firm_id").load.size
assert_equal size, loaded_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the test cases about size shouldn't be on test/cases/calculations_test.rb file

Yes, just put them in test/cases/relations_test.rb for now.

def test_size_with_distinct
posts = Post.distinct.select(:author_id, :comments_count)
assert_queries(1) { assert_equal 8, posts.size }
assert_queries(1) { assert_equal 8, posts.load.size }
end
def test_size_with_eager_loading_and_custom_order
posts = Post.includes(:comments).order("comments.id")
assert_queries(1) { assert_equal 11, posts.size }
assert_queries(1) { assert_equal 11, posts.load.size }
end
def test_size_with_eager_loading_and_custom_order_and_distinct
posts = Post.includes(:comments).order("comments.id").distinct
assert_queries(1) { assert_equal 11, posts.size }
assert_queries(1) { assert_equal 11, posts.load.size }
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a weird situation added a test with includes:

def test_size_with_eager_loading_and_manual_distinct_select_and_includes
  accounts = Account.select("DISTINCT accounts.firm_id").includes(:firm).order("firms.id")

  assert_queries(1) { assert_equal 5, accounts.size }
  assert_queries(1) { assert_equal 5, accounts.load.size }
end

It raises:
# ActiveRecord::StatementInvalid: SQLite3::SQLException: near "DISTINCT": syntax error
I can investigate further if you want.

But this also gets raised when using:

Account.distinct(:credit_limit).select("DISTINCT accounts.firm_id").count(:all)
# ActiveRecord::StatementInvalid: SQLite3::SQLException: near "DISTINCT": syntax error

What do you think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue for count with eager loading.
I'd prefer to address it in another PR, since it is a little complex than what this PR fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll work it in another PR!

Copy link
Member

@kamipo kamipo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Can you add changelog entry and squash your commits into one?

@jvillarejo jvillarejo force-pushed the fix_wrong_size_query_with_distinct_select branch from 1b0a059 to fa2c61f Compare February 26, 2019 15:21
… with DISTINCT

When using `select` with `'DISTINCT( ... )'` if you use method `size` on a non loaded relation it overrides the column selected by passing `:all` so it returns different value than count.

This fixes rails#35214
@jvillarejo
Copy link
Contributor Author

jvillarejo commented Feb 26, 2019

Done! Thank you so much for your help! This is my first Rails PR 😄. Is it correct?

@kamipo kamipo merged commit fa2c61f into rails:master Feb 26, 2019
kamipo added a commit that referenced this pull request Feb 26, 2019
…istinct_select

Fix different `count` calculation when using `size` with DISTINCT `select`
@kamipo
Copy link
Member

kamipo commented Feb 26, 2019

Thanks! And congrats on your first Rails contribution 🎉

@jvillarejo jvillarejo deleted the fix_wrong_size_query_with_distinct_select branch February 26, 2019 16:41
kamipo added a commit that referenced this pull request Feb 28, 2019
…istinct_select

Fix different `count` calculation when using `size` with DISTINCT `select`
prathamesh-sonpatki added a commit to codetriage/CodeTriage that referenced this pull request Apr 2, 2019
prathamesh-sonpatki added a commit to codetriage/CodeTriage that referenced this pull request Apr 2, 2019
* Revert "Rollback to rails v5.2.2.1"

This reverts commit 9a74b83.

* Bump will_paginate to latest

- This will fix the issue related to
  https://sentry.io/organizations/app5385926herokucom/issues/963491060/events/51363174270/
- Change in Rails - rails/rails#35361
- Issue in will_paginate - mislav/will_paginate#589
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants