Don't pass select option to count SQL #348

Closed
wants to merge 3 commits into
from

Projects

None yet

4 participants

@ffmike
ffmike commented Dec 14, 2013

Somewhere in recent Rails master, a change was made that makes count blow up if a select option is supplied. End result is that

@companies = Company.select("companies.id, companies.type").paginate(:page => 1, :per_page => 25)

is blowing up with invalid SQL. Probably this should be fixed in Rails, but I don't think there's ever a point in passing specific fields to count SQL from will_paginate, so this seems like a good change to me anyhow.

@ffmike
ffmike commented Dec 15, 2013

There's been a bunch of discussion of this (IMO wrong) behavior in Rails. rails/rails#11824 is an open bug that covers.

Another potential fix for will_paginate would be to change from using count to using count(:all). That's possibly better than my suggestion of dropping :select.

@mislav
Owner
mislav commented Dec 16, 2013

Thanks for reporting. I'm not sure which approach is better. I'd definitely like a failing test for this as well as the fix, as I'm reluctant to merge bugfixes that don't demonstrate the bug. It's fine if the test only fails on Rails master.

@dmz006
dmz006 commented Dec 16, 2013

I'm not sure if it is tied to it, but I am using .having along with some joins and was experiencing a similar issue with the count generating errors.

What I ended up doing was editing lib/will_paginate/active_record.rb line 83:

  •      excluded = [:order, :limit, :offset, :reorder]
    
  •     excluded = [:order, :limit, :offset, :reorder, :having]
    

I do not know if i am using any selects that are paginated but will check those and see.

@ffmike
ffmike commented Dec 16, 2013

HAVING alone doesn't seem to pose an issue with count:

Company.group("type").having("count(type)>5").count
(0.7ms) SELECT COUNT(*) AS count_all, type AS type FROM companies GROUP BY type HAVING count(type)>5
=> {"Restaurant"=>275, "Wholesaler"=>6, "Winery"=>29}

However, given the inherent problems with blacklisting particular methods out of a potentially-expandable list, I'm more inclined to go with the count(:all) approach. I'll see if I can break loose time for a patch that takes that approach with tests.

@ffmike
ffmike commented Dec 16, 2013

Um. Duh. The fact that taking count after having doesn't error doesn't mean it's the RIGHT answer. Unfortunately count(:all) doesn't do any better:

Company.group("type").having("count(type)>5").count(:all)
(0.7ms) SELECT COUNT(*) AS count_all, type AS type FROM companies GROUP BY type HAVING count(type)>5

=> {"Restaurant"=>275, "Wholesaler"=>6, "Winery"=>29}

We get back a hash but we expect the number 3.

Unfortunately dropping the having doesn't give the right answer either:

Company.group("type").count
(0.8ms) SELECT COUNT(*) AS count_all, type AS type FROM companies GROUP BY type
=> {"Restaurant"=>275, "ServiceProvider"=>1, "Wholesaler"=>6, "Winery"=>29}

And if we drop both the group and having clauses, we'd get the wrong answer.

@dmz006
dmz006 commented Dec 16, 2013

Here is the type of select I have:

@catalog = Item.joins("LEFT JOIN item_images ON item_images.item_id = items.id").select("items.*, item_images.id as photo_id, item_images.photo_file_name, item_images.photo_content_type, item_images.photo_file_size, count(item_images.id) as image_count").where("items.quantity > 0 AND items.active").having("image_count > 0").order("items.price_cents").group("items.id").paginate(:page => 1, :per_page => 15)

The model is a simple item from a catalog using paperclip item_images for the pictures.

If I don't exclude the having I get the following:
Item Load (0.1ms) SELECT items., item_images.id as photo_id, item_images.photo_file_name, item_images.photo_content_type, item_images.photo_file_size, count(item_images.id) as image_count FROM items LEFT JOIN item_images ON item_images.item_id = items.id WHERE (items.quantity > 0 AND items.active) GROUP BY items.id HAVING image_count > 0 ORDER BY items.price_cents LIMIT 15 OFFSET 0
(0.3ms) SELECT COUNT(
) AS count_all, items.id AS items_id FROM items LEFT JOIN item_images ON item_images.item_id = items.id WHERE (items.quantity > 0 AND items.active) GROUP BY items.id HAVING image_count > 0
Mysql2::Error: Unknown column 'image_count' in 'having clause': SELECT COUNT() AS count_all, items.id AS items_id FROM items LEFT JOIN item_images ON item_images.item_id = items.id WHERE (items.quantity > 0 AND items.active) GROUP BY items.id HAVING image_count > 0
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'image_count' in 'having clause': SELECT COUNT(
) AS count_all, items.id AS items_id FROM items LEFT JOIN item_images ON item_images.item_id = items.id WHERE (items.quantity > 0 AND items.active) GROUP BY items.id HAVING image_count > 0

And with the having excluded:
Item Load (0.2ms) SELECT items., item_images.id as photo_id, item_images.photo_file_name, item_images.photo_content_type, item_images.photo_file_size, count(item_images.id) as image_count FROM items LEFT JOIN item_images ON item_images.item_id = items.id WHERE (items.quantity > 0 AND items.active) GROUP BY items.id HAVING image_count > 0 ORDER BY items.price_cents LIMIT 15 OFFSET 0
(0.4ms) SELECT COUNT(
) AS count_all, items.id AS items_id FROM items LEFT JOIN item_images ON item_images.item_id = items.id WHERE (items.quantity > 0 AND items.active) GROUP BY items.id

ffmike added some commits Dec 20, 2013
@ffmike ffmike Revert "Don't pass select option to count SQL"
This reverts commit 1d70835.

There are better ways to handle this issue than with an ever-expanding blacklist.
12f8ec4
@ffmike ffmike Use count(:all) except when counting distinct records a57b3d5
@ffmike
ffmike commented Dec 20, 2013

Well, I took another pass on this. It's getting kind of nasty.

The problem is we want count(:all) UNLESS there's a DISTINCT in the count. So unless someone has a better idea, we need to actually look at the select SQL for the relation.

This change keeps all the existing specs passing (which just removing :select from the list does NOT do) on Rails edge. It also makes one new spec pass:

it "should count with select" do
  lambda {
    topics = Topic.select("topics.title, topics.subtitle").paginate :page => 2, :per_page => 3
    topics.total_entries.should == 4
  }.should run_queries(1)
end

BUT: it fails on another new spec:

 it "should not ignore :select parameter when it says DISTINCT with multiple attributes" do
  users = User.select('DISTINCT salary, name').paginate :page => 2
  users.total_entries.should == 6
end

Though you might think counting on two attributes, one distinct, is silly, in fact it's valid and works in Rails edge. Test from my production database:

[2] pry(main)> Company.select("DISTINCT state,name")
Company Load (2.3ms) SELECT DISTINCT state,name FROM companies
Company Load (2.8ms) SELECT DISTINCT state,name FROM companies
[3] pry(main)> Company.select("DISTINCT state,name").count
(1.6ms) SELECT COUNT(DISTINCT state,name) FROM companies
(2.6ms) SELECT COUNT(DISTINCT state,name) FROM companies
=> 321
[4] pry(main)> Company.select("DISTINCT state,name").count(:all)
(0.9ms) SELECT COUNT() FROM companies
(1.5ms) SELECT COUNT(
) FROM companies
=> 325

Note that in this case we actually WANT to use count, rather than count(:all) (just as with a single-attribute distinct).

@ffmike
ffmike commented Dec 20, 2013

Oh, even worse. SELECT COUNT(DISTINCT state,name) FROM companies is valid in MySQL but not in SQLite. Might want to just wait for that particular case to get fixed upstream somewhere?

@dmz006
dmz006 commented Dec 22, 2013

I'll be the first to admit i'm not the best with ruby and I'm sure what I've done is totally wrong but it works for my complex and simple uses of will_paginate :)

I was noticing that whenever count was called it was only counting with
SELECT COUNT() AS count_all, items.id AS items_id FROM ...

When will_paginate/active_record.rb function count was processing the count it would send it up to the ActiveRecord count which was generating the above count statement. If I calculated my count externally and passed it in with :total_entries then the count wasn't called & my pagination worked great!

So since I didn't have enough energy to fully figure out how ActiveRecord count generated the sql to do the counting I decided to simply replace the super call with a self.all.count which appears to work.

  def count
    if limit_value
      excluded = [:order, :limit, :offset, :reorder]
      excluded << :includes unless eager_loading?
      rel = self.except(*excluded)
      # TODO: hack. decide whether to keep
      rel = rel.apply_finder_options(@wp_count_options) if defined? @wp_count_options
      rel.count
    else
      self.all.count
    end
  end

I had tried the above patches using super(:all) but any call I did to super always generated the same sql queries.

@senny
senny commented Jan 21, 2014

Hey guys, just checking back what the state of this issue is. The new default when of Rails when count is called, is to pass the select as-is to the database. This was changed because the behavior of the backends differ on multiple columns and when DISTINCT is involved. Rails now just passes through what the relation contains.

There are still ways to get around this though. See (full gist):

    post = Post.create! name: "first"
    post = Post.create! name: "second"
    post = Post.create! name: "third"

    assert_equal 3, Post.select("posts.name").count

    assert_raises(ActiveRecord::StatementInvalid) do
      Post.select("posts.name, posts.created_at").count
    end

    assert_equal 3, Post.select("posts.name, posts.created_at").count(:all)
    assert_equal 3, Post.select("posts.name, posts.created_at").unscope(:select).count

Let me know if there is something we need to consider for rails/rails.

EDIT: After reading the issue more in depth I noticed it's about the combination of a custom select and limit. This looks like an issue we need to address in rails/rails. Sorry for the disturbance.

@mislav
Owner
mislav commented Jan 21, 2014

Totally lost track of what's going on here 😛

So is the original issue by @ffmike caused by a bug in AR? Can we work around it? Is the PR still valid?

If supporting DISTINCT is a PITA then I would suggest that we don't do it and let the users worry about it. I would like to support the case when the user just passes a simple :select value because that's going to be done by a lot of users

@senny
senny commented Jan 21, 2014

@mislav not exactly, the code Company.select("companies.id, companies.type").count will probably raise an error as the select_values are used in the count statement. This will produce: SELECT COUNT('companies.id', 'companies.type') FROM companies, which is not valid. The currently open rails issue is about a specific select:

Company.select("companies.*").limit(10).count

This should produce a valid COUNT operation. Combined with limit AR will issue a subquery and there is an alias problem that will raise an exception. I'm not sure how we are going to address that issue.

DISTINCT is not per-se problematic but there are differences between the DB verndors. Some do support a DISTINCT call on multiple columns, others don't.

There are multiple ways to deal with the situation at hand:

  • Always specify the column to count on. (eg count(:all) or count(:name)). This will overwrite previously defined select values.
  • Do not use the relations select values when performing the count. (eg. unscope(:select).count) This is basically a count(:all).
  • Only use select values that are valid in the COUNT() operation. ( eg. a single column: select(:name).count)
  • Use a different relation for the count and the select:
base_relation = Model.where(XX)
# call will_paginate using `base_relation` and chain a `select` call to fetch the actual data
`base_relation.select("name, title")`

I'm not very familiar with will_paginate and found the issue through the link on the Rails tracker. So I can't say what solution fits best.

@dmz006
dmz006 commented Apr 28, 2014

After upgrading to 4.1 I started running into problems with stack limit reached until I remembered I had changed the super to self.all.count. So I changed it back and it solved that problem but brought up the original problem I had with my big joins / having / group issues. In looking at the COUNT() generated with the rel.count I realized I needed to actually change that to rel.to_a.count. This would cause my funky joins to actually be loaded & converted to_a first before counting.

My new active_record count function now looks like this:

  def count
    if limit_value
      excluded = [:order, :limit, :offset, :reorder]
      excluded << :includes unless eager_loading?
      rel = self.except(*excluded)
      # TODO: hack. decide whether to keep
      rel = rel.apply_finder_options(@wp_count_options) if defined? @wp_count_options
      rel.to_a.count
    else
      super
    end
  end

Yes I know it would be better to have sql get the count for me and save the array memory space; however I can't seem to find a way to make the count work otherwise.

@nathany nathany referenced this pull request May 30, 2014
Closed

Test against Rails 4.1 #380

@mislav
Owner
mislav commented Jun 18, 2014

I've got Rails 4.1 and Rails 4.2 alpha passing on CI in master. Closing this as a Rails issue, or something that users have to figure out on their own.

Sorry I couldn't be more helpful, but I barely understand what's going on in here. I guess our primary problem is that AR can generate SQL that is valid for some RDBMS but not others. I don't think the solution belongs to will_paginate, though, unless it could be very simple.

@mislav mislav closed this Jun 18, 2014
@ffmike
ffmike commented Jun 30, 2014

Belated note - with current will_paginate HEAD, plus Rails 4.1.2 HEAD, my code is working now. Don't know when or where this was fixed, but I'm happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment