Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upgroup trashes select values #16291
Comments
rafaelfranca
added
the
activerecord
label
Jul 25, 2014
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hmaurer
Jul 27, 2014
Contributor
Could you provide a bug report in the form of an executable test case, using this script ? Thanks
|
Could you provide a bug report in the form of an executable test case, using this script ? Thanks |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davmac314
Aug 1, 2014
I'm so far unable to reproduce the problem when I try to boil it down to a script, but I am still looking into it.
davmac314
commented
Aug 1, 2014
|
I'm so far unable to reproduce the problem when I try to boil it down to a script, but I am still looking into it. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davmac314
Aug 1, 2014
Ok, got it. The problem occurs when calling empty? on a relation clause.
unless File.exist?('Gemfile')
File.write('Gemfile', <<-GEMFILE)
source 'https://rubygems.org'
gem 'rails', github: 'rails/rails'
gem 'arel', github: 'rails/arel'
gem 'rack', github: 'rack/rack'
gem 'i18n', github: 'svenfuchs/i18n'
gem 'sqlite3'
GEMFILE
system 'bundle'
end
require 'bundler'
Bundler.setup(:default)
require 'active_record'
require 'minitest/autorun'
require 'logger'
# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: ':memory:')
ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Schema.define do
create_table :posts do |t|
t.integer :groupid
end
end
class Post < ActiveRecord::Base
end
class BugTest < Minitest::Test
def test_association_stuff
post = Post.create!(:groupid => 1)
posts = Post.select('posts.id', 'SUM(posts.groupid) AS xxx')
.group('posts.id')
.order('xxx')
assert_equal false, posts.empty?
end
end
davmac314
commented
Aug 1, 2014
|
Ok, got it. The problem occurs when calling
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rafaelfranca
Aug 1, 2014
Member
empty? removed the select values because it needs to call count(:all) or it will fail in some conditions. If we fix your problem we will break other cases. Since we can't fix both I recommend to use count instead of empty?
|
|
rafaelfranca
closed this
Aug 1, 2014
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davmac314
Aug 2, 2014
You cannot be serious.
This is clearly a bug. You can't close it just because you can't immediately see how to fix it. My suggested change (which I made only tentatively) might break other cases, but that doesn't mean that there is no possible fix.
If, as you say, empty? needs to remove the select values, then it should at least throw an exception when select values have been specified, with a string to the effect of "'empty?' doesn't work when 'select' has been specified", rather than forcing users to spend time debugging an issue that isn't even a fault in their own code.
However, I simply cannot believe that there is no possible fix. For instance, it should be possible for empty? to just set limit 1 and run the query, returning true if there was 0 rows in the result. Using count(*) is just inefficient; it forces the rows the be counted even though we don't care about the number.
In any case, please re-open this bug. There is presently no satisfactory resolution.
davmac314
commented
Aug 2, 2014
|
You cannot be serious. This is clearly a bug. You can't close it just because you can't immediately see how to fix it. My suggested change (which I made only tentatively) might break other cases, but that doesn't mean that there is no possible fix. If, as you say, empty? needs to remove the select values, then it should at least throw an exception when select values have been specified, with a string to the effect of "'empty?' doesn't work when 'select' has been specified", rather than forcing users to spend time debugging an issue that isn't even a fault in their own code. However, I simply cannot believe that there is no possible fix. For instance, it should be possible for empty? to just set limit 1 and run the query, returning true if there was 0 rows in the result. Using count(*) is just inefficient; it forces the rows the be counted even though we don't care about the number. In any case, please re-open this bug. There is presently no satisfactory resolution. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rafaelfranca
Aug 2, 2014
Member
However, I simply cannot believe that there is no possible fix. For instance, it should be possible for empty? to just set limit 1 and run the query, returning true if there was 0 rows in the result.
This is exactly how exists? works. As I said, you are just using the wrong method.
I'm not saying that there is no way to fix it. I'm saying that fixing it will break other people application that expect this exactly behaviour.
We could raise an exception when there are select values specified but again this will break some valid use cases that use select values that will not give any problem.
This is exactly how I'm not saying that there is no way to fix it. I'm saying that fixing it will break other people application that expect this exactly behaviour. We could raise an exception when there are select values specified but again this will break some valid use cases that use select values that will not give any problem. |
rafaelfranca
reopened this
Aug 2, 2014
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davmac314
Aug 3, 2014
Thank you for re-opening.
It is my understanding that a query object should behave where possible like an Array or similar colleciton; that's why it has an 'empty?' method in the first place. The documentation here:
http://api.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-empty-3F
Says that empty? "returns true if there are no records", which it clearly fails to do in the test case I have provided. If other people's applications depend on the current behavior, then surely they are depending on undocumented behavior. Could you give me an example where the current behavior is desired? I just don't understand why you'd want this method to be broken.
davmac314
commented
Aug 3, 2014
|
Thank you for re-opening. It is my understanding that a query object should behave where possible like an Array or similar colleciton; that's why it has an 'empty?' method in the first place. The documentation here: http://api.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-empty-3F Says that |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davmac314
Aug 3, 2014
I mean, just to clarify, you're saying that it shouldn't be fixed because some other people are relying on the broken behavior, right? Or am I misunderstanding?
davmac314
commented
Aug 3, 2014
|
I mean, just to clarify, you're saying that it shouldn't be fixed because some other people are relying on the broken behavior, right? Or am I misunderstanding? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
matthewd
Aug 3, 2014
Member
a query object should behave where possible like an Array or similar collection
group makes this vastly less true: see the different behaviour of the aggregate methods (including count), for example.
However, I see no reason that count(:all) (or any other implementation of empty?) shouldn't trash the select list: it is, by definition, not going to use it. Perhaps you're actually reporting that if we're going to trash the select list, we should also trash the equally-useless order list, because 1) it's useless, and 2) it may be depending on something in the select.
However, I see no reason that |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davmac314
Aug 3, 2014
However, I see no reason that count(:all) (or any other implementation of empty?) shouldn't trash the select list
See my testcase. If you trash the select list, you break anything which relies on the existence of an alias in that list. Now, in this case, you could certainly have empty? remove both the select and order lists, and that would make the test case above pass. I'm not sure that it would be valid in all cases, though; a 'where' clause can also reference aliases from the select list.
Why can't empty? (and count) just do a count from a nested subquery? I.e. if the original query is "XXX" and you call empty? it should do something like:
SELECT COUNT(*) FROM (XXX LIMIT 1) xxx_table;
davmac314
commented
Aug 3, 2014
See my testcase. If you trash the select list, you break anything which relies on the existence of an alias in that list. Now, in this case, you could certainly have Why can't empty? (and count) just do a count from a nested subquery? I.e. if the original query is "XXX" and you call
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
matthewd
Aug 3, 2014
Member
a 'where' clause can also reference aliases from the select list
Hmm... I don't know of a DB where that's true -- and AFAIK, it's disallowed for semantic reasons, not because it's difficult.
Why can't empty? (and count) just do a count from a nested subquery?
They could... the most obvious trade-off that comes to mind for me is that you're now (probably -- depending on how clever your DBMS is) evaluating all the entries in the SELECT clause, only to then discard them. And that can be very expensive.
.. LIMIT 1 ..
empty? deliberately does a plain count (instead of a LIMIT 1, or an EXISTS, say) as a performance optimisation (read: trade-off): we bet that if there are enough rows for the difference to matter, we'll probably have need of the actual count as well. Thus we trade a potentially-slower query against the probability of needing to run two.
Hmm... I don't know of a DB where that's true -- and AFAIK, it's disallowed for semantic reasons, not because it's difficult.
They could... the most obvious trade-off that comes to mind for me is that you're now (probably -- depending on how clever your DBMS is) evaluating all the entries in the SELECT clause, only to then discard them. And that can be very expensive.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davmac314
Aug 4, 2014
Hmm... I don't know of a DB where that's true
Ok, that may be true, although it surprises me. I just tested with Mysql and it does not allow using the column aliases in the where clause. In any case aliases (in Mysql) can be referenced in both 'group by' and 'order by' clauses, and 'group by' clauses cannot be thrown away for a count operation without changing the semantics.
They could... the most obvious trade-off that comes to mind for me is that you're now (probably -- depending on how clever your DBMS is) evaluating all the entries in the SELECT clause, only to then discard them. And that can be very expensive.
I would think any DBMS worth it's salt would be able to optimise this, but ok, let's suppose it's better to keep the current behavior (no subquery) in cases where it actually does the right thing.
empty? deliberately does a plain count (instead of a LIMIT 1, or an EXISTS, say) as a performance optimisation (read: trade-off): we bet that if there are enough rows for the difference to matter, we'll probably have need of the actual count as well. Thus we trade a potentially-slower query against the probability of needing to run two.
Ok, that's fine; but the current trade-off seems to be incorrect behavior (and a failing query) rather than just slower performance.
In general, I don't think you can add a 'count' clause when a 'group by' clause is present and expect it to return a count of result rows. You will instead get a set of rows which are grouped as per the 'group by' clause and which contains a count of the items in each group. (That is to say, losing the 'select' list isn't the only issue demonstrated with the test case.) It's necessary to use a subquery in this case, I think. Perhaps a subquery could be used if-and-only-if there is a 'group by' clause.
davmac314
commented
Aug 4, 2014
Ok, that may be true, although it surprises me. I just tested with Mysql and it does not allow using the column aliases in the where clause. In any case aliases (in Mysql) can be referenced in both 'group by' and 'order by' clauses, and 'group by' clauses cannot be thrown away for a count operation without changing the semantics.
I would think any DBMS worth it's salt would be able to optimise this, but ok, let's suppose it's better to keep the current behavior (no subquery) in cases where it actually does the right thing.
Ok, that's fine; but the current trade-off seems to be incorrect behavior (and a failing query) rather than just slower performance. In general, I don't think you can add a 'count' clause when a 'group by' clause is present and expect it to return a count of result rows. You will instead get a set of rows which are grouped as per the 'group by' clause and which contains a count of the items in each group. (That is to say, losing the 'select' list isn't the only issue demonstrated with the test case.) It's necessary to use a subquery in this case, I think. Perhaps a subquery could be used if-and-only-if there is a 'group by' clause. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rails-bot
Nov 19, 2014
This issue has been automatically marked as stale because it has not been commented on for at least
three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.
rails-bot
commented
Nov 19, 2014
|
This issue has been automatically marked as stale because it has not been commented on for at least The resources of the Rails team are limited, and so we are asking for your help. If you can still reproduce this error on the Thank you for all your contributions. |
rails-bot
added
the
stale
label
Nov 19, 2014
rails-bot
closed this
Jan 2, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rails-bot
Jan 2, 2015
This issue has been automatically closed because of inactivity.
If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.
rails-bot
commented
Jan 2, 2015
|
This issue has been automatically closed because of inactivity. If you can still reproduce this error on the Thank you for all your contributions. |
davmac314 commentedJul 25, 2014
When I do a query with both select and group, the group trashes the values I specified with select, eg:
... will not include a sum_b column in the result. This appears to be caused by activerecord/lib/active_record/relation/calculation.rb, line ~299 (in both 4.1.4 and master):
Essentially the last line trashes the previously set relation.select_values. One possible fix appears to be to change the last line to:
... but I am not particularly certain that this is correct in all cases, and anyway I'm not sure if the select_values should really be added to if they have already been specified.