-
Notifications
You must be signed in to change notification settings - Fork 558
-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing explicit order-by clause for windowed results #161
Comments
I am looking at this now, but have to come up with some guards as the |
Great, thanks! If you need need any more information or if I can help, just let me know. |
OK, here is where I am at. This is a diff of my current work (https://gist.github.com/1759844) and I would like to proceed as follows. First, this issue must be fixed on master which is for ActiveRecord 3.2.x. Once it is fixed for that version, I will back port a fix for the 3-1-stable branch too. Second, I would like to get a failing test written using the [order_row_number] table I created with the matching SqlServerOrderRowNumber class in our "offset_and_limit_test_sqlserver.rb" file. This is pulled from the article you mentioned and it would be nice if we could write a simple test that would fail if the one line addition to the sqlserver arel visitor was removed. As a reminder, you can run one test file by doing:
That current work diff has a new method called # In active_record/relation/calculations.rb
def build_count_subquery(relation, column_name, distinct)
column_alias = Arel.sql('count_column')
subquery_alias = Arel.sql('subquery_for_count')
aliased_column = aggregate_column(column_name == :all ? 1 : column_name).as(column_alias)
relation.select_values = [aliased_column]
subquery = relation.arel.as(subquery_alias)
sm = Arel::SelectManager.new relation.engine
select_value = operation_over_aggregate_column(column_alias, 'count', distinct)
sm.project(select_value).from(subquery)
end So this still leaves two failing tests introduced by the fix. See this gist for details (https://gist.github.com/1759870). I would like to come up with a way to solve these two in a way that also removes the need for the count guard above. One thought I had after reading the error messages is to inject a top/limit when needed. But that has it's own concerns, but may not be so bad. For instance, we have a def visit_Arel_Nodes_UpdateStatement(o)
if o.orders.any? && o.limit.nil?
o.limit = Nodes::Limit.new(2147483647)
end
super
end Might it not be so bad if we injected a limit like that in the Working around our coerced test in "calculations_test_sqlserver.rb", with the test "test_coerced_offset_is_kept" which can be run in isolation by using ``, the following error occurs.
|
What I can say so far is that your proposed approach of injecting top/limit within See this gist for my current work: https://gist.github.com/1761619. Regarding the approach using a guard: I couldn't find a way (within It took me quite some time to get the environment all up and running so I didn't have time to write the test you proposed. But I can have a look at this tomorrow. |
Nice, I can QA this tomorrow and that other test would be awesome too, so we do not have to worry about a regression. All in all, are you happy with that TOP approach? I have not done any research on negative effects, but can not imagine any either. |
FYI, I have pushed updated 3.1 and 3.2 gems. Still would like to see that test, but not a high priority. |
Thanks a lot for this and also, for your great work in general on this adapter! I will write that test as soon as I have time and let you know. Regarding the TOP approach: I think from a pragmatic point of view I'm happy with it as I don't assume someone would select more than 2147483647 rows at once (if this is possible at all). I'm not sure if there are any performance drawbacks when using the order-clause in a subquery, but I rather don't think so. Probably, the guard approach would be cleaner, but only if we had a clean way to determine if the select statement is within a subquery or not, regardless of what the including outer query is. But as I mentioned above, I couldn't find a way to get this information within |
Yea, your are absolutely right about the guard not being viable. There is no way to get info on the parent query in an AST like that. And the one I wrote was very specific. I like the TOP better. FYI, I changed both of them to 9223372036854775807 which is the max value for bigint since SQL Server docs claim if that expression is not a float, it will cast it to bigint anyway. |
Hi Ken! Sorry that it took me so long, but i finally wrote the test case which fails when the one line addition to the sqlserver arel visitor is removed. Here is the gist for it: https://gist.github.com/2585835. |
Cool, I added the test! |
Starting from this conversation with Ken I investigated this issue a little more.
My conclusion is that there should be an explicit order-by clause at the end of the outer query for windowed results.
Let me explain what I found while investigating the execution plans of my queries.
In most cases (i.e. "not too complex queries") the ordering seems to be valid as the specified order in "OVER (ORDER BY …" takes effect. The problem is, this doesn't guarantee a valid sort order for all cases as the sql server query optimizer may add some additional sorts due to performance optimization and proper use of indices, which are above (with respect to the execution plan) the one we explicitly specified, thus, leading to an invalid order for the overall results of the query. If we had the order-by clause at the end of the outer query, we could guarantee that this - explicitly specified - sort order is always at the top of the execution plan, thus, leading to valid order of the overall results.
As a reference, I also found this blog post http://blogs.msdn.com/b/craigfr/archive/2008/03/19/ranking-functions-row-number.aspx, which states "Note that the ORDER BY clause associated with each ranking function determines the behavior of that ranking function only. These ORDER BY clauses do not determine the order that rows will be returned from the query. To guarantee absolutely a certain output order from the query, you must add an explicit query-level ORDER BY clause" and further "In this case, the extra ORDER BY clause does not affect the plan as the optimizer recognizes that the data is already sorted by column B following the computation of the ROW_NUMBER function".
The text was updated successfully, but these errors were encountered: