Starting from this conversation with Ken I investigated this issue a little more.
My conclusion is that there should be an explicit order-by clause at the end of the outer query for windowed results.
Let me explain what I found while investigating the execution plans of my queries.
In most cases (i.e. "not too complex queries") the ordering seems to be valid as the specified order in "OVER (ORDER BY …" takes effect. The problem is, this doesn't guarantee a valid sort order for all cases as the sql server query optimizer may add some additional sorts due to performance optimization and proper use of indices, which are above (with respect to the execution plan) the one we explicitly specified, thus, leading to an invalid order for the overall results of the query. If we had the order-by clause at the end of the outer query, we could guarantee that this - explicitly specified - sort order is always at the top of the execution plan, thus, leading to valid order of the overall results.
As a reference, I also found this blog post http://blogs.msdn.com/b/craigfr/archive/2008/03/19/ranking-functions-row-number.aspx, which states "Note that the ORDER BY clause associated with each ranking function determines the behavior of that ranking function only. These ORDER BY clauses do not determine the order that rows will be returned from the query. To guarantee absolutely a certain output order from the query, you must add an explicit query-level ORDER BY clause" and further "In this case, the extra ORDER BY clause does not affect the plan as the optimizer recognizes that the data is already sorted by column B following the computation of the ROW_NUMBER function".
I am looking at this now, but have to come up with some guards as the ORDER BY is not allowed in sub queries and inline functions which are generated by some association counts.
Great, thanks! If you need need any more information or if I can help, just let me know.
OK, here is where I am at. This is a diff of my current work (https://gist.github.com/1759844) and I would like to proceed as follows. First, this issue must be fixed on master which is for ActiveRecord 3.2.x. Once it is fixed for that version, I will back port a fix for the 3-1-stable branch too. Second, I would like to get a failing test written using the [order_row_number] table I created with the matching SqlServerOrderRowNumber class in our "offset_and_limit_test_sqlserver.rb" file. This is pulled from the article you mentioned and it would be nice if we could write a simple test that would fail if the one line addition to the sqlserver arel visitor was removed.
As a reminder, you can run one test file by doing:
rake test TEST_FILES="test/cases/offset_and_limit_test_sqlserver.rb"
That current work diff has a new method called active_record_count_subquery? which specifically addresses how ActiveRecord construct's a sub query for aggregate functions. In this case, a count. Here is the ActiveRecord code I was hard coding that guard to.
# In active_record/relation/calculations.rb
def build_count_subquery(relation, column_name, distinct)
column_alias = Arel.sql('count_column')
subquery_alias = Arel.sql('subquery_for_count')
aliased_column = aggregate_column(column_name == :all ? 1 : column_name).as(column_alias)
relation.select_values = [aliased_column]
subquery = relation.arel.as(subquery_alias)
sm = Arel::SelectManager.new relation.engine
select_value = operation_over_aggregate_column(column_alias, 'count', distinct)
So this still leaves two failing tests introduced by the fix. See this gist for details (https://gist.github.com/1759870). I would like to come up with a way to solve these two in a way that also removes the need for the count guard above. One thought I had after reading the error messages is to inject a top/limit when needed. But that has it's own concerns, but may not be so bad. For instance, we have a visit_Arel_Nodes_UpdateStatement limit which does this:
if o.orders.any? && o.limit.nil?
o.limit = Nodes::Limit.new(2147483647)
Might it not be so bad if we injected a limit like that in the visit_Arel_Nodes_SelectStatementWithOffset method if none is there and just take the active_record_count_subquery? guard out? Perhaps. Thoughts? Can you do some work on this and run some tests too? Again, focus on master first.
Working around our coerced test in "calculations_test_sqlserver.rb", with the test "test_coerced_offset_is_kept" which can be run in isolation by using ``, the following error occurs.
ActiveRecord::StatementInvalid: TinyTds::Error: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.:
EXEC sp_executesql N'SELECT COUNT(count_column) FROM (SELECT [__rnt].* FROM ( SELECT ROW_NUMBER() OVER (ORDER BY [accounts].[id] ASC) AS [__rn], 1 AS count_column FROM [accounts] ) AS [__rnt] WHERE [__rnt].[__rn] > (1) ORDER BY [__rnt].[__rn] ASC) subquery_for_count'
What I can say so far is that your proposed approach of injecting top/limit within #visit_Arel_Nodes_SelectStatementWithOffset doesn't break any test, so this could be a good solution. What are the concerns to this?
See this gist for my current work: https://gist.github.com/1761619.
Regarding the approach using a guard: I couldn't find a way (within #visit_Arel_Nodes_SelectStatementWithOffset) to determine whether or not the select statement would be used as a subquery within an update statement (leading to these two errors: https://gist.github.com/1759870).
It took me quite some time to get the environment all up and running so I didn't have time to write the test you proposed. But I can have a look at this tomorrow.
Nice, I can QA this tomorrow and that other test would be awesome too, so we do not have to worry about a regression. All in all, are you happy with that TOP approach? I have not done any research on negative effects, but can not imagine any either.
Add explicit order-by clause for windowed results. Fixes #161.
FYI, I have pushed updated 3.1 and 3.2 gems. Still would like to see that test, but not a high priority.
Thanks a lot for this and also, for your great work in general on this adapter!
I will write that test as soon as I have time and let you know.
Regarding the TOP approach: I think from a pragmatic point of view I'm happy with it as I don't assume someone would select more than 2147483647 rows at once (if this is possible at all). I'm not sure if there are any performance drawbacks when using the order-clause in a subquery, but I rather don't think so.
Probably, the guard approach would be cleaner, but only if we had a clean way to determine if the select statement is within a subquery or not, regardless of what the including outer query is. But as I mentioned above, I couldn't find a way to get this information within #visit_Arel_Nodes_SelectStatementWithOffset.
Yea, your are absolutely right about the guard not being viable. There is no way to get info on the parent query in an AST like that. And the one I wrote was very specific. I like the TOP better. FYI, I changed both of them to 9223372036854775807 which is the max value for bigint since SQL Server docs claim if that expression is not a float, it will cast it to bigint anyway.
Hi Ken! Sorry that it took me so long, but i finally wrote the test case which fails when the one line addition to the sqlserver arel visitor is removed. Here is the gist for it: https://gist.github.com/2585835.
Add test for issue #161
Cool, I added the test!