Skip to content

Commit 3b65e38

Browse files
committed
Fixed selecting DISTINCT rows from a table when ordering by a column that is not
part of the SELECT list where the join creates at least one extra row. While previously the test suite combined with Rails 4.0.0 looked like it worked, the constructed SQL would return incorrect values for FinderTest#test_find_with_order_on_included_associations_with_construct_finder_sql_for_association_limiting_and_is_distinct. By using a GROUP BY and MAX()/MIN() in the ORDER BY with no DISTINCT call, the order of the returned rows was not necessarily correct. With Rails 4.0.1, a related bug was fixed that introduces a new method, columns_for_distinct. Unfortunately by adding the columns from the ORDER BY clause to the SELECT list, the DISTINCT nature of the queries was altered. This combined with a test limiting the results to only 2 records, exposed the bug. From testing various SQL constructs to solve the issue, I ended up determining that to properly return a single value from the related table used in the ORDER BY, a query with a subquery utilizing ROW_NUMBER() and DENSE_RANK() was needed. The ROW_NUMBER() was partitioned over the originally requested columns, and then the outer query would filter the results to just the first row of each partition. DENSE_RANK() was used to generate a proper ordering of the partitions in relation to each other. In the process I also determined that extra code, plus an extra test, was necessary to ensure that when such a construct was generated using both a limit and offset, that the results would be correct also. See #306 for further discussion, plus example queries and results.
1 parent f117e2c commit 3b65e38

File tree

3 files changed

+165
-28
lines changed

3 files changed

+165
-28
lines changed

lib/active_record/connection_adapters/sqlserver/schema_statements.rb

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -44,17 +44,12 @@ def columns(table_name, _name = nil)
4444
end
4545

4646
# like postgres, sqlserver requires the ORDER BY columns in the select list for distinct queries, and
47-
# requires that the ORDER BY include the distinct column.
48-
# this method is idental to the postgres method
47+
# requires that the ORDER BY include the distinct column. Unfortunately, sqlserver does not support
48+
# DISTINCT ON () like Posgres, or FIRST_VALUE() like Oracle (at least before SQL Server 2012). Because
49+
# of these facts, we don't actually add any extra columns for distinct, but instead have to create
50+
# a subquery with ROW_NUMBER() and DENSE_RANK() in our monkey-patches to Arel.
4951
def columns_for_distinct(columns, orders) #:nodoc:
50-
order_columns = orders.map do |s|
51-
# Convert Arel node to string
52-
s = s.to_sql unless s.is_a?(String)
53-
# Remove any ASC/DESC modifiers
54-
s.gsub(/\s+(ASC|DESC)\s*(NULLS\s+(FIRST|LAST)\s*)?/i, '')
55-
end.reject(&:blank?).map.with_index { |column, i| "#{column} AS alias_#{i}" }
56-
57-
[super, *order_columns].join(', ')
52+
columns
5853
end
5954

6055
def rename_table(table_name, new_name)

lib/arel/visitors/sqlserver.rb

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ class SQLServer < Arel::Visitors::ToSql
77
def visit_Arel_Nodes_SelectStatement(o, a)
88
if complex_count_sql?(o)
99
visit_Arel_Nodes_SelectStatementForComplexCount(o, a)
10+
elsif distinct_non_present_orders?(o, a)
11+
visit_Arel_Nodes_SelectStatementDistinctNonPresentOrders(o, a)
1012
elsif o.offset
1113
visit_Arel_Nodes_SelectStatementWithOffset(o, a)
1214
else
@@ -47,6 +49,76 @@ def visit_Arel_Nodes_Bin(o, a)
4749

4850
# SQLServer ToSql/Visitor (Additions)
4951

52+
# This constructs a query using DENSE_RANK() and ROW_NUMBER() to allow
53+
# ordering a DISTINCT set of data by columns in another table that are
54+
# not part of what we actually want to be DISTINCT. Without this, it is
55+
# possible for the DISTINCT qualifier combined with TOP to return fewer
56+
# rows than were requested.
57+
def visit_Arel_Nodes_SelectStatementDistinctNonPresentOrders(o, a)
58+
core = o.cores.first
59+
projections = core.projections
60+
groups = core.groups
61+
orders = o.orders.uniq
62+
63+
select_frags = projections.map do |x|
64+
frag = projection_to_sql_remove_distinct(x, core, a)
65+
# Remove the table specifier
66+
frag.gsub!(/^[^\.]*\./, '')
67+
# If there is an alias, remove everything but
68+
frag.gsub(/^.*\sAS\s+/i, '')
69+
end
70+
71+
if o.offset
72+
select_frags << 'ROW_NUMBER() OVER (ORDER BY __order) AS __offset'
73+
else
74+
select_frags << '__order'
75+
end
76+
77+
projection_list = projections.map { |x| projection_to_sql_remove_distinct(x, core, a) }.join(', ')
78+
79+
sql = [
80+
('SELECT'),
81+
(visit(core.set_quantifier, a) if core.set_quantifier and !o.offset),
82+
(visit(o.limit, a) if o.limit and !o.offset),
83+
(select_frags.join(', ')),
84+
('FROM ('),
85+
('SELECT'),
86+
(
87+
[
88+
(projection_list),
89+
(', DENSE_RANK() OVER ('),
90+
("ORDER BY #{orders.map { |x| visit(x, a) }.join(', ')}" if !orders.empty?),
91+
(') AS __order'),
92+
(', ROW_NUMBER() OVER ('),
93+
("PARTITION BY #{projection_list}" if !orders.empty?),
94+
(" ORDER BY #{orders.map { |x| visit(x, a) }.join(', ')}" if !orders.empty?),
95+
(') AS __joined_row_num')
96+
].join('')
97+
),
98+
(source_with_lock_for_select_statement(o, a)),
99+
("WHERE #{core.wheres.map { |x| visit(x, a) }.join ' AND ' }" unless core.wheres.empty?),
100+
("GROUP BY #{groups.map { |x| visit(x, a) }.join ', ' }" unless groups.empty?),
101+
(visit(core.having, a) if core.having),
102+
(') AS __sq'),
103+
('WHERE __joined_row_num = 1'),
104+
('ORDER BY __order' unless o.offset)
105+
].compact.join(' ')
106+
107+
if o.offset
108+
sql = [
109+
('SELECT'),
110+
(visit(core.set_quantifier, a) if core.set_quantifier),
111+
(visit(o.limit, a) if o.limit),
112+
('*'),
113+
('FROM (' + sql + ') AS __osq'),
114+
("WHERE __offset > #{visit(o.offset.expr, a)}"),
115+
('ORDER BY __offset')
116+
].join(' ')
117+
end
118+
119+
sql
120+
end
121+
50122
def visit_Arel_Nodes_SelectStatementWithOutOffset(o, a, windowed = false)
51123
find_and_fix_uncorrelated_joins_in_select_statement(o)
52124
core = o.cores.first
@@ -123,6 +195,18 @@ def visit_Arel_Nodes_SelectStatementForComplexCount(o, a)
123195

124196
# SQLServer Helpers
125197

198+
def projection_to_sql_remove_distinct(x, core, a)
199+
frag = Arel.sql(visit(x, a))
200+
# In Rails 4.0.0, DISTINCT was in a projection, whereas with 4.0.1
201+
# it is now stored in the set_quantifier. This moves it to the correct
202+
# place so the code works on both 4.0.0 and 4.0.1.
203+
if frag =~ /^\s*DISTINCT\s+/i
204+
core.set_quantifier = Arel::Nodes::Distinct.new()
205+
frag.gsub!(/\s*DISTINCT\s+/, '')
206+
end
207+
frag
208+
end
209+
126210
def source_with_lock_for_select_statement(o, a)
127211
core = o.cores.first
128212
source = "FROM #{visit(core.source, a).strip}" if core.source
@@ -166,6 +250,68 @@ def single_distinct_select_statement?(o)
166250
p1.respond_to?(:include?) && p1.include?('DISTINCT'))
167251
end
168252

253+
# Determine if the SELECT statement is asking for DISTINCT results,
254+
# but is using columns not part of the SELECT list in the ORDER BY.
255+
# This is necessary because SQL Server requires all ORDER BY entries
256+
# be in the SELECT list with DISTINCT. However, these ordering columns
257+
# can cause duplicate rows, which affect when using a limit.
258+
def distinct_non_present_orders?(o, a)
259+
projections = o.cores.first.projections
260+
261+
sq = o.cores.first.set_quantifier
262+
p1 = projections.first
263+
264+
found_distinct = sq and sq.class.to_s =~ /Distinct/
265+
if (p1.respond_to?(:distinct) && p1.distinct) || (p1.respond_to?(:include?) && p1.include?('DISTINCT'))
266+
found_distinct = true
267+
end
268+
269+
return false if !found_distinct or o.orders.uniq.empty?
270+
271+
tables_all_columns = []
272+
expressions = projections.map do |p|
273+
visit(p, a).split(',').map do |x|
274+
x.strip!
275+
# Rails 4.0.0 included DISTINCT in the first projection
276+
x.gsub!(/\s*DISTINCT\s+/, '')
277+
# Aliased column names
278+
x.gsub!(/\s+AS\s+\w+/i, '')
279+
# Identifier quoting
280+
x.gsub!(/\[|\]/, '')
281+
star_match = /^(\w+)\.\*$/.match(x)
282+
if star_match
283+
tables_all_columns << star_match[1]
284+
end
285+
x.strip.downcase
286+
end.join(', ')
287+
end
288+
289+
# Make sure each order by is in the select list, otherwise there needs
290+
# to be a subquery with row_numbe()
291+
o.orders.uniq.each do |order|
292+
order = visit(order, a)
293+
order.strip!
294+
295+
order.gsub!(/\s+(asc|desc)/i, '')
296+
# Identifier quoting
297+
order.gsub!(/\[|\]/, '')
298+
299+
order.strip!
300+
order.downcase!
301+
302+
# If we selected all columns from a table, the order is ok
303+
table_match = /^(\w+)\.\w+$/.match(order)
304+
next if table_match and tables_all_columns.include?(table_match[1])
305+
306+
next if expressions.include?(order)
307+
308+
return true
309+
end
310+
311+
# We didn't find anything in the order by no being selected
312+
false
313+
end
314+
169315
def windowed_single_distinct_select_statement?(o)
170316
o.limit &&
171317
o.offset &&

test/cases/finder_test_sqlserver.rb

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,30 @@ class FinderTestSqlserver < ActiveRecord::TestCase
1010
end
1111

1212
class FinderTest < ActiveRecord::TestCase
13-
fixtures :authors, :author_addresses, :posts
13+
fixtures :authors, :author_addresses, :posts, :categorizations
1414
COERCED_TESTS = [
1515
:test_exists_does_not_select_columns_without_alias,
1616
:test_string_sanitation,
17-
:test_take_and_first_and_last_with_integer_should_use_sql_limit,
18-
:test_find_with_order_on_included_associations_with_construct_finder_sql_for_association_limiting_and_is_distinct
19-
17+
:test_take_and_first_and_last_with_integer_should_use_sql_limit
2018
]
2119

2220
include SqlserverCoercedTest
2321

2422

25-
# TODO This test passes in rails 4.0.0 but not 4.0.1-2
26-
def test_coerced_find_with_order_on_included_associations_with_construct_finder_sql_for_association_limiting_and_is_distinct
27-
p = Post.all.merge!(includes: { authors: :author_address },
28-
order: 'author_addresses.id DESC ',
29-
limit: 2)
30-
# ar_version = Gem.loaded_specs['activerecord'].version.version
31-
# arel_to_png(p, "#{ar_version}")
32-
count = p.to_a.size
33-
#puts "*****#{ActiveRecord::SQLCounter.log_all.join("\n\n")}"
34-
assert_equal 2, count
35-
36-
assert_equal 3, Post.all.merge!(includes: { author: :author_address, authors: :author_address},
37-
order: 'author_addresses_authors.id DESC ', limit: 3).to_a.size
23+
def test_find_with_order_on_included_associations_with_construct_finder_sql_for_association_limiting_and_is_distinct_and_offset
24+
# Based on test_find_with_order_on_included_associations_with_construct_finder_sql_for_association_limiting_and_is_distinct
25+
# and issue https://github.com/rails-sqlserver/activerecord-sqlserver-adapter/issues/306
26+
assert_equal(
27+
posts(:welcome, :thinking),
28+
Post.all.merge!(
29+
:includes => { authors: :author_address },
30+
:order => ['authors.name DESC'],
31+
:limit => 2,
32+
:offset => 1
33+
).to_a
34+
)
3835
end
3936

40-
4137
def test_coerced_exists_does_not_select_columns_without_alias
4238
assert_sql(/SELECT TOP \(1\) 1 AS one FROM \[topics\]/i) do
4339
Topic.exists?

0 commit comments

Comments
 (0)