Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Made `first` finder consistent among database engines by adding a default order #5153

Merged
merged 6 commits into from
@mhfs

Hi,

When you do a call to first the generated SQL is this:

> User.first
  User Load (35.2ms)  SELECT "users".* FROM "users" LIMIT 1

However, PostgreSQL changes the 'default order' of the records when you perform an update as you can observe in this example:

# CREATE TABLE users (id serial, name varchar(40));
# INSERT INTO users (name) VALUES ('John');
# INSERT INTO users (name) VALUES ('Michael');
# SELECT * FROM users;
 id |     name      
----+---------------
  1 | John
  2 | Michael

# SELECT "users".* FROM "users" LIMIT 1;
 id | name 
----+------
  1 | John

# UPDATE users SET name = 'Modified John' WHERE id = 1;
# SELECT * FROM users;
 id |     name      
----+---------------
  2 | Michael
  1 | Modified John

# SELECT "users".* FROM "users" LIMIT 1;
 id |  name   
----+---------
  2 | Michael

So I made a change so that all first calls gains a default order clause:

> User.first
  User Load (3.1ms)  SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT 1

If an order clause is already chained, nothing is added:

> User.order(:name).first
  User Load (75.1ms)  SELECT "users".* FROM "users" ORDER BY name LIMIT 1

I'm not particularly confortable changing the global behavior for a PostgreSQL specific fix but I think an adapter check wouldn't make sense in this area of the code. What do you think?

If accepted I'd be happy to backport the fix to previous versions.

Cheers!

@mhfs

Forgot to mention issue #5103 is related to this fix.

@panthomakos

It seems to me that having an implicit order clause in first is more confusing than just letting the backend return whatever it's going to return. There is no ordering to an SQL table by default, so adding one via the Rails framework seems artificial. This could also cause performance pains, especially for heavily sharded databases. I think it would be better to allow users to customize this behavior on their own, for instance by using a scope:

scope :first_by_id, order('id ASC').first
@mhfs

@panthomakos thanks for jumping in. I'm very sympathetic about your concerns however I believe the key issue in this matter is the interpretation of what first should do. Right now the behavior of first for a PostgreSQL app seems asymmetric to last. For instance, in a table with multiple records, first and last can happen to be the same record.

IMO that's wrong and first's responsability is beyond giving me the first record according to the database implementation point of view. In the way I interpret first, it should return the first inserted record.

Another point that I consider important is the fact that (I believe) Rails apps should be consistent running against different databases. I found this issue because a bug came up while migrating a MySQL app to PostgreSQL and at least for me it's very undesirable.

And last but not least, not sure if performance is an issue here since adding an order by primary key is exactly what last does by default. The performance cost wouldn't go beyond that. If this order hits performance, limiting by 1 without using first is always an option, but I think the default behavior should be the one from the patch.

@mhfs

@tenderlove @jonleighton would you guys take a look at this? :heart: :heart:

@mhfs

@pragdave just wrote an article about this behavior. much better explained than I did above. =D

http://pragdave.blogs.pragprog.com/pragdave/2012/03/a-subtle-potential-bug-in-most-rails-applications.html

@Systho

I most often use the first method when I know there s only one result to the query and I wouldn't like Rail to add and order clause silently to my requests.

If we were going from scratch I would agree with you that first and last should imply a idea of order and there is a missing '#single' method which should do the current behaviour of first. Unfortunately what you are proposing is to add an order clause to almost all queries which do not need it

@pragdave

@sysho—last() already has an order...

And, frankly, I don't care about first(). It doesn't mean much. But I do think that Rails should at least warn on any case of offset() being used without an order() clause.

@Systho

@pragdave I agree, offset without order doesn't mean much.

I know that last()already has an order, what I meant is that most Rails code uses first() as take(1)not as the opposite of last()

@tenderlove
Owner

I'm against adding an order to first. We might be able to overcome these issues, but I feel like I need to list them off before we make any changes.

1. Backwards compatibility and deprecation

How do we deprecate this behavior and remain backwards compatible? Presumably PG users that use this feature may depend on it's current behavior.

2. What should we order by?

Primary key doesn't necessarily indicate which record was the first to be added. Some replicating / sharding databases chunk up their primary keys (and in fact we have one at work). We could sort by created_at columns, but those might not be present.

To be frank, a database table seems like an unordered list to me. It doesn't surprise me that asking for first without specifying how you want the list to be sorted will return an unpredictable answer.

Maybe the best solution would be:

  1. Add a warning if first is called without an order
  2. Add a take method for people that use first as take(1)

Thoughts?

@pragdave

How do we deprecate this behavior and remain backwards compatible?

I'm not sure this has ever been a Rails priority :),

Primary key doesn't necessarily indicate which record was the first to be added.

I agree, but the last() method already assumes that ordering is based on id(). If first and last are related methods (and I'd think that based on the names alone 99.9% of used would say they were), the a change to first() would seem tidy.

However, I don't think the issue is first(). Maybe, in an ideal world, you'd rename first to be "one" or "row" and take a year to deprecate first, but I'm not convinced.

However, I do think it would be very worthwhile to add a warning to an ARel chain that uses offset() but not order(), because the use of offset() really does imply that you want deterministic order.

@panthomakos

I agree with @tenderlove on this issue. In fact I think the issue has more to do with the behavior of last than it does with first. I think a possible solution could be:

  1. Add a warning if first or last are called without an order.
  2. Remove the ordering from last.

That way the behavior is consistent without having to deal with primary key ordering. This also removes the assumption that first == "the first record that was inserted" which is not something that adding an ordering will solve (as @tenderlove points out PK is not an indicator of this).

@tenderlove
Owner

How do we deprecate this behavior and remain backwards compatible?

I'm not sure this has ever been a Rails priority :),

Obviously I'm damned if I do, and damned if I don't (break backwards compat). :-P

Would you like to volunteer for fielding complaints when someone reports a ticket about this changing? If so, let's do it. But we must add a take method to give people who rely on the behavior an upgrade path.

I agree, but the last() method already assumes that ordering is based on id().

Yes, and I'm saying that if the table is an unordered collection, you shouldn't rely on last ordering by id. Maybe I should file a bug that last doesn't work correctly on my sharded tables?

However, I do think it would be very worthwhile to add a warning to an ARel chain that uses offset() but not order(), because the use of offset() really does imply that you want deterministic order.

Agreed.

@Systho

I agree with making a warning when either first or last is called without an order and make a new method (any, take, single, one ...) which does the job of limiting the query when no order is given.

I actually think than take(1) should be an alias for limit(1) and therefore return a collection and not a single object.

I would go for single()or one() personally

@pragdave

Aaron

I agree 100% (and I was just being cynical when talking about compatibility). FWIW, I think the answer is actually to slowly deprecate both first and last, because the names are really not appropriate to a relational wrapper. Replace both with the method row(), or one_row(), or whatever. The deprecation warning for first() could be "replace with .one_row.order()" and for last() you just add in a "desc"

@panthomakos

If we assume:

  1. Database tables are unordered lists.
  2. first and last are not behaving consistently.

Then it seems reasonable to me to make this conclusion:

last should not order by PK.

If that's true then why should we create a bunch of new methods? I think the issue is with last not first. Both of these methods make sense when you have an ordered list.

Using something like single or one indicates nothing about the position of the element. And limit already exists, adding another method named take does not seem to resolve the issue.

@pragdave

@panthomakos I don't disagree, but if we treat tables as sets, not ordered lists, why not just alias first() to last()?

@panthomakos

@pragdave I think the reason not to do this is that first and last make sense when an order clause is present. For instance (assuming score is unique in this case):

order('score DESC').first == order('score ASC').last
order('score DESC').last == order('score ASC').first

Having this kind of behavior is useful and aliasing first to last would not work.

@tenderlove
Owner

@pragdave

I agree 100% (and I was just being cynical when talking about compatibility).

I know. I was giving you a hard time, with the possible outcome of having a scape goat if we decide to break things. ;-)

@panthomakos I think we're talking about introducing one method: take. The name has no implication of order, and Enumerable implements it too:

irb(main):001:0> [1,2,3,4].take 2
=> [1, 2]
irb(main):002:0>

Though I think we should extend it to be similar to first, in that first can take 0 arguments, but we would have take return any particular value from the collection.

irb(main):002:0> [1,2,3,4].first
=> 1
irb(main):003:0> [1,2,3,4].first 2
=> [1, 2]
irb(main):004:0>

I think it's safe to assume that database tables are unordered lists. Given that fact, no developer can make assumptions about the return value of either first or last. If no developer can make assumptions about the return value, maybe it's OK to order by id with first, but emit a warning if no order was provided.

@Systho

@tenderlove If we want to be compliant with the Enumerable module (which is good), then 1,2,3,4].take(1)should return [1]

I think everybody agrees that database tables are sets and therefore first(), last() and offset() should issue warnings and optionally add an order silently (but if so, the default order should be present and the same for the 3 methods)

There is also an obvious need for the get_a_single_record_and_i_dont_care_which_one(), which shouldn't be by take() because take() should return a collection.

I like the idea of aliasing take() to first() to be closer to the Enumerable module

@mhfs

Hi all,

It might be just me but I see a lot of relation and order implied in the names first and last. And not the unordered kind of order. =D

Maybe the lack of symmetry between them annoys me because of the influence of Array's first/last.

@tenderlove I don't know if I see a point in adding a default order and warning at the same time. Looks like not having a default to me... "I work but don't use me". =D

IMO we should pick one way or the other.

1) Assume a large number of people do use non chunked sequential primary keys and expect first to respect the primary key order like last do and don't warn at all (I'd guess this is most of the people - the minority would have to adjust).

2) Assume the developer should have no expectation on first's order, always warning if no order was chained and documenting properly on both first and last.

I prefer 1 by far.

Regardless of the destiny of first and last I agree with @pragdave that we should warn about offset being used without a chained order.

I think take (or whatever name we pick - I like take and would go with it) should behave the same as AR's first and return a record and not a collection for no params or 1.

Once consensus is reached I'll be happy to update the pull request.

Cheers!

@Systho

I think the destiny of offset(), first() and last() should be bound, and I agree with you that ordering by id when no order is given is a very convenient approach but I wonder if it won't be a dramatic loss in performance (especially for first() which is virtually the only other way than find() to get a single record ). If the performance loss is real, then a warning should be issued IMHO.

about the take(), please do not make the return type based on the parameter value : it will be the hell to use. It's probably better not to do anything than to name it like the Enumerable#take and make it behave differently.

There is still a need for a method which would extract a single record from the relation without ordering it : I like both single() and one() but since this will be a method used very often, I would go for the shorter one()

@tenderlove
Owner

@Systho yes, if you provide take with an argument, it should always return a list. I'm saying that if you provide take with no arguments it should return just one. This slightly deviates from Enumerable#take in that Enumerable#take requires a parameter:

irb(main):001:0> [1,2,3,4].take
ArgumentError: wrong number of arguments (0 for 1)
    from (irb):1:in `take'
    from (irb):1
    from /Users/aaron/.local/bin/irb:12:in `<main>'
irb(main):002:0>
@Systho

@tenderlove OK, I didn't understand what you meant

@carlosantoniodasilva

I'm just going to give a naming idea related on take vs first vs one, that seems reasonable enough for a method that will return either 1 record or a list of records if an argument is given, without relying on ordering:

>> [1,2,3,4].sample 
=> 3
>> [1,2,3,4].sample 2
=> [4, 2]
@Systho

I think that sample() implies some statistical properties of randomness. I prefer the take() verb wich IMHO is closer to the meaning "Do not make any assumption on ordering"

@mhfs

@tenderlove would you give final words on which direction we'll take?

1) Will we add default order to first?
2) Will we warn when first's default order is used?
3) Will we add default order to offset?
4) Will we warn when offset's default order is used?
5) Will we add take? Will it have this name?

Just want to be sure before writing code...

Thanks!

@tenderlove
Owner

1) Will we add default order to first?

Yes, in master.

2) Will we warn when first's default order is used?

Yes

3) Will we add default order to offset?
4) Will we warn when offset's default order is used?

I think we should, but let's open a new ticket and discuss there. I like to keep a "paper" trail. ;-)

5) Will we add take? Will it have this name?

Yes.

@mhfs

@tenderlove just found out that the "first without order" warning would be shown 4389 times during AR's test suite (mysql only). what should we do to these guys? change them to take or add an explicit order?

wasn't expecting such a large number...

@tenderlove
Owner

@mhfs maybe we shouldn't add the warning.....

I wonder if adding more documentation on the first method would be an acceptable solution.

@mhfs

I incline toward adding a default order without the warning but the final call is yours. =D

@tenderlove
Owner

@mhfs ya, let's not add the warning, but please make sure we note this in the documentation.

@mhfs

@tenderlove I just gave it a try. Could you review it please? Thanks!

...verecord/lib/active_record/relation/finder_methods.rb
@@ -312,11 +345,24 @@ def find_some(ids)
end
end
+ def find_take
+ if loaded?
+ @records.take(1)[0]
+ else
+ @take ||= limit(1).to_a[0]
@tenderlove Owner

Can you use to_a.first in these cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
...verecord/lib/active_record/relation/finder_methods.rb
@@ -67,7 +89,15 @@ def find_by!(*args)
# Person.where(["user_name = :u", { :u => user_name }]).first
# Person.order("created_on DESC").offset(5).first
def first(limit = nil)
- limit ? limit(limit).to_a : find_first
+ if limit
+ if order_values.empty? && primary_key
+ order("#{quoted_table_name}.#{quoted_primary_key} ASC").limit(limit).to_a
@tenderlove Owner

Can you check to see if there is a way to do this with ARel nodes? I want to reduce the amount of raw SQL string concatenation we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mhfs

@tenderlove just pushed the changes you asked. I had to make a minor unexpected change to PostgreSQLAdapter#distinct since it expects only strings as order clauses. Using arel nodes as you suggested broke some tests.

Would you please check if there's a better way of doing that?

@rafaelfranca

hey @mhfs could you add a "Fixes #5103" in the message of the first commit of this pull request? This will make my work easier.

Thanks.

@mhfs

@rafaelfranca done.

@tenderlove it's ready for you.

@tenderlove tenderlove merged commit 2601042 into from
@spastorino spastorino merged commit 878db1f into from
@vijaydev
Collaborator

@mhfs Can you document this in the AR query guide please?

@mhfs

@vijaydev sure! will do it in the weekend and ping you back.

@mhfs

@tenderlove FYI I just noticed (see #6147) some finders (e.g. find_one) use first internally and consequently are getting undesirable order bys in its queries. I'll submit a new PR to make those finders use take instead.

@mtparet

I'm wondering how many rails app has been impacted by the performance problem of using 'order_by' (before replacing .first by .take)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
6 activerecord/CHANGELOG.md
@@ -1,5 +1,11 @@
## Rails 4.0.0 (unreleased) ##
+* Added default order to `first` to assure consistent results among
+ diferent database engines. Introduced `take` as a replacement to
+ the old behavior of `first`.
+
+ *Marcelo Silveira*
+
* Added an :index option to automatically create indexes for references
and belongs_to statements in migrations.
View
5 activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb
@@ -1233,7 +1233,10 @@ def distinct(columns, orders) #:nodoc:
# Construct a clean list of column names from the ORDER BY clause, removing
# any ASC/DESC modifiers
- order_columns = orders.collect { |s| s.gsub(/\s+(ASC|DESC)\s*(NULLS\s+(FIRST|LAST)\s*)?/i, '') }
+ order_columns = orders.collect do |s|
+ s = s.to_sql unless s.is_a?(String)
+ s.gsub(/\s+(ASC|DESC)\s*(NULLS\s+(FIRST|LAST)\s*)?/i, '')
+ end
order_columns.delete_if { |c| c.blank? }
order_columns = order_columns.zip((0...order_columns.size).to_a).map { |s,i| "#{s} AS alias_#{i}" }
View
2  activerecord/lib/active_record/querying.rb
@@ -3,7 +3,7 @@
module ActiveRecord
module Querying
- delegate :find, :first, :first!, :last, :last!, :all, :exists?, :any?, :many?, :to => :scoped
+ delegate :find, :take, :take!, :first, :first!, :last, :last!, :all, :exists?, :any?, :many?, :to => :scoped
delegate :first_or_create, :first_or_create!, :first_or_initialize, :to => :scoped
delegate :find_by, :find_by!, :to => :scoped
delegate :destroy, :destroy_all, :delete, :delete_all, :update, :update_all, :to => :scoped
View
56 activerecord/lib/active_record/relation/finder_methods.rb
@@ -60,6 +60,28 @@ def find_by!(*args)
where(*args).first!
end
+ # Gives a record (or N records if a parameter is supplied) without any implied
+ # order. The order will depend on the database implementation.
+ # If an order is supplied it will be respected.
+ #
+ # Examples:
+ #
+ # Person.take # returns an object fetched by SELECT * FROM people
+ # Person.take(5) # returns 5 objects fetched by SELECT * FROM people LIMIT 5
+ # Person.where(["name LIKE '%?'", name]).take
+ def take(limit = nil)
+ limit ? limit(limit).to_a : find_take
+ end
+
+ # Same as +take+ but raises <tt>ActiveRecord::RecordNotFound</tt> if no record
+ # is found. Note that <tt>take!</tt> accepts no arguments.
+ def take!
+ take or raise RecordNotFound
+ end
+
+ # Find the first record (or first N records if a parameter is supplied).
+ # If no order is defined it will order by primary key.
+ #
# Examples:
#
# Person.first # returns the first object fetched by SELECT * FROM people
@@ -67,7 +89,15 @@ def find_by!(*args)
# Person.where(["user_name = :u", { :u => user_name }]).first
# Person.order("created_on DESC").offset(5).first
def first(limit = nil)
- limit ? limit(limit).to_a : find_first
+ if limit
+ if order_values.empty? && primary_key
+ order(arel_table[primary_key].asc).limit(limit).to_a
+ else
+ limit(limit).to_a
+ end
+ else
+ find_first
+ end
end
# Same as +first+ but raises <tt>ActiveRecord::RecordNotFound</tt> if no record
@@ -76,6 +106,9 @@ def first!
first or raise RecordNotFound
end
+ # Find the last record (or last N records if a parameter is supplied).
+ # If no order is defined it will order by primary key.
+ #
# Examples:
#
# Person.last # returns the last object fetched by SELECT * FROM people
@@ -83,8 +116,8 @@ def first!
# Person.order("created_on DESC").offset(5).last
def last(limit = nil)
if limit
- if order_values.empty?
- order("#{primary_key} DESC").limit(limit).reverse
+ if order_values.empty? && primary_key
+ order(arel_table[primary_key].desc).limit(limit).reverse
else
to_a.last(limit)
end
@@ -315,11 +348,24 @@ def find_some(ids)
end
end
+ def find_take
+ if loaded?
+ @records.take(1).first
+ else
+ @take ||= limit(1).to_a.first
+ end
+ end
+
def find_first
if loaded?
@records.first
else
- @first ||= limit(1).to_a[0]
+ @first ||=
+ if order_values.empty? && primary_key
+ order(arel_table[primary_key].asc).limit(1).to_a.first
+ else
+ limit(1).to_a.first
+ end
end
end
@@ -331,7 +377,7 @@ def find_last
if offset_value || limit_value
to_a.last
else
- reverse_order.limit(1).to_a[0]
+ reverse_order.limit(1).to_a.first
end
end
end
View
32 activerecord/test/cases/finder_test.rb
@@ -143,6 +143,26 @@ def test_find_by_sql_with_sti_on_joined_table
assert_equal [Account], accounts.collect(&:class).uniq
end
+ def test_take
+ assert_equal topics(:first), Topic.take
+ end
+
+ def test_take_failing
+ assert_nil Topic.where("title = 'This title does not exist'").take
+ end
+
+ def test_take_bang_present
+ assert_nothing_raised do
+ assert_equal topics(:second), Topic.where("title = 'The Second Topic of the day'").take!
+ end
+ end
+
+ def test_take_bang_missing
+ assert_raises ActiveRecord::RecordNotFound do
+ Topic.where("title = 'This title does not exist'").take!
+ end
+ end
+
def test_first
assert_equal topics(:second).title, Topic.where("title = 'The Second Topic of the day'").first.title
end
@@ -163,6 +183,12 @@ def test_first_bang_missing
end
end
+ def test_first_have_primary_key_order_by_default
+ expected = topics(:first)
+ expected.touch # PostgreSQL changes the default order if no order clause is used
+ assert_equal expected, Topic.first
+ end
+
def test_model_class_responds_to_first_bang
assert Topic.first!
Topic.delete_all
@@ -191,7 +217,8 @@ def test_model_class_responds_to_last_bang
end
end
- def test_first_and_last_with_integer_should_use_sql_limit
+ def test_take_and_first_and_last_with_integer_should_use_sql_limit
+ assert_sql(/LIMIT 3|ROWNUM <= 3/) { Topic.take(3).entries }
assert_sql(/LIMIT 2|ROWNUM <= 2/) { Topic.first(2).entries }
assert_sql(/LIMIT 5|ROWNUM <= 5/) { Topic.last(5).entries }
end
@@ -212,7 +239,8 @@ def test_last_with_integer_and_reorder_should_not_use_sql_limit
assert_no_match(/LIMIT/, query.first)
end
- def test_first_and_last_with_integer_should_return_an_array
+ def test_take_and_first_and_last_with_integer_should_return_an_array
+ assert_kind_of Array, Topic.take(5)
assert_kind_of Array, Topic.first(5)
assert_kind_of Array, Topic.last(5)
end
Something went wrong with that request. Please try again.