Skip to content

Made `first` finder consistent among database engines by adding a default order #5153

Merged
merged 6 commits into from May 3, 2012
@mhfs
mhfs commented Feb 24, 2012

Hi,

When you do a call to first the generated SQL is this:

> User.first
  User Load (35.2ms)  SELECT "users".* FROM "users" LIMIT 1

However, PostgreSQL changes the 'default order' of the records when you perform an update as you can observe in this example:

# CREATE TABLE users (id serial, name varchar(40));
# INSERT INTO users (name) VALUES ('John');
# INSERT INTO users (name) VALUES ('Michael');
# SELECT * FROM users;
 id |     name      
----+---------------
  1 | John
  2 | Michael

# SELECT "users".* FROM "users" LIMIT 1;
 id | name 
----+------
  1 | John

# UPDATE users SET name = 'Modified John' WHERE id = 1;
# SELECT * FROM users;
 id |     name      
----+---------------
  2 | Michael
  1 | Modified John

# SELECT "users".* FROM "users" LIMIT 1;
 id |  name   
----+---------
  2 | Michael

So I made a change so that all first calls gains a default order clause:

> User.first
  User Load (3.1ms)  SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT 1

If an order clause is already chained, nothing is added:

> User.order(:name).first
  User Load (75.1ms)  SELECT "users".* FROM "users" ORDER BY name LIMIT 1

I'm not particularly confortable changing the global behavior for a PostgreSQL specific fix but I think an adapter check wouldn't make sense in this area of the code. What do you think?

If accepted I'd be happy to backport the fix to previous versions.

Cheers!

@mhfs
mhfs commented Feb 24, 2012

Forgot to mention issue #5103 is related to this fix.

@panthomakos

It seems to me that having an implicit order clause in first is more confusing than just letting the backend return whatever it's going to return. There is no ordering to an SQL table by default, so adding one via the Rails framework seems artificial. This could also cause performance pains, especially for heavily sharded databases. I think it would be better to allow users to customize this behavior on their own, for instance by using a scope:

scope :first_by_id, order('id ASC').first
@mhfs
mhfs commented Mar 1, 2012

@panthomakos thanks for jumping in. I'm very sympathetic about your concerns however I believe the key issue in this matter is the interpretation of what first should do. Right now the behavior of first for a PostgreSQL app seems asymmetric to last. For instance, in a table with multiple records, first and last can happen to be the same record.

IMO that's wrong and first's responsability is beyond giving me the first record according to the database implementation point of view. In the way I interpret first, it should return the first inserted record.

Another point that I consider important is the fact that (I believe) Rails apps should be consistent running against different databases. I found this issue because a bug came up while migrating a MySQL app to PostgreSQL and at least for me it's very undesirable.

And last but not least, not sure if performance is an issue here since adding an order by primary key is exactly what last does by default. The performance cost wouldn't go beyond that. If this order hits performance, limiting by 1 without using first is always an option, but I think the default behavior should be the one from the patch.

@mhfs
mhfs commented Mar 14, 2012

@tenderlove @jonleighton would you guys take a look at this? ❤️ ❤️

@mhfs
mhfs commented Mar 22, 2012

@pragdave just wrote an article about this behavior. much better explained than I did above. =D

http://pragdave.blogs.pragprog.com/pragdave/2012/03/a-subtle-potential-bug-in-most-rails-applications.html

@Systho
Systho commented Mar 22, 2012

I most often use the first method when I know there s only one result to the query and I wouldn't like Rail to add and order clause silently to my requests.

If we were going from scratch I would agree with you that first and last should imply a idea of order and there is a missing '#single' method which should do the current behaviour of first. Unfortunately what you are proposing is to add an order clause to almost all queries which do not need it

@pragdave

@sysholast() already has an order...

And, frankly, I don't care about first(). It doesn't mean much. But I do think that Rails should at least warn on any case of offset() being used without an order() clause.

@Systho
Systho commented Mar 22, 2012

@pragdave I agree, offset without order doesn't mean much.

I know that last()already has an order, what I meant is that most Rails code uses first() as take(1)not as the opposite of last()

@tenderlove
Ruby on Rails member

I'm against adding an order to first. We might be able to overcome these issues, but I feel like I need to list them off before we make any changes.

1. Backwards compatibility and deprecation

How do we deprecate this behavior and remain backwards compatible? Presumably PG users that use this feature may depend on it's current behavior.

2. What should we order by?

Primary key doesn't necessarily indicate which record was the first to be added. Some replicating / sharding databases chunk up their primary keys (and in fact we have one at work). We could sort by created_at columns, but those might not be present.

To be frank, a database table seems like an unordered list to me. It doesn't surprise me that asking for first without specifying how you want the list to be sorted will return an unpredictable answer.

Maybe the best solution would be:

  1. Add a warning if first is called without an order
  2. Add a take method for people that use first as take(1)

Thoughts?

@pragdave

How do we deprecate this behavior and remain backwards compatible?

I'm not sure this has ever been a Rails priority :),

Primary key doesn't necessarily indicate which record was the first to be added.

I agree, but the last() method already assumes that ordering is based on id(). If first and last are related methods (and I'd think that based on the names alone 99.9% of used would say they were), the a change to first() would seem tidy.

However, I don't think the issue is first(). Maybe, in an ideal world, you'd rename first to be "one" or "row" and take a year to deprecate first, but I'm not convinced.

However, I do think it would be very worthwhile to add a warning to an ARel chain that uses offset() but not order(), because the use of offset() really does imply that you want deterministic order.

@panthomakos

I agree with @tenderlove on this issue. In fact I think the issue has more to do with the behavior of last than it does with first. I think a possible solution could be:

  1. Add a warning if first or last are called without an order.
  2. Remove the ordering from last.

That way the behavior is consistent without having to deal with primary key ordering. This also removes the assumption that first == "the first record that was inserted" which is not something that adding an ordering will solve (as @tenderlove points out PK is not an indicator of this).

@tenderlove
Ruby on Rails member

How do we deprecate this behavior and remain backwards compatible?

I'm not sure this has ever been a Rails priority :),

Obviously I'm damned if I do, and damned if I don't (break backwards compat). :-P

Would you like to volunteer for fielding complaints when someone reports a ticket about this changing? If so, let's do it. But we must add a take method to give people who rely on the behavior an upgrade path.

I agree, but the last() method already assumes that ordering is based on id().

Yes, and I'm saying that if the table is an unordered collection, you shouldn't rely on last ordering by id. Maybe I should file a bug that last doesn't work correctly on my sharded tables?

However, I do think it would be very worthwhile to add a warning to an ARel chain that uses offset() but not order(), because the use of offset() really does imply that you want deterministic order.

Agreed.

@Systho
Systho commented Mar 22, 2012

I agree with making a warning when either first or last is called without an order and make a new method (any, take, single, one ...) which does the job of limiting the query when no order is given.

I actually think than take(1) should be an alias for limit(1) and therefore return a collection and not a single object.

I would go for single()or one() personally

@pragdave

Aaron

I agree 100% (and I was just being cynical when talking about compatibility). FWIW, I think the answer is actually to slowly deprecate both first and last, because the names are really not appropriate to a relational wrapper. Replace both with the method row(), or one_row(), or whatever. The deprecation warning for first() could be "replace with .one_row.order()" and for last() you just add in a "desc"

@panthomakos

If we assume:

  1. Database tables are unordered lists.
  2. first and last are not behaving consistently.

Then it seems reasonable to me to make this conclusion:

last should not order by PK.

If that's true then why should we create a bunch of new methods? I think the issue is with last not first. Both of these methods make sense when you have an ordered list.

Using something like single or one indicates nothing about the position of the element. And limit already exists, adding another method named take does not seem to resolve the issue.

@pragdave

@panthomakos I don't disagree, but if we treat tables as sets, not ordered lists, why not just alias first() to last()?

@panthomakos

@pragdave I think the reason not to do this is that first and last make sense when an order clause is present. For instance (assuming score is unique in this case):

order('score DESC').first == order('score ASC').last
order('score DESC').last == order('score ASC').first

Having this kind of behavior is useful and aliasing first to last would not work.

@tenderlove
Ruby on Rails member

@pragdave

I agree 100% (and I was just being cynical when talking about compatibility).

I know. I was giving you a hard time, with the possible outcome of having a scape goat if we decide to break things. ;-)

@panthomakos I think we're talking about introducing one method: take. The name has no implication of order, and Enumerable implements it too:

irb(main):001:0> [1,2,3,4].take 2
=> [1, 2]
irb(main):002:0>

Though I think we should extend it to be similar to first, in that first can take 0 arguments, but we would have take return any particular value from the collection.

irb(main):002:0> [1,2,3,4].first
=> 1
irb(main):003:0> [1,2,3,4].first 2
=> [1, 2]
irb(main):004:0>

I think it's safe to assume that database tables are unordered lists. Given that fact, no developer can make assumptions about the return value of either first or last. If no developer can make assumptions about the return value, maybe it's OK to order by id with first, but emit a warning if no order was provided.

@Systho
Systho commented Mar 22, 2012

@tenderlove If we want to be compliant with the Enumerable module (which is good), then 1,2,3,4].take(1)should return [1]

I think everybody agrees that database tables are sets and therefore first(), last() and offset() should issue warnings and optionally add an order silently (but if so, the default order should be present and the same for the 3 methods)

There is also an obvious need for the get_a_single_record_and_i_dont_care_which_one(), which shouldn't be by take() because take() should return a collection.

I like the idea of aliasing take() to first() to be closer to the Enumerable module

@mhfs
mhfs commented Mar 22, 2012

Hi all,

It might be just me but I see a lot of relation and order implied in the names first and last. And not the unordered kind of order. =D

Maybe the lack of symmetry between them annoys me because of the influence of Array's first/last.

@tenderlove I don't know if I see a point in adding a default order and warning at the same time. Looks like not having a default to me... "I work but don't use me". =D

IMO we should pick one way or the other.

1) Assume a large number of people do use non chunked sequential primary keys and expect first to respect the primary key order like last do and don't warn at all (I'd guess this is most of the people - the minority would have to adjust).

2) Assume the developer should have no expectation on first's order, always warning if no order was chained and documenting properly on both first and last.

I prefer 1 by far.

Regardless of the destiny of first and last I agree with @pragdave that we should warn about offset being used without a chained order.

I think take (or whatever name we pick - I like take and would go with it) should behave the same as AR's first and return a record and not a collection for no params or 1.

Once consensus is reached I'll be happy to update the pull request.

Cheers!

@Systho
Systho commented Mar 22, 2012

I think the destiny of offset(), first() and last() should be bound, and I agree with you that ordering by id when no order is given is a very convenient approach but I wonder if it won't be a dramatic loss in performance (especially for first() which is virtually the only other way than find() to get a single record ). If the performance loss is real, then a warning should be issued IMHO.

about the take(), please do not make the return type based on the parameter value : it will be the hell to use. It's probably better not to do anything than to name it like the Enumerable#take and make it behave differently.

There is still a need for a method which would extract a single record from the relation without ordering it : I like both single() and one() but since this will be a method used very often, I would go for the shorter one()

@tenderlove
Ruby on Rails member

@Systho yes, if you provide take with an argument, it should always return a list. I'm saying that if you provide take with no arguments it should return just one. This slightly deviates from Enumerable#take in that Enumerable#take requires a parameter:

irb(main):001:0> [1,2,3,4].take
ArgumentError: wrong number of arguments (0 for 1)
    from (irb):1:in `take'
    from (irb):1
    from /Users/aaron/.local/bin/irb:12:in `<main>'
irb(main):002:0>
@Systho
Systho commented Mar 23, 2012

@tenderlove OK, I didn't understand what you meant

@carlosantoniodasilva
Ruby on Rails member

I'm just going to give a naming idea related on take vs first vs one, that seems reasonable enough for a method that will return either 1 record or a list of records if an argument is given, without relying on ordering:

>> [1,2,3,4].sample 
=> 3
>> [1,2,3,4].sample 2
=> [4, 2]
@Systho
Systho commented Mar 23, 2012

I think that sample() implies some statistical properties of randomness. I prefer the take() verb wich IMHO is closer to the meaning "Do not make any assumption on ordering"

@mhfs
mhfs commented Mar 23, 2012

@tenderlove would you give final words on which direction we'll take?

1) Will we add default order to first?
2) Will we warn when first's default order is used?
3) Will we add default order to offset?
4) Will we warn when offset's default order is used?
5) Will we add take? Will it have this name?

Just want to be sure before writing code...

Thanks!

@tenderlove
Ruby on Rails member

1) Will we add default order to first?

Yes, in master.

2) Will we warn when first's default order is used?

Yes

3) Will we add default order to offset?
4) Will we warn when offset's default order is used?

I think we should, but let's open a new ticket and discuss there. I like to keep a "paper" trail. ;-)

5) Will we add take? Will it have this name?

Yes.

@mhfs
mhfs commented Apr 3, 2012

@tenderlove just found out that the "first without order" warning would be shown 4389 times during AR's test suite (mysql only). what should we do to these guys? change them to take or add an explicit order?

wasn't expecting such a large number...

@tenderlove
Ruby on Rails member

@mhfs maybe we shouldn't add the warning.....

I wonder if adding more documentation on the first method would be an acceptable solution.

@mhfs
mhfs commented Apr 6, 2012

I incline toward adding a default order without the warning but the final call is yours. =D

@tenderlove
Ruby on Rails member

@mhfs ya, let's not add the warning, but please make sure we note this in the documentation.

@mhfs
mhfs commented Apr 27, 2012

@tenderlove I just gave it a try. Could you review it please? Thanks!

@tenderlove tenderlove commented on an outdated diff Apr 27, 2012
...verecord/lib/active_record/relation/finder_methods.rb
@@ -312,11 +345,24 @@ def find_some(ids)
end
end
+ def find_take
+ if loaded?
+ @records.take(1)[0]
+ else
+ @take ||= limit(1).to_a[0]
@tenderlove
Ruby on Rails member
tenderlove added a note Apr 27, 2012

Can you use to_a.first in these cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@tenderlove tenderlove commented on an outdated diff Apr 27, 2012
...verecord/lib/active_record/relation/finder_methods.rb
@@ -67,7 +89,15 @@ def find_by!(*args)
# Person.where(["user_name = :u", { :u => user_name }]).first
# Person.order("created_on DESC").offset(5).first
def first(limit = nil)
- limit ? limit(limit).to_a : find_first
+ if limit
+ if order_values.empty? && primary_key
+ order("#{quoted_table_name}.#{quoted_primary_key} ASC").limit(limit).to_a
@tenderlove
Ruby on Rails member
tenderlove added a note Apr 27, 2012

Can you check to see if there is a way to do this with ARel nodes? I want to reduce the amount of raw SQL string concatenation we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mhfs
mhfs commented Apr 27, 2012

@tenderlove just pushed the changes you asked. I had to make a minor unexpected change to PostgreSQLAdapter#distinct since it expects only strings as order clauses. Using arel nodes as you suggested broke some tests.

Would you please check if there's a better way of doing that?

@rafaelfranca
Ruby on Rails member

hey @mhfs could you add a "Fixes #5103" in the message of the first commit of this pull request? This will make my work easier.

Thanks.

@mhfs
mhfs commented May 3, 2012

@rafaelfranca done.

@tenderlove it's ready for you.

@tenderlove tenderlove merged commit 2601042 into rails:master May 3, 2012
@spastorino spastorino merged commit 878db1f into rails:master May 3, 2012
@vijaydev
Ruby on Rails member
vijaydev commented May 4, 2012

@mhfs Can you document this in the AR query guide please?

@mhfs
mhfs commented May 4, 2012

@vijaydev sure! will do it in the weekend and ping you back.

@mhfs
mhfs commented May 4, 2012

@tenderlove FYI I just noticed (see #6147) some finders (e.g. find_one) use first internally and consequently are getting undesirable order bys in its queries. I'll submit a new PR to make those finders use take instead.

@mtparet

I'm wondering how many rails app has been impacted by the performance problem of using 'order_by' (before replacing .first by .take)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.