Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

validates_uniqueness_of is horribly inefficient in mysql #613

Closed
lighthouse-import opened this Issue · 28 comments

1 participant

Lighthouse Import
Lighthouse Import

Imported from Lighthouse. Original ticket at: http://rails.lighthouseapp.com/projects/8994/tickets/2503
Created by Dave Grijalva - 2010-11-08 23:12:04 UTC

mysql is case insensitive by default. for case insensitive comparisions, the following is efficient.

WHERE users.username = 'myUsername'

validates_uniqueness_of currently generates

WHERE LOWER(users.username) = BINARY 'myusername'

this is redundant and completely clobbers performance. If you have a unique index on the column (and you should, because validates_uniqueness_of cannot be trusted), you don't get any of the benefit of the index because the system has to transform the column.

Lighthouse Import

Imported from Lighthouse.
Comment by CancelProfileIsBroken - 2009-05-17 12:54:15 UTC

Please submit a patch if you want to propose a change here. You'd want to look in activerecord/lib/active_record/validations.rb. What's happening is that the code first requests a case-sensitive equality operator, then later notices that the database is case-insensitive but doesn't change equality operators. So you'd need to fix the logic and add a method to the adapter to return a case-insensitive equality operator as well.

Lighthouse Import

Imported from Lighthouse.
Comment by Phil Ross - 2009-05-31 14:55:45 UTC

I'm using the attached patch to avoid the full table scan. This adds a :case_sensitive => :db option to validates_uniqueness_of, which causes the normal equality operator to be used. The validation will therefore use the database's case-sensitivity mode.

Lighthouse Import

Imported from Lighthouse.
Comment by Scott Storck - 2009-06-15 23:45:37 UTC

I am using this patch because I had a lot of problems with the SQL generated by validate_uniqueness_of not matching an existing row which would cause a duplicate row error.

Removing the BINARY keyword from the sql results in the match working correctly.
The data being tested for uniqueness which fails to match has trailing spaces.
Mysql doesn't seem to interested in the trailing space when checking for uniqueness.
The BINARY keyword however causes mysql to do a byte for byte check.

I would think it usually would make sense to keep the uniqueness test as close to the test from the underlying database that is used as possible.

I also have never used a database that doesn't use an index to enforce a unique field.
I suspect that other databases might not be able to use the index if a modifying clause like BINARY is used.
It would be interesting to hear from users of the other databases about this.

I have applied the patch manually as it doesn't apply to my ActiveRecord version (2.3.2).

Lighthouse Import

Imported from Lighthouse.
Comment by David Stevenson - 2009-06-20 17:49:33 UTC

I've had this problem before too. It's a tricky situation because it depends on the setup of MySQL (and its case sensitivity, which is different by default depending on your platform). I'm completely in favor of a :db option that puts that responsibility on the person who sets up the DB, rather than on rails itself.

Lighthouse Import

Imported from Lighthouse.
Comment by chad.ingram (at me) - 2009-06-29 22:05:57 UTC

This now has a patch. Assigning so maybe it can move forward since it's no longer 'incomplete' but I can't change that...

Lighthouse Import

Imported from Lighthouse.
Comment by Renaud Kern - 2009-08-08 21:07:52 UTC

The patch doesn't pass. The file activerecord/test/cases/validations/uniqueness_validation_test.rb doesn't exit anymore. Test are now in activerecord/test/cases/validations_test.rb

Lighthouse Import

Imported from Lighthouse.
Comment by Phil Ross - 2009-08-08 21:47:01 UTC

@Renaud Kern I still see activerecord/test/cases/validations/uniqueness_validation_test.rb in the master branch and my patch still applies cleanly.

There is an activerecord/test/cases/validations_test.rb file, but this doesn't contain any tests for validates_uniqueness_of.

Lighthouse Import

Imported from Lighthouse.
Comment by Elad Meidar - 2009-08-08 22:06:29 UTC

+1 on idea, -1 on patch.
Patch does not apply on 2-3-stable.

Lighthouse Import

Imported from Lighthouse.
Comment by Hugo Peixoto - 2009-08-08 22:34:28 UTC

+1 on the idea. But the patch does not apply to 2-3-stable.
I've attached a patch that ports Phil's code to this branch.

Lighthouse Import

Imported from Lighthouse.
Comment by Jeremy Kemper - 2009-08-09 08:16:09 UTC

:case_sensitive => :db is a strange option to add to the public API.

Ideas for clarifying, or perhaps sidestepping, this issue?

Lighthouse Import

Imported from Lighthouse.
Comment by Hugo Peixoto - 2009-08-09 13:16:03 UTC

I agree, Jeremy. But sidestepping doesn't seem like the right thing to do.

How about ":case_sensitive => :use_database_default" ?

Lighthouse Import

Imported from Lighthouse.
Comment by Dan Croak - 2009-08-10 02:01:47 UTC

-1 on changing case_insensitive to non-boolean.

My understanding is that the problem here is the BINARY conversion for MySQL, correct? When running a 2.3.3 app, I do not see the reported LOWER().

So, the fix needs to simply be:

make MySQL act like SQLite3, for instance, and simply not add the BINARY conversion?

Lighthouse Import

Imported from Lighthouse.
Comment by Rizwan Reza - 2009-08-10 02:10:55 UTC

-1 on changing case_sensitive. Agreed with Dan.

Hugo: The patch doesn't work anymore.

Lighthouse Import

Imported from Lighthouse.
Comment by Elad Meidar - 2009-08-13 21:13:36 UTC

+1 on feature, although not exactly fits into the current boolean convention, it's a well required addition, i would not scope it only for mysql, since FTS are a cross RDBMS issue and there is a need to help developers deal with them before it getting to be a problem.

since the patch for 2-3-stable did work as Rizwan said, i took the liberty to fix it.

Lighthouse Import

Imported from Lighthouse.
Comment by Blue Box Jesse - 2009-09-27 00:38:39 UTC

We've seen this issue bite a number of customers, so this is a very valuable commit.

BugMash: +1

I agree that the best naming convention would be :non-boolean but using the same description field.

I've confirmed the patch applies on 2-3-stable.

Lighthouse Import

Imported from Lighthouse.
Comment by Jordan Brough - 2009-11-12 22:33:11 UTC

I think this is worth fixing either way, but for anyone interested here's a ticket I've added for a way to sidestep the issue -- #3486 (update rails to handle db unique-constraints and omit the validates_uniqueness_of)

Lighthouse Import

Imported from Lighthouse.
Comment by Ryan Angilly - 2009-11-17 14:52:46 UTC

Just wanted to throw my $0.02 to keep the discussion going. I do think it's a necessary fix, although I'm not crazy about the :case_sensitive => :db option. I feel like the model should reflect whether or not the intent is for case sensitivity or not. Using :case_sensitive => :db leaves it ambiguous, and would force someone to go check the database engine to find out if case sensitivity was actually enabled. I'm thinking maybe an api like:

validates_uniqueness_of :email, :db_case_sensitive => :false

And that option would turn off the LOWER and the BINARY, and let the db do its magic.

Lighthouse Import

Imported from Lighthouse.
Comment by Ken Miller - 2009-12-30 22:14:55 UTC

I don't at present have a patch, but doesn't this seem like the sort of thing the connection adapter should be able to report on? Either by introspecting the database, or explicitly declared in the connection spec. That would require editing all the connection adapters, but that seems preferable to having to declare it in each model.

Lighthouse Import

Imported from Lighthouse.
Comment by Rizwan Reza - 2010-02-12 12:46:14 UTC

[bulk edit]

Lighthouse Import

Imported from Lighthouse.
Comment by Neeraj Singh - 2010-04-20 16:04:01 UTC

In my test index is not used when I use Lower(user.email). Otherwise index is being used. That means BINARY is not the issue as Dan Croak suggested. Please correct me if I am wrong.

I will say that current code is okay. However it should be documented that if users want better performance then they need to do two things.

  • set case_sensitive to true
  • add before_validation callback to lowercase the model attribute
Lighthouse Import

Imported from Lighthouse.
Comment by Jeremy Kemper - 2010-05-04 17:48:47 UTC

[bulk edit]

Lighthouse Import

Imported from Lighthouse.
Comment by metavida - 2010-08-12 20:19:30 UTC

I wanted to agree with Neeraj that BINARY isn't the problem. On my MySQL 5.0.89 install I get the following results.

# Using the SQL generated by Rails the index is NOT used.
EXPLAIN SELECT * FROM `users` WHERE LOWER(login) = BINARY 'test';
+-------------+------+------+------+------+-------------+
| select_type | type | key  | ref  | rows | Extra       |
+-------------+------+------+------+------+-------------+
| SIMPLE      | ALL  | NULL | NULL |   94 | Using where | 
+-------------+------+------+------+------+-------------+

# Using neither LOWER nor BINARY the index IS used.
EXPLAIN SELECT * FROM `users` WHERE login = 'test';
+-------------+------+-------------------+-------+------+-------------+
| select_type | type | key               | ref   | rows | Extra       |
+-------------+------+-------------------+-------+------+-------------+
| SIMPLE      | ref  | users_login_index | const |    2 | Using where | 
+-------------+------+-------------------+-------+------+-------------+

# Using only BINARY the index IS used, and the comparison IS case sensitive.
EXPLAIN SELECT * FROM `users` WHERE login = BINARY 'test';
+-------------+-------+-------------------+------+------+-------------+
| select_type | type  | key               | ref  | rows | Extra       |
+-------------+-------+-------------------+------+------+-------------+
| SIMPLE      | range | users_login_index | NULL |    2 | Using where | 
+-------------+-------+-------------------+------+------+-------------+

Seems like a bug in the default behavior to me. With :case_sensitive=>true we should be adding BINARY to the conditions to make the assertion work as it claims it does.

Lighthouse Import

Imported from Lighthouse.
Comment by Rohit Arondekar - 2010-10-15 09:58:51 UTC

Any updates here?

Lighthouse Import

Imported from Lighthouse.
Comment by Jeff Kreeftmeijer - 2010-11-01 17:07:07 UTC

Automatic cleanup of spam.

Lighthouse Import

Imported from Lighthouse.
Comment by viktor tron (strawberry) - 2011-01-12 13:10:16 UTC

Any updates on this issue?

Lighthouse Import

Imported from Lighthouse.
Comment by Grant Hollingworth - 2011-01-19 16:36:08 UTC

Here's a patch I've been using, rebased against the current HEAD. It adds a case_insensitive_equality_operator method to ActiveRecord::ConnectionAdapters::DatabaseStatements, following the existing case_sensitive_equality_operator. MysqlAdapter overrides it.

There are no tests because it changes performance but not behaviour.

Lighthouse Import

Imported from Lighthouse.
Comment by Raphael Sofaer - 2011-02-17 05:11:27 UTC

That patch (Grant's patch) looks like it will break case-insensitive comparison in databases with a collation like utf8_bin. I'm definitely having this problem too, though. We spend a huge amount of time in those queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.