case_sensitive_equality_operator including BINARY is slow on MySQL #1399

Closed
kenn opened this Issue May 29, 2011 · 3 comments

Projects

None yet

2 participants

@kenn
Contributor
kenn commented May 29, 2011

It's about time we should fix this - right now ActiveRecord::ConnectionAdapters::MysqlAdapter has this code block:

def case_sensitive_equality_operator
  "= BINARY"
end

which renders validates_uniqueness_of prohibitively slow. With 7M records, it takes 3+ seconds in our case. It really should be:

def case_sensitive_equality_operator
  "="
end

That way, we benefit from MySQL's collation for case sensitivity in the index (like utf8_unicode_ci or utf8_bin), which is FAST. Or we could add :case_sensitive => nil as a new option particularly for that case.

It's a known problem for a long time actually, and patches have been around:

@sikachu
Member
sikachu commented Jun 1, 2011

I think I was the one who working on this patch. The blog posts are quite unrelated because the ability to validates uniqueness with case sensitive for MySQL was lost between 2.3.x and 3.0.x transition.

I bought back the = BINARY() because there is no way you can do case sensitive testing in MySQL without having to alter your database. It was broken in 3.0.x and nobody noticed it.

If you're going to do update your schema to something that will make MySQL always do case sensitive, I think you can already do :case_sensitive => false on the validate_uniqueness_of. I don't think there's a good fix for it, since telling people to migrate to utf8_unicode_ci is also impossible.

@tenderlove, what is your thought on this ticket?

@kenn
Contributor
kenn commented Jun 1, 2011

Oh, I use both utf8_unicode_ci and utf8_bin in the same app and there's absolutely no way to ask people to change column/index collation, that's my point too.

I use utf8_unicode_ci for email and username, because I want to match KennEjima with kennejima, and I use utf8_bin for secret tokens, because I don't want to let AyBV6hja1 accidentally match with AYBV6HJA1 (token is BASE62 representation of SecureRandom) - both collations exist in the same users table.

In the very early days after releasing the app, we removed every line of validate_uniqueness_of from our code and made a custom validator to do just that.

class BetterUniquenessValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    if record.send("#{attribute}_changed?")
      target = options[:with_type] ? record.class : record.class.base_class
      if target.where(attribute => record.read_attribute(attribute)).exists?
        record.errors[attribute] << "already in use"
      end
    end
  end
end

# In User.rb
validates :username, :better_uniqueness => true
validates :email, :better_uniqueness => true

IIRC, I found the noticeable performance degradation when we only have 10K users, so the problem should be pretty common, and people must be doing either:

  1. Don't use validate_uniqueness_of and instead use a custom validator just like mine, from the beginning
  2. Don't use validate_uniqueness_of at all and let database raise exception when duplicate (this is actually the fastest if duplicate case is very rare (e.g. UUID), we use this strategy in some places)
  3. Monkey patch case_sensitive_equality_operator
  4. Don't care performance

And the current implementation only benefits people like 4 (and Postgres, maybe? but I guess :case_sensitive => false could kill Postgres too), so I think the reason nobody noticed the regression in 3.0.x is because nobody is using it. :P

Anyway, my suggestion for change is either:

  1. In case :case_sensitive => nil is explicitly given (options.has_key?(:case_sensitive) && options[:case_sensitive].nil?), the equality operator should be just "IS" or "=". Period. (Backward compatible)
  2. Or better yet, do not set :case_sensitive => true by default, but let people explicitly set it for "= BINARY". (Desired)
@sikachu sikachu was assigned Jun 1, 2011
@jake3030 jake3030 pushed a commit to jake3030/rails that referenced this issue Jun 28, 2011
@lukemelia @josh lukemelia + josh Fix rendering html partial via inline render when with :js format [#1399
 state:resolved]

Signed-off-by: Joshua Peek <josh@joshpeek.com>
60d6f25
@sikachu
Member
sikachu commented Apr 25, 2012

It's already '=' in current master. Closing this.

@sikachu sikachu closed this Apr 25, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment