Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid validating a unique field if it has not changed and is backed by a unique index #45149

Merged
merged 1 commit into from
May 25, 2022

Conversation

fatkodima
Copy link
Member

Previously, when saving a record, ActiveRecord will perform an extra query to check for the uniqueness of each attribute having a uniqueness validation, even if that attribute hasn't changed.

If the database has the corresponding unique index, then this validation can never fail for persisted records, and we could safely skip it.

create_table :users do |t|
  t.string :name
  t.string :email
  t.index :email, unique: true
end

class User < ApplicationRecord
  validates :email, uniqueness: true
end

u = User.create!(email: "user@example.com")

Before

u.update(name: "User")

TRANSACTION (0.2ms)  BEGIN
User Exists? (0.4ms)  SELECT 1 AS one FROM "users" WHERE "users"."email" = $1 AND "users"."id" != $2 LIMIT $3  [["email", "user@example.com"], ["id", 7], ["LIMIT", 1]]
User Update (0.4ms)  UPDATE "users" SET "name" = $1, "updated_at" = $2 WHERE "users"."id" = $3  [["name", "User"], ["updated_at", "2022-05-21 23:54:37.306592"], ["id", 7]]
TRANSACTION (1.3ms)  COMMIT

After

u.update(name: "User")

TRANSACTION (0.2ms)  BEGIN
User Update (0.4ms)  UPDATE "users" SET "name" = $1, "updated_at" = $2 WHERE "users"."id" = $3  [["name", "User"], ["updated_at", "2022-05-21 23:54:37.306592"], ["id", 7]]
TRANSACTION (1.3ms)  COMMIT

If I haven't missed any edge case that can invalidate this approach, then this will greatly reduce the number of queries from frequently updated records with uniqueness validations.

@fatkodima fatkodima force-pushed the uniqueness-validation-skip-query branch 2 times, most recently from 9dcff01 to 71595ba Compare May 22, 2022 10:35
Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent idea, I love it. The implementation need to be refined a bit though.

Comment on lines 69 to 80
def unchanged_and_covered_by_unique_index?(klass, record, attribute)
return false if options[:conditions] || options.key?(:case_sensitive)

scope = Array(options[:scope]).map(&:to_s)
return false unless attributes_scope?(record, scope)

attributes = scope + [attribute.to_s]
return false if attributes.any? { |attribute| record.attribute_changed?(attribute) }

klass.connection.schema_cache.indexes(klass.table_name).any? do |index|
index.unique &&
index.where.nil? &&
index.columns.size <= attributes.size &&
attributes[0, index.columns.size] == index.columns
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of things in here could be computed just once, as it only depends on the schema and the validator options.

The problem is we can't access the schema when the model is first defined, so we'll have to memoize on the first call...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean memoization in validator's instance variable? Is this thread-safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. Thread-safe might mean a lot of different things.

It is indeed subject to a race-condition, but since we're computing a value based on static data, if two threads enter the code path, all it means is that we're a little bit wasteful because we compute the thing twice, but in the end both end up with the same result.

That's for MRI "thanks" to the GVL. Now I'm not certain about JRuby / Truffle, I know they sometimes raise if you access some datastructures concurrently, but I don't remember if it applies to instance variables. https://github.com/jruby/jruby/wiki/Concurrency-in-jruby suggest it's fine.

index.unique &&
index.where.nil? &&
index.columns.size <= attributes.size &&
attributes[0, index.columns.size] == index.columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index doesn't need to be in the same order than attributes. e.g. index [:foo, :bar, :baz] works if attributes == [:bar, :baz, :foo].

I think something like:

columns = index.columns.first(attributes.size)
columns.sort == attributes.sort

It may also make sense to push part of this logic in schema_cache.

@fatkodima fatkodima force-pushed the uniqueness-validation-skip-query branch from 71595ba to aaf24d5 Compare May 22, 2022 21:56
@fatkodima
Copy link
Member Author

Updated.

Comment on lines 75 to 77
return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) }
return true if attributes.any? { |attribute| record.attribute_changed?(attribute) ||
record.read_attribute(attribute).nil? }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done in a single iteration?

Suggested change
return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) }
return true if attributes.any? { |attribute| record.attribute_changed?(attribute) ||
record.read_attribute(attribute).nil? }
return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) ||
record.attribute_changed?(attribute) ||
record.read_attribute(attribute).nil? }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, but it looks more clear to me separately - one iteration checks for the changes and another if a reflection. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably won't matter for performance, but to me with two iterations it would have me asking "Why are there two iterations? Is there some special case I'm missing?"
A seperate method that covers both cases like attribute_validation_needed? might be overkill. So 🤷

I do think we should use a different name for the block variable as attribute mirrors the attribute passed into validation_needed?

scope = Array(options[:scope]).map(&:to_s)
attributes = scope + [attribute.to_s]

return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than fallback on doing a query, we could get the column name from the association, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably unusual, but an association can also be a collection association like has_many.
Should we handle belongs_to specially or better left as is now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an association can also be a collection association like has_many.

Does that even work? I'd assume any association here would be a belongs_to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works, just tested out in the console.

def scope_relation(record, relation)
Array(options[:scope]).each do |scope_item|
scope_value = if record.class._reflect_on_association(scope_item)
record.association(scope_item).reader
else
record.read_attribute(scope_item)
end
relation = relation.where(scope_item => scope_value)
end
relation
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow...

Ok, let's handle belongs_to specifically, and fallback for the rest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, has_many relations work only for :scope, but something like validates_uniqueness_of :comments does not work.
Handling belongs_to specifically leads to some hairy code. I'm thinking, do we even need to care about has_manys in the :scope? While it works accidentally, I can't imagine a use case for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine a use case for this.

Yeah, me neither.

Comment on lines 83 to 85
return @covered_by_unique_index[attribute] if defined?(@covered_by_unique_index)

@covered_by_unique_index = self.attributes.index_with do |attribute|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return @covered_by_unique_index[attribute] if defined?(@covered_by_unique_index)
@covered_by_unique_index = self.attributes.index_with do |attribute|
@covered_by_unique_index ||= self.attributes.index_with do |attribute|

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's probably doesn't matter that much, but instead of index_with, you could do:

@covered ||= attributes.select { |attr| ... }.to_set
@covered.include?(attribute)

end
end

@covered_by_unique_index[attribute]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@covered_by_unique_index[attribute]
@covered_by_unique_index[attribute.to_s]

If you call to_s in the generation code, you need to call it here as well.

@fatkodima fatkodima force-pushed the uniqueness-validation-skip-query branch from aaf24d5 to 910aebe Compare May 23, 2022 21:15
@fatkodima
Copy link
Member Author

Updated with getting column names from belongs_to.
Please take another look.

Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things. Remember to squash your commits as well please.

Comment on lines 10 to 13
* Do not return invalid indexes in PostgreSQL.

*fatkodima*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Do not return invalid indexes in PostgreSQL.
*fatkodima*

🔥

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops 💩

end

def resolve_attributes(record, attributes)
attributes.map do |attribute|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use flat_map here.

Copy link
Member Author

@fatkodima fatkodima May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That should be caught by https://www.rubydoc.info/gems/rubocop/0.40.0/RuboCop/Cop/Performance/FlatMap, which is enabled.
I will investigate why that was not done.

Upd: flat_map is equivalent to flatten(1), but flatten flattens all the levels. So this is why it was correctly ignored.


scope = Array(options[:scope])
attributes = scope + [attribute]
attributes = resolve_attributes(record, attributes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: It annoys me a bit that we have to go through all this on every validation. I kinda wish validators had a proper step to precompute things like this based on the model.

@fatkodima fatkodima force-pushed the uniqueness-validation-skip-query branch from 910aebe to c2bdc6b Compare May 25, 2022 14:44
@fatkodima
Copy link
Member Author

Some minor things. Remember to squash your commits as well please.

Sure, always do this. I was pushing as separate commits to probably help with review of new changes.

@byroot
Copy link
Member

byroot commented May 25, 2022

I was pushing as separate commits to probably help with review of new changes.

I guess it's up to anyone's preference, but on smaller PRs like this I re-review the entire thing every time. Thanks for your consideration though.

@byroot byroot merged commit 1554cf1 into rails:main May 25, 2022
@sasharevzin
Copy link

Is there a way force checking uniqueness in case the record was modified outside of the current context? 🤔

@berniechiu
Copy link
Contributor

This is great. I've always found myself removing unique validation after adding DB uniqueness to avoid extra queries : )

@@ -20,6 +20,8 @@ def validate_each(record, attribute, value)
finder_class = find_finder_class_for(record)
value = map_enum_attribute(finder_class, attribute, value)

return if record.persisted? && !validation_needed?(finder_class, record, attribute)
Copy link
Contributor

@bf4 bf4 Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this idea of validation_needed? which checks

return true if attributes.any? { |attr| record.attribute_changed?(attr) || record.read_attribute(attr).nil? }

could be a signal that there's a useful general validation option missing, something like only_dirty: true such that.

validate :thing, presence: true, only_dirty: true

would short circuit the presence validator on thing unless record.attribute_changed?("thing") and prevent it from checking the if/unless conditionals and continuing to the validate_each itself etc just like allow_blank does

If that seems interesting, I'm up to make the PR

I think it would be useful to have a first class idea of 'this validation applies even at rest' vs. 'this validation applies only on change'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants