-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid validating a unique field if it has not changed and is backed by a unique index #45149
Conversation
9dcff01
to
71595ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent idea, I love it. The implementation need to be refined a bit though.
def unchanged_and_covered_by_unique_index?(klass, record, attribute) | ||
return false if options[:conditions] || options.key?(:case_sensitive) | ||
|
||
scope = Array(options[:scope]).map(&:to_s) | ||
return false unless attributes_scope?(record, scope) | ||
|
||
attributes = scope + [attribute.to_s] | ||
return false if attributes.any? { |attribute| record.attribute_changed?(attribute) } | ||
|
||
klass.connection.schema_cache.indexes(klass.table_name).any? do |index| | ||
index.unique && | ||
index.where.nil? && | ||
index.columns.size <= attributes.size && | ||
attributes[0, index.columns.size] == index.columns | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of things in here could be computed just once, as it only depends on the schema and the validator options.
The problem is we can't access the schema when the model is first defined, so we'll have to memoize on the first call...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean memoization in validator's instance variable? Is this thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well. Thread-safe might mean a lot of different things.
It is indeed subject to a race-condition, but since we're computing a value based on static data, if two threads enter the code path, all it means is that we're a little bit wasteful because we compute the thing twice, but in the end both end up with the same result.
That's for MRI "thanks" to the GVL. Now I'm not certain about JRuby / Truffle, I know they sometimes raise if you access some datastructures concurrently, but I don't remember if it applies to instance variables. https://github.com/jruby/jruby/wiki/Concurrency-in-jruby suggest it's fine.
index.unique && | ||
index.where.nil? && | ||
index.columns.size <= attributes.size && | ||
attributes[0, index.columns.size] == index.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index doesn't need to be in the same order than attributes
. e.g. index [:foo, :bar, :baz]
works if attributes == [:bar, :baz, :foo]
.
I think something like:
columns = index.columns.first(attributes.size)
columns.sort == attributes.sort
It may also make sense to push part of this logic in schema_cache
.
71595ba
to
aaf24d5
Compare
Updated. |
return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) } | ||
return true if attributes.any? { |attribute| record.attribute_changed?(attribute) || | ||
record.read_attribute(attribute).nil? } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be done in a single iteration?
return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) } | |
return true if attributes.any? { |attribute| record.attribute_changed?(attribute) || | |
record.read_attribute(attribute).nil? } | |
return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) || | |
record.attribute_changed?(attribute) || | |
record.read_attribute(attribute).nil? } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, but it looks more clear to me separately - one iteration checks for the changes and another if a reflection. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably won't matter for performance, but to me with two iterations it would have me asking "Why are there two iterations? Is there some special case I'm missing?"
A seperate method that covers both cases like attribute_validation_needed?
might be overkill. So 🤷
I do think we should use a different name for the block variable as attribute
mirrors the attribute
passed into validation_needed?
scope = Array(options[:scope]).map(&:to_s) | ||
attributes = scope + [attribute.to_s] | ||
|
||
return true if attributes.any? { |attribute| record.class._reflect_on_association(attribute) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than fallback on doing a query, we could get the column name from the association, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably unusual, but an association can also be a collection association like has_many
.
Should we handle belongs_to
specially or better left as is now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an association can also be a collection association like has_many.
Does that even work? I'd assume any association here would be a belongs_to
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works, just tested out in the console.
rails/activerecord/lib/active_record/validations/uniqueness.rb
Lines 85 to 96 in fb721b5
def scope_relation(record, relation) | |
Array(options[:scope]).each do |scope_item| | |
scope_value = if record.class._reflect_on_association(scope_item) | |
record.association(scope_item).reader | |
else | |
record.read_attribute(scope_item) | |
end | |
relation = relation.where(scope_item => scope_value) | |
end | |
relation | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow...
Ok, let's handle belongs_to
specifically, and fallback for the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, has_many
relations work only for :scope
, but something like validates_uniqueness_of :comments
does not work.
Handling belongs_to
specifically leads to some hairy code. I'm thinking, do we even need to care about has_many
s in the :scope
? While it works accidentally, I can't imagine a use case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine a use case for this.
Yeah, me neither.
return @covered_by_unique_index[attribute] if defined?(@covered_by_unique_index) | ||
|
||
@covered_by_unique_index = self.attributes.index_with do |attribute| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return @covered_by_unique_index[attribute] if defined?(@covered_by_unique_index) | |
@covered_by_unique_index = self.attributes.index_with do |attribute| | |
@covered_by_unique_index ||= self.attributes.index_with do |attribute| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it's probably doesn't matter that much, but instead of index_with
, you could do:
@covered ||= attributes.select { |attr| ... }.to_set
@covered.include?(attribute)
end | ||
end | ||
|
||
@covered_by_unique_index[attribute] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@covered_by_unique_index[attribute] | |
@covered_by_unique_index[attribute.to_s] |
If you call to_s
in the generation code, you need to call it here as well.
aaf24d5
to
910aebe
Compare
Updated with getting column names from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things. Remember to squash your commits as well please.
activerecord/CHANGELOG.md
Outdated
* Do not return invalid indexes in PostgreSQL. | ||
|
||
*fatkodima* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Do not return invalid indexes in PostgreSQL. | |
*fatkodima* |
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops 💩
end | ||
|
||
def resolve_attributes(record, attributes) | ||
attributes.map do |attribute| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use flat_map
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 That should be caught by https://www.rubydoc.info/gems/rubocop/0.40.0/RuboCop/Cop/Performance/FlatMap, which is enabled.
I will investigate why that was not done.
Upd: flat_map
is equivalent to flatten(1)
, but flatten
flattens all the levels. So this is why it was correctly ignored.
|
||
scope = Array(options[:scope]) | ||
attributes = scope + [attribute] | ||
attributes = resolve_attributes(record, attributes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: It annoys me a bit that we have to go through all this on every validation. I kinda wish validators had a proper step to precompute things like this based on the model.
910aebe
to
c2bdc6b
Compare
Sure, always do this. I was pushing as separate commits to probably help with review of new changes. |
I guess it's up to anyone's preference, but on smaller PRs like this I re-review the entire thing every time. Thanks for your consideration though. |
Is there a way force checking uniqueness in case the record was modified outside of the current context? 🤔 |
This is great. I've always found myself removing unique validation after adding DB uniqueness to avoid extra queries : ) |
@@ -20,6 +20,8 @@ def validate_each(record, attribute, value) | |||
finder_class = find_finder_class_for(record) | |||
value = map_enum_attribute(finder_class, attribute, value) | |||
|
|||
return if record.persisted? && !validation_needed?(finder_class, record, attribute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this idea of validation_needed?
which checks
return true if attributes.any? { |attr| record.attribute_changed?(attr) || record.read_attribute(attr).nil? }
could be a signal that there's a useful general validation option missing, something like only_dirty: true
such that.
validate :thing, presence: true, only_dirty: true
would short circuit the presence validator on thing unless record.attribute_changed?("thing") and prevent it from checking the if/unless conditionals and continuing to the validate_each itself etc just like allow_blank does
If that seems interesting, I'm up to make the PR
I think it would be useful to have a first class idea of 'this validation applies even at rest' vs. 'this validation applies only on change'
Previously, when saving a record, ActiveRecord will perform an extra query to check for the uniqueness of each attribute having a
uniqueness
validation, even if that attribute hasn't changed.If the database has the corresponding unique index, then this validation can never fail for persisted records, and we could safely skip it.
Before
After
If I haven't missed any edge case that can invalidate this approach, then this will greatly reduce the number of queries from frequently updated records with uniqueness validations.