Skip to content

Avoid validating belongs_to association if it has not changed #46522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 24, 2022

Conversation

fatkodima
Copy link
Member

@fatkodima fatkodima commented Nov 18, 2022

class User < ApplicationRecord
  has_many :posts
end

class Post < ApplicationRecord
  belongs_to :user
end

belongs_to associations are required by default, so they are checked for presence when the child record is saved. This is done every time the record is updated, even when the parent record hasn't changed.

For example,

post.update!(title: "Rails is the best")
TRANSACTION (0.1ms)  BEGIN
User Load (0.5ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
Post Update (0.3ms)  UPDATE "posts" SET "title" = $1 WHERE "posts"."id" = $2  [["title", "Rails is the best"], ["id", 3]]
TRANSACTION (39.5ms)  COMMIT

If we have a relevant foreign key, there is no need to check for the user (and thus performing an extra SQL query).

After the changes from this PR:

post.update!(title: "Rails is the best")
TRANSACTION (0.1ms)  BEGIN
UPDATE "posts" SET "title" = $1 WHERE "posts"."id" = $2  [["title", "Rails is the best"], ["id", 4]]
TRANSACTION (0.4ms)  COMMIT

Since, from my observation, most records do not change its belongs_to association values during their lifetime, this will greatly reduce the number of database queries for updates. And this will be even more notably for models with multiple belongs_to's.

cc @byroot (since we implemented a similar optimization previously - #45149)

@fatkodima fatkodima force-pushed the belongs_to-avoid-queries branch from 305e63a to 1ff70f8 Compare November 18, 2022 21:49
@rails-bot rails-bot bot added the railties label Nov 18, 2022
@fatkodima fatkodima force-pushed the belongs_to-avoid-queries branch from 1ff70f8 to 6627110 Compare November 18, 2022 22:01
@lazaronixon
Copy link
Contributor

lazaronixon commented Nov 19, 2022

Hey excellent work in both PRs. Would not be better just check for not null ? I'm from the team who doesn't rely on foreign keys, I would like to have this performance gain too.
https://guides.rubyonrails.org/active_record_migrations.html#active-record-and-referential-integrity

end

def marshal_load(array)
@version, @columns, _columns_hash, @primary_keys, @data_sources, @indexes, @database_version = array
@version, @columns, _columns_hash, @primary_keys, @data_sources, @indexes, @foreign_keys, @database_version = array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: I need to dig into what the backward and forward compatibility expectations are for marshaled schema cache.

I remember we want to be able to load a schema cache generated by the N-1 version to make it easy to upgrade, but I don't remember if that only applies to YAML serialization or if it applies to Marshal too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the eager computing changes, I am thinking if we really need this schema cache changes anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should apply for both. We want the cache to be read during the deploy of a new version, so that means that both the current version and the new one will try to read from the same file.

record.public_send("#{name}_changed?") ||
record.read_attribute(foreign_key).nil? ||
reflection.polymorphic? ||
model.connection.schema_cache.foreign_keys(model.table_name).none? { |fk| fk.column.to_s == foreign_key }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could cache (or even eagerly compute) this last expression.

It needs access to the schema cache, so to eagerly compute it we might need to wait for the "schema_load" event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this to_s necessary? I might be wrong by I think ForeignKeyDefinition#column is already a String.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperisfine Thank you for the comments! Updated the PR with eager computation.

@fatkodima fatkodima force-pushed the belongs_to-avoid-queries branch from 6627110 to 3043052 Compare November 21, 2022 13:17
@rafaelfranca
Copy link
Member

I'm curious why only when we have foreign key we can safely skip this validation. Is it because the associated record could be gone from the database? I don't think the reason why we validate this association is to get rid of orphan records, so I'd not even validate it if we already have a value in the id and it didn't change.

@fatkodima
Copy link
Member Author

Is it because the associated record could be gone from the database?

Basically this.

From my understanding, when we validate an association id field (e.g. user_id), we ensure it's value just present. But when we validate an association (e.g. user), we ensure that the associated record is present in the db. This is how it currently works. For safety, this of course should be backed by a foreign key from the db side (like unique index for uniqueness validation).

If we start skipping validation when the association id hasn't changed, we could possibly get orphaned records.
Correct me, if I'm wrong.

@casperisfine
Copy link
Contributor

If we start skipping validation when the association id hasn't changed, we could possibly get orphaned records.

I think this is wasteful. I mean sure you can create this situation easily, but I'd argue it's a bug in your code, and even if it happens, I'm not sure having a validation error on that orphaned record really helps you that much.

This validation is causing a lot of extra queries.

On paper skipping this if a foreign key exist makes sense and is correct, but in practice on large deployments foreign key checks tend to be disabled, this is especially true for MySQL since foreign keys tend not to play well with online-schema change solutions such as LHM, gh-ost, etc.

Now I could get behind to be safe by default, but I'd also see value in disabling this globally.

@fatkodima
Copy link
Member Author

Now I could get behind to be safe by default, but I'd also see value in disabling this globally.

So should I just change this PR to basically ignore the validation if the column hasn't changed? Or need to also introduce some app-wide config option that forces to fully check the association presence?

@casperisfine
Copy link
Contributor

should I just change this PR to basically ignore the validation if the column hasn't changed?

That's what I'd do yes. But maybe this would deemed too backward incompatible?

Hence the app wide config suggestion. But I'd like to hear opinions from others here. @rafaelfranca what do you think?

@rafaelfranca
Copy link
Member

That is what I'd do, yes. We have a patch in one application that does exactly that and we were planning to push to Rails.

I don't think this validation is to avoid orphan records., every time you update an associated record but not change the primary key. If people want to avoid this they should be using a foreign key, or dependent: :destroy. We should validate if the record you are trying to associate exists when you are changing the association.

To easy upgrade we can have a temporary application wide config to keep the old behavior of checking when the association key didn't change.

@fatkodima fatkodima force-pushed the belongs_to-avoid-queries branch from 3043052 to f9cd554 Compare November 24, 2022 02:38
@rails-bot rails-bot bot added the docs label Nov 24, 2022
@fatkodima
Copy link
Member Author

Updated with the suggestions.

@fatkodima fatkodima changed the title Avoid validating belongs_to association if it has not changed and is backed by a foreign key Avoid validating belongs_to association if it has not changed Nov 24, 2022
Copy link
Contributor

@casperisfine casperisfine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the meaning of that config need to be flipped around.

record.read_attribute(foreign_key).nil? ||
record.attribute_changed?(foreign_key) ||
(reflection.polymorphic? && (record.read_attribute(foreign_type).nil? || record.attribute_changed?(foreign_type))) ||
!ActiveRecord.belongs_to_required_validates_foreign_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assume this one is set during boot and won't change. So if it's set, we can just not generate a condition at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the naming suggest that it should be used the other way around:

  • if belongs_to_required_validates_foreign_key == true then we just put no condition, we always validates the presence.
  • if belongs_to_required_validates_foreign_key == false then we only validate presence when the foreign key changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Under belongs_to_required_validates_foreign_key I meant something like "validates foreign key attribute for presence, not the association". But yeah, your interpretation is correct. So flipped it.

config.active_record.belongs_to_required_validates_foreign_key = true
```

and will be enabled by default with `load_defaults 7.1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we were saying it would be disabled by default in 7.1, no?

@fatkodima fatkodima force-pushed the belongs_to-avoid-queries branch from f9cd554 to e5d1514 Compare November 24, 2022 11:06
foreign_key = reflection.foreign_key
foreign_type = reflection.foreign_type

record.read_attribute(foreign_key).nil? ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use _attribute here? I think we can assume that foreign_key and foreign_type are not aliases and are already strings. Right?

Copy link
Contributor

@casperisfine casperisfine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, LGTM now. I don't think the _attribute is a big deal anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants