Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup define_attribute_methods initializer #48743

Merged
merged 1 commit into from
Jul 18, 2023

Conversation

casperisfine
Copy link
Contributor

Followup: #48716

model.connection_pool.schema_reflection is never falsy, so that if was pointless.

Instead we more properly check if the schema cache contains that table.

I also added some more comments to explain why the initializer tries so hard not to touch the database.

@matthewd

@casperisfine casperisfine force-pushed the ar-define-attr-methods-init branch 2 times, most recently from 01a079f to 98180d7 Compare July 17, 2023 11:52
@casperisfine
Copy link
Contributor Author

I just noticed that initializer "active_record.check_schema_cache_dump" is basically useless now. I'll see about removing it in a followup.

@casperisfine casperisfine force-pushed the ar-define-attr-methods-init branch 2 times, most recently from 486b1df to 6b98791 Compare July 17, 2023 12:11
@rails-bot rails-bot bot added the railties label Jul 17, 2023
@casperisfine
Copy link
Contributor Author

urk, I ran into a weird problem causing a couple tests to fail.

Now in test environment if CI env is set, we enter the config.eager_load = true path and model attributes are eagerloaded, which in itself is actually probably good.

The problem is that when running tests, the test:prepare task is ran, and one thing is does is to load the schema, so if you had pending migrations, the attribute methods were defined before the schema was loaded...

I'm not sure if anyone rely on the implicit load_schema on CI, but I'd prefer not to break it.

I see two ways out of this:

  • Reset column informations after test:load_schema.
  • Don't eagerly define attributes methods in test environment.

Not sure which is best.

Followup: rails#48716

`model.connection_pool.schema_reflection` is never falsy, so that `if`
was pointless.

Instead we more properly check if the schema cache contains that table.

I also added some more comments to explain why the initializer tries
so hard not to touch the database.
if app.config.eager_load
# In development and test we shouldn't eagerly define attribute methods because
# db:test:prepare will trigger later and might change the schema.
if app.config.eager_load && !Rails.env.local?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy with this !Rails.env.local? check, but I couldn't find anything cleaner...

@byroot byroot merged commit 835eb8a into rails:main Jul 18, 2023
9 checks passed
# version can be deployed again.
#
# This is why this initializer tries hard not to query the database, and if it
# does, it makes sure to any possible database error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo here @casperisfine

Is it supposed to be?

Suggested change
# does, it makes sure to any possible database error.
# does, it makes sure to rescue any possible database error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in 47d0f4e

# database anyway to load the schema cache dump, so we might as well do it during boot to
# save memory in pre-forking setups and avoid slowness during the first requests post deploy.
schema_reflection = model.connection_pool.schema_reflection
if check_schema_cache_dump_version || schema_reflection.cached?(model.table_name) || model.connected?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperisfine, I'm seeing this trigger DB connections on boot even when db/schema_cache.yml doesn't exist. As check_schema_cache_dump_version is true by default, it seems a DB connection will be triggered nearly all the time here. That seems to contradict the emphasis in the comments of avoiding an excess connection.

It's quite possible I'm missing something, but since the newly added SchemaReflection#cached? also verifies check_schema_cache_dump_version, can this be simplified to if schema_reflection.cached?(model.table_name) || model.connected? ?

Or alternatively, would adding if schema_reflection.possible_cache_available? ... be appropriate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to contradict the emphasis in the comments of avoiding an excess connection.

So I'm not a great writer, but I tries to explain that in the last part of the comment.

There are competing concerns here:

  • For resiliency, apps MUST be able to boot even if the DB is down or unresponsive.
    • Ideally this means not trying to access the database at all.
    • At worst we can rescue errors, and hope the timeout are strict enough in case the DB is unresponsive.
  • For performance, attribute methods SHOULD be defined during boot, so that they are shared in forking setups.

The only way to achieve all the above is to have schema cache dumps (which few users do) and to turn off check_schema_cache_dump_version (which few users do).

So for users that don't have a schema cache persisted we're kinda stuck between a rock and a hard place here. We can either not trigger queries on boot and degrade memory performance and first requests performance, or trigger queries on boot and degrade resiliency.

I chose the later.

Previously we'd only do it if somehow we were already connected, but after reflection I think it's a bad behavior, because if you fix your app to no longer connect on boot, you will suddenly experience a performance degradation.

So yeah, all this is tricky, and as long as schema cache dumps remain hard to integrate, I don't think we can do much better here.

Copy link
Contributor

@zarqman zarqman Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@byroot, thanks for your thoughtful reply!

If I'm understanding correctly, this PR intentionally changes the default Rails behavior to always attempt connecting to the DB at boot, except when a schema cache dump is both present and version checks are disabled. And, if that DB connect fails, it always displays a warning.

Again assuming that's correct, then it seems that the check_schema_cache_dump_version variable is being overloaded as it's now the only way to disable the above behavior. That is, it's behaving more like check_schema_cache_dump_version_or_preload_schema_from_database_anyway.

For background, I discovered this because we have a test in our apps that runs RAILS_ENV=production rails middleware and validates the returned list to help detect unexpected changes. Since it runs as a subprocess of the test suite, it doesn't have access to the production database and has now started displaying the warning from this initializer.

We don't use db/schema_cache.yml at present and can set check_schema_cache_dump_version = false with no problem, but it feels like an unintuitive toggle to be switching relative to the behavior it now modifies.

Would it be worth replacing check_schema_cache_dump_version (just for this initializer) with a new option, named something like preload_schema_cache_during_boot? I would understand the desire to avoid adding yet another option flag to Rails though.

If this new behavior is indeed kept, then it would seem like it warrants a changelog entry?

BTW, while I'm here, thanks so much for your work on many of Rails' gnarlier internals. Often not visible, but still highly appreciated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to always attempt connecting to the DB at boot, except when a schema cache dump is both present and version checks are disabled. And, if that DB connect fails, it always displays a warning.

That is correct.

it doesn't have access to the production database and has now started displaying the warning from this initializer.

I understand it might be a bit annoying in your use case, but I don't think it's a huge deal and is probably valuable for users in general.

If this new behavior is indeed kept

So we've discussed this further with @matthewd. He'll do a more thorough summary of what was said and concluded, but it's likely we'll partially revert to the older behavior here, as in prioritize not connecting to the database (resiliency) over eager loading attributes (performance).

In the end I'm not fully satisfied with either solution. I really wish we didn't have to make such a silly tradeoff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @byroot and I discussed this, and while we both feel that the ideal choice is to define attributes based on the schema cache during boot without touching the database, if we have to choose between connecting early to verify the schema version or deferring attribute definition, retaining the historical sacrifice of performance is probably the safer way to go.

In particular, in my view, that allows check_schema_cache_dump_version to be "do the higher-performance thing, even though that means unchecked trust that my deploy process will provide a valid dump", which feels a bit more natural than always being in performance mode, and having that flag switch on "resilience at the expense of a safety check"1.

Ultimately, only eagerly defining attributes when a schema dump is present and the version check is disabled is Not Great: defining attributes is expensive, and it saves both memory and initial-request runtime to do it at the same time we're eager loading everything else. check_schema_cache_dump_version disabling a valuable part of eager_load is likely surprising... especially for people who've gone to the effort of providing a schema dump.

In the slightly longer term, we should look into other ways to make this work better:

  • store the schema dump inside the database itself
    • connection required, but no chance of a version mismatch, so you can't suddenly lose schema-cache performance during a deploy/rollback
  • populate the schema cache more eagerly when introspecting the database
    • fewer inspection queries, avoid N+1 on models
  • more communicative versioning of schema dumps
    • at minimum, it would be nice to know the schema dump is unchanged after some migrations
  • eagerly define attributes based on the dumped schema, then clear+redefine if we later realise the cache was out of date
    • same [bad] performance during a mismatch, but you get the benefits of full eager loading when the cache is accurate
  • deprecate/remove the version check
    • a well-deployed application shouldn't see bogus schema dumps, and when correctly coordinated, it can even be helpful to adopt a schema change [e.g. removal of a column] before it has actually happened
  • default schema dumping to true
    • as we don't own the deploy process, ensuring this is well-coordinated seems hard; at best it harks back to the complexity of getting asset precompile right

Footnotes

  1. there are serious shortcomings in the schema dump version-checking mechanism, so "safety" might be over-selling it, but that is its intent

casperisfine pushed a commit to Shopify/rails that referenced this pull request Jul 24, 2023
Followup: rails#48743

After careful consideration, unless users have a schema cache dump loaded
and `check_schema_cache_dump_version = false`, we have no choice but
to arbitrate between resiliency and performance.

If we define attribute methods during boot, we allow them to be shared
between forked workers, and prevent the first requests to be slower.

However, by doing so we'd trigger a connection to the datase, which
if it's unresponsive could lead to workers not being able to restart
triggering a cascading failure.

Previously Rails used to be in some sort of middle-ground where
it would define attribute methods during boot if for some reason
it was already connected to the database at this point.

But this is now deemed undesirable, as an application initializer
inadvertantly establishing the database connection woudl lead to
a readically different behavior.
paulreece pushed a commit to paulreece/rails-paulreece that referenced this pull request Aug 26, 2023
Followup: rails#48743

After careful consideration, unless users have a schema cache dump loaded
and `check_schema_cache_dump_version = false`, we have no choice but
to arbitrate between resiliency and performance.

If we define attribute methods during boot, we allow them to be shared
between forked workers, and prevent the first requests to be slower.

However, by doing so we'd trigger a connection to the datase, which
if it's unresponsive could lead to workers not being able to restart
triggering a cascading failure.

Previously Rails used to be in some sort of middle-ground where
it would define attribute methods during boot if for some reason
it was already connected to the database at this point.

But this is now deemed undesirable, as an application initializer
inadvertantly establishing the database connection woudl lead to
a readically different behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants