Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup define_attribute_methods initializer #48743

Merged
merged 1 commit into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions activerecord/lib/active_record/connection_adapters/schema_cache.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@ def initialize(cache_path, cache = nil)
@cache_path = cache_path
end

def bind(connection)
BoundSchemaReflection.new(self, connection)
end

def set_schema_cache(cache)
@cache = cache
end
Expand Down Expand Up @@ -88,6 +84,18 @@ def clear_data_source_cache!(connection, name)
cache(connection).clear_data_source_cache!(connection, name)
end

def cached?(table_name)
if @cache.nil?
# If `check_schema_cache_dump_version` is enabled we can't load
# the schema cache dump without connecting to the database.
unless self.class.check_schema_cache_dump_version
@cache = load_cache(nil)
end
end

@cache&.cached?(table_name)
end

def dump_to(connection, filename)
fresh_cache = empty_cache
fresh_cache.add_all(connection)
Expand Down Expand Up @@ -152,6 +160,10 @@ def load!
@schema_reflection.load!(@connection)
end

def cached?(table_name)
@schema_reflection.cached?(table_name)
end

def primary_keys(table_name)
@schema_reflection.primary_keys(@connection, table_name)
end
Expand Down Expand Up @@ -288,6 +300,10 @@ def init_with(coder)
end
end

def cached?(table_name)
@columns.key?(table_name)
end

def primary_keys(connection, table_name)
@primary_keys.fetch(table_name) do
if data_source_exists?(connection, table_name)
Expand Down
38 changes: 25 additions & 13 deletions activerecord/lib/active_record/railtie.rb
Original file line number Diff line number Diff line change
Expand Up @@ -171,23 +171,35 @@ class Railtie < Rails::Railtie # :nodoc:
end

initializer "active_record.define_attribute_methods" do |app|
# For resiliency, it is critical that a Rails application should be
# able to boot without depending on the database (or any other service)
# being responsive.
#
# Otherwise a bad deploy adding a lot of load on the database may require to
# entirely shutdown the application so the database can recover before a fixed
# version can be deployed again.
#
# This is why this initializer tries hard not to query the database, and if it
# does, it makes sure to any possible database error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo here @casperisfine

Is it supposed to be?

Suggested change
# does, it makes sure to any possible database error.
# does, it makes sure to rescue any possible database error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in 47d0f4e

check_schema_cache_dump_version = config.active_record.check_schema_cache_dump_version
config.after_initialize do
ActiveSupport.on_load(:active_record) do
if app.config.eager_load
# In development and test we shouldn't eagerly define attribute methods because
# db:test:prepare will trigger later and might change the schema.
if app.config.eager_load && !Rails.env.local?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy with this !Rails.env.local? check, but I couldn't find anything cleaner...

begin
descendants.each do |model|
# If the schema cache was loaded from a dump, we can use it without connecting
if schema_reflection = model.connection_pool.schema_reflection
# TODO: this is dirty, can we find a better way?
schema_reflection = schema_reflection.bind(nil)
elsif model.connected?
# If there's no connection yet, we avoid connecting.
schema_reflection = model.connection.schema_reflection
end

# If the schema cache doesn't have the columns
# hash for the model cached, `define_attribute_methods` would trigger a query.
if schema_reflection && schema_reflection.columns_hash?(model.table_name)
# If the schema cache doesn't have the columns for this model,
# we avoid calling `define_attribute_methods` as it would trigger a query.
#
# However if we're already connected to the database, it's too late so we might
# as well eagerly define the attributes and hope the database timeout is strict enough.
#
# Additionally if `check_schema_cache_dump_version` is enabled, we have to connect to the
# database anyway to load the schema cache dump, so we might as well do it during boot to
# save memory in pre-forking setups and avoid slowness during the first requests post deploy.
schema_reflection = model.connection_pool.schema_reflection
if check_schema_cache_dump_version || schema_reflection.cached?(model.table_name) || model.connected?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperisfine, I'm seeing this trigger DB connections on boot even when db/schema_cache.yml doesn't exist. As check_schema_cache_dump_version is true by default, it seems a DB connection will be triggered nearly all the time here. That seems to contradict the emphasis in the comments of avoiding an excess connection.

It's quite possible I'm missing something, but since the newly added SchemaReflection#cached? also verifies check_schema_cache_dump_version, can this be simplified to if schema_reflection.cached?(model.table_name) || model.connected? ?

Or alternatively, would adding if schema_reflection.possible_cache_available? ... be appropriate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to contradict the emphasis in the comments of avoiding an excess connection.

So I'm not a great writer, but I tries to explain that in the last part of the comment.

There are competing concerns here:

  • For resiliency, apps MUST be able to boot even if the DB is down or unresponsive.
    • Ideally this means not trying to access the database at all.
    • At worst we can rescue errors, and hope the timeout are strict enough in case the DB is unresponsive.
  • For performance, attribute methods SHOULD be defined during boot, so that they are shared in forking setups.

The only way to achieve all the above is to have schema cache dumps (which few users do) and to turn off check_schema_cache_dump_version (which few users do).

So for users that don't have a schema cache persisted we're kinda stuck between a rock and a hard place here. We can either not trigger queries on boot and degrade memory performance and first requests performance, or trigger queries on boot and degrade resiliency.

I chose the later.

Previously we'd only do it if somehow we were already connected, but after reflection I think it's a bad behavior, because if you fix your app to no longer connect on boot, you will suddenly experience a performance degradation.

So yeah, all this is tricky, and as long as schema cache dumps remain hard to integrate, I don't think we can do much better here.

Copy link
Contributor

@zarqman zarqman Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@byroot, thanks for your thoughtful reply!

If I'm understanding correctly, this PR intentionally changes the default Rails behavior to always attempt connecting to the DB at boot, except when a schema cache dump is both present and version checks are disabled. And, if that DB connect fails, it always displays a warning.

Again assuming that's correct, then it seems that the check_schema_cache_dump_version variable is being overloaded as it's now the only way to disable the above behavior. That is, it's behaving more like check_schema_cache_dump_version_or_preload_schema_from_database_anyway.

For background, I discovered this because we have a test in our apps that runs RAILS_ENV=production rails middleware and validates the returned list to help detect unexpected changes. Since it runs as a subprocess of the test suite, it doesn't have access to the production database and has now started displaying the warning from this initializer.

We don't use db/schema_cache.yml at present and can set check_schema_cache_dump_version = false with no problem, but it feels like an unintuitive toggle to be switching relative to the behavior it now modifies.

Would it be worth replacing check_schema_cache_dump_version (just for this initializer) with a new option, named something like preload_schema_cache_during_boot? I would understand the desire to avoid adding yet another option flag to Rails though.

If this new behavior is indeed kept, then it would seem like it warrants a changelog entry?

BTW, while I'm here, thanks so much for your work on many of Rails' gnarlier internals. Often not visible, but still highly appreciated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to always attempt connecting to the DB at boot, except when a schema cache dump is both present and version checks are disabled. And, if that DB connect fails, it always displays a warning.

That is correct.

it doesn't have access to the production database and has now started displaying the warning from this initializer.

I understand it might be a bit annoying in your use case, but I don't think it's a huge deal and is probably valuable for users in general.

If this new behavior is indeed kept

So we've discussed this further with @matthewd. He'll do a more thorough summary of what was said and concluded, but it's likely we'll partially revert to the older behavior here, as in prioritize not connecting to the database (resiliency) over eager loading attributes (performance).

In the end I'm not fully satisfied with either solution. I really wish we didn't have to make such a silly tradeoff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @byroot and I discussed this, and while we both feel that the ideal choice is to define attributes based on the schema cache during boot without touching the database, if we have to choose between connecting early to verify the schema version or deferring attribute definition, retaining the historical sacrifice of performance is probably the safer way to go.

In particular, in my view, that allows check_schema_cache_dump_version to be "do the higher-performance thing, even though that means unchecked trust that my deploy process will provide a valid dump", which feels a bit more natural than always being in performance mode, and having that flag switch on "resilience at the expense of a safety check"1.

Ultimately, only eagerly defining attributes when a schema dump is present and the version check is disabled is Not Great: defining attributes is expensive, and it saves both memory and initial-request runtime to do it at the same time we're eager loading everything else. check_schema_cache_dump_version disabling a valuable part of eager_load is likely surprising... especially for people who've gone to the effort of providing a schema dump.

In the slightly longer term, we should look into other ways to make this work better:

  • store the schema dump inside the database itself
    • connection required, but no chance of a version mismatch, so you can't suddenly lose schema-cache performance during a deploy/rollback
  • populate the schema cache more eagerly when introspecting the database
    • fewer inspection queries, avoid N+1 on models
  • more communicative versioning of schema dumps
    • at minimum, it would be nice to know the schema dump is unchanged after some migrations
  • eagerly define attributes based on the dumped schema, then clear+redefine if we later realise the cache was out of date
    • same [bad] performance during a mismatch, but you get the benefits of full eager loading when the cache is accurate
  • deprecate/remove the version check
    • a well-deployed application shouldn't see bogus schema dumps, and when correctly coordinated, it can even be helpful to adopt a schema change [e.g. removal of a column] before it has actually happened
  • default schema dumping to true
    • as we don't own the deploy process, ensuring this is well-coordinated seems hard; at best it harks back to the complexity of getting asset precompile right

Footnotes

  1. there are serious shortcomings in the schema dump version-checking mechanism, so "safety" might be over-selling it, but that is its intent

model.define_attribute_methods
end
end
Expand Down
29 changes: 29 additions & 0 deletions activerecord/test/cases/connection_adapters/schema_cache_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ def setup
@connection = ARUnit2Model.connection
@cache = new_bound_reflection
@database_version = @connection.get_database_version
@check_schema_cache_dump_version_was = SchemaReflection.check_schema_cache_dump_version
end

def teardown
SchemaReflection.check_schema_cache_dump_version = @check_schema_cache_dump_version_was
end

def new_bound_reflection(connection = @connection)
Expand All @@ -21,6 +26,30 @@ def load_bound_reflection(filename, connection = @connection)
end
end

def test_cached?
cache = new_bound_reflection
assert_not cache.cached?("courses")

cache.columns("courses").size
assert cache.cached?("courses")

tempfile = Tempfile.new(["schema_cache-", ".yml"])
cache.dump_to(tempfile.path)

reflection = SchemaReflection.new(tempfile.path)

# `check_schema_cache_dump_version` forces us to have an active connection
# to load the cache.
assert_not reflection.cached?("courses")

# If we disable it we can load the cache
SchemaReflection.check_schema_cache_dump_version = false
assert reflection.cached?("courses")

cache = BoundSchemaReflection.new(reflection, :__unused_connection__)
assert cache.cached?("courses")
end

def test_yaml_dump_and_load
# Create an empty cache.
cache = new_bound_reflection
Expand Down
15 changes: 3 additions & 12 deletions railties/test/application/configuration_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -406,30 +406,21 @@ class Post < ActiveRecord::Base
assert_not_includes Post.instance_methods, :title
end

test "does not eager load attribute methods in production when the schema cache is empty" do
test "does not eager load attribute methods in production when the schema cache is empty and the database not connected" do
app_file "app/models/post.rb", <<-RUBY
class Post < ActiveRecord::Base
end
RUBY

app_file "config/initializers/active_record.rb", <<-RUBY
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Migration.verbose = false
ActiveRecord::Schema.define(version: 1) do
create_table :posts do |t|
t.string :title
end
end
RUBY

add_to_config <<-RUBY
config.enable_reloading = false
config.eager_load = true
config.active_record.check_schema_cache_dump_version = false
RUBY

app "production"

assert_not_includes Post.instance_methods, :title
assert_not_includes (Post.instance_methods - ActiveRecord::Base.instance_methods), :title
end

test "eager loads attribute methods in production when the schema cache is populated" do
Expand Down
Loading