Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply schema cache dump when creating connections #17632

Merged

Conversation

eugeneius
Copy link
Member

The db:schema:cache:dump rake task dumps the database schema structure to db/schema_cache.dump. If this file is present, the schema details are loaded into the currently checked out connection by a railtie while Rails is booting, to avoid having to query the database for its schema.

The schema cache dump is only applied to the initial connection used to boot the application though; other connections from the same pool are created with an empty schema cache, and still have to load the structure of each table directly from the database.

With this change, a copy of the schema cache is associated with the connection pool and applied to connections as they are created.

@rafaelfranca
Copy link
Member

Does this improve the performance of creating a new connection? If so could you share the benchmark?

@dannyfallon
Copy link

Simple benchmark: Pool size set to 20, postgreSQL & mySQL database on localhost, different apps

Benchmark.measure { 19.times { ActiveRecord::Base.connection_pool.checkout } }

screen shot 2014-11-18 at 01 23 11

screen shot 2014-11-18 at 01 47 10

The first result in both tests is without any schema caching. The second run is with schema cache but no patch. The third is with the patch. This is primarily an optimisation/bugfix of the schema cache functionality that exists with some non-obvious benefits.

For example, in mySQL the description of a table is obtained by a SHOW FULL FIELDS query which causes the creation of a temporary table on-disk. In a pool of mySQL connections, say 25, currently 24 of them don't have the schema cache you've generated - each time the app uses a connection to access a table that ActiveRecord hasn't seen before on that connection it will perform that query; that's one SHOW FULL FIELDS per table per connection (Worst case: 24xN queries where N is the table count).

In a heavily-trafficked rails application, with a reasonable pool size, you're pretty much guaranteed to run most (if not all) of those queries at the same time for your core tables, seriously impacting database performance as it locks the tables to answer those description requests. This is where the most benefit from this PR is felt.

@thedarkone
Copy link
Contributor

Do we need to worry about making SchemaCache thread-safe or is schema cache loaded from filename = File.join(app.config.paths["db"].first, "schema_cache.dump") complete and immutable? Note that a simple schema_cache.dup is not enough, because the newly dup-ed schema will still be using all the same internal Hashes (@columns, @columns_hash etc).

@eugeneius eugeneius force-pushed the schema_cache_dump_connection_pool branch 2 times, most recently from 086f935 to 114d6a8 Compare November 21, 2014 08:50
@eugeneius
Copy link
Member Author

@thedarkone the schema cache is populated with every table in the database before it's dumped, so in theory it should be complete and never need to be updated. However my intention was to make a copy anyway, since relying on that assumption seems like a bad idea; I just misunderstood how #dup works.

I've added an implementation of #initialize_dup to SchemaCache which will create copies of the internal hashes. It doesn't deep copy them, so the same Column objects will be shared between multiple schema caches, but as far as I can tell those are immutable.

@rafaelfranca this patch doesn't speed up creating a connection, but it does speed up the first query to each table from a new connection. I'll put together a benchmark to demonstrate this in the next few days.

@eugeneius eugeneius force-pushed the schema_cache_dump_connection_pool branch from 114d6a8 to 7b20f53 Compare November 28, 2014 20:28
@eugeneius
Copy link
Member Author

Here's a benchmark: https://gist.github.com/eugeneius/f2ccf1694d8759df7884#file-benchmark

When you boot a multithreaded server and start sending it traffic, many threads will all need to check out new connections and query tables at the same time. That's what this benchmark is simulating, by clearing the models' column caches and removing all connections from the pool at the start of each iteration.

Without the schema cache dump, this patch has no effect:

$ BUNDLE_GEMFILE=Gemfile_master ./benchmark
Calculating -------------------------------------
                         5.000  i/100ms
-------------------------------------------------
                         54.037  (±20.4%) i/s -    515.000 
$ BUNDLE_GEMFILE=Gemfile_schema_cache_dump_connection_pool ./benchmark
Calculating -------------------------------------
                         5.000  i/100ms
-------------------------------------------------
                         53.825  (±20.4%) i/s -    515.000 

With the schema cache dump in place, the benchmark runs significantly faster:

$ BUNDLE_GEMFILE=Gemfile_master bundle exec rake db:schema:cache:dump
$ BUNDLE_GEMFILE=Gemfile_master ./benchmark
Calculating -------------------------------------
                         5.000  i/100ms
-------------------------------------------------
                         53.681  (±20.5%) i/s -    515.000 
$ BUNDLE_GEMFILE=Gemfile_schema_cache_dump_connection_pool ./benchmark
Calculating -------------------------------------
                         7.000  i/100ms
-------------------------------------------------
                         77.578  (±23.2%) i/s -    735.000 

@eugeneius eugeneius force-pushed the schema_cache_dump_connection_pool branch from 7b20f53 to 7ddc517 Compare November 29, 2014 16:25
@eugeneius eugeneius force-pushed the schema_cache_dump_connection_pool branch from 7ddc517 to b06a83e Compare December 17, 2014 01:01
@eugeneius
Copy link
Member Author

@tenderlove you reviewed this feature originally, what do you think of this change?

@eugeneius eugeneius force-pushed the schema_cache_dump_connection_pool branch from b06a83e to e814913 Compare February 21, 2015 17:01
@eugeneius eugeneius force-pushed the schema_cache_dump_connection_pool branch 3 times, most recently from e0a8631 to 97c8c9e Compare April 29, 2015 09:20
The `db:schema:cache:dump` rake task dumps the database schema structure
to `db/schema_cache.dump`. If this file is present, the schema details
are loaded into the currently checked out connection by a railtie while
Rails is booting, to avoid having to query the database for its schema.

The schema cache dump is only applied to the initial connection used to
boot the application though; other connections from the same pool are
created with an empty schema cache, and still have to load the structure
of each table directly from the database.

With this change, a copy of the schema cache is associated with the
connection pool and applied to connections as they are created.
@eugeneius eugeneius force-pushed the schema_cache_dump_connection_pool branch 3 times, most recently from 9ed3bbc to 33fe7cc Compare April 29, 2015 18:20
@eugeneius
Copy link
Member Author

I've rebased this branch and removed the CHANGELOG entry, since it was the only thing that conflicted. For posterity, here's what I had written:

*   Database schema details dumped using the `db:schema:cache:dump` rake task
    are now applied to all connections, not just the initial connection.

    *Eugene Kenny*

Does anyone have feedback on whether this patch is likely to be accepted, or what changes I need to make for that to happen? I've been running this as a monkey patch in production for over a year now; deploying our app without it causes CPU usage on the database to spike as every new process fetches the schema at once. We can't deploy without it, but I'd like to not have to maintain the monkey patch.

@rafaelfranca
Copy link
Member

This looks good to me but I want to get @tenderlove or @matthewd feedback before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants