New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeMap (PG): Store Typemap in Schema Cache and use another cache class to keep it in memory #46409
base: main
Are you sure you want to change the base?
Conversation
d7b9f1e
to
e38cfa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this again and for working on it.
I left some comments about how the code should be implemented. It looks like a lot of this code came from #41288 so they should get credit as well as co-author if we do merge this.
In general the main issue with this implementation is the conditional checks on whether we have a postgres adapter. The adapter design is implemented such that conditional checks aren't needed.
While I understand the naming came from other implementations I think we can come up with better method names that are clearer. 🙂
@@ -24,6 +24,8 @@ def lazily_set_schema_cache | |||
return unless ActiveRecord.lazily_load_schema_cache | |||
|
|||
cache = SchemaCache.load_from(db_config.lazy_schema_cache_path) | |||
PostgreSQL::TypeMapCache.init(cache) if connection.adapter_name == "PostgreSQL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't do this check in the main connection adapter, it should be adapter agnostic. So your prior PR was closer to correct. Looks that the way the methods in the abstract adapter map to methods in the db adapters. Abstract should be the default schema_cache
or init_schema_cache
whatever we call it, and Postgres should implement the custom one. I'm not really a fan of init_schema_cache
as a method name either.
@@ -143,9 +143,18 @@ class Railtie < Rails::Railtie # :nodoc: | |||
schema_cache_path: db_config.schema_cache_path | |||
) | |||
|
|||
cache = ActiveRecord::ConnectionAdapters::SchemaCache.load_from(filename) | |||
cache = if connection_pool.db_config.configuration_hash[:adapter] == "postgresql" | |||
ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaCache.load_from(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a previous comment about how the schema cache should be done in the adapters. When implemented that way then we don't need to handle conditionals here and can simply call the same ActiveRecord::ConnectionAdapters.load_schema_cache(filename)
. which will handle loading up the Postgresl specific cache with inheritance.
Note for the future, adapter
is available on the db_config
directly, we should rarely need to reach into the configuration_hash
to access Rails required information.
module ActiveRecord | ||
module ConnectionAdapters | ||
module PostgreSQL | ||
class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Methods and classes that shouldn't be consumed by applications should be marked as private with a nodoc
.
class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache | |
class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache # :nodoc: |
super(conn) | ||
|
||
@additional_type_records = PostgreSQL::TypeMapCache.instance.additional_type_records || [] | ||
@known_coder_type_records = PostgreSQL::TypeMapCache.instance.known_coder_type_records || [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love these names, they're long and vague. How does additional differ from known? I think we can find better names for these.
activerecord/lib/active_record/connection_adapters/postgresql/type_map_cache.rb
Show resolved
Hide resolved
activerecord/lib/active_record/connection_adapters/postgresql/type_map_cache.rb
Show resolved
Hide resolved
execute_and_clear(query, "SCHEMA", [], allow_retry: true, uses_transaction: false) do |records| | ||
initializer.run(records) | ||
# Will not work when dumping, a dump file should be recreated on each | ||
# schema_cache:dump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this means. If users need to explicitly dump that should be documented somewhere outside the code. This won't show up in the API docs or guides so it needs to be in a public place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @eileencodes, do you know where should we modify the documention once it would be approve ?
coders = execute_and_clear(query, "SCHEMA", [], allow_retry: true, uses_transaction: false) do |result| | ||
PostgreSQL::TypeMapCache.instance.known_coder_type_records |= result.to_a | ||
|
||
result.filter_map { |row| construct_coder(row, coders_by_name[row["typname"]]) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this into its own method that is called from here. It's becoming a bit unwieldy to read.
Just a note about your patch. It's using entirely private APIs which we can change or remove without deprecation or replacement. You should get the pool from the public connection handler retrieve_connection_pool rather than reaching into the pool manager and then store the type information there. While ideally in the future the schema cache will be stored on the pool manager level rather than the pool level that's not how it works today so storing the type info on pool manager is guaranteed to keep breaking as I work on moving things around to work better. |
Hello @eileencodes, thank you very much for the comments, we are on them. |
@mochnatiy Curious about the status of this? Our team has flagged something similar and I would love for this PR to be merged! |
We are still working on this, but we have technicals issues and some personnals things that were in the way ! Sorry about that 😅 |
e38cfa7
to
90b1f80
Compare
Thornolf I would be happy to hop in and help out if you would like. Are the latest changes pushed up here? |
…ss to keep it in memory
90b1f80
to
b37a453
Compare
👀 |
We have the same issue in our services :( would be awesome to include this fix as part of rails |
Happy new year and this issue still need to fix |
Motivation / Background
This pull request is a rework of the other one after some feedbacks : #44478
In our project, we are using PG and Resque.
Right now, for every Resque job processed and new connection spawned, we are executing this method in Rails that run a costly query :
To fix this issue we have done a monkey patch, basically, the
ConnectionPool
stores a cache of the type_map, and a new connection will inherit from this cache, instead of querying it.It also makes every Rails upgrade as complicated on our side.
In order not to have monkey patches we want to solve it on a Rails side which could be useful for other developers too.
First of all, we have dug in relevant PRs done in Rails and took some ideas.
The solutions to deal with issues maybe:
We have selected the first one and would like to have feedback from Rails maintainers and complete this topic.
If there is something we have missed we are also ready to fix it shortly.
Base PRs for this PR:
#35311
https://github.com/rails/rails/pull/39821/files
https://github.com/rails/rails/pull/39077/files
https://github.com/rails/rails/pull/41288/files
I would like to tag also @piecehealth and @ted-hanson
Detail
Currently in this PR we have TypeMap as a singleton (after trying to implement it as an instance variable we had issues with concurrency and now we have it as a single example in memory).
The general approach is to keep the TypeMap in the schema cache, and load from the
schema_cache.yml
when create a new connection. If TypeMap is not present in schema cache, it will be retrieved from the database.The idea is still the same. During deploy, we run migrations and create a dump of schema cache. When initializing a Rails app, we load schema cache into a memory and initialize our
TypeMapCache
instance. Then we don't fire any query and retrieve Typemap from this cache when establish a new connection.Additional information
Unfortunately, it does not work with
lazily_load_schema_cache
option enabled since we load schema cache lazily after creating a connection.Checklist
[Fix #issue-number]