Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeMap (PG): Store Typemap in Schema Cache and use another cache class to keep it in memory #46409

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mochnatiy
Copy link

Motivation / Background

This pull request is a rework of the other one after some feedbacks : #44478

In our project, we are using PG and Resque.
Right now, for every Resque job processed and new connection spawned, we are executing this method in Rails that run a costly query :

def load_additional_types(oids = nil)
  initializer = OID::TypeMapInitializer.new(type_map)

  query = <<~SQL
    SELECT t.oid, t.typname, t.typelem, t.typdelim, t.typinput, r.rngsubtype, t.typtype, t.typbasetype
    FROM pg_type as t
    LEFT JOIN pg_range as r ON oid = rngtypid
  SQL
     
  if oids
    query += "WHERE t.oid IN (%s)" % oids.join(", ")
  else
    query += initializer.query_conditions_for_initial_load
  end
      
  execute_and_clear(query, "SCHEMA", []) do |records|
    initializer.run(records)
  end
end

To fix this issue we have done a monkey patch, basically, the ConnectionPool stores a cache of the type_map, and a new connection will inherit from this cache, instead of querying it.

module ActiveRecord
  module ConnectionAdapters
    class PoolManager
      attr_accessor :type_map
    end
  end

  class ConnectionAdapters::ConnectionPool
    def adopt_connection(conn)
      ....
      pool_manager = ConnectionAdapters::ConnectionPool.get_pool_manager

      pool_manager.type_map ||= conn.type_map
      ....
    end
  end

  class ConnectionAdapters::PostgreSQLAdapter
    attr_reader :type_map

    def initialize(connection, logger, connection_parameters, config)
      ....

      pool_manager = ConnectionAdapters::ConnectionPool.get_pool_manager
      type_map = pool_manager.type_map

      if type_map
        @type_map = type_map
      else
        @type_map = Type::HashLookupTypeMap.new
        initialize_type_map
      end
 
      ....
    end
  end
end

It also makes every Rails upgrade as complicated on our side.

In order not to have monkey patches we want to solve it on a Rails side which could be useful for other developers too.

First of all, we have dug in relevant PRs done in Rails and took some ideas.
The solutions to deal with issues maybe:

- Store TypeMap into SchemaCache (relevant for PG only)
- Switch off TypeMap if there is no usage of PG custom types
- Introduce the monkey patch as the solution for Rails

We have selected the first one and would like to have feedback from Rails maintainers and complete this topic.
If there is something we have missed we are also ready to fix it shortly.

Base PRs for this PR:

#35311
https://github.com/rails/rails/pull/39821/files
https://github.com/rails/rails/pull/39077/files
https://github.com/rails/rails/pull/41288/files

I would like to tag also @piecehealth and @ted-hanson

Detail

Currently in this PR we have TypeMap as a singleton (after trying to implement it as an instance variable we had issues with concurrency and now we have it as a single example in memory).

The general approach is to keep the TypeMap in the schema cache, and load from the schema_cache.yml when create a new connection. If TypeMap is not present in schema cache, it will be retrieved from the database.

The idea is still the same. During deploy, we run migrations and create a dump of schema cache. When initializing a Rails app, we load schema cache into a memory and initialize our TypeMapCache instance. Then we don't fire any query and retrieve Typemap from this cache when establish a new connection.

Additional information

Unfortunately, it does not work with lazily_load_schema_cache option enabled since we load schema cache lazily after creating a connection.

Checklist

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.
  • CI is passing.

@mochnatiy mochnatiy force-pushed the pg-typemap-as-singleton-prod branch 4 times, most recently from d7b9f1e to e38cfa7 Compare November 3, 2022 10:07
Copy link
Member

@eileencodes eileencodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this again and for working on it.

I left some comments about how the code should be implemented. It looks like a lot of this code came from #41288 so they should get credit as well as co-author if we do merge this.

In general the main issue with this implementation is the conditional checks on whether we have a postgres adapter. The adapter design is implemented such that conditional checks aren't needed.

While I understand the naming came from other implementations I think we can come up with better method names that are clearer. 🙂

@@ -24,6 +24,8 @@ def lazily_set_schema_cache
return unless ActiveRecord.lazily_load_schema_cache

cache = SchemaCache.load_from(db_config.lazy_schema_cache_path)
PostgreSQL::TypeMapCache.init(cache) if connection.adapter_name == "PostgreSQL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do this check in the main connection adapter, it should be adapter agnostic. So your prior PR was closer to correct. Looks that the way the methods in the abstract adapter map to methods in the db adapters. Abstract should be the default schema_cache or init_schema_cache whatever we call it, and Postgres should implement the custom one. I'm not really a fan of init_schema_cache as a method name either.

@@ -143,9 +143,18 @@ class Railtie < Rails::Railtie # :nodoc:
schema_cache_path: db_config.schema_cache_path
)

cache = ActiveRecord::ConnectionAdapters::SchemaCache.load_from(filename)
cache = if connection_pool.db_config.configuration_hash[:adapter] == "postgresql"
ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaCache.load_from(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a previous comment about how the schema cache should be done in the adapters. When implemented that way then we don't need to handle conditionals here and can simply call the same ActiveRecord::ConnectionAdapters.load_schema_cache(filename). which will handle loading up the Postgresl specific cache with inheritance.

Note for the future, adapter is available on the db_config directly, we should rarely need to reach into the configuration_hash to access Rails required information.

module ActiveRecord
module ConnectionAdapters
module PostgreSQL
class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods and classes that shouldn't be consumed by applications should be marked as private with a nodoc.

Suggested change
class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache
class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache # :nodoc:

super(conn)

@additional_type_records = PostgreSQL::TypeMapCache.instance.additional_type_records || []
@known_coder_type_records = PostgreSQL::TypeMapCache.instance.known_coder_type_records || []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love these names, they're long and vague. How does additional differ from known? I think we can find better names for these.

execute_and_clear(query, "SCHEMA", [], allow_retry: true, uses_transaction: false) do |records|
initializer.run(records)
# Will not work when dumping, a dump file should be recreated on each
# schema_cache:dump
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means. If users need to explicitly dump that should be documented somewhere outside the code. This won't show up in the API docs or guides so it needs to be in a public place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @eileencodes, do you know where should we modify the documention once it would be approve ?

coders = execute_and_clear(query, "SCHEMA", [], allow_retry: true, uses_transaction: false) do |result|
PostgreSQL::TypeMapCache.instance.known_coder_type_records |= result.to_a

result.filter_map { |row| construct_coder(row, coders_by_name[row["typname"]]) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this into its own method that is called from here. It's becoming a bit unwieldy to read.

@eileencodes
Copy link
Member

It also makes every Rails upgrade as complicated on our side.

Just a note about your patch. It's using entirely private APIs which we can change or remove without deprecation or replacement. You should get the pool from the public connection handler retrieve_connection_pool rather than reaching into the pool manager and then store the type information there.

While ideally in the future the schema cache will be stored on the pool manager level rather than the pool level that's not how it works today so storing the type info on pool manager is guaranteed to keep breaking as I work on moving things around to work better.

@mochnatiy
Copy link
Author

Hello @eileencodes, thank you very much for the comments, we are on them.

@danielvdao
Copy link

@mochnatiy Curious about the status of this? Our team has flagged something similar and I would love for this PR to be merged!

@Thornolf
Copy link
Contributor

Thornolf commented Feb 7, 2023

@mochnatiy Curious about the status of this? Our team has flagged something similar and I would love for this PR to be merged!

We are still working on this, but we have technicals issues and some personnals things that were in the way ! Sorry about that 😅

@natemontgomery
Copy link

Thornolf I would be happy to hop in and help out if you would like. Are the latest changes pushed up here?

@mochnatiy mochnatiy force-pushed the pg-typemap-as-singleton-prod branch from 90b1f80 to b37a453 Compare March 14, 2023 15:40
@berniechiu
Copy link
Contributor

👀

@beetlegius
Copy link

We have the same issue in our services :( would be awesome to include this fix as part of rails

@lusinh
Copy link

lusinh commented Jan 15, 2024

Happy new year and this issue still need to fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants