TypeMap (PG): Store Typemap in Schema Cache and use another cache class to keep it in memory #46409

mochnatiy · 2022-11-02T16:37:35Z

Motivation / Background

This pull request is a rework of the other one after some feedbacks : #44478

In our project, we are using PG and Resque.
Right now, for every Resque job processed and new connection spawned, we are executing this method in Rails that run a costly query :

def load_additional_types(oids = nil)
  initializer = OID::TypeMapInitializer.new(type_map)

  query = <<~SQL
    SELECT t.oid, t.typname, t.typelem, t.typdelim, t.typinput, r.rngsubtype, t.typtype, t.typbasetype
    FROM pg_type as t
    LEFT JOIN pg_range as r ON oid = rngtypid
  SQL
     
  if oids
    query += "WHERE t.oid IN (%s)" % oids.join(", ")
  else
    query += initializer.query_conditions_for_initial_load
  end
      
  execute_and_clear(query, "SCHEMA", []) do |records|
    initializer.run(records)
  end
end

To fix this issue we have done a monkey patch, basically, the ConnectionPool stores a cache of the type_map, and a new connection will inherit from this cache, instead of querying it.

module ActiveRecord
  module ConnectionAdapters
    class PoolManager
      attr_accessor :type_map
    end
  end

  class ConnectionAdapters::ConnectionPool
    def adopt_connection(conn)
      ....
      pool_manager = ConnectionAdapters::ConnectionPool.get_pool_manager

      pool_manager.type_map ||= conn.type_map
      ....
    end
  end

  class ConnectionAdapters::PostgreSQLAdapter
    attr_reader :type_map

    def initialize(connection, logger, connection_parameters, config)
      ....

      pool_manager = ConnectionAdapters::ConnectionPool.get_pool_manager
      type_map = pool_manager.type_map

      if type_map
        @type_map = type_map
      else
        @type_map = Type::HashLookupTypeMap.new
        initialize_type_map
      end
 
      ....
    end
  end
end

It also makes every Rails upgrade as complicated on our side.

In order not to have monkey patches we want to solve it on a Rails side which could be useful for other developers too.

First of all, we have dug in relevant PRs done in Rails and took some ideas.
The solutions to deal with issues maybe:

- Store TypeMap into SchemaCache (relevant for PG only)
- Switch off TypeMap if there is no usage of PG custom types
- Introduce the monkey patch as the solution for Rails

We have selected the first one and would like to have feedback from Rails maintainers and complete this topic.
If there is something we have missed we are also ready to fix it shortly.

Base PRs for this PR:

#35311
https://github.com/rails/rails/pull/39821/files
https://github.com/rails/rails/pull/39077/files
https://github.com/rails/rails/pull/41288/files

I would like to tag also @piecehealth and @ted-hanson

Detail

Currently in this PR we have TypeMap as a singleton (after trying to implement it as an instance variable we had issues with concurrency and now we have it as a single example in memory).

The general approach is to keep the TypeMap in the schema cache, and load from the schema_cache.yml when create a new connection. If TypeMap is not present in schema cache, it will be retrieved from the database.

The idea is still the same. During deploy, we run migrations and create a dump of schema cache. When initializing a Rails app, we load schema cache into a memory and initialize our TypeMapCache instance. Then we don't fire any query and retrieve Typemap from this cache when establish a new connection.

Additional information

Unfortunately, it does not work with lazily_load_schema_cache option enabled since we load schema cache lazily after creating a connection.

Checklist

This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
Tests are added or updated if you fix a bug or add a feature.
CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.
CI is passing.

eileencodes

Thanks for opening this again and for working on it.

I left some comments about how the code should be implemented. It looks like a lot of this code came from #41288 so they should get credit as well as co-author if we do merge this.

In general the main issue with this implementation is the conditional checks on whether we have a postgres adapter. The adapter design is implemented such that conditional checks aren't needed.

While I understand the naming came from other implementations I think we can come up with better method names that are clearer. 🙂

eileencodes · 2022-11-02T16:55:00Z

activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb

@@ -24,6 +24,8 @@ def lazily_set_schema_cache
        return unless ActiveRecord.lazily_load_schema_cache

        cache = SchemaCache.load_from(db_config.lazy_schema_cache_path)
+        PostgreSQL::TypeMapCache.init(cache) if connection.adapter_name == "PostgreSQL"


We shouldn't do this check in the main connection adapter, it should be adapter agnostic. So your prior PR was closer to correct. Looks that the way the methods in the abstract adapter map to methods in the db adapters. Abstract should be the default schema_cache or init_schema_cache whatever we call it, and Postgres should implement the custom one. I'm not really a fan of init_schema_cache as a method name either.

eileencodes · 2022-11-02T16:58:00Z

activerecord/lib/active_record/railtie.rb

@@ -143,9 +143,18 @@ class Railtie < Rails::Railtie # :nodoc:
              schema_cache_path: db_config.schema_cache_path
            )

-            cache = ActiveRecord::ConnectionAdapters::SchemaCache.load_from(filename)
+            cache = if connection_pool.db_config.configuration_hash[:adapter] == "postgresql"
+              ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaCache.load_from(filename)


I left a previous comment about how the schema cache should be done in the adapters. When implemented that way then we don't need to handle conditionals here and can simply call the same ActiveRecord::ConnectionAdapters.load_schema_cache(filename). which will handle loading up the Postgresl specific cache with inheritance.

Note for the future, adapter is available on the db_config directly, we should rarely need to reach into the configuration_hash to access Rails required information.

eileencodes · 2022-11-03T12:49:30Z

activerecord/lib/active_record/connection_adapters/postgresql/schema_cache.rb

+module ActiveRecord
+  module ConnectionAdapters
+    module PostgreSQL
+      class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache


Methods and classes that shouldn't be consumed by applications should be marked as private with a nodoc.

Suggested change

class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache

class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache # :nodoc:

eileencodes · 2022-11-03T12:50:04Z

activerecord/lib/active_record/connection_adapters/postgresql/schema_cache.rb

+          super(conn)
+
+          @additional_type_records = PostgreSQL::TypeMapCache.instance.additional_type_records || []
+          @known_coder_type_records = PostgreSQL::TypeMapCache.instance.known_coder_type_records || []


I don't love these names, they're long and vague. How does additional differ from known? I think we can find better names for these.

activerecord/lib/active_record/connection_adapters/postgresql/type_map_cache.rb

eileencodes · 2022-11-03T12:52:11Z

activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

-            execute_and_clear(query, "SCHEMA", [], allow_retry: true, uses_transaction: false) do |records|
-              initializer.run(records)
+          # Will not work when dumping, a dump file should be recreated on each
+          # schema_cache:dump


I'm not sure what this means. If users need to explicitly dump that should be documented somewhere outside the code. This won't show up in the API docs or guides so it needs to be in a public place.

Hello @eileencodes, do you know where should we modify the documention once it would be approve ?

eileencodes · 2022-11-03T12:53:02Z

activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

+            coders = execute_and_clear(query, "SCHEMA", [], allow_retry: true, uses_transaction: false) do |result|
+              PostgreSQL::TypeMapCache.instance.known_coder_type_records |= result.to_a
+
+              result.filter_map { |row| construct_coder(row, coders_by_name[row["typname"]]) }


Let's move this into its own method that is called from here. It's becoming a bit unwieldy to read.

eileencodes · 2022-11-03T15:11:39Z

It also makes every Rails upgrade as complicated on our side.

Just a note about your patch. It's using entirely private APIs which we can change or remove without deprecation or replacement. You should get the pool from the public connection handler retrieve_connection_pool rather than reaching into the pool manager and then store the type information there.

While ideally in the future the schema cache will be stored on the pool manager level rather than the pool level that's not how it works today so storing the type info on pool manager is guaranteed to keep breaking as I work on moving things around to work better.

mochnatiy · 2022-11-18T09:40:34Z

Hello @eileencodes, thank you very much for the comments, we are on them.

danielvdao · 2023-01-03T18:54:21Z

@mochnatiy Curious about the status of this? Our team has flagged something similar and I would love for this PR to be merged!

Thornolf · 2023-02-07T14:08:58Z

@mochnatiy Curious about the status of this? Our team has flagged something similar and I would love for this PR to be merged!

We are still working on this, but we have technicals issues and some personnals things that were in the way ! Sorry about that 😅

natemontgomery · 2023-02-10T18:25:13Z

Thornolf I would be happy to hop in and help out if you would like. Are the latest changes pushed up here?

…ss to keep it in memory

berniechiu · 2023-07-07T02:12:17Z

👀

beetlegius · 2023-07-21T10:18:37Z

We have the same issue in our services :( would be awesome to include this fix as part of rails

lusinh · 2024-01-15T03:32:56Z

Happy new year and this issue still need to fix

rails-bot bot added the activerecord label Nov 2, 2022

mochnatiy force-pushed the pg-typemap-as-singleton-prod branch 4 times, most recently from d7b9f1e to e38cfa7 Compare November 3, 2022 10:07

eileencodes requested changes Nov 3, 2022

View reviewed changes

mochnatiy force-pushed the pg-typemap-as-singleton-prod branch from e38cfa7 to 90b1f80 Compare February 7, 2023 14:26

TypeMap (PG): Store Typemap in Schema Cache and use another cache cla…

b37a453

…ss to keep it in memory

mochnatiy force-pushed the pg-typemap-as-singleton-prod branch from 90b1f80 to b37a453 Compare March 14, 2023 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeMap (PG): Store Typemap in Schema Cache and use another cache class to keep it in memory #46409

TypeMap (PG): Store Typemap in Schema Cache and use another cache class to keep it in memory #46409

mochnatiy commented Nov 2, 2022

eileencodes left a comment

eileencodes Nov 2, 2022

eileencodes Nov 2, 2022

eileencodes Nov 3, 2022

eileencodes Nov 3, 2022

eileencodes Nov 3, 2022

Thornolf Feb 7, 2023

eileencodes Nov 3, 2022

eileencodes commented Nov 3, 2022

mochnatiy commented Nov 18, 2022

danielvdao commented Jan 3, 2023

Thornolf commented Feb 7, 2023

natemontgomery commented Feb 10, 2023

berniechiu commented Jul 7, 2023

beetlegius commented Jul 21, 2023

lusinh commented Jan 15, 2024

	class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache
	class SchemaCache < ActiveRecord::ConnectionAdapters::SchemaCache # :nodoc:

TypeMap (PG): Store Typemap in Schema Cache and use another cache class to keep it in memory #46409

Are you sure you want to change the base?

TypeMap (PG): Store Typemap in Schema Cache and use another cache class to keep it in memory #46409

Conversation

mochnatiy commented Nov 2, 2022

Motivation / Background

Detail

Additional information

Checklist

eileencodes left a comment

Choose a reason for hiding this comment

eileencodes Nov 2, 2022

Choose a reason for hiding this comment

eileencodes Nov 2, 2022

Choose a reason for hiding this comment

eileencodes Nov 3, 2022

Choose a reason for hiding this comment

eileencodes Nov 3, 2022

Choose a reason for hiding this comment

eileencodes Nov 3, 2022

Choose a reason for hiding this comment

Thornolf Feb 7, 2023

Choose a reason for hiding this comment

eileencodes Nov 3, 2022

Choose a reason for hiding this comment

eileencodes commented Nov 3, 2022

mochnatiy commented Nov 18, 2022

danielvdao commented Jan 3, 2023

Thornolf commented Feb 7, 2023

natemontgomery commented Feb 10, 2023

berniechiu commented Jul 7, 2023

beetlegius commented Jul 21, 2023

lusinh commented Jan 15, 2024