Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce optional global TypeInfoCache #3200

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Randgalt
Copy link

In the current implementation, TypeInfoCache is a per-connection cache. This means, over time, that the same type information is retrieved from the database copious times. For some DB systems, this has a negative cost/performance impact.

Introduce an optional global TypeInfoCache. This is useful for cases where a single DB system is used in Production for all connections and there's no need for each connection to re-retrieve type info.

  • Extract the storage portion of TypeInfoCache into a new class, TypeInfoCacheStorage
  • Add new property, GLOBAL_TYPE_INFO_CACHE to enable a global cache
  • When enabled, the constant TypeInfoCacheStorage GLOBAL_INSTANCE is always used. Over the course of running a Production instance, this global cache should not need to query the database for type info.

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Looks like there are a few PRs dealing with TypeInfoCache but this is different than those AFAICT.

New Feature Submissions:

  1. Does your submission pass tests?
  2. Does ./gradlew styleCheck pass ?
  3. Have you added your new test classes to an existing test suite in alphabetical order?

Changes to Existing Features:

  • Does this break existing behaviour? If so please explain.
    No

  • Have you added an explanation of what your changes do and why you'd like us to include them?
    Yes

  • Have you written new tests for your core changes, as applicable?
    Yes (though more thorough testing might be needed if you want)

  • Have you successfully run tests with your changes locally?
    Yes

In the current implementation, `TypeInfoCache` is a per-connection
cache. This means, over time, that the same type information is
retrieved from the database copious times. For some DB systems, this
has a negative cost/performance impact.

Introduce an optional global `TypeInfoCache`. This is useful for
cases where a single DB system is used in Production for all
connections and there's no need for each connection to re-retrieve
type info.

- Extract the storage portion of `TypeInfoCache` into a new class,
`TypeInfoCacheStorage`
- Add new property, `GLOBAL_TYPE_INFO_CACHE` to enable a global cache
- When enabled, the constant `TypeInfoCacheStorage GLOBAL_INSTANCE` is
always used. Over the course of running a Production instance, this global
cache should not need to query the database for type info.
@Randgalt Randgalt force-pushed the jordanz/support-global-type-info-cache branch from 829e5c8 to e4a9de8 Compare April 10, 2024 18:39
@vlsi
Copy link
Member

vlsi commented Apr 11, 2024

Thank you for the PR, however, I am leaning towards a different implementation:

  1. Refactor TypeCache code (see WIP: rework TypeInfoCache #3062)
  2. Factor common types to a read-only part. In other words, we know common types for the base PostgreSQL, so we could have a single cache with base types plus per-connection types for connection-specific data. I believe it would greatly reduce the memory consumption without having to resort to non-default connection options.

WDYT?

@Randgalt
Copy link
Author

@vlsi we'd be happy with any solution that solves the issue for us. We're forking internally for now as it's become a big problem (note we use CRDB). Let me know if I can help in any way.

@electrum
Copy link

electrum commented Apr 12, 2024

@vlsi Note that memory consumption is not a concern for us here. The issue is the cost of repeatedly retrieving the type information. Would your solution avoid fetching the same information multiple times across connections?

@vlsi
Copy link
Member

vlsi commented Apr 12, 2024

Ultimately, I would like to hard-code well-known pg catalog entries for the current DB versions, so query catalog only in case we face an unknown type. WDYT?

@davecramer
Copy link
Member

Ultimately, I would like to hard-code well-known pg catalog entries for the current DB versions, so query catalog only in case we face an unknown type. WDYT?

I don't see another way to do this. So yes.

@vlsi
Copy link
Member

vlsi commented Apr 12, 2024

I don't see another way to do this. So yes

It might be useful to have something like "global backend unique ID" returned on connect.
If the backend could return its ID, then

  • we could associate the global caches with the corresponding ID across the connections
  • we could reuse type information. For instance, if a connection queries type info, the other connections could reuse the results

Currently, as a workaround we could assume host:port would always talk to the same database (==having the same oids), however, it is not always the case as there might be balancers and so on.

@davecramer
Copy link
Member

I don't see another way to do this. So yes

It might be useful to have something like "global backend unique ID" returned on connect. If the backend could return its ID, then

Interesting. Something else we can ask for in the protocol changes

  • we could associate the global caches with the corresponding ID across the connections
  • we could reuse type information. For instance, if a connection queries type info, the other connections could reuse the results

Currently, as a workaround we could assume host:port would always talk to the same database (==having the same oids), however, it is not always the case as there might be balancers and so on.
Good point. Hopefully someone doesn't overwrite a type in a different schema as well

@bokken
Copy link
Member

bokken commented Apr 13, 2024

2. Factor common types to a read-only part. In other words, we know common types for the base PostgreSQL, so we could have a single cache with base types plus per-connection types for connection-specific data.

That is effectively what I proposed in #1980. Though that pr also addressed how the shared timer is used/managed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants