Introduce optional global TypeInfoCache #3200

Randgalt · 2024-04-10T18:36:40Z

In the current implementation, TypeInfoCache is a per-connection cache. This means, over time, that the same type information is retrieved from the database copious times. For some DB systems, this has a negative cost/performance impact.

Introduce an optional global TypeInfoCache. This is useful for cases where a single DB system is used in Production for all connections and there's no need for each connection to re-retrieve type info.

Extract the storage portion of TypeInfoCache into a new class, TypeInfoCacheStorage
Add new property, GLOBAL_TYPE_INFO_CACHE to enable a global cache
When enabled, the constant TypeInfoCacheStorage GLOBAL_INSTANCE is always used. Over the course of running a Production instance, this global cache should not need to query the database for type info.

All Submissions:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Looks like there are a few PRs dealing with TypeInfoCache but this is different than those AFAICT.

New Feature Submissions:

Does your submission pass tests?
Does ./gradlew styleCheck pass ?
Have you added your new test classes to an existing test suite in alphabetical order?

Changes to Existing Features:

Does this break existing behaviour? If so please explain.
No
Have you added an explanation of what your changes do and why you'd like us to include them?
Yes
Have you written new tests for your core changes, as applicable?
Yes (though more thorough testing might be needed if you want)
Have you successfully run tests with your changes locally?
Yes

In the current implementation, `TypeInfoCache` is a per-connection cache. This means, over time, that the same type information is retrieved from the database copious times. For some DB systems, this has a negative cost/performance impact. Introduce an optional global `TypeInfoCache`. This is useful for cases where a single DB system is used in Production for all connections and there's no need for each connection to re-retrieve type info. - Extract the storage portion of `TypeInfoCache` into a new class, `TypeInfoCacheStorage` - Add new property, `GLOBAL_TYPE_INFO_CACHE` to enable a global cache - When enabled, the constant `TypeInfoCacheStorage GLOBAL_INSTANCE` is always used. Over the course of running a Production instance, this global cache should not need to query the database for type info.

vlsi · 2024-04-11T07:35:35Z

Thank you for the PR, however, I am leaning towards a different implementation:

Refactor TypeCache code (see WIP: rework TypeInfoCache #3062)
Factor common types to a read-only part. In other words, we know common types for the base PostgreSQL, so we could have a single cache with base types plus per-connection types for connection-specific data. I believe it would greatly reduce the memory consumption without having to resort to non-default connection options.

WDYT?

Randgalt · 2024-04-11T07:41:34Z

@vlsi we'd be happy with any solution that solves the issue for us. We're forking internally for now as it's become a big problem (note we use CRDB). Let me know if I can help in any way.

electrum · 2024-04-12T15:42:11Z

@vlsi Note that memory consumption is not a concern for us here. The issue is the cost of repeatedly retrieving the type information. Would your solution avoid fetching the same information multiple times across connections?

vlsi · 2024-04-12T16:29:47Z

Ultimately, I would like to hard-code well-known pg catalog entries for the current DB versions, so query catalog only in case we face an unknown type. WDYT?

davecramer · 2024-04-12T16:43:45Z

Ultimately, I would like to hard-code well-known pg catalog entries for the current DB versions, so query catalog only in case we face an unknown type. WDYT?

I don't see another way to do this. So yes.

vlsi · 2024-04-12T17:49:16Z

I don't see another way to do this. So yes

It might be useful to have something like "global backend unique ID" returned on connect.
If the backend could return its ID, then

we could associate the global caches with the corresponding ID across the connections
we could reuse type information. For instance, if a connection queries type info, the other connections could reuse the results

Currently, as a workaround we could assume host:port would always talk to the same database (==having the same oids), however, it is not always the case as there might be balancers and so on.

davecramer · 2024-04-12T23:31:53Z

I don't see another way to do this. So yes

It might be useful to have something like "global backend unique ID" returned on connect. If the backend could return its ID, then

Interesting. Something else we can ask for in the protocol changes

we could associate the global caches with the corresponding ID across the connections

we could reuse type information. For instance, if a connection queries type info, the other connections could reuse the results

Currently, as a workaround we could assume host:port would always talk to the same database (==having the same oids), however, it is not always the case as there might be balancers and so on.
Good point. Hopefully someone doesn't overwrite a type in a different schema as well

bokken · 2024-04-13T13:41:15Z

2. Factor common types to a read-only part. In other words, we know common types for the base PostgreSQL, so we could have a single cache with base types plus per-connection types for connection-specific data.

That is effectively what I proposed in #1980. Though that pr also addressed how the shared timer is used/managed.

Randgalt force-pushed the jordanz/support-global-type-info-cache branch from 829e5c8 to e4a9de8 Compare April 10, 2024 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce optional global TypeInfoCache #3200

Introduce optional global TypeInfoCache #3200

Randgalt commented Apr 10, 2024

vlsi commented Apr 11, 2024 •

edited

Randgalt commented Apr 11, 2024

electrum commented Apr 12, 2024 •

edited

vlsi commented Apr 12, 2024

davecramer commented Apr 12, 2024

vlsi commented Apr 12, 2024

davecramer commented Apr 12, 2024

bokken commented Apr 13, 2024

Introduce optional global TypeInfoCache #3200

Are you sure you want to change the base?

Introduce optional global TypeInfoCache #3200

Conversation

Randgalt commented Apr 10, 2024

All Submissions:

New Feature Submissions:

Changes to Existing Features:

vlsi commented Apr 11, 2024 • edited

Randgalt commented Apr 11, 2024

electrum commented Apr 12, 2024 • edited

vlsi commented Apr 12, 2024

davecramer commented Apr 12, 2024

vlsi commented Apr 12, 2024

davecramer commented Apr 12, 2024

bokken commented Apr 13, 2024

vlsi commented Apr 11, 2024 •

edited

electrum commented Apr 12, 2024 •

edited