Drastically increase the performance of DatabaseMetaData.getTypeInfo() #5

kdubb · 2012-09-11T08:03:47Z

When lazy loading type information in TypeInfoCache, load all information for all types in the database instead of just the requested type. This decreased the runtime of DatabaseMetaData.getTypeInfo() from ~27s to less than 1s.

ringerc · 2012-09-20T07:01:07Z

This patch appears to break some basic tests. Tested with:

git remote add kdubb git://github.com/kdubb/pgjdbc.git
git fetch kdubb
git checkout kdubb/DatabaseMetaData_Performance
git rebase master
ant clean test

'master' here is a clean upstream master not any local working branch. The failure persists if I don't rebase against current master.

Tested on JDK 7, Fedora 17, Pg 9.1, ant 1.8.3.

Tests failing:

    [junit] Testcase: testUnknownArrayType(org.postgresql.test.jdbc2.ArrayTest):        Caused an ERROR
    [junit] No results were returned by the query.
    [junit] org.postgresql.util.PSQLException: No results were returned by the query.
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:412)
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:409)
    [junit]     at org.postgresql.jdbc2.AbstractJdbc2Array.getBaseTypeName(AbstractJdbc2Array.java:759)
    [junit]     at org.postgresql.test.jdbc2.ArrayTest.testUnknownArrayType(ArrayTest.java:252)
    [junit] 
    [junit] 
    [junit] Testcase: testNonStandardDelimiter(org.postgresql.test.jdbc2.ArrayTest):    Caused an ERROR
    [junit] No results were returned by the query.
    [junit] org.postgresql.util.PSQLException: No results were returned by the query.
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:412)
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:409)
    [junit]     at org.postgresql.jdbc2.AbstractJdbc2Array.getResultSetImpl(AbstractJdbc2Array.java:818)
    [junit]     at org.postgresql.jdbc2.AbstractJdbc2Array.getResultSet(AbstractJdbc2Array.java:765)
    [junit]     at org.postgresql.test.jdbc2.ArrayTest.testNonStandardDelimiter(ArrayTest.java:398)

ringerc · 2012-09-20T08:31:58Z

@kdubb This isn't ready to merge. Can you test and see if you can reproduce those failures?

kdubb · 2012-09-20T08:33:28Z

I just managed to reproduce them. I will fix them and update soon. My apologies for sending it prematurely.

On Sep 20, 2012, at 1:31 AM, Craig Ringer notifications@github.com wrote:

@kdubb This isn't ready to merge. Can you test and see if you can reproduce those failures?

—
Reply to this email directly or view it on GitHub.

ringerc · 2012-09-20T11:29:41Z

No worries. PgJDBC seems to be surprisingly tricky to work on.

Accidentally inverted the while logic which caused no information to be loaded instead of the wanted behavior of all information loaded

kdubb · 2012-09-20T16:54:30Z

@ringerc I managed a single character typo while implementing one of the the improvements; my favorite kind! It now passes all the tests when compiling with my default JDK's (6 & 7) on OSX.

I believe it also works with java source version set to 1.5 against the master

ringerc · 2012-09-21T13:20:44Z

Thanks for that. I confirm it builds on JDK 1.5. I'm currently unable to merge it because other breakage in the driver is causing it to fail to run against Pg 8.3 (possibly also other untested versions between that and tested-ok 9.1) so I have to fix that before I can test your patch against all the supported Pg versions.

Why the heck did I volunteer for this again? ;-)

francisdb · 2013-03-20T15:19:11Z

Any chance this will be fixed in the near future? This issue makes developing in playframework with jpa/postgresql really slow as the entity manager factory gets created every time after changing the code which in turn causes a call to getTypeInfo that takes 8+sec on my machine.

lordnelson · 2013-05-13T11:53:59Z

Hi @kdubb thanks for your pull request. Would you be able to create some unit tests for your changes? It might help get them into the main tree faster.

valgog · 2013-05-13T12:23:23Z

I was also adding some changes to the same code in #52
Those are addressing a problem of search_path not being taken into account. Maybe we could merge our two changes and get them through into the master finally.

davecramer · 2013-06-11T13:51:02Z

This patch requires some work as it doesn't merge anymore. Any chance you could look at it ?

valgog · 2013-06-11T22:26:24Z

The problem that I have with the approach of loading all the types on connection start, that it can be really very long, to fetch all the types. On the example of my staging database:

staging_db=# EXPLAIN ANALYZE 
SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_type  (cost=0.00..1189.94 rows=23115 width=69) (actual time=0.008..6.660 rows=23106 loops=1)
 Total runtime: 7.504 ms
(2 rows)

staging_db=# \timing
Timing is on.
staging_db=# \copy ( SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type ) to /dev/null 
Time: 715.766 ms

But not many databases have 23K type definitions is the catalog, so I can imagine, that in most cases this approach will bring performance benefits. Especially taking into account, that most of the production deployments are using connection pools and will preload the caches only once, when a new connection is built.

Maybe it makes sense to introduce a parameter, that will instruct to preload the types?

My suggestion would be something like:

preloadTypes=[none,all,search_path]

Here none will keep current behavior, all preload everything as suggested by @kdubb and search_path to preload only the types, that are defined in the schemas, included in the search_path.

In any case, there will be some corner cases, if somebody wants to change the search_path for the connection... but we have that problem with current implementation anyway.

davecramer · 2013-06-11T22:32:56Z

I would think it would make sense to lazy load them ? Certainly I can't see
loading them on connection start unless we put in some kind of controls as
you suggest.

Dave Cramer

On Tue, Jun 11, 2013 at 6:26 PM, Valentine Gogichashvili <
notifications@github.com> wrote:

The problem that I have with the approach of loading all the types on
connection start, that it can be really very long, to fetch all the types.
On the example of my staging database:

staging_db=# EXPLAIN ANALYZE
SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type;

QUERY PLAN

Seq Scan on pg_type (cost=0.00..1189.94 rows=23115 width=69) (actual time=0.008..6.660 rows=23106 loops=1)
Total runtime: 7.504 ms
(2 rows)

staging_db=# \timing
Timing is on.
staging_db=# \copy ( SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type ) to /dev/null
Time: 715.766 ms

But not many databases have 23K type definitions is the catalog, so I can
imaging, that in most cases this approach will bring performance benefits.
Especially taking into account, that most of the production deployments are
using connection pools and will preload the caches only once, when a new
connection is built.

Maybe it makes sense to introduce a parameter, that will instruct to
preload the types?

My suggestion would be something like:

preloadTypes=[none,all,search_path]

Here none will keep current behavior, all preload everything as suggested
by @kdubb https://github.com/kdubb and search_path to preload only the
types, that are defined in the schemas, included in the search_path.

In any case, there will be some corner cases, if somebody wants to change
the search_path for the connection... but we have that problem with
current implementation anyway.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5#issuecomment-19296606
.

valgog · 2013-06-11T22:37:34Z

Yes, I should correct myself of course, this is lazy loading, but if you load all the types the first time you access at least one of them as suggested by this commit, effectively it is start of the connection on a busy system.

davecramer · 2013-06-12T10:57:13Z

Well clearly this requires some kind of ability to turn off and on.

I think your original idea is fine. With none being the default behaviour.

Dave Cramer

On Tue, Jun 11, 2013 at 6:37 PM, Valentine Gogichashvili <
notifications@github.com> wrote:

Yes, I should correct myself of course, this is lazy loading, but if you
load all the types the first time you access at least one of them as
suggested by this commit, effectively it is start of the connection on a
busy system.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5#issuecomment-19297117
.

Fix typo in getPGArrayElement fetch loop

3dc96bc

Accidentally inverted the while logic which caused no information to be loaded instead of the wanted behavior of all information loaded

davecramer closed this Apr 25, 2014

imario42 referenced this pull request in Gordiychuk/pgjdbc Nov 24, 2015

perf: optimize getBigDecimal

33904ef

imario42 mentioned this pull request Nov 24, 2015

perf: improve performance of resultSet.getObject #411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drastically increase the performance of DatabaseMetaData.getTypeInfo() #5

Drastically increase the performance of DatabaseMetaData.getTypeInfo() #5

kdubb commented Sep 11, 2012

ringerc commented Sep 20, 2012

ringerc commented Sep 20, 2012

kdubb commented Sep 20, 2012

ringerc commented Sep 20, 2012

kdubb commented Sep 20, 2012

ringerc commented Sep 21, 2012

francisdb commented Mar 20, 2013

lordnelson commented May 13, 2013

valgog commented May 13, 2013

davecramer commented Jun 11, 2013

valgog commented Jun 11, 2013

davecramer commented Jun 11, 2013

QUERY PLAN

valgog commented Jun 11, 2013

davecramer commented Jun 12, 2013

Drastically increase the performance of DatabaseMetaData.getTypeInfo() #5

Drastically increase the performance of DatabaseMetaData.getTypeInfo() #5

Conversation

kdubb commented Sep 11, 2012

ringerc commented Sep 20, 2012

ringerc commented Sep 20, 2012

kdubb commented Sep 20, 2012

ringerc commented Sep 20, 2012

kdubb commented Sep 20, 2012

ringerc commented Sep 21, 2012

francisdb commented Mar 20, 2013

lordnelson commented May 13, 2013

valgog commented May 13, 2013

davecramer commented Jun 11, 2013

valgog commented Jun 11, 2013

davecramer commented Jun 11, 2013

QUERY PLAN

valgog commented Jun 11, 2013

davecramer commented Jun 12, 2013