Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drastically increase the performance of DatabaseMetaData.getTypeInfo() #5

Closed
wants to merge 2 commits into from

Conversation

kdubb
Copy link
Contributor

@kdubb kdubb commented Sep 11, 2012

When lazy loading type information in TypeInfoCache, load all information for all types in the database instead of just the requested type. This decreased the runtime of DatabaseMetaData.getTypeInfo() from ~27s to less than 1s.

When lazy loading type information in TypeInfoCache, load all information for all types in the database instead of just the requested type.  This decreased the runtime of DatabaseMetaData.getTypeInfo() from ~27s to less than 1s.
@ringerc
Copy link
Member

ringerc commented Sep 20, 2012

This patch appears to break some basic tests. Tested with:

git remote add kdubb git://github.com/kdubb/pgjdbc.git
git fetch kdubb
git checkout kdubb/DatabaseMetaData_Performance
git rebase master
ant clean test

'master' here is a clean upstream master not any local working branch. The failure persists if I don't rebase against current master.

Tested on JDK 7, Fedora 17, Pg 9.1, ant 1.8.3.

Tests failing:

    [junit] Testcase: testUnknownArrayType(org.postgresql.test.jdbc2.ArrayTest):        Caused an ERROR
    [junit] No results were returned by the query.
    [junit] org.postgresql.util.PSQLException: No results were returned by the query.
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:412)
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:409)
    [junit]     at org.postgresql.jdbc2.AbstractJdbc2Array.getBaseTypeName(AbstractJdbc2Array.java:759)
    [junit]     at org.postgresql.test.jdbc2.ArrayTest.testUnknownArrayType(ArrayTest.java:252)
    [junit] 
    [junit] 
    [junit] Testcase: testNonStandardDelimiter(org.postgresql.test.jdbc2.ArrayTest):    Caused an ERROR
    [junit] No results were returned by the query.
    [junit] org.postgresql.util.PSQLException: No results were returned by the query.
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:412)
    [junit]     at org.postgresql.jdbc2.TypeInfoCache.getPGArrayElement(TypeInfoCache.java:409)
    [junit]     at org.postgresql.jdbc2.AbstractJdbc2Array.getResultSetImpl(AbstractJdbc2Array.java:818)
    [junit]     at org.postgresql.jdbc2.AbstractJdbc2Array.getResultSet(AbstractJdbc2Array.java:765)
    [junit]     at org.postgresql.test.jdbc2.ArrayTest.testNonStandardDelimiter(ArrayTest.java:398)

@ringerc
Copy link
Member

ringerc commented Sep 20, 2012

@kdubb This isn't ready to merge. Can you test and see if you can reproduce those failures?

@kdubb
Copy link
Contributor Author

kdubb commented Sep 20, 2012

I just managed to reproduce them. I will fix them and update soon. My apologies for sending it prematurely.

On Sep 20, 2012, at 1:31 AM, Craig Ringer notifications@github.com wrote:

@kdubb This isn't ready to merge. Can you test and see if you can reproduce those failures?


Reply to this email directly or view it on GitHub.

@ringerc
Copy link
Member

ringerc commented Sep 20, 2012

No worries. PgJDBC seems to be surprisingly tricky to work on.

Accidentally inverted the while logic which caused no information to be loaded instead of the wanted behavior of all information loaded
@kdubb
Copy link
Contributor Author

kdubb commented Sep 20, 2012

@ringerc I managed a single character typo while implementing one of the the improvements; my favorite kind! It now passes all the tests when compiling with my default JDK's (6 & 7) on OSX.

I believe it also works with java source version set to 1.5 against the master

@ringerc
Copy link
Member

ringerc commented Sep 21, 2012

Thanks for that. I confirm it builds on JDK 1.5. I'm currently unable to merge it because other breakage in the driver is causing it to fail to run against Pg 8.3 (possibly also other untested versions between that and tested-ok 9.1) so I have to fix that before I can test your patch against all the supported Pg versions.

Why the heck did I volunteer for this again? ;-)

@francisdb
Copy link

Any chance this will be fixed in the near future? This issue makes developing in playframework with jpa/postgresql really slow as the entity manager factory gets created every time after changing the code which in turn causes a call to getTypeInfo that takes 8+sec on my machine.

@lordnelson
Copy link
Contributor

Hi @kdubb thanks for your pull request. Would you be able to create some unit tests for your changes? It might help get them into the main tree faster.

@valgog
Copy link
Contributor

valgog commented May 13, 2013

I was also adding some changes to the same code in #52
Those are addressing a problem of search_path not being taken into account. Maybe we could merge our two changes and get them through into the master finally.

@davecramer
Copy link
Member

This patch requires some work as it doesn't merge anymore. Any chance you could look at it ?

@valgog
Copy link
Contributor

valgog commented Jun 11, 2013

The problem that I have with the approach of loading all the types on connection start, that it can be really very long, to fetch all the types. On the example of my staging database:

staging_db=# EXPLAIN ANALYZE 
SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_type  (cost=0.00..1189.94 rows=23115 width=69) (actual time=0.008..6.660 rows=23106 loops=1)
 Total runtime: 7.504 ms
(2 rows)

staging_db=# \timing
Timing is on.
staging_db=# \copy ( SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type ) to /dev/null 
Time: 715.766 ms

But not many databases have 23K type definitions is the catalog, so I can imagine, that in most cases this approach will bring performance benefits. Especially taking into account, that most of the production deployments are using connection pools and will preload the caches only once, when a new connection is built.

Maybe it makes sense to introduce a parameter, that will instruct to preload the types?

My suggestion would be something like:

preloadTypes=[none,all,search_path]

Here none will keep current behavior, all preload everything as suggested by @kdubb and search_path to preload only the types, that are defined in the schemas, included in the search_path.

In any case, there will be some corner cases, if somebody wants to change the search_path for the connection... but we have that problem with current implementation anyway.

@davecramer
Copy link
Member

I would think it would make sense to lazy load them ? Certainly I can't see
loading them on connection start unless we put in some kind of controls as
you suggest.

Dave Cramer

On Tue, Jun 11, 2013 at 6:26 PM, Valentine Gogichashvili <
notifications@github.com> wrote:

The problem that I have with the approach of loading all the types on
connection start, that it can be really very long, to fetch all the types.
On the example of my staging database:

staging_db=# EXPLAIN ANALYZE
SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type;

QUERY PLAN

Seq Scan on pg_type (cost=0.00..1189.94 rows=23115 width=69) (actual time=0.008..6.660 rows=23106 loops=1)
Total runtime: 7.504 ms
(2 rows)

staging_db=# \timing
Timing is on.
staging_db=# \copy ( SELECT typinput='array_in'::regproc, typtype, typname FROM pg_type ) to /dev/null
Time: 715.766 ms

But not many databases have 23K type definitions is the catalog, so I can
imaging, that in most cases this approach will bring performance benefits.
Especially taking into account, that most of the production deployments are
using connection pools and will preload the caches only once, when a new
connection is built.

Maybe it makes sense to introduce a parameter, that will instruct to
preload the types?

My suggestion would be something like:

preloadTypes=[none,all,search_path]

Here none will keep current behavior, all preload everything as suggested
by @kdubb https://github.com/kdubb and search_path to preload only the
types, that are defined in the schemas, included in the search_path.

In any case, there will be some corner cases, if somebody wants to change
the search_path for the connection... but we have that problem with
current implementation anyway.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5#issuecomment-19296606
.

@valgog
Copy link
Contributor

valgog commented Jun 11, 2013

Yes, I should correct myself of course, this is lazy loading, but if you load all the types the first time you access at least one of them as suggested by this commit, effectively it is start of the connection on a busy system.

@davecramer
Copy link
Member

Well clearly this requires some kind of ability to turn off and on.

I think your original idea is fine. With none being the default behaviour.

Dave Cramer

On Tue, Jun 11, 2013 at 6:37 PM, Valentine Gogichashvili <
notifications@github.com> wrote:

Yes, I should correct myself of course, this is lazy loading, but if you
load all the types the first time you access at least one of them as
suggested by this commit, effectively it is start of the connection on a
busy system.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5#issuecomment-19297117
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants