New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache result set column mapping for prepared statements #614

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@Chrriis
Contributor

Chrriis commented Aug 3, 2016

No description provided.

@codecov-io

This comment has been minimized.

codecov-io commented Aug 3, 2016

Current coverage is 58.17% (diff: 90.00%)

Merging #614 into master will increase coverage by 0.04%

@@             master       #614   diff @@
==========================================
  Files           148        148          
  Lines         15616      15630    +14   
  Methods           0          0          
  Messages          0          0          
  Branches       3091       3094     +3   
==========================================
+ Hits           9078       9093    +15   
  Misses         5307       5307          
+ Partials       1231       1230     -1   

Powered by Codecov. Last update 22c72b4...2bba94f

@Chrriis

This comment has been minimized.

Contributor

Chrriis commented Aug 3, 2016

This PR tries to fix #607.

@vlsi I have a few comments based on your remarks.

PgStatement now contains the logic to create the result set column map. The subclass PgPreparedStatement adds cache around it because columns do not change.
I was wondering: PgCallableStatement inherits PgPreparedStatement. Could that cause a problem?

The cache itself is in the CachedQuery. I was not sure about putting it in org.postgresql.core.v3.SimpleQuery because then I would have to either cast preparedQuery.query to a SimpleQuery (is it always v3?) or expose getter/setter in the super interface and pollute all the implementations with setters that do not make any sense (either no-op or throwing an exception). Moreover, I am not sure the life cycle of this cache follows the life cycle of other fields of that class (cf. unprepare())

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 3, 2016

expose getter/setter in the super interface

Only a caching getter is required. Setter is not required.

is it always v3

I'm not sure. However, caching resultset fields should be done in org.postgresql.core.v3.SimpleQuery.

In particular org.postgresql.core.v3.SimpleQuery#setFields is where database-provided fields are stored.
So setFields and org.postgresql.core.v3.SimpleQuery#unprepare would be good places to reset the cached resultset fields.

I was not sure about putting it in org.postgresql.core.v3.SimpleQuery because then I would have to either cast preparedQuery.query to a SimpleQuery (is it always v3?)

Technically speaking, org.postgresql.jdbc.PgResultSet#originalQuery refers to some query.
I think it would be fine to add (if instanceof SimpleQuery, then grab column positions from there)

@Chrriis

This comment has been minimized.

Contributor

Chrriis commented Aug 3, 2016

I think it would be fine to add (if instanceof SimpleQuery

So, the code that creates and initializes the map would be duplicated (a few lines, using a connection property and the list of fields): in ResultSet when it is not a SimpleQuery and in the SimpleQuery cache getter. Do you have a helper class where to place such factory method?
An alternate way is for the query to return its empty map (not yet initialized) and the result set populates it, but I am not sure I like the idea of using emptyness as an indicator that the cache should be filled.

Originally, the cache was supposed to be for PgPreparedStatement only. But SimpleQuery is probably used for plain statements too. I guess it won't cause any problems to cache in that case too, because unprepare() is going to clean the mapping for next execution, right?

I also wonder about PgCallableStatement: is it fine to apply the same logic than for PgPreparedStatement?

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 3, 2016

Do you have a helper class where to place such factory method?

What if a connection would be such a class?
Or just PgResultSet

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 3, 2016

because unprepare() is going to clean the mapping for next execution, right?

I think so

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 3, 2016

I also wonder about PgCallableStatement: is it fine to apply the same logic than for PgPreparedStatement?

I think so.

@Chrriis

This comment has been minimized.

Contributor

Chrriis commented Aug 3, 2016

PgResultSet cannot access SimpleQuery (it is package private). I don't think it is wise to make it public. I can add the method to get the map to Query; other classes would return a new map every time.

@Chrriis

This comment has been minimized.

Contributor

Chrriis commented Aug 3, 2016

... or rather return null, meaning cache is not supported and the result set has to create a proper map.

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 3, 2016

. or rather return null, meaning cache is not supported and the result set has to create a proper map.

This sounds reasonable.

@Chrriis

This comment has been minimized.

Contributor

Chrriis commented Aug 3, 2016

If setFields(Fields[]) and unprepare() clear the cache, then the cache has no effect until the statement gets prepared. Indeed, these 2 methods are called in QueryExecutorImpl.sendParse(xx).
Is that acceptable or should we manage to retain the cache for prepared statements from first call somehow?

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 3, 2016

Is that acceptable or should we manage to retain the cache for prepared statements from first call somehow?

It might deserve some benchmarking.
If we try to keep the map cached, then we would need to check if column names are intact (that is cache old Field[] array, and compare with new one on setFields or array of String[] columnNames).
I'm not sure how much that array equals would cost.

Any ideas?

@Chrriis

This comment has been minimized.

Contributor

Chrriis commented Aug 3, 2016

I think that the optimization is worth for statements that are executed many times, in which case it is going to be named. So as a first implementation, it is probably not worth trying to retain from first call. At least, this is what I am currently doing. We can change later on for more complicated logic if we feel it is necessary.

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 3, 2016

@> in which case it is going to be named.

The sad thing is lots of users apply "never use server-prepared" mode due to various limitations.
For instance: pgbouncer (it does not support server-prepared yet), various application issues (e.g. someone does pg_dump right in the middle of jdbc transation), etc.

So, it would be nice to have that String[] columnNames cache tested (just to check if there is a room for improvement).

So as a first implementation,

Covering just "server-prepared" in the first implementation would be fine.

@Chrriis Chrriis closed this Aug 3, 2016

@Chrriis Chrriis force-pushed the Chrriis:CachedResultSetColumnMapping branch from 03ba513 to ba14509 Aug 3, 2016

@Chrriis

This comment has been minimized.

Contributor

Chrriis commented Aug 3, 2016

I am not sure what I did. Re-opening PR. Sorry about that.

@Chrriis Chrriis reopened this Aug 3, 2016

*
* @return null if the query implementation does not support this method.
*/
HashMap<String, Integer> getResultSetColumnNameIndexMap();

This comment has been minimized.

@vlsi

vlsi Aug 3, 2016

Member

Please use interfaces, not specific classes

assertNotNull(columnNameIndexMap);
rs.close();
stmt.close();
// Prepared statement

This comment has been minimized.

@vlsi

vlsi Aug 3, 2016

Member

"prepared statement" deserves its own test method.
It is typically better have "27 of 67 failures" than "1 of 1 failure".
With more fine granular tests it might be easier to track the progress & debug.

However, I do not ask to always split each assertion to its own test method.

columnNameIndexMap = getResultSetColumnNameIndexMap(rs);
assertNotNull(columnNameIndexMap);
rs.close();
// make sure the statement is named

This comment has been minimized.

@vlsi

vlsi Aug 3, 2016

Member

This one deserves its own test method as well

@vlsi

This comment has been minimized.

Member

vlsi commented Aug 5, 2016

Here are benchmark results:
number of columns: 50
number of rows: 1
all columns are of type INT
.getInt is used

  1. getInt(String) vs getInt(int) does not impact performance. @ahachete
  2. getInt(String) did allocate extra heap for resultset (4264 vs 2088).
  3. After the fix, it allocates 24 bytes more == 2112-2088 (I have no idea, however I do not think it is worth investigating)

Thus I would suggest caching the Map for server-prepared statements only (e.g. the ones that are known to be reused).

Before:

Benchmark                                             (columnIndexType)  Mode  Cnt     Score     Error   Units
ProcessResultSet.bindExecuteFetch                                  NAME  avgt    5    67,727 ±   2,460   us/op
ProcessResultSet.bindExecuteFetch:·gc.alloc.rate.norm              NAME  avgt    5  4264,040 ±   0,026    B/op
ProcessResultSet.bindExecuteFetch                                 INDEX  avgt    5    64,269 ±   1,497   us/op
ProcessResultSet.bindExecuteFetch:·gc.alloc.rate.norm             INDEX  avgt    5  2088,038 ±   0,024    B/op

After:

Benchmark                                             (columnIndexType)  Mode  Cnt     Score     Error   Units
ProcessResultSet.bindExecuteFetch                                  NAME  avgt    5    66,213 ±   2,943   us/op
ProcessResultSet.bindExecuteFetch:·gc.alloc.rate.norm              NAME  avgt    5  2112,039 ±   0,026    B/op
ProcessResultSet.bindExecuteFetch                                 INDEX  avgt    5    64,711 ±   3,332   us/op
ProcessResultSet.bindExecuteFetch:·gc.alloc.rate.norm             INDEX  avgt    5  2112,038 ±   0,024    B/op

@vlsi vlsi closed this in 88fbbc5 Aug 5, 2016

zemian pushed a commit to zemian/pgjdbc that referenced this pull request Oct 6, 2016

perf: cache result set column mapping for prepared statements
For server-prepared statements, Map<ColumnName, ColumnPosition> is reused, thus
resultSet.getXXX(String) is faster.

Benchmarks show reduced heap allocation, however the response time seems to be not affected.

closes pgjdbc#614
closes pgjdbc#607
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment