feat: Adaptive fetch size base on record size in bytes #675

Gordiychuk · 2016-10-30T22:38:33Z

This feature allow avoid OOM during fetch many data from huge table and
also unlike defaultFetchSize not degradate performance of small tables,
because fetch size estimates on records that was already fetch.

Current version postgresql protocol(v3) not supports ability specify
fetchSizeInBytes that why this feature implements on driver size.
For estimate fetchsize for next round trip to database calculates average
size by previous rows and also applies exponential smoothing.

To configure adaptive fetch size was introduce next properties:
fetchSizeMode=adaptive
defaultRowFetchSize=100
fetchSizeMode.adaptive.fetchSizeInBytes=1000000
fetchSizeMode.adaptive.average.smoothingFactor=0.5

PR shoul solve problems describe in #292

Gordiychuk · 2016-10-30T22:39:55Z

This PR not ready yet. I wants add some benchmarks for this feature.

Gordiychuk · 2016-10-30T22:41:27Z

I also not sure about properties names.

davecramer · 2016-10-30T23:16:46Z

I'm not sure why this needs to be so complicated? The optimal size is around 1000. It doesn't get a lot better above that. 1000 should be large enough for small tables. How small a table are thinking a fetch size of 1000 would negatively impact performance ?

jorsol · 2016-10-30T23:53:37Z

I believe that a truly "adaptive" fetch size should be "auto-tunning", introduce so many properties can lead to confusion and is potentialy error prone, what should te best "smoothingFactor"?, how many "fetchSizeInBytes" should I use?

I'm not sure if it's related or if it helps, but Microsoft have "Adaptive Buffering" and is not based on server cursors, the buffering is made in the driver, and that should improve the memory requirements since it should not be bound to autocommit and related and the best of all is that it not need to be tunned.

vlsi · 2016-10-31T14:35:19Z

The optimal size is around 1000

That depends on the "row size".
Consider what happens if the rows are wide, so 1000 rows might be too much for certain types of queries.

davecramer · 2016-10-31T21:04:29Z

given that the optimal size is dependent on row size I'm not sure how we can write an adaptive optimizer without taking time to fetch into account? Ideas ?

Gordiychuk · 2016-10-31T21:45:55Z

I add some benchmarks and gets next results on my home machine with postgresql 9.5.
In tests use 2 table with lightweight rows and heavyweight rows.

heavyweight_rows

select pg_size_pretty(octet_length(id::text)::numeric) as id_size, pg_size_pretty(octet_length(value)::numeric) as value_size from heavyweight_rows limit 1;
 id_size | value_size 
---------+------------
 1 bytes | 600 kB

lightweight_rows

select pg_size_pretty(octet_length(id::text)::numeric) as id_size, pg_size_pretty(octet_length(value::text)::numeric) as value_size from lightweight_rows limit 1;
 id_size | value_size 
---------+------------
 1 bytes | 17 bytes

# JMH 1.12 (released 214 days ago, please consider updating!)
# VM version: JDK 1.8.0_111, VM 25.111-b14
# VM invoker: /usr/lib/jvm/java-8-oracle/jre/bin/java
# VM options: -Didea.launcher.port=7534 -Didea.launcher.bin.path=/home/fol/idea/idea-IC-143.1821.5/bin -Dfile.encoding=UTF-8 -Xmx320m

Benchmark                                (fetchSize)  (fetchSizeInBytes)  Mode  Cnt     Score    Error  Units
FetchSizeBenchmark.fetchHeavyweightRows            0                1024  avgt   10  1721,112 ± 36,035  ms/op
FetchSizeBenchmark.fetchHeavyweightRows            0              102400  avgt   10  1711,399 ± 46,519  ms/op
FetchSizeBenchmark.fetchHeavyweightRows            0             1048576  avgt   10  1643,720 ± 34,820  ms/op
FetchSizeBenchmark.fetchHeavyweightRows            0             5242880  avgt   10  1571,789 ± 54,024  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          100                   0  avgt   10  1860,957 ± 51,705  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          100                1024  avgt   10  1699,106 ± 42,862  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          100              102400  avgt   10  1714,772 ± 48,099  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          100             1048576  avgt   10  1624,588 ± 57,950  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          100             5242880  avgt   10  1597,335 ± 87,698  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          200                   0  avgt   10  2026,717 ± 55,242  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          200                1024  avgt   10  1777,269 ± 77,307  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          200              102400  avgt   10  1728,127 ± 38,816  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          200             1048576  avgt   10  1718,488 ± 54,292  ms/op
FetchSizeBenchmark.fetchHeavyweightRows          200             5242880  avgt   10  1640,671 ± 54,100  ms/op
FetchSizeBenchmark.fetchLightweightRows            0                   0  avgt   10    10,711 ±  0,340  ms/op
FetchSizeBenchmark.fetchLightweightRows            0                1024  avgt   10    24,489 ±  1,535  ms/op
FetchSizeBenchmark.fetchLightweightRows            0              102400  avgt   10    10,843 ±  0,331  ms/op
FetchSizeBenchmark.fetchLightweightRows            0             1048576  avgt   10    11,191 ±  0,792  ms/op
FetchSizeBenchmark.fetchLightweightRows            0             5242880  avgt   10    10,708 ±  0,360  ms/op
FetchSizeBenchmark.fetchLightweightRows          100                   0  avgt   10    17,493 ±  0,666  ms/op
FetchSizeBenchmark.fetchLightweightRows          100                1024  avgt   10    24,182 ±  1,981  ms/op
FetchSizeBenchmark.fetchLightweightRows          100              102400  avgt   10    12,275 ±  1,341  ms/op
FetchSizeBenchmark.fetchLightweightRows          100             1048576  avgt   10    11,093 ±  0,379  ms/op
FetchSizeBenchmark.fetchLightweightRows          100             5242880  avgt   10    10,812 ±  0,347  ms/op
FetchSizeBenchmark.fetchLightweightRows          200                   0  avgt   10    15,297 ±  0,872  ms/op
FetchSizeBenchmark.fetchLightweightRows          200                1024  avgt   10    22,629 ±  0,975  ms/op
FetchSizeBenchmark.fetchLightweightRows          200              102400  avgt   10    11,148 ±  0,576  ms/op
FetchSizeBenchmark.fetchLightweightRows          200             1048576  avgt   10    11,347 ±  1,292  ms/op
FetchSizeBenchmark.fetchLightweightRows          200             5242880  avgt   10    11,302 ±  1,250  ms/op
FetchSizeBenchmark.fetchLightweightRows         1000                   0  avgt   10    12,206 ±  0,926  ms/op
FetchSizeBenchmark.fetchLightweightRows         1000                1024  avgt   10    22,170 ±  0,789  ms/op
FetchSizeBenchmark.fetchLightweightRows         1000              102400  avgt   10    11,101 ±  0,323  ms/op
FetchSizeBenchmark.fetchLightweightRows         1000             1048576  avgt   10    10,866 ±  0,340  ms/op
FetchSizeBenchmark.fetchLightweightRows         1000             5242880  avgt   10    10,808 ±  0,268  ms/op

where parameters was

  @Param({"0", "100", "200", "1000"})
  int fetchSize;

  //0byte, 1kb, 100kb, 1mb, 5mb
  @Param({"0", "1024", "102400", "1048576", "5242880"})
  long fetchSizeInBytes;

Some tests fails with OOM for example current fetch size, or fetch size equal to 1k and they can absents in results table.

# JMH 1.12 (released 214 days ago, please consider updating!)
# VM version: JDK 1.8.0_111, VM 25.111-b14
# VM invoker: /usr/lib/jvm/java-8-oracle/jre/bin/java
# VM options: -Didea.launcher.port=7534 -Didea.launcher.bin.path=/home/fol/idea/idea-IC-143.1821.5/bin -Dfile.encoding=UTF-8 -Xmx320m
# Warmup: 10 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.postgresql.benchmark.statement.FetchSizeBenchmark.fetchHeavyweightRows
# Parameters: (fetchSize = 1000, fetchSizeInBytes = 0)

# Run progress: 37,50% complete, ETA 00:25:48
# Fork: 1 of 1
# Warmup Iteration   1: <failure>

org.postgresql.util.PSQLException: Ran out of memory retrieving query results.
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2122)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:432)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:357)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:304)
    at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:290)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:267)
    at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:234)
    at org.postgresql.benchmark.statement.FetchSizeBenchmark.fetchHeavyweightRows(FetchSizeBenchmark.java:102)
    at org.postgresql.benchmark.statement.generated.FetchSizeBenchmark_fetchHeavyweightRows_jmhTest.fetchHeavyweightRows_avgt_jmhStub(FetchSizeBenchmark_fetchHeavyweightRows_jmhTest.java:170)
    at org.postgresql.benchmark.statement.generated.FetchSizeBenchmark_fetchHeavyweightRows_jmhTest.fetchHeavyweightRows_AverageTime(FetchSizeBenchmark_fetchHeavyweightRows_jmhTest.java:133)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:430)
    at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:412)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.postgresql.core.PGStream.receiveTupleV3(PGStream.java:395)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2118)
    ... 20 more

Gordiychuk · 2016-10-31T21:49:53Z

I believe that a truly "adaptive" fetch size should be "auto-tunning", introduce so many properties can lead to confusion and is potentialy error prone, what should te best "smoothingFactor"?, how many "fetchSizeInBytes" should I use?

I agree, the first implementation had too many parameters that why in patch ed3704b I simplify it to one property defaultRowFetchSizeInBytes.

vlsi · 2016-10-31T21:58:56Z

introduce so many properties can lead to confusion and is potentialy error prone

Frankly speaking, I find nothing wrong with having knobs that allow to fine tune the mechanics.
I don't mean those knobs should be tunable (and even known) by end users.
The users should just use "all defaults" and pgjdbc should do the right thing.
However, when someone finds himself in times of trouble, he might tune a knob to workaround the issue.

codecov-io · 2016-11-06T20:04:59Z

Codecov Report

Merging #675 into master will increase coverage by 0.03%.
The diff coverage is 85.36%.

@@             Coverage Diff              @@
##             master     #675      +/-   ##
============================================
+ Coverage     68.79%   68.83%   +0.03%     
- Complexity     3856     3879      +23     
============================================
  Files           174      178       +4     
  Lines         16029    16099      +70     
  Branches       2612     2621       +9     
============================================
+ Hits          11027    11081      +54     
- Misses         3772     3784      +12     
- Partials       1230     1234       +4

Gordiychuk · 2016-11-06T20:27:16Z

It is sad that simple query protocol not support fetchSize parameters. This feature will not work if preferQueryMode equal to simple or extendedForPrepared(rovided that used Statement).

Gordiychuk · 2016-11-10T13:23:36Z

@davecramer @jorsol @vlsi whether there are any objections about this PR?

vlsi · 2016-11-10T13:33:16Z

I'm sorry, I've not yet looked into the code.

davecramer · 2016-11-10T13:35:09Z

Is pgjdbc/www#40 up to date with the current
implementation ?

Dave Cramer

On 10 November 2016 at 08:33, Vladimir Sitnikov notifications@github.com
wrote:

I'm sorry, I've not yet looked into the code.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#675 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAYz9kRhtMMfo-D0RPT1Vlls9wW6E3yDks5q8x0cgaJpZM4KkhrU
.

Gordiychuk · 2016-11-10T13:51:01Z

@davecramer yes, I wrote docs after complete changes in this PR

davecramer · 2016-11-10T14:09:13Z

So as I understand it adaptive fetch size is either on or off? Is it possible to get a mode where we set it and it stays at whatever value we want, much like we have now ?

Gordiychuk · 2016-11-10T15:14:20Z

So as I understand it adaptive fetch size is either on or off? Is it possible to get a mode where we set it and it stays at whatever value we want, much like we have now ?

Adaptive fetch size work only if user not specify fetch size manually(Statement#setFetchSize, ResultSet#setFetchSize), so previous behavior was save.

davecramer · 2016-11-10T15:17:10Z

On 10 November 2016 at 10:14, Vladimir Gordiychuk notifications@github.com
wrote:

So as I understand it adaptive fetch size is either on or off? Is it
possible to get a mode where we set it and it stays at whatever value we
want, much like we have now ?

Adaptive fetch size work only if user not specify fetch size
manually(Statement#setFetchSize, ResultSet#setFetchSize), so previous
behavior was save.

Yes, but I want to be able to set the fetch size and have it remain static.
Assuming I actually know better. This is the current behaviour. Your
feature should be turned on by someone on purpose, not by accident.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#675 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAYz9qQLoO-oIGjLkAZjGxzVeZikBCpCks5q8zTMgaJpZM4KkhrU
.

Gordiychuk · 2016-11-10T15:37:23Z

Yes, but I want to be able to set the fetch size and have it remain static.
Assuming I actually know better. This is the current behaviour. Your
feature should be turned on by someone on purpose, not by accident.

Do you want something like this

Connection connection = ds.getConnection();
connection.setDefaultFetchSizeInBytes(0); //turn off adaptive fetch size
connection.setDefaultFetchSize(WELL_CALCULATED_NUMBER); //use for all statement predefine static fetch size
//...

?

I thinks when DS configure with defaultFetchSizeInBytes/defaultFetchSize we should override it only on Statement/ResultSet level(not on Connection, because after close connection it returns to pool in dirty state for example where turn off adaptive fetch size), and it's available on this level set static value that not changes by accident.

vlsi · 2016-11-10T15:40:42Z

dave> Your feature should be turned on by someone on purpose, not by accident.

I would say we should have autotuning=on by default. I don't think it is often that developers care and know better which fetch size to choose.

jorsol