-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adaptive fetch size base on record size in bytes #675
Conversation
This PR not ready yet. I wants add some benchmarks for this feature. |
I also not sure about properties names. |
I'm not sure why this needs to be so complicated? The optimal size is around 1000. It doesn't get a lot better above that. 1000 should be large enough for small tables. How small a table are thinking a fetch size of 1000 would negatively impact performance ? |
I believe that a truly "adaptive" fetch size should be "auto-tunning", introduce so many properties can lead to confusion and is potentialy error prone, what should te best "smoothingFactor"?, how many "fetchSizeInBytes" should I use? I'm not sure if it's related or if it helps, but Microsoft have "Adaptive Buffering" and is not based on server cursors, the buffering is made in the driver, and that should improve the memory requirements since it should not be bound to autocommit and related and the best of all is that it not need to be tunned. |
That depends on the "row size". |
given that the optimal size is dependent on row size I'm not sure how we can write an adaptive optimizer without taking time to fetch into account? Ideas ? |
I add some benchmarks and gets next results on my home machine with postgresql 9.5. heavyweight_rows
lightweight_rows
where parameters was
Some tests fails with OOM for example current fetch size, or fetch size equal to 1k and they can absents in results table.
|
I agree, the first implementation had too many parameters that why in patch ed3704b I simplify it to one property defaultRowFetchSizeInBytes. |
Frankly speaking, I find nothing wrong with having knobs that allow to fine tune the mechanics. |
5f37800
to
fdb8e0d
Compare
Codecov Report
@@ Coverage Diff @@
## master #675 +/- ##
============================================
+ Coverage 68.79% 68.83% +0.03%
- Complexity 3856 3879 +23
============================================
Files 174 178 +4
Lines 16029 16099 +70
Branches 2612 2621 +9
============================================
+ Hits 11027 11081 +54
- Misses 3772 3784 +12
- Partials 1230 1234 +4 |
It is sad that simple query protocol not support fetchSize parameters. This feature will not work if |
@davecramer @jorsol @vlsi whether there are any objections about this PR? |
I'm sorry, I've not yet looked into the code. |
Is pgjdbc/www#40 up to date with the current Dave Cramer On 10 November 2016 at 08:33, Vladimir Sitnikov notifications@github.com
|
@davecramer yes, I wrote docs after complete changes in this PR |
So as I understand it adaptive fetch size is either on or off? Is it possible to get a mode where we set it and it stays at whatever value we want, much like we have now ? |
Adaptive fetch size work only if user not specify fetch size manually(Statement#setFetchSize, ResultSet#setFetchSize), so previous behavior was save. |
On 10 November 2016 at 10:14, Vladimir Gordiychuk notifications@github.com
Yes, but I want to be able to set the fetch size and have it remain static.
|
Do you want something like this Connection connection = ds.getConnection();
connection.setDefaultFetchSizeInBytes(0); //turn off adaptive fetch size
connection.setDefaultFetchSize(WELL_CALCULATED_NUMBER); //use for all statement predefine static fetch size
//... ? I thinks when DS configure with defaultFetchSizeInBytes/defaultFetchSize we should override it only on Statement/ResultSet level(not on Connection, because after close connection it returns to pool in dirty state for example where turn off adaptive fetch size), and it's available on this level set static value that not changes by accident. |
dave> Your feature should be turned on by someone on purpose, not by accident. I would say we should have autotuning=on by default. I don't think it is often that developers care and know better which fetch size to choose. |
39c5092
to
b70d42f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -1,3 +1,8 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
О_о
It is not obvious, i thinks not only for me. Maybe include notes about it to contributing guide with info how add correct licence header automatically in particular IDE.
@@ -0,0 +1,32 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -0,0 +1,30 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -0,0 +1,14 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -0,0 +1,31 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -0,0 +1,15 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -0,0 +1,187 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -0,0 +1,19 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
@@ -0,0 +1,124 @@ | |||
/* | |||
* Copyright (c) 2003, PostgreSQL Global Development Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the year the file was created.
f17e79a
to
03851a6
Compare
Is it available include this PR in 9.4.1213 milestone? |
Just in case, have you checked how mssql jdbc implements "adaptive buffering"? |
@vlsi now, yes. Postgres send our whole row by one message that why we can't reuse this approach without change protocol. We also can read not whole messages from socket(only one by one when user execute next), but I'm not sure that's a good idea. [1] https://msdn.microsoft.com/en-us/library/dd305039.aspx |
@Gordiychuk , I wonder if |
@vlsi I thought about it, and deside that more safe way not cache fetch size, because results of query can changes and for example query that first time return only records with small size, next time can return records with huge size and we fail with OOM. |
If client executes the same query with drastically different outputs, we have very little to do about it. I think it is fine to cache adaptive statistics on a per query basis. |
IMO, it's a matter of probability, what is the probability of running the same query and get different output? If it's a common scenario, then don't cache, but if it's remote then the benefits of cache can outweigh the disadvantage. BTW, is there any benchmark that can truly show the advantage of cache vs no cache? |
@visi, sorry I didn't pick this up earlier. I think using statistics makes
much more sense
Dave Cramer
…On 30 November 2016 at 14:12, Jorge Solorzano ***@***.***> wrote:
IMO, it's a matter of probability, what is the probability of running the
same query and get different output? If it's a common scenario, then don't
cache, but if it's remote then the benefits of cache can outweigh the
disadvantage.
BTW, is there any benchmark that can truly show the advantage of cache vs
no cache?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#675 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAYz9s756rCOVPwTZvBVlrBOOO5ytAW2ks5rDcqegaJpZM4KkhrU>
.
|
Where are we on this? I'd like to merge this in sooner than later |
I would like to cache the fetch size at query cache level. Not sure if that is implemented |
@Gordiychuk what needs to be done to this ? Are you still opposed to caching the fetch size as @vlsi suggested? |
Is this ready for primetime (for 42.2)? Can we have it without caching? test how it perform, and make the decision of use a cache later? |
@jorsol I think this changes not ready yet and can't be merged. |
This feature allow avoid OOM during fetch many data from huge table and also unlike defaultFetchSize not degradate performance of small tables, because fetch size estimates on records that was already fetch. Current version postgresql protocol(v3) not supports ability specify fetchSizeInBytes that why this feature implements on driver size. For estimate fetchsize for next round trip to database calculates average size by previous rows and also applies exponential smoothing. To configure adaptive fetch size was introduce next properties: fetchSizeMode=adaptive defaultRowFetchSize=100 fetchSizeMode.adaptive.fetchSizeInBytes=1000000 fetchSizeMode.adaptive.average.smoothingFactor=0.5 PR shoul solve problems describe in pgjdbc#292
03851a6
to
f9688fb
Compare
I don't think the changes are coming and we now have #1707 |
This feature allow avoid OOM during fetch many data from huge table and
also unlike defaultFetchSize not degradate performance of small tables,
because fetch size estimates on records that was already fetch.
Current version postgresql protocol(v3) not supports ability specify
fetchSizeInBytes that why this feature implements on driver size.
For estimate fetchsize for next round trip to database calculates average
size by previous rows and also applies exponential smoothing.
To configure adaptive fetch size was introduce next properties:
fetchSizeMode=adaptive
defaultRowFetchSize=100
fetchSizeMode.adaptive.fetchSizeInBytes=1000000
fetchSizeMode.adaptive.average.smoothingFactor=0.5
PR shoul solve problems describe in #292