Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implementation of adaptive fetching #1707

Merged
merged 8 commits into from Feb 24, 2020

Conversation

adrklos
Copy link
Contributor

@adrklos adrklos commented Feb 14, 2020

Implementation of adaptive fetching logic during process of fetching result set.
Change adds three new properties used with adaptive fetching:

  • adaptiveFetch - use to enable/disable work of adaptive fetching
  • adaptiveFetchMinimum - use to define minimum size computed during adaptive fetching
  • adaptiveFetchmaximum - use to define maximum size computed during adaptive fetching
    Adaptive fetching is used to compute fetch size to fully use size defined by maxResultBuffer.
    Computing is made by dividing maxResultBuffer size by max row result size noticed so far.
    Each query have seperate adaptive fetch size computed, but same queries have it shared.
    If adaptive fetch is turned on, first fetch is going to be made with defaultRowFetchSize,
    next fetching of resultSet will be made with computed adaptive fetch size.
    If adaptive fetch is turned on during fetching, then first fetching made by ResultSet will
    be made with defaultRowFetchSize, next will use computed adaptive fetch size.
    Additionally, resultSet have new method - getLastUsedFetchSize to get fetch size used during
    last fetching. If adaptive fetching is turned on, then getFetchSize will return lastUsedFetchSize.
    Property adaptiveFetch need properties defaultRowFetchSize and maxResultBuffer to work.

Commit made for Heimdall Data's request to implement adaptive fetching.

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change? (Because I don't think there is pull request for this change.)

New Feature Submissions:

  1. Does your submission pass tests?
  2. Does mvn checkstyle:check pass ?

Tests for adaptive fetching aren't added, because I wasn't sure where in tests I could implement new class with tests for this (if required I can add tests, but would be really glad for information in which test package I could implement them).

The motivation behind this feature is to make fetching more effective by using full memory capacity defined by maxResultBuffer. Adaptive fetching is going to compute size to maximize the number of rows during next fetching, but also predicting it to be safe and not exceed maxResultBuffer size (prediction is made by finding max result row size noticed so far and using it to compute adaptive fetch size for query).

If somebody would want to test how it works, then I can suggest below scenario:
Connection settings:

  • maxResultBuffer set really low i.e. 200 bytes
  • defaultRowFetchSize set as 4
  • adaptiveFetch set true.
    Database setting:
  • create table with one column containing string with 10 characters and add some rows
    What should happen during fetching:
  • first fetch would be done with size 4
  • next fetching of data would be done with size 20 (because 200/10 = 20)

I would be glad for any comments about possibilities to improve my code, because I know that I changed a lot and I may missed more effective or smarter coding solutions.

adrklos added 2 commits Feb 14, 2020
Implementation of adaptive fetching logic during process of fetching result set.
Change adds three new properties used with adaptive fetching:
- adaptiveFetch - use to enable/disable work of adaptive fetching
- adaptiveFetchMinimum - use to define minimum size computed during adaptive fetching
- adaptiveFetchmaximum - use to define maximum size computed during adaptive fetching
Adaptive fetching is used to compute fetch size to fully use size defined by maxResultBuffer.
Computing is made by dividing maxResultBuffer size by max row result size noticed so far.
Each query have seperate adaptive fetch size computed, but same queries have it shared.
If adaptive fetch is turned on, first fetch is going to be made with defaultRowFetchSize,
next fetching of resultSet will be made with computed adaptive fetch size.
If adaptive fetch is turned on during fetching, then first fetching made by ResultSet will
be made with defaultRowFetchSize, next will use computed adaptive fetch size.
Additionally, resultSet have new method - getLastUsedFetchSize to get fetch size used during
last fetching. If adaptive fetching is turned on, then getFetchSize will return lastUsedFetchSize.
Property adaptiveFetch need properties defaultRowFetchSize and maxResultBuffer to work.

Commit made for Heimdall Data's request to implement adaptive fetching.
Fix of type start in HashMap in constructor of AdaptiveFetchQueryMonitoring

Commit made for Heimdall Data's request to implement adaptive fetching.
@davecramer
Copy link
Member

@davecramer davecramer commented Feb 15, 2020

https://github.com/pgjdbc/pgjdbc/blob/master/docs/documentation/head/connect.md will also have to reflect the new properties

Documentation update in file connect.md

Commit made for Heimdall Data's request to implement adaptive fetching.
@adrklos
Copy link
Contributor Author

@adrklos adrklos commented Feb 17, 2020

@davecramer I committed before some about new properties in https://github.com/adrklos/pgjdbc/blob/heim_adaptive_fetch/docs/documentation/head/connect.md, and now I added extra information about what benefit gives using of adaptive fetching. Now I have no idea what more I could add, so if actual information is not enough, then please specify what information should be added for properties.

@davecramer
Copy link
Member

@davecramer davecramer commented Feb 17, 2020

@adrklos apologies. I must have missed it the first time around.

@davecramer
Copy link
Member

@davecramer davecramer commented Feb 17, 2020

We already have a query cache and a way to calculate the key. Have you looked at the code in

public final Object createQueryKey(String sql, boolean escapeProcessing,
?

I'd also like to explore re-using the existing query cache if possible?

@adrklos
Copy link
Contributor Author

@adrklos adrklos commented Feb 17, 2020

No, I haven't checked that code. And I would want to explain one thing, just to be clear: there is no implemented caching of computed size for queries after reading all of data rows, kept values in AdaptiveFetchQueryMonitoring in theory should exist only to a moment, when last query (last query in mean query which at the moment isn't identical to other called queries) reads Command completion message. After that, computed value should be removed to solve problem if the biggest data rows would be removed from result set before next reading with fetching.

@davecramer
Copy link
Member

@davecramer davecramer commented Feb 17, 2020

Even so you may want to look at how the CacheQueryKey is generated.

More explanation for setMaxRowSize method in PGStream class.

Commit made for Heimdall Data's request to implement adaptive fetching.
@adrklos
Copy link
Contributor Author

@adrklos adrklos commented Feb 17, 2020

Ok, I looked at generating key and at cache work, and I'm not really sure what to do with this. For needs of adaptive fetch I think key as sql string is enough. It may be used with existing cache, information from AdaptiveFetchQueryInfo could be put in CachedQuery class, methods of AdaptiveFetchQueryMonitoring may be moved to QueryExecutorBase. The only things, which may be problematic, are removing objects from cache after use and doubling information for one query (because of different keys, one hard-coded string key, and second generated by createQueryKey). On other hand if adaptive fetch objects would use sql string as key, then it would solve problem with removing, because it's used only for sql without columns returned and parametrized, so the ones which I think aren't used for fetching (but I'm not sure).

* @param rowSize new value to be set as maxRowSize
*/
public void setMaxRowSize(int rowSize) {
if (maxRowSize == -1) {
Copy link
Member

@davecramer davecramer Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if rowsize is always positive the check for -1 is not necessary

Copy link
Contributor Author

@adrklos adrklos Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, this check is not necessary, just wanted to show that if there is no maxRowSize then given rowSize should be set as maxRowSize without checking.

@davecramer
Copy link
Member

@davecramer davecramer commented Feb 18, 2020

I'm concerned that there is a lot of code here and no tests. I'd like to see some unit tests as well as an integration test where we attempt to read a large result set and see that adaptive fetching actually does what we think it does.

@@ -197,7 +203,7 @@ private void waitOnLock() throws PSQLException {
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new PSQLException(
GT.tr("Interrupted while waiting to obtain lock on database connection"),
GT.tr("Interrupted while waiting to obtain lock on database connection"),
Copy link
Member

@sehrope sehrope Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra whitespace.

if (maxRowSize == -1) {
maxRowSize = rowSize;
} else {
if (maxRowSize < rowSize) {
Copy link
Member

@sehrope sehrope Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is easier to understand as if (rowSize > maxRowSize) { ... }.

@sehrope
Copy link
Member

@sehrope sehrope commented Feb 18, 2020

I'm concerned that there is a lot of code here and no tests. I'd like to see some unit tests as well as an integration test where we attempt to read a large result set and see that adaptive fetching actually does what we think it does.

+1 I took a peek at the PR. Not an actual review, just skimming through to get an idea of what it's trying to do and agree it's a lot of code change.

@adrklos
Copy link
Contributor Author

@adrklos adrklos commented Feb 18, 2020

@davecramer Ok, I understand your concerns about this situation. I will prepare unit tests for AdaptiveFetchQueryMonitoring class (as I think it's a class which needs unit tests the most) and integration test for adaptive fetching process.

Commit to add:
- unit tests for AdaptiveFetchQueryMonitoring class;
- some style fixes (removing extra spaces, simpler code in setMaxRowSize method in PGStream class).

Commit made for Heimdall Data's request to implement adaptive fetching.
@codecov-io
Copy link

@codecov-io codecov-io commented Feb 19, 2020

Codecov Report

Merging #1707 into master will increase coverage by 3.75%.
The diff coverage is 75.39%.

@@             Coverage Diff              @@
##             master    #1707      +/-   ##
============================================
+ Coverage     65.47%   69.23%   +3.75%     
- Complexity     3936     4239     +303     
============================================
  Files           186      189       +3     
  Lines         17115    17443     +328     
  Branches       2810     2887      +77     
============================================
+ Hits          11206    12076     +870     
+ Misses         4642     4068     -574     
- Partials       1267     1299      +32

adrklos added 2 commits Feb 20, 2020
Commit to add:
- integration tests for adaptive fetching process;
- change casting to from primitive types to Wrapper classes in AdaptiveFetchQueryMonitoringTest.

Commit made for Heimdall Data's request to implement adaptive fetching.
Change in AdaptiveFetchSizeTest:
- is using PreparedStatement during tests now;
- added assume to check if preferQueryMode is not Simple;
- new method to open connection and creating table to remove code repetitiveness.

Commit made for Heimdall Data's request to implement adaptive fetching.
@adrklos
Copy link
Contributor Author

@adrklos adrklos commented Feb 21, 2020

@davecramer I added some unit tests (all of them for AdaptiveFetchQueryMonitoring class, as I mentioned before) and some integration tests for adaptive fetching process. I would be glad for any information about possibilities to improve my code or ideas for new tests, which I might have forgotten to add.

@davecramer
Copy link
Member

@davecramer davecramer commented Feb 21, 2020

@adrklos thanks I saw this. The only thing I'd like to see and I'm not exactly sure where to put this is an explanation like in the commit log put in the code so that people reading this in the future won't have to look in the commit log for how it works. Perhaps in pgjdbc/src/main/java/org/postgresql/core/v3/adaptivefetch/AdaptiveFetchQueryMonitoring.java ?

@adrklos
Copy link
Contributor Author

@adrklos adrklos commented Feb 21, 2020

@davecramer Yes, adding it as explanation for class pgjdbc/src/main/java/org/postgresql/core/v3/adaptivefetch/AdaptiveFetchQueryMonitoring.java should be the best. Adaptive fetch process implementation required changes in several places, however analysing the code flow would lead to AdaptiveFetchQueryMonitoring class, so I think putting explanation there will be most appropriate.

Commit to add explanation about adaptive fetching process in AdaptiveFetchQueryMonitoring class.

Commit made for Heimdall Data's request to implement adaptive fetching.
@adrklos
Copy link
Contributor Author

@adrklos adrklos commented Feb 21, 2020

@davecramer I added explanation, the same as is in the commit (I hope it would be enough to understand idea behind adaptive fetching process). I see that testDuringSendBigTransactionConnectionCloseSlotStatusNotActive test failed, and as I remember this test is flaky, so I hope it doesn't need my reaction to solve this problem.

@davecramer davecramer added this to the 42.3.0 milestone Feb 21, 2020
@davecramer
Copy link
Member

@davecramer davecramer commented Feb 21, 2020

Excellent. Thanks. I restarted that job

@davecramer davecramer merged commit 5bb5f40 into pgjdbc:master Feb 24, 2020
2 checks passed
@sehrope
Copy link
Member

@sehrope sehrope commented Feb 24, 2020

I'd have liked some more time to review this prior to it being merged. Skimming through, a bunch of things stand out:

  • BOOLEAN.true vs just true
  • localAdaptiveFetch == true vs just localAdaptiveFetch
  • Swallowing all exceptions in calls for getAdaptiveFetchSize(...),addQueryToAdaptiveFetchMonitoring(...), removeQueryFromAdaptiveFetchMonitoring(...)
  • Why create the cache when it's not enabled?
  • Why can you enable / disable at runtime per connection, but not change the min / max values?

Bunch of style / naming items too:

  • Inconsistent naming "min" / "max" some places, "minimum" / "maximum" others next to each other.
  • Adding units to the variable and possibly property names. Ex: is "maxRowSize" a count of rows, or their size in bytes?
  • "AdaptiveFetchQueryMonitoring" sounds like it's a background daemon. Should be "AdaptiveFetchCache" or similar no?
  • "AdaptiveFetchQueryInfo" is not query information. It should be something like "AdaptiveFetchCacheEntry".
  • Counter interaction for AdaptiveFetchQueryInfo is managed via info.setCounter(info.getCounter() + 1. This should be managed by the class itself via something like increment() or decrement() methods.
  • Member variable comments should use Javadoc style. Ex: See AdaptiveFetchQueryInfo
  • Non-member comments should be capitalized and have a space in the front "//this is a comment" vs "// This is a comment..."
  • Method comments should not start with "Method to ...`. Just describe the action directly.
  • Test comments should not reference "first", "second", ... etc.

I'm not suggesting holding off merging indefinitely but something of this size, especially with public API additions, needs more than a few days to give people a chance to review. Or at least a comment polling for final feedback prior to merging.

@davecramer
Copy link
Member

@davecramer davecramer commented Feb 24, 2020

Fair enough I can revert. Is reverting the best solution to this ?

davecramer added a commit that referenced this issue Feb 24, 2020
davecramer added a commit that referenced this issue Feb 24, 2020
davecramer pushed a commit to davecramer/pgjdbc that referenced this issue Jul 5, 2021
* feat: implementation of adaptive fetching

Implementation of adaptive fetching logic during process of fetching result set.
Change adds three new properties used with adaptive fetching:
- adaptiveFetch - use to enable/disable work of adaptive fetching
- adaptiveFetchMinimum - use to define minimum size computed during adaptive fetching
- adaptiveFetchMaximum - use to define maximum size computed during adaptive fetching
Adaptive fetching is used to compute fetch size to fully use size defined by maxResultBuffer.
Computing is made by dividing maxResultBuffer size by max row result size noticed so far.
Each query have separate adaptive fetch size computed, but same queries have it shared.
If adaptive fetch is turned on, first fetch is going to be made with defaultRowFetchSize,
next fetching of resultSet will be made with computed adaptive fetch size.
If adaptive fetch is turned on during fetching, then first fetching made by ResultSet will
be made with defaultRowFetchSize, next will use computed adaptive fetch size.
Additionally, resultSet have new method - getLastUsedFetchSize to get fetch size used during
last fetching. If adaptive fetching is turned on, then getFetchSize will return lastUsedFetchSize.
Property adaptiveFetch need properties defaultRowFetchSize and maxResultBuffer to work.

Commit to add:
- unit tests for AdaptiveFetchQueryMonitoring class;
- some style fixes (removing extra spaces, simpler code in setMaxRowSize method in PGStream class).

Commit to add:
- integration tests for adaptive fetching process;
- change casting to from primitive types to Wrapper classes in AdaptiveFetchQueryMonitoringTest.

Change in AdaptiveFetchSizeTest:
- is using PreparedStatement during tests now;
- added assume to check if preferQueryMode is not Simple;
- new method to open connection and creating table to remove code repetitiveness.

Commit to add explanation about adaptive fetching process in AdaptiveFetchQueryMonitoring class.
davecramer added a commit to davecramer/pgjdbc that referenced this issue Jul 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants