Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset MultiChannelGroupByHash.currentPageBuilder to exact position count #12512

Merged
merged 1 commit into from Aug 31, 2022

Conversation

lukasz-stec
Copy link
Member

@lukasz-stec lukasz-stec commented May 23, 2022

Description

Since MultiChannelGroupByHash knows exactly how many
positions currentPageBuilder can eventually contain,
there is no need to increase the size with
calculateBlockResetSize, wasting memory in the process.

This fixes #12484
It's based on #12336. Only last commit matters here.

tpch/tpcds benchmark

There is a 1-2 % improvement in peak memory consumption.

image

reset-exeact-orc-part-sf1000.pdf

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine, spi BlockBuilder

How would you describe this change to a non-technical end user or system administrator?

Lower hash aggregation memory consumption

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@lukasz-stec
Copy link
Member Author

io.trino.spi.TestSpiBackwardCompatibility.testSpiSingleVersionBackwardCompatibility fails becasue BlockBuilder newBlockBuilderLike(BlockBuilderStatus blockBuilderStatus) was implemented in the BlockBuilder interface as a default method and this test does not support that.
IF we go with this PR, we will need to rework TestSpiBackwardCompatibility probably

@lukasz-stec lukasz-stec marked this pull request as ready for review June 28, 2022 13:16
@lukasz-stec lukasz-stec requested a review from sopel39 June 28, 2022 13:16
@lukasz-stec
Copy link
Member Author

if we decide to go ahead with this change, io.trino.spi.TestSpiBackwardCompatibility.testSpiSingleVersionBackwardCompatibility needs to be fixed

@@ -98,6 +98,19 @@ public void reset()
}
}

public void resetExact()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add reset(int expectedEntries) method instead.

Exact is ambiguous. Exact what? positions? size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90%* of method names are ambiguous (see no further than newBlockBuilderLike).
I think keeping the method without expectedEntries param is better as we don't need the extended functionality of resetting PageBuilder to any position count.

  • made up stat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Łukasz. The name is not that bad, and it will only be invoked with the same argument every time. Some simple javadoc might be added though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely needs a javadoc
Alternatively, how about a PageBuilder newPageBuilderLike(int expectedEntries) method instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid the Javadoc but also avoid exposing unnecessary functionality I added private
private void reset(int expectedEntries) and public resetExact with an extremely straightforward implementation that does not need documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset(int expectedEntries) may correspond to the PageBuilder constructor but it's not needed, why add it to the public API?

How about instead of inlining resetExact we change the name to be less ambiguous? The approach of inlining method calls because we don't like the method name is not good long term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you can inline it TBH due to page builder status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 see https://github.com/trinodb/trino/pull/12512/files#r950167250
to sum up. we don't need reset(int expectedEntries) public API, we need resetExact. If the name is misleading please propose a better name but I stand by the fact that resetExact captures the semantics we need and reset(int expectedEntries) adds complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to move forward with the PR I dropped the resetExact and made void reset(int expectedEntries) public.

Copy link
Member Author

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased on the master + renamed newBlockBuilderLike param

@@ -98,6 +98,19 @@ public void reset()
}
}

public void resetExact()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90%* of method names are ambiguous (see no further than newBlockBuilderLike).
I think keeping the method without expectedEntries param is better as we don't need the extended functionality of resetting PageBuilder to any position count.

  • made up stat

@lukasz-stec
Copy link
Member Author

this change adds new method to BlockBuilder in SPI so it's not backward compatible (and TestSpiBackwardCompatibility tests fail for that reason https://github.com/trinodb/trino/runs/7087416672?check_suite_focus=true).
@sopel39 @kokosing do you know how we deal with this kind of change?

Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Since the pages considered her are temporary (not the output ones), consider making them fixed size from the beginning (let's say of size 1024). That way the value lookup consists of a bit shift and a single memory lookup instead of two. That is just me thinking out loud, you can consider this out of scope.

@@ -98,6 +98,19 @@ public void reset()
}
}

public void resetExact()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Łukasz. The name is not that bad, and it will only be invoked with the same argument every time. Some simple javadoc might be added though

Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment.
Also, PTAL if GeneratedPageProjection#project can benefit from similar exact sizing

@@ -98,6 +98,19 @@ public void reset()
}
}

public void resetExact()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely needs a javadoc
Alternatively, how about a PageBuilder newPageBuilderLike(int expectedEntries) method instead ?

@lukasz-stec
Copy link
Member Author

PTAL if GeneratedPageProjection#project can benefit from similar exact sizing

Not an expert in projection but it looks like this could indeed benefit from the exact sizing. I can do this as I follow up on this PR.

@lukasz-stec
Copy link
Member Author

Since the pages considered her are temporary (not the output ones), consider making them fixed size from the beginning (let's say of size 1024).

@skrzypo987 I think this could have negative consequences if the number of groups is small

@raunaqmorarka
Copy link
Member

@lukasz-stec PTAL at test failures

@lukasz-stec
Copy link
Member Author

PTAL at test failures

@raunaqmorarka TestSpiBackwardCompatibility is failing. I'm not sure how to proceed.
See #12512 (comment).
This of course needs to be fixed before we can merge this.

@raunaqmorarka
Copy link
Member

PTAL at test failures

@raunaqmorarka TestSpiBackwardCompatibility is failing. I'm not sure how to proceed. See #12512 (comment). This of course needs to be fixed before we can merge this.

I think we need to keep the existing API around as well to keep it backward compatible. Maybe mark it Deprecated.

@lukasz-stec
Copy link
Member Author

I think we need to keep the existing API around as well to keep it backward compatible. Maybe mark it Deprecated.

The existing API is still there but now it's implemented as a default interface method.
TestSpiBackwardCompatibility does not handle changes like that

@raunaqmorarka
Copy link
Member

I think we need to keep the existing API around as well to keep it backward compatible. Maybe mark it Deprecated.

The existing API is still there but now it's implemented as a default interface method. TestSpiBackwardCompatibility does not handle changes like that

If it's not going to require changes to any existing usage (i.e. it's not really backward incompatible change), then we can probably just add the changes to the exceptions list with a comment to get past this.

@lukasz-stec lukasz-stec force-pushed the ls/017-mcgbh-reset-precise branch 2 times, most recently from be20719 to 9e984ab Compare August 29, 2022 11:23
@lukasz-stec
Copy link
Member Author

rebased on the master + added TestSpiBackwardCompatibility exceptions for the newBlockBuilderLike method

@lukasz-stec lukasz-stec force-pushed the ls/017-mcgbh-reset-precise branch 2 times, most recently from 55566f2 to 18e1c53 Compare August 30, 2022 12:19
@sopel39
Copy link
Member

sopel39 commented Aug 31, 2022

@electrum landed API verifier, please rebase and make sure tests pass

@lukasz-stec
Copy link
Member Author

@electrum landed API verifier, please rebase and make sure tests pass

rebased and added an exception to the revapi config.

Since MultiChannelGroupByHash knows exactly how many
positions currentPageBuilder can eventually contain,
there is no need to increase the size with
calculateBlockResetSize, wasting memory in the process.
@sopel39 sopel39 merged commit aee2179 into trinodb:master Aug 31, 2022
@sopel39 sopel39 mentioned this pull request Aug 31, 2022
@github-actions github-actions bot added this to the 395 milestone Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Presize new page to exeact expect size in MultiChannelGroupByHash#startNewPage
4 participants