Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more aggregation pushdown into scan #1188

Closed

Conversation

mbasmanova
Copy link
Contributor

@mbasmanova mbasmanova commented Mar 11, 2022

Add runtime statistic "loadedToValueHook" to track number of values processed
via aggregation pushdown into scan. This value contains total number of values,
not rows, e.g. if we have 2 aggregates using pushdown for 10 rows each,
loadedToValueHook will be 10 + 10 = 29. This statistics is reported by the
LazyVector::load() method when it is called with non-null ValueHook.

Use the new statistic to verify pushdown in TableScanTest.aggregationPushdown.
These checks showed that aggregation pushdown is not enabled when it should be.
Hence, a couple more fixes.

Fix check for identity projections in FilterProject operator to identify these
more reliably. This allows pushdown for queries like SELECT f(a), sum(b) FROM t
GROUP BY 1.

Fix can-pushdown check in GroupingSet to use the correct set of aggregation
inputs. This check was consistently producing 'false' for non-first aggregates.
This allows pushdown for all aggregates in queries like SELECT a, sum(b), sum
(c), min(d) FROM t GROUP BY 1.

Remaining TODOs for follow-up PRs:

  • Enable pushdown for global aggregations.
  • Fix runtime statistics reporting in LazyVector. As of now, aggregation
    pushdown statistics reported by LazyVector are added to the first operator
    upstream of aggregation. In some queries it is TableScan, while in others it
    is FilterProject.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2022
@mbasmanova mbasmanova changed the title Expose stats about aggregation pushdown Enable more aggregation pushdown into scan Mar 11, 2022
@mbasmanova mbasmanova marked this pull request as ready for review March 11, 2022 12:25
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:
Add runtime statistic "loadedToValueHook" to track number of values processed
via aggregation pushdown into scan. This value contains total number of values,
not rows, e.g. if we have 2 aggregates using pushdown for 10 rows each,
loadedToValueHook will be 10 + 10 = 20. This statistics is reported by the
LazyVector::load() method when it is called with non-null ValueHook.

Use the new statistic to verify pushdown in TableScanTest.aggregationPushdown.
These checks showed that aggregation pushdown is not enabled when it should be.
Hence, a couple more fixes.

Fix check for identity projections in FilterProject operator to identify these
more reliably. This allows pushdown for queries like SELECT f(a), sum(b) FROM t
GROUP BY 1.

Fix can-pushdown check in GroupingSet to use the correct set of aggregation
inputs. This check was consistently producing 'false' for non-first aggregates.
This allows pushdown for all aggregates in queries like SELECT a, sum(b), sum
(c), min(d) FROM t GROUP BY 1.

Remaining TODOs for follow-up PRs:

- Enable pushdown for global aggregations.
- Fix runtime statistics reporting in LazyVector. As of now, aggregation
  pushdown statistics reported by LazyVector are added to the first operator
  upstream of aggregation. In some queries it is TableScan, while in others it
  is FilterProject.

Pull Request resolved: facebookincubator#1188

Reviewed By: pedroerp

Differential Revision: D34818362

Pulled By: mbasmanova

fbshipit-source-id: bc200f1985f59eecd6119c69d8a3d4ac7bbbc1b9
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34818362

shiyu-bytedance pushed a commit to shiyu-bytedance/velox-1 that referenced this pull request Aug 18, 2022
Summary:
Add runtime statistic "loadedToValueHook" to track number of values processed
via aggregation pushdown into scan. This value contains total number of values,
not rows, e.g. if we have 2 aggregates using pushdown for 10 rows each,
loadedToValueHook will be 10 + 10 = 20. This statistics is reported by the
LazyVector::load() method when it is called with non-null ValueHook.

Use the new statistic to verify pushdown in TableScanTest.aggregationPushdown.
These checks showed that aggregation pushdown is not enabled when it should be.
Hence, a couple more fixes.

Fix check for identity projections in FilterProject operator to identify these
more reliably. This allows pushdown for queries like SELECT f(a), sum(b) FROM t
GROUP BY 1.

Fix can-pushdown check in GroupingSet to use the correct set of aggregation
inputs. This check was consistently producing 'false' for non-first aggregates.
This allows pushdown for all aggregates in queries like SELECT a, sum(b), sum
(c), min(d) FROM t GROUP BY 1.

Remaining TODOs for follow-up PRs:

- Enable pushdown for global aggregations.
- Fix runtime statistics reporting in LazyVector. As of now, aggregation
  pushdown statistics reported by LazyVector are added to the first operator
  upstream of aggregation. In some queries it is TableScan, while in others it
  is FilterProject.

Pull Request resolved: facebookincubator#1188

Reviewed By: pedroerp

Differential Revision: D34818362

Pulled By: mbasmanova

fbshipit-source-id: 0fd2090e2ff48466921a09a4105fad2277ef5270
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants