Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pushdown approx_distinct(x, e) into pinot #15231

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Sep 27, 2020

Current Pinot aggregation pushdown only supports approx_distinct(x) but not approx_distinct(x, e).

Since pinot supports configurable approximate distinct with query function syntax distinctCountHll(x, log2m) (apache/pinot#5564), we can push down approx_distinct(x, e) by converting e to log2m then call pinot function: distinctCountHll

Some ref to log2m:
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLog.java#L98

Test plan

  • Adding unit tests to check the correct expression to pushdown.
  • Tested locally with a presto server and pinot up and run the query with real data set.
== RELEASE NOTES ==

Pinot Changes
* Support pushing down aggregation function `approx_distinct(x, e)` to Pinot connector.

@xiangfu0
Copy link
Contributor Author

@agrawaldevesh

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supported in both Pql and sql code paths ?

@xiangfu0 xiangfu0 force-pushed the fixing_approximate_distinct_with_error_rate branch from e18011e to 8a7c588 Compare September 28, 2020 18:42
@xiangfu0
Copy link
Contributor Author

Is this supported in both Pql and sql code paths ?

Yes, both places are supported. Pinot directly pass the arguments into function and let functions to handle at.

@xiangfu0 xiangfu0 force-pushed the fixing_approximate_distinct_with_error_rate branch from 8a7c588 to 331deb8 Compare September 28, 2020 19:41
@highker highker self-requested a review October 4, 2020 17:24
@highker highker self-assigned this Oct 4, 2020
@xiangfu0 xiangfu0 force-pushed the fixing_approximate_distinct_with_error_rate branch from 331deb8 to c9ff18f Compare October 4, 2020 19:43
@xiangfu0 xiangfu0 force-pushed the fixing_approximate_distinct_with_error_rate branch from c9ff18f to 3393e40 Compare October 4, 2020 19:44
@highker highker merged commit 9f4203e into prestodb:master Oct 4, 2020
@xiangfu0 xiangfu0 deleted the fixing_approximate_distinct_with_error_rate branch October 5, 2020 22:57
@caithagoras caithagoras mentioned this pull request Oct 19, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants