Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding array functions pushdown to pinot #15260

Merged
merged 1 commit into from
Oct 4, 2020

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Oct 2, 2020

Support pushing down Array functions to Pinot.

  • Array functions(array_sum, array_min, array_max, array_average) inside aggregation functions or group by clause to Pinot.
  • Boolean functions in Predicate: contains

E.g. For Presto query: SELECT sum(array_max(col)) FROM myTable WHERE contains(col, 1) , the pushdown pinot query is: SELECT sum(arrayMax(col)) FROM myTable WHERE col = 1

Test plan:
Unit tests and tested with local setup for array functions(array_sum, array_min, array_max, array_average, contains) pushdown to Pinot:

== RELEASE NOTES ==

Pinot Changes
* Support pushing down array functions `array_sum`, `array_min`, `array_max`, `array_average`, and `contains` to Pinot connector.

@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Oct 2, 2020

@agrawaldevesh @dharakk

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having second thoughts about ARRAY_AGGREGATION_ALLOWED_PUSHDOWN_MAP: I am wondering if we should ditch the map and just have a case statement to handle the 4 or 5 of those keys separately ?

They are all somewhat custom anyway ... sometimes we special case on things like array_min/array_max and sometimes we look up the map.

I was wondering if the code would be simpler by having a couple of helper methods with case expressions instead of a map, to account for the customness.

@xiangfu0 xiangfu0 force-pushed the pinot-pushdown-array-functions branch from 492b7b6 to 0301b8e Compare October 3, 2020 07:49
@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Oct 3, 2020

I am having second thoughts about ARRAY_AGGREGATION_ALLOWED_PUSHDOWN_MAP: I am wondering if we should ditch the map and just have a case statement to handle the 4 or 5 of those keys separately ?

They are all somewhat custom anyway ... sometimes we special case on things like array_min/array_max and sometimes we look up the map.

I was wondering if the code would be simpler by having a couple of helper methods with case expressions instead of a map, to account for the customness.

I overall feel, it would be simpler to have both map and switch case so we can have some common code logics to handle the function name conventions.

This is the first PR of pushing down array functions and I only implemented 4 functions. There would be more array functions to be pushed down.
Ref: https://prestodb.io/docs/current/functions/array.html and apache/pinot#6083

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I am fine with leaving the map there.

@xiangfu0 xiangfu0 force-pushed the pinot-pushdown-array-functions branch from 0301b8e to 10b39e8 Compare October 3, 2020 21:25
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Please make sure to update the PR description with a test plan that enumerates some examples of what all presto array expressions can now be pushed into Pinot.

@xiangfu0 xiangfu0 force-pushed the pinot-pushdown-array-functions branch 2 times, most recently from 2fc2ef1 to 83f91ee Compare October 4, 2020 00:02
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering how type checking happens for these new array functions ? How do we check that the arguments to array_sum and contains for example are indeed arrays ?

What happens if I invoke those functions on scalar columns ?

Should we add some negative tests ?

@xiangfu0 xiangfu0 force-pushed the pinot-pushdown-array-functions branch 2 times, most recently from 462536f to 81ca2d2 Compare October 4, 2020 06:42
@xiangfu0 xiangfu0 force-pushed the pinot-pushdown-array-functions branch from 81ca2d2 to 442e0bc Compare October 4, 2020 17:26
@highker highker self-requested a review October 4, 2020 18:26
@highker highker self-assigned this Oct 4, 2020
@xiangfu0 xiangfu0 force-pushed the pinot-pushdown-array-functions branch from 442e0bc to 52f19fb Compare October 4, 2020 19:40
@highker highker merged commit 5c29aa2 into prestodb:master Oct 4, 2020
@xiangfu0 xiangfu0 deleted the pinot-pushdown-array-functions branch October 5, 2020 22:58
@caithagoras caithagoras mentioned this pull request Oct 19, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants