Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shall we implement higher order function map (Presto Transform)? #11123

Closed
lmatz opened this issue Jul 21, 2023 · 6 comments
Closed

Shall we implement higher order function map (Presto Transform)? #11123

lmatz opened this issue Jul 21, 2023 · 6 comments
Labels
component/func-expr Support a SQL function or operator needs-discussion

Comments

@lmatz
Copy link
Contributor

lmatz commented Jul 21, 2023

This refers to the transform function in PrestoDB:https://prestodb.io/docs/current/functions/array.html#transform

The way Risingwave currently write is:

dev=> CREATE TABLE my_table (my_array integer[]);
CREATE_TABLE
dev=> insert into my_table values (ARRAY[1,2,3,4,5]);
INSERT 0 1
dev=> select unnest(my_array) * 2 as transformed_array from my_table;
 transformed_array 
-------------------
                 2
                 4
                 6
                 8
                10
(5 rows)

If I do:

select array_agg(unnest(my_array) * 2)  as transformed_array from my_table;
ERROR:  QueryError: Feature is not yet implemented: table function inside aggregation calls

And in PG:

martin=# select array_agg(unnest(my_array) * 2) as transformed_array from my_table;
ERROR:  aggregate function calls cannot contain set-returning function calls
LINE 1: select array_agg(unnest(my_array) * 2) as transformed_array ...
                         ^
HINT:  You might be able to move the set-returning function into a LATERAL FROM item.

Current workaround is to use subquery / CTE / view:

select array_agg(transformed_array) from (
  select unnest(my_array) * 2 as transformed_array from my_table
) as unnested;

But writing Transform(my_array, x -> x * 2) is much more concise.

Also note the unnest + array_agg trick above only works for 1d array, because unnest in PostgreSQL works recursively. We would need a non-recursive unnest_list to pair with array_agg precisely if such usage become more common.

This case is inspired by users' query of Presto, but the user's usage can be worked around as it essentially is a xxx in (A, B, C)

Link risingwavelabs/rfcs#69

@github-actions github-actions bot added this to the release-1.1 milestone Jul 21, 2023
@xiangjinwu xiangjinwu changed the title Shall we implement Transform? Shall we implement higher order function map (Presto Transform)? Jul 21, 2023
@xiangjinwu xiangjinwu added the component/func-expr Support a SQL function or operator label Jul 21, 2023
@lmatz lmatz removed this from the release-1.1 milestone Jul 21, 2023
@lmatz
Copy link
Contributor Author

lmatz commented Jul 26, 2023

We would need a non-recursive unnest_list to pair with array_agg precisely if such usage become more common.

What if we don't know the dimension in advance? I suppose we cannot determine how many times unnest_list and array_agg we should apply. In this case transform seems to be the only way out

@xiangjinwu
Copy link
Contributor

What if we don't know the dimension in advance?

It is part of the static type. int[] requires one pair while int[][][] requires three pairs. We do not allow {2,3} and {{3,4},{5,6}} to appear in the same column.

@st1page
Copy link
Contributor

st1page commented Aug 2, 2023

I missed this issue and write a RFC today for the higher order function risingwavelabs/rfcs#69

@lmatz
Copy link
Contributor Author

lmatz commented Aug 2, 2023

I missed this issue and write a RFC today for the higher order function risingwavelabs/rfcs#69

It's fine, I was thinking the same:

For the lambda expression, it is like a anonymous SQL UDF. We can implement the SQL UDF first and design lambda expression later.

I remember Xiangjin told me that the lambda expression here is more restrictive, i.e. it must be a const function

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Aug 2, 2023

Let me clarify this, supporting lambda expression and higher order function are two different things, although related.

It is possible to use higher order function (map, filter, reduce) without lambda expression. We just need a builtin or UDF:

  • transform(array[1, 4, 9, 16], sqrt) -> {1, 2, 3, 4}

Lambda expression just makes the use of HOF more convenient, so that we do not need to create function beforehand and come up with names.

The core part of HOP support is how to pass functions (incl lambda) as arguments. Right now Expr::eval only consumes chunks with typed arrays, but we may not want to put functions in arrays/chunks - that is, there are no array/column of function type.

By saying the argument (possible a function) must be const, I mean it cannot be a column (InputRef) as follows:

select transform(arr, fn) from
  (values (array['alice', 'bob']), (array['san francisco']), (array['utf8', 'ascii'])) as t1 (arr)
cross join
  (values (lower), (upper), (initcap)) as t2 (fn);

(To be honest, the SomeAllExpression today is similar to HOF. v < ALL(arr) and v ~ ALL(arr) is handled by the same logic parameterized by the binary predicate operator < or ~, with hack of course.)

@xxchan
Copy link
Member

xxchan commented Oct 9, 2023

We've already implemented lambda expression and higher order function (array_transform) in #11937

Remaining tracked in #11837

@xxchan xxchan closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/func-expr Support a SQL function or operator needs-discussion
Projects
None yet
Development

No branches or pull requests

4 participants