New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(expr): implement greatest and least function #12838
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -672,6 +672,20 @@ impl Binder { | |||
) | |||
} | |||
|
|||
fn reduce(r#type: ExprType) -> Handle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is variadic function support: #12178
The expressions must all be convertible to a common data type, which will be the type of the result (see Section 10.5 for details).
In the frontend this can be achieved with align_types
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the function currently using the variadic input. The input type of the function implementations are all f(row: impl Row)
like array
#[function("array(...) -> list")]
fn array(row: impl Row) -> ListValue {
ListValue::new(row.iter().map(|d| d.to_owned_datum()).collect())
}
To implement the functions in this PR, if the input types are also impl Row
, we will have to cast the type to the inner type manually, which is verbose and dirty. Is is possible to do things like the following with the proc macro?
fn generic_greatest<T>(input: impl IntoIterator<Item=Option<T>>) -> Option<T> {}
If not I think the current implementation is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will have to cast the type to the inner type manually, which is verbose and dirty
You meant T::try_from
? Anyways it would be a backward compatible change from a binary tree to a flat one later. cc @wangrunji0408
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is possible to do things like the following with the proc macro?
fn generic_greatest<T>(input: impl IntoIterator<Item=Option<T>>) -> Option<T> {}
Yes, it's possible. But it requires some effort to support this in the macro. If this does help, I may add this support in the near future.
we will have to cast the type to the inner type manually, which is verbose and dirty
Can we compare the Datum
directly without downcast? I think it should be correct if all inputs have been casted to the common type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, it seems that implicit cast to the common type is needed in the frontend. Otherwise, the backend function doesn't know how to compare values between string and int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we compare the Datum directly without downcast?
No because there are null first
and null last
. For the 2 functions here we actually pick neither but ignore nulls.
risingwave/src/common/src/types/mod.rs
Lines 547 to 551 in d2ec83c
// We MUST NOT implement `Ord` for `ScalarImpl` because that will make `Datum` derive an incorrect | |
// default `Ord`. To get a default-ordered `ScalarImpl`/`ScalarRefImpl`/`Datum`/`DatumRef`, you can | |
// use `DefaultOrdered<T>`. If non-default order is needed, please refer to `sort_util`. | |
impl !PartialOrd for ScalarImpl {} | |
impl !PartialOrd for ScalarRefImpl<'_> {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented the variadic version with f(row: impl Row)
and T::try_from
. The implementation looks much better now. Thanks for the suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fyi I found a case where align_type
(pg) is required and a binary expr tree would fail:
dev=> select greatest('12', '2');
greatest
----------
2
(1 row)
dev=> select greatest('12', '2', 0);
greatest
----------
12
(1 row)
The former fallback to varchar
/text
but the latter infers all 3 to be int
. It is not equivalent to greatest(greatest('12', '2'), 0)
. Not practical, just some PostgreSQL subtleties.
@@ -672,6 +672,20 @@ impl Binder { | |||
) | |||
} | |||
|
|||
fn reduce(r#type: ExprType) -> Handle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, it seems that implicit cast to the common type is needed in the frontend. Otherwise, the backend function doesn't know how to compare values between string and int.
c2916a9
to
651d4e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest lgtm.
Feel free to update the unit test expectation on ambiguous signatures to make it pass. It is a false alarm that greatest()
requires special return-type inference logic:
greatest()
is allowed by the variadic macro, but we have disallowed it in frontend withensure_arity
already.greatest(..)
does have its special return-type inference logic (align_types
).
src/expr/impl/src/scalar/cmp.rs
Outdated
.flatten() | ||
.map( | ||
|scalar| match <<T as Scalar>::ScalarRefType<'_>>::try_from(scalar) { | ||
Ok(v) => v.to_owned_scalar(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we can max
on ScalarRef
and only to_owned_scalar
on the max found at the end.
651d4e3
to
eaca70a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Need to update the release notes as well.
Codecov Report
@@ Coverage Diff @@
## main #12838 +/- ##
==========================================
+ Coverage 69.18% 69.20% +0.01%
==========================================
Files 1490 1490
Lines 246047 246110 +63
==========================================
+ Hits 170235 170318 +83
+ Misses 75812 75792 -20
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
You missed e2e tests for the streaming mode. |
These 2 are not aggregate or set returning functions. For pure scalar functions, we do not repeat the test in different configurations. |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Complete #12761
The code of cmp unit test is refactored accordingly.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
Support
greatest
andleast
pg function.The GREATEST and LEAST functions select the largest or smallest value from a list of any number of expressions. The expressions must all be convertible to a common data type, which will be the type of the result.
NULL values in the argument list are ignored. The result will be NULL only if all the expressions evaluate to NULL. (This is a deviation from the SQL standard. According to the standard, the return value is NULL if any argument is NULL. Some other databases behave this way.)
Refer to https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-GREATEST-LEAST