-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict input argument as orderable in array_sort, array_sort_desc #6928
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good % some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for iterating. A couple remaining comments.
dc7be2f
to
3239206
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
CI is red. Looks like we need to update AggregationFuzzer to respect "orderable" constraint.
|
@mbasmanova We could add this constraint in |
@mbasmanova This is more appropriate and cleaner (I should think more before coding :) ). PS: |
There are so-called companion functions. See #4566 These are not currently used though. I do see lots of issues when testing companion functions in the fuzzer. I was thinking to modify the registration APIs to allow to exclude all companion functions. CC: @kagamiori |
@mbasmanova Hi Masha, it seems we could not use JSON type as DuckDB does not support it. I am trying other transform functions, if not work would fall back to the specific functions way. terminating with uncaught exception of type std::runtime_error: unsupported type for duckdb -> velox conversion: JSON |
@mbasmanova Hi Masha, I've updated this PR with the following changes, could you please help to take a look? Thanks.
|
You may need to put $internal$array_sort in double quotes: |
I wonder if maybe a better name for this function would be $internal$canonicalize, because this function transforms an array into canonical form that allows for comparison. |
@@ -464,4 +489,12 @@ VELOX_DECLARE_STATEFUL_VECTOR_FUNCTION( | |||
signatures(false), | |||
createDesc); | |||
|
|||
// Add an internal version of array_sort for used by AggregationFuzzerTest to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos:
An internal function to canonicalize an array to allow for comparisons. Used in AggregationFuzzerTest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -122,6 +122,8 @@ void registerArrayFunctions(const std::string& prefix) { | |||
VELOX_REGISTER_VECTOR_FUNCTION(udf_array_sort, prefix + "array_sort"); | |||
VELOX_REGISTER_VECTOR_FUNCTION( | |||
udf_array_sort_desc, prefix + "array_sort_desc"); | |||
VELOX_REGISTER_VECTOR_FUNCTION( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, move this registration into AggregationFuzzerTest. This function should not be available in production setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good % a few remaining comments.
velox/functions/prestosql/registration/ArrayFunctionsRegistration.cpp
Outdated
Show resolved
Hide resolved
velox/functions/prestosql/registration/RegistrationFunctions.cpp
Outdated
Show resolved
Hide resolved
{"map_agg", "array_sort(map_keys({}))"}, | ||
{"map_union", "array_sort(map_keys({}))"}, | ||
{"map_union_sum", "array_sort(map_keys({}))"}, | ||
{"array_agg", "\"$internal$canonicalize\"({})"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this change reduces coverage in the AggregationFuzzer. Before this change, the Fuzzer was able to verify results against DuckDB (since it supports array_sort function), but now it cannot. We are going to face same problem with using Presto as the reference query runner. A solution could be to change the Fuzzer to apply post-aggregation projections using Velox only (once to Velox results, once to results from reference DB), then compare. This is a follow-up though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova merged this pull request in 5afb621. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: In Presto, `array_sort` applied to arrays of complex types with nested nulls may or may not fail depending on whether the sorting logic needs to compare the nulls to decide the order. ```SQL presto> SELECT array_sort(col0) FROM ( VALUES (array [array [1, 2, 3, 4], array [2, null, 3]])) AS t(col0); _col0 ------------------------------ [[1, 2, 3, 4], [2, null, 3]] presto> SELECT array_sort(col0) FROM ( VALUES (array [array [1, 2, 3, 4], array [1, null, 3]])) AS t(col0); Query 20230925_113531_00074_6r5h7 failed: Array contains elements not supported for comparison ``` This PR checks for contains-nulls only during complex type comparison (Presto's implementation). Add a throwOnNestedNullCompare flag to distinguish the normal `array_sort[_desc]` function and the internal canonicalize function `$internal$canonicalize` introduced in #6928, which would use `NoStop` null-handling mode to compare and not throw. Resolve #6713 Pull Request resolved: #7229 Reviewed By: xiaoxmeng, kgpai Differential Revision: D50644882 Pulled By: mbasmanova fbshipit-source-id: 80883ed17da1cae1ec8fbb3fc66b899c422a4dd1
In Presto, input to
array_sort
, andarray_sort_desc
are restricted toorderable types. For example, MAP type is not orderable.
This PR uses the
orderableTypeVariable
to apply the restrictionto the input argument of array_sort. This restriction applies to the input
argument type of normal array_sort, and the return type of the lambda
of array_sort with a transform lambda, not the array_sort function
with a custom comparator.
For more details please check the blog https://velox-lib.io/blog/array-sort/
Part of #6718, resolve #6712