You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have the udf dualarrayexplode which gives us the Cartesian product of two arrays which we sometimes use for comparison levels. Currently we hand-roll the SQL, but it would be useful to have a shortcut for this.
Describe the solution you'd like
Would be nice to have a couple of ways to construct this - using string templates (or something similar), or from comparison levels.
For instance, pairwise levenshtein comparison level (minimum levenshtein distance between arrays less than 2)
# for each x in name_arr_l, y in name_arr_r, condition is min(levenshtein(x, y)) <= 2cll.pairwise_array_level("name_arr", sql_snippet="levenshtein({col_l}, {col_r})", aggregate_fun="min", threshold=2)
or the same thing from a comparison level:
# for each x in name_arr_l, y in name_arr_r, condition is min(levenshtein(x, y)) <= 2cll.pairwise_array_from_level(cll.levenshtein_level("name_arr", 2), aggregate_fun="min")
or for equality (if for instance we need to count instances so array_length_intersection is inappropriate):
Could have specialised versions - e.g. one for Levenshtein, one for equality, etc, but a suitably flexible option probably covers more use-cases without providing an overwhelming number of new levels.
Additional context
Might need some thought about range of use-cases + the best interface for this.
The text was updated successfully, but these errors were encountered:
@RossKen Since you mention DuckDB, have you considered something like my suggestion to use in iterated list_reduce in DuckDB in issue #1994? Is that approach viable?
Is your proposal related to a problem?
We have the udf
dualarrayexplode
which gives us the Cartesian product of two arrays which we sometimes use for comparison levels. Currently we hand-roll the SQL, but it would be useful to have a shortcut for this.Describe the solution you'd like
Would be nice to have a couple of ways to construct this - using string templates (or something similar), or from comparison levels.
For instance, pairwise levenshtein comparison level (minimum levenshtein distance between arrays less than 2)
or the same thing from a comparison level:
or for equality (if for instance we need to count instances so
array_length_intersection
is inappropriate):Describe alternatives you've considered
Could have specialised versions - e.g. one for Levenshtein, one for equality, etc, but a suitably flexible option probably covers more use-cases without providing an overwhelming number of new levels.
Additional context
Might need some thought about range of use-cases + the best interface for this.
The text was updated successfully, but these errors were encountered: