-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transform_keys and transform_values Presto functions #2245
Add transform_keys and transform_values Presto functions #2245
Conversation
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…ubator#2245) Summary: Pull Request resolved: facebookincubator#2245 Differential Revision: D38565343 Pulled By: mbasmanova fbshipit-source-id: a6a670105d921cf6fd8d88f4d52e112f1a7c00a4
71bc9a4
to
12ba118
Compare
This pull request was exported from Phabricator. Differential Revision: D38565343 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Masha, looks good. Some minor questions.
keyRows, wrapCapture, context, lambdaArgs, &transformedKeys); | ||
} | ||
|
||
// TODO Check for duplicates in transformedKeys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is checked below right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Removed.
auto wrapCapture = toWrapCapture<MapVector>( | ||
numKeys, entry.callable, *entry.rows, flatMap); | ||
|
||
entry.callable->apply( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: One question: across multiple lambdas , how do we ensure the type of keys remains same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the job of the query planner. We assume the types are correct similar to different "then" and "else" branches of a switch statement.
// In most cases there will be only one function and the loop will run once. | ||
auto it = args[1]->asUnchecked<FunctionVector>()->iterator(&rows); | ||
while (auto entry = it.next()) { | ||
auto keyRows = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do keyRows change over lambdas ? I would presume typically entry.rows would be similar .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a general case, there is only one lambda function and entry.rows == rows. However, in case when there are multiple lambdas, each lambda applies to a unique subset of rows, hence, keyRows are non-overlapping between lambdas.
// In most cases there will be only one function and the loop will run once. | ||
auto it = args[1]->asUnchecked<FunctionVector>()->iterator(&rows); | ||
while (auto entry = it.next()) { | ||
auto keyRows = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: valueRows ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Renamed.
|
||
VectorPtr transformedValues; | ||
|
||
// Loop over lambda functions and apply these to keys of the map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: apply these to values of a map ?
namespace { | ||
|
||
// See documentation at https://prestodb.io/docs/current/functions/map.html | ||
class TransformValuesFunction : public exec::VectorFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It seems values/key transforms can be templatized to one function, however this makes easier reading personally to me at expense of code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that and decided not to use templates for readability.
TEST_F(TransformKeysTest, conditional) { | ||
vector_size_t size = 1'000; | ||
|
||
// Make 2 columns: the map to transform and a boolean that decided which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also have a test with 2 lambdas with different types returned and confirm it fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also have a test with 2 lambdas with different types returned and confirm it fails.
I don't think this testing should apply to each lambda function. Instead, I suggest to add tests to make sure FunctionVector cannot be created to store lambda of different types. I'll do that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2255 adds sanity checks to switch expression
…ubator#2245) Summary: Pull Request resolved: facebookincubator#2245 Differential Revision: D38565343 Pulled By: mbasmanova fbshipit-source-id: 5eecb069a1a23d772d15893feb12f2ec88b4b781
This pull request was exported from Phabricator. Differential Revision: D38565343 |
12ba118
to
0e7be68
Compare
@kgpai Krishna, thank you for review. I replied to comments and update the PR to address these. Would you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you Masha for adding these super useful functions!
auto size = sizes[row]; | ||
for (auto i = 1; i < size; i++) { | ||
if (mapKeys->equalValueAt(mapKeys.get(), offset + i, offset + i - 1)) { | ||
VELOX_USER_FAIL("{}", kDuplicateKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe also print out the value of the duplicate key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. Let me add this in a follow-up PR as it should be added to the map() function as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori #2260 adds duplicate key to the message.
assertEqualVectors(expectedResult, result); | ||
} | ||
|
||
TEST_F(TransformKeysTest, dictionaryWithDuplicates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what does this case test? (e.g., what's the difference between this and dictionaryWithUniqueValues for transform_keys?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking this function doesn't distinguish between different dictionaries. I was adding these tests for completeness. Ideally, we would enhance Fuzzer to support lambda functions to make sure we get coverage of inputs with all sorts of encodings.
No description provided.