-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnest array of row to multiple columns #10883
Conversation
425e667
to
df7d46a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private int position; | ||
private int positionCount; | ||
|
||
public ArrayRowUnnester(Type elementType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just make constructor to take List<Type> fieldTypes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that. But setBlock expects a RowBlock, so it would be really strange for the constructor to take an arbitrary list of types. There are pros and cons for both. I think keeping the constructor consistent with other unnesters and make it obvious that this should be only used for RowType is better.
implements Unnester | ||
{ | ||
private final List<Type> fieldTypes; | ||
private ColumnarRow block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just call it columnarRow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, both makes sense. I don't really care.
@@ -112,7 +113,12 @@ public UnnestOperator(OperatorContext operatorContext, List<Integer> replicateCh | |||
this.unnesters = new ArrayList<>(unnestTypes.size()); | |||
for (Type type : unnestTypes) { | |||
if (type instanceof ArrayType) { | |||
unnesters.add(new ArrayUnnester(((ArrayType) type).getElementType())); | |||
if (((ArrayType) type).getElementType() instanceof RowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might consider put a local variable elementType = ((ArrayType) type).getElementType()
and being reused ?
@@ -575,6 +575,11 @@ public void testUnnest() | |||
assertQuery("SELECT b FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5]) t(a, b)", "SELECT * FROM VALUES 4, 5, NULL"); | |||
assertQuery("SELECT count(*) FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5])", "SELECT 3"); | |||
assertQuery("SELECT a FROM UNNEST(ARRAY['kittens', 'puppies']) t(a)", "SELECT * FROM VALUES ('kittens'), ('puppies')"); | |||
assertQuery("SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)], ARRAY[ROW('a', true), ROW('b', false)])", "SELECT * FROM VALUES (1, 2, 'a', true), (3, 4, 'b', false)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add comment explaining these three tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other test cases have comments. These test cases should be at least as self-explanatory for what they are testing as the others in the test. Do we have any criteria on when to add comment vs not?
@@ -650,7 +661,17 @@ protected RelationPlan visitUnnest(Unnest node, Void context) | |||
Symbol inputSymbol = symbolAllocator.newSymbol(rewritten, type); | |||
argumentSymbols.add(inputSymbol); | |||
if (type instanceof ArrayType) { | |||
unnestSymbols.put(inputSymbol, ImmutableList.of(unnestedSymbolsIterator.next())); | |||
Type elementType = ((ArrayType) type).getElementType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like how unnest is done through out the code base. the
if (type instanceof ArrayType)
...
else if (type instanceof MapType)
...
is everywhere. If I were to do a refactoring, I'd prefer something doesn't require that. But I didn't see an easy way to do so. Would be more than happy to take suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract the outer if as a static method.
I'm not sure what part of the following snippet you don't like
if (type instanceof ArrayType)
...
else if (type instanceof MapType)
...
You could have done switch(type.getTypeSignature().getBase()
. But that makes little difference.
@@ -0,0 +1,63 @@ | |||
package com.facebook.presto.operator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyright
import static com.google.common.base.Preconditions.checkState; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public class ArrayRowUnnester |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayOfRowsUnnester?
@@ -575,6 +575,11 @@ public void testUnnest() | |||
assertQuery("SELECT b FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5]) t(a, b)", "SELECT * FROM VALUES 4, 5, NULL"); | |||
assertQuery("SELECT count(*) FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5])", "SELECT 3"); | |||
assertQuery("SELECT a FROM UNNEST(ARRAY['kittens', 'puppies']) t(a)", "SELECT * FROM VALUES ('kittens'), ('puppies')"); | |||
assertQuery("SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)], ARRAY[ROW('a', true), ROW('b', false)])", "SELECT * FROM VALUES (1, 2, 'a', true), (3, 4, 'b', false)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please format this as:
assertQuery(
actualQuery,
expectedQuery);
```?
@@ -575,6 +575,11 @@ public void testUnnest() | |||
assertQuery("SELECT b FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5]) t(a, b)", "SELECT * FROM VALUES 4, 5, NULL"); | |||
assertQuery("SELECT count(*) FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5])", "SELECT 3"); | |||
assertQuery("SELECT a FROM UNNEST(ARRAY['kittens', 'puppies']) t(a)", "SELECT * FROM VALUES ('kittens'), ('puppies')"); | |||
assertQuery("SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)], ARRAY[ROW('a', true), ROW('b', false)])", "SELECT * FROM VALUES (1, 2, 'a', true), (3, 4, 'b', false)"); | |||
assertQuery("SELECT x, y FROM (VALUES (ARRAY[ROW(1, 2), ROW(3, 4)])) AS t(a) CROSS JOIN UNNEST(a) t(x, y)", "SELECT * FROM VALUES (1, 2), (3, 4)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the output is not:
SELECT * FROM VALUES (1, 2, 3, 4)
There is only one array so I was expecting only one output row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one array of two rows. Each array element will become one row in the final result right? So the first element ROW(1, 2) becomes (1, 2), and second element ROW(3, 4) becomes (3, 4).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be produced for:
SELECT x, y FROM (VALUES (ARRAY[ROW(1, 2), ROW('varchar', false])) AS t(a) CROSS JOIN UNNEST(a) t(x, y)
?
Also why the test
assertQuery("SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)], ARRAY[ROW('a', true), ROW('b', false)])", "SELECT * FROM VALUES (1, 2, 'a', true), (3, 4, 'b', false)");
does not produce 4 rows (one per each row in arrays)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the semantic correctly, when you unnest an array, each element in the array will become 1 row. If you unnest two arrays, first element of both arrays will become 1 row, second element of both arrays will become the second row, etc. Things would be a little more obvious with "WITH ORDINALITY". Previously, we'd always unnest an array into one column, so if it's array, we'd unnest that to one column of type ROW. This PR would change it so it unnests the array to multiple columns defined by the fields in the row.
In your example, ARRAY[ROW(1, 2), ROW('varchar', false])
is not a valid array.
@@ -575,6 +575,11 @@ public void testUnnest() | |||
assertQuery("SELECT b FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5]) t(a, b)", "SELECT * FROM VALUES 4, 5, NULL"); | |||
assertQuery("SELECT count(*) FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5])", "SELECT 3"); | |||
assertQuery("SELECT a FROM UNNEST(ARRAY['kittens', 'puppies']) t(a)", "SELECT * FROM VALUES ('kittens'), ('puppies')"); | |||
assertQuery("SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)], ARRAY[ROW('a', true), ROW('b', false)])", "SELECT * FROM VALUES (1, 2, 'a', true), (3, 4, 'b', false)"); | |||
assertQuery("SELECT x, y FROM (VALUES (ARRAY[ROW(1, 2), ROW(3, 4)])) AS t(a) CROSS JOIN UNNEST(a) t(x, y)", "SELECT * FROM VALUES (1, 2), (3, 4)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also below queries:
SELECT x, y FROM (VALUES (ARRAY[ROW(1, 2), ROW('varchar', false)])) AS t(a) CROSS JOIN UNNEST(a) t(x, y)
SELECT x, y FROM (VALUES (ARRAY[ROW(1)], ARRAY[ROW(2, 3)]) AS t(a) CROSS JOIN UNNEST(a) t(x, y)
SELECT x, y FROM (VALUES (ARRAY[ROW(1)], ARRAY[ROW('varchar')]) AS t(a) CROSS JOIN UNNEST(a) t(x, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These queries not valid, so they won't test unnest array of rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but they should fail in some reasonable way. IMO this should be tested, but maybe not here.
926d499
to
0f5db0f
Compare
@@ -575,6 +575,15 @@ public void testUnnest() | |||
assertQuery("SELECT b FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5]) t(a, b)", "SELECT * FROM VALUES 4, 5, NULL"); | |||
assertQuery("SELECT count(*) FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5])", "SELECT 3"); | |||
assertQuery("SELECT a FROM UNNEST(ARRAY['kittens', 'puppies']) t(a)", "SELECT * FROM VALUES ('kittens'), ('puppies')"); | |||
assertQuery( | |||
"SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)], ARRAY[ROW('a', true), ROW('b', false)])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of
SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)])
what are the column names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They shouldn’t have one, just like every other place in the engine that produces anonymous rows or columns.
0f5db0f
to
42ad041
Compare
outputFields.add(Field.newUnqualified(Optional.empty(), ((ArrayType) expressionType).getElementType())); | ||
Type elementType = ((ArrayType) expressionType).getElementType(); | ||
if (elementType instanceof RowType) { | ||
outputFields.addAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracting a variable will make this more readable:
ImmutableList<Field> fields = elementType.getTypeParameters().stream()
.map(type -> Field.newUnqualified(Optional.empty(), type))
.collect(toImmutableList());
outputFields.addAll(fields);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. But I thought we prefer inlining variables if they are only used once? Happy to go either way. But can we have a consistent standard? (I can take "when it affects readability, use a variable, otherwise inline" as a standard, though that would leave some room for debates about what's "readable". So it's probably better to provide a clearer guideline.
42ad041
to
0e1314b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments below.
In addition, please add tests to TestUnnestOperator
.
@@ -575,6 +575,15 @@ public void testUnnest() | |||
assertQuery("SELECT b FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5]) t(a, b)", "SELECT * FROM VALUES 4, 5, NULL"); | |||
assertQuery("SELECT count(*) FROM UNNEST(ARRAY[1, 2, 3], ARRAY[4, 5])", "SELECT 3"); | |||
assertQuery("SELECT a FROM UNNEST(ARRAY['kittens', 'puppies']) t(a)", "SELECT * FROM VALUES ('kittens'), ('puppies')"); | |||
assertQuery( | |||
"SELECT * FROM UNNEST(ARRAY[ROW(1, 2), ROW(3, 4)], ARRAY[ROW('a', true), ROW('b', false)])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cover double and structural type in this test case
@@ -650,7 +661,17 @@ protected RelationPlan visitUnnest(Unnest node, Void context) | |||
Symbol inputSymbol = symbolAllocator.newSymbol(rewritten, type); | |||
argumentSymbols.add(inputSymbol); | |||
if (type instanceof ArrayType) { | |||
unnestSymbols.put(inputSymbol, ImmutableList.of(unnestedSymbolsIterator.next())); | |||
Type elementType = ((ArrayType) type).getElementType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract the outer if as a static method.
I'm not sure what part of the following snippet you don't like
if (type instanceof ArrayType)
...
else if (type instanceof MapType)
...
You could have done switch(type.getTypeSignature().getBase()
. But that makes little difference.
outputFields.addAll( | ||
elementType.getTypeParameters().stream() | ||
.map(type -> Field.newUnqualified(Optional.empty(), type)) | ||
.collect(toImmutableList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer using a for-loop here. If you prefer stream
, you could do .forEach(outputFields::add)
. This way, you wouldn't need outputFields.addAll
any more. You can change your code either way.
{ | ||
this.columnarRow = ColumnarRow.toColumnarRow(block); | ||
this.position = 0; | ||
this.positionCount = block == null ? 0 : block.getPositionCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPE at line 72. ColumnarRow.toColumnarRow
throws for null argument
0e1314b
to
851765c
Compare
851765c
to
b93a352
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Please look at test failures, too.
true), | ||
booleanSessionProperty( | ||
LEGACY_UNNEST, | ||
"Using legacy unnest semantic, where unnest(array<row>) will create one column of type row", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array(row)
is the canonical form to refer to a parametric type.
@@ -115,6 +115,7 @@ | |||
private DataSize filterAndProjectMinOutputPageSize = new DataSize(500, KILOBYTE); | |||
private int filterAndProjectMinOutputPageRowCount = 256; | |||
private int maxGroupingSets = 2048; | |||
private boolean legacyUnnestArrayRows = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this false
by default. The old (broken) behavior should be available but not how it works out of the box.
@Test(singleThreaded = true) | ||
public class TestLegacyUnnestArrayRows | ||
{ | ||
private H2QueryRunner h2QueryRunner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for H2 here. Take a look at how com.facebook.presto.sql.query.TestXXX work.
} | ||
|
||
@Test | ||
public void testUnnestArrayRowsLegacyBehavorDisabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this test to a TestUnnest class in com.facebook.presto.sql.query, similar to TestGrouping, TestJoinUsing, etc. and leave TestLegacyUnnest in a separate class (like TestLegacyJoinUsing). It's easier to follow the code, keep concerns separated and when it's time to remove the feature, it's just a matter of deleting the class.
440382e
to
97179e0
Compare
@martint I addressed the comments. Mind take another look? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, but otherwise looks good.
@Test | ||
public void testLegacyUnnestArrayRows() | ||
{ | ||
QueryAssertions assertions = new QueryAssertions(testSessionBuilder().build(), new FeaturesConfig().setLegacyUnnestArrayRows(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be closed:
try (QueryAssertions queryAssertions = new QueryAssertions(...)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you could just do this instead of adding FeaturesConfig to the QueryAssertions:
testSessionBuilder()
.setSystemProperty(LEGACY_UNNEST, "true")
.build()
Although, admittedly, that doesn't test the whole config->session->engine path.
@Test | ||
public void testUnnestArrayRowsLegacyBehavorDisabled() | ||
{ | ||
QueryAssertions assertions = new QueryAssertions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
public class TestUnnestArrayRows | ||
{ | ||
@Test | ||
public void testUnnestArrayRowsLegacyBehavorDisabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just call this testUnnestArrayOfRows
|
||
import org.testng.annotations.Test; | ||
|
||
public class TestUnnestArrayRows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this TestUnnest (this is where all tests for unnest should eventually go)
97179e0
to
f3f7aa3
Compare
This partially resolves #8151 (for array type).