-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ContainerRowSerde::compareWithNulls API #6419
Conversation
✅ Deploy Preview for meta-velox canceled.
|
1d608db
to
21b8a91
Compare
@@ -144,5 +145,91 @@ TEST_F(ContainerRowSerdeTest, nested) { | |||
testRoundTrip(nestedArray); | |||
} | |||
|
|||
TEST_F(ContainerRowSerdeTest, compareNullsInRowVector) { | |||
std::optional<std::optional<int>> x = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int -> int32_t
Is this variables used? If so, let's come up with a more descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
std::optional<std::optional<int>> x = | ||
std::make_optional<std::optional<int>>(std::nullopt); | ||
auto data = makeRowVector({ | ||
makeNullableFlatVector<int32_t>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeFlatVector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
velox/exec/ContainerRowSerde.cpp
Outdated
auto result = compareStringAsc(left, right, index, flags.equalsOnly); | ||
return flags.ascending ? result : result * -1; | ||
} | ||
|
||
std::optional<std::optional<int32_t>> | ||
compareNulls(bool leftNull, bool rightNull, CompareFlags flags) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a copy of BaseVector::compareNulls? If so, maybe change BaseVector::compareNulls to be a static public method and re-use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could reuse BaseVector::compareNulls
. Updated.
velox/exec/ContainerRowSerde.cpp
Outdated
auto result = compareStringAsc(left, right, index, flags.equalsOnly); | ||
return flags.ascending ? result : result * -1; | ||
} | ||
|
||
std::optional<std::optional<int32_t>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double optional is a bit hard to reason about. Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reuse BaseVector::compareNulls
now.
DecodedVector decodedVector; | ||
decodedVector.decode(*rowVector, rows); | ||
|
||
static const CompareFlags kCompareFlags{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing that we serializes a value of type ROW(INTEGER, VARCHAR), but compare it to a value of type ROW(INTEGER).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
ByteStream out; | ||
HashStringAllocator::prepareRead(position.header, out); | ||
|
||
auto arrayVector = makeNullableArrayVector<int64_t>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you also add tests for maps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laithsakka Laith, would you help review this PR?
e12a786
to
4564074
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @duanmeng, Thank you for adding this support to fix min() and max() functions! I left two questions.
velox/exec/ContainerRowSerde.h
Outdated
@@ -44,6 +44,15 @@ class ContainerRowSerde { | |||
const Type* type, | |||
CompareFlags flags); | |||
|
|||
/// Return std::nullopt if encountered in NullHandlingMode::StopAtNull mode. | |||
/// Support NullHandlingMode::NoStop mode as well and return | |||
/// std::optional<int32_t>(0) if null encountered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed this, but where do we return std::nullopt in the leaf cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your question about ** leaf cases**.
It will return std::nullopt
, say when an element of an ArrayVector is null during in StopAtAtNull mode. BaseVector::compareNulls
returns std::nullopt
in StopAtAtNull mode when a null is encountered.
std::optional<int32_t> compareArrays(
...
if (leftNull || rightNull) {
auto result = BaseVector::compareNulls(leftNull, rightNull, flags);
if (result && *result == 0) {
continue;
}
return result;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Return std::nullopt if encountered in NullHandlingMode::StopAtNull mode.
/// Support NullHandlingMode::NoStop mode as well and return
/// std::optional<int32_t>(0) if null encountered.
This is not clear and may be wrong. I will update it as follows,
"Return std::nullopt if encountered in NullHandlingMode::StopAtNull mode. And return std::optional<int32_t>(0) if both sides are nulls in NullHandlingMode::NoStop mode."
velox/exec/ContainerRowSerde.cpp
Outdated
auto map = right.wrappedVector()->asUnchecked<MapVector>(); | ||
VELOX_CHECK_EQ(map->encoding(), VectorEncoding::Simple::MAP); | ||
auto wrappedIndex = right.wrappedIndex(index); | ||
auto size = map->sizeAt(wrappedIndex); | ||
std::vector<vector_size_t> indices(size); | ||
auto rightIndices = map->sortedKeyIndices(wrappedIndex); | ||
auto result = compareArrayIndices(left, *map->mapKeys(), rightIndices, flags); | ||
if (result) { | ||
if (result.has_value() && result.value() != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if (result && *result != 0)
for consistency with other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your remind. I've fixed it using the following code, and improve the ut of MapType.
if (result && *result == 0) {
return compareArrayIndices(left, *map->mapValues(), rightIndices, flags);
}
return result;
@mbasmanova Hi Masha, I've fixed all the comments, refactored the UT codes, and improved the coverage according to your guidance. Could you help to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duanmeng Thank you for iterating on this PR. This is shaping up nicely. Some comments below.
velox/exec/ContainerRowSerde.cpp
Outdated
|
||
if (leftNull || rightNull) { | ||
auto result = BaseVector::compareNulls(leftNull, rightNull, flags); | ||
if (result && *result == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit hard to read. Consider being more explicit:
if (result.has_value() && result.value() == 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
velox/exec/ContainerRowSerde.cpp
Outdated
auto result = compareSwitch(left, *child, wrappedIndex, flags); | ||
if (result) { | ||
return result; | ||
if (result && *result == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
velox/exec/ContainerRowSerde.h
Outdated
@@ -44,6 +44,15 @@ class ContainerRowSerde { | |||
const Type* type, | |||
CompareFlags flags); | |||
|
|||
/// Return std::nullopt if encountered in NullHandlingMode::StopAtNull mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this comment a bit cryptic. Maybe, use the comment from BaseVector::compare as a base.
/// Returns < 0 if 'left' is less than 'right' at 'index', 0 if
/// equal and > 0 otherwise. If flags.nullHandlingMode is StopAtNull,
/// returns std::nullopt if either 'left' or 'right' value is null or contains a null.
Also, let's document existing 'compare' methods to clarify that these support only NoStop for flags.nullHandlingMode, consider NULL equal to NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -35,6 +35,39 @@ class ContainerRowSerdeTest : public testing::Test, | |||
return position; | |||
} | |||
|
|||
std::vector<std::shared_ptr<ByteStream>> serialize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be simplified to just serialize the data and return a list of positions. No need to create and return ByteStreams. These can be created on the fly from the positions in testCompareWithNulls. I think this would also remove the need for rewindByteStreams method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, It is much cleaner now.
void testCompareWithNulls( | ||
const DecodedVector& decodedVector, | ||
const std::vector<HashStringAllocator::Position>& positions, | ||
const std::vector<std::shared_ptr<ByteStream>>& byteStreams, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per comment above, let's remove byteStreams parameter and create ByteStreeams from positions on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
expected.at(i), | ||
ContainerRowSerde::compareWithNulls( | ||
*byteStreams.at(i), decodedVector, i, compareFlags) | ||
.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that compareWithNulls cannot return std::nullopt? Is this a valid assumption? Don't we want to allow testing cases where compareWithNulls returns std::nullopt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
|
||
auto res = ContainerRowSerde::compareWithNulls( | ||
*byteStreams.at(3), decodedVector, 3, compareFlags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coding index '3' here makes this method hard to use. Let's change expected to std::vector<std::optional<int32_t>> and remove expectedOptionVal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
byteStreams, | ||
{0, 1, -1}, | ||
std::nullopt, | ||
compareFlags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only test different values of equalsOnly and nullHandlingMode, the test may be more readable if we change testCompareWithNulls to take these 2 values instead of CompareFlags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duanmeng Look great % a few remaining nits.
velox/exec/ContainerRowSerde.h
Outdated
/// Returns < 0 if 'left' is less than 'right' at 'index', 0 if | ||
/// equal and > 0 otherwise. If flags.nullHandlingMode is StopAtNull, | ||
/// returns std::nullopt if either 'left' or 'right' value is null or contains | ||
/// a null. If flags.nullHandlingMode is NoStop, returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If flags.nullHandlingMode is NoStop, returns
std::optional<int32_t>(0) if both sides are nulls
Let's remove this sentence. It is missing in comments for other versions of this method.
Alternatively, let's say that "If flags.nullHandlingMode is NoStop then NULL is considered equal to NULL."
ContainerRowSerde::serialize(*data, i, out); | ||
allocator_.finishWrite(out, 0); | ||
|
||
auto cur = std::make_shared<ByteStream>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic using 'cur' is not needed. Let's remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
mode}; | ||
|
||
for (auto i = 0; i < expected.size(); ++i) { | ||
ByteStream buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf -> stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
{1, 2, 3, 4}, | ||
}); | ||
|
||
std::vector<HashStringAllocator::Position> positions = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
serializeWithPositions(data); | ||
|
||
auto someNulls = makeNullableFlatVector<int32_t>({ | ||
{1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not need for {} around numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
auto rowVector = makeRowVector({someNulls}); | ||
SelectivityVector rows(rowVector->size()); | ||
DecodedVector decodedVector; | ||
decodedVector.decode(*rowVector, rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove 'rows' parameter and then remove 'rows' variable.
DecodedVector decodedVector(*decodedVector);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
{1, 2, std::nullopt, 4}, | ||
{1, 5}, | ||
}); | ||
DecodedVector decodedVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can combine these 2 lines into one:
DecodedVector decodedVector(*arrayVector);
Ditto other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova merged this pull request in 34dd00c. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: In Presto, min and max check for null array elements, map values, and struct fields and throw, but in Velox, they don't. For example, the following SQL should throw an exception, "**ARRAY comparison not supported for arrays with null elements**". ```SQL SELECT min(col0) FROM (VALUES (array [1, 2, 3, 4]), (array [null, 1])) AS t(col0) ``` This PR adds a comparison method in `SingleValueAccumulator` using the method added in #6419 to make `min/max` aggregate functions that compare values of complex types able to detect cases when complex type values contain nulls, making the following guarantees, - ARRAY comparison is not supported for arrays with null elements - ROW comparison is not supported for fields with null elements - MAP comparison is not supported for null value elements Part of #6314 Pull Request resolved: #6570 Reviewed By: xiaoxmeng Differential Revision: D49315892 Pulled By: mbasmanova fbshipit-source-id: dffb780ac6252c5e85fb3700db452cf2d7ace8df
Summary: Presto aggregate functions that compare values of complex types need to detect cases when complex type values contain nulls. The new ContainerRowSerde::compareWithNulls API allows to signal the case when complex type value contains a null. Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6419 Reviewed By: xiaoxmeng Differential Revision: D49187047 Pulled By: mbasmanova fbshipit-source-id: 326c6b5e47cd1050ec456b41e7ff56ab3c75c3a6
…okincubator#6570) Summary: In Presto, min and max check for null array elements, map values, and struct fields and throw, but in Velox, they don't. For example, the following SQL should throw an exception, "**ARRAY comparison not supported for arrays with null elements**". ```SQL SELECT min(col0) FROM (VALUES (array [1, 2, 3, 4]), (array [null, 1])) AS t(col0) ``` This PR adds a comparison method in `SingleValueAccumulator` using the method added in facebookincubator#6419 to make `min/max` aggregate functions that compare values of complex types able to detect cases when complex type values contain nulls, making the following guarantees, - ARRAY comparison is not supported for arrays with null elements - ROW comparison is not supported for fields with null elements - MAP comparison is not supported for null value elements Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6570 Reviewed By: xiaoxmeng Differential Revision: D49315892 Pulled By: mbasmanova fbshipit-source-id: dffb780ac6252c5e85fb3700db452cf2d7ace8df
Summary: Presto aggregate functions that compare values of complex types need to detect cases when complex type values contain nulls. The new ContainerRowSerde::compareWithNulls API allows to signal the case when complex type value contains a null. Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6419 Reviewed By: xiaoxmeng Differential Revision: D49187047 Pulled By: mbasmanova fbshipit-source-id: 326c6b5e47cd1050ec456b41e7ff56ab3c75c3a6
…okincubator#6570) Summary: In Presto, min and max check for null array elements, map values, and struct fields and throw, but in Velox, they don't. For example, the following SQL should throw an exception, "**ARRAY comparison not supported for arrays with null elements**". ```SQL SELECT min(col0) FROM (VALUES (array [1, 2, 3, 4]), (array [null, 1])) AS t(col0) ``` This PR adds a comparison method in `SingleValueAccumulator` using the method added in facebookincubator#6419 to make `min/max` aggregate functions that compare values of complex types able to detect cases when complex type values contain nulls, making the following guarantees, - ARRAY comparison is not supported for arrays with null elements - ROW comparison is not supported for fields with null elements - MAP comparison is not supported for null value elements Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6570 Reviewed By: xiaoxmeng Differential Revision: D49315892 Pulled By: mbasmanova fbshipit-source-id: dffb780ac6252c5e85fb3700db452cf2d7ace8df
Summary: Presto aggregate functions that compare values of complex types need to detect cases when complex type values contain nulls. The new ContainerRowSerde::compareWithNulls API allows to signal the case when complex type value contains a null. Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6419 Reviewed By: xiaoxmeng Differential Revision: D49187047 Pulled By: mbasmanova fbshipit-source-id: 326c6b5e47cd1050ec456b41e7ff56ab3c75c3a6
…okincubator#6570) Summary: In Presto, min and max check for null array elements, map values, and struct fields and throw, but in Velox, they don't. For example, the following SQL should throw an exception, "**ARRAY comparison not supported for arrays with null elements**". ```SQL SELECT min(col0) FROM (VALUES (array [1, 2, 3, 4]), (array [null, 1])) AS t(col0) ``` This PR adds a comparison method in `SingleValueAccumulator` using the method added in facebookincubator#6419 to make `min/max` aggregate functions that compare values of complex types able to detect cases when complex type values contain nulls, making the following guarantees, - ARRAY comparison is not supported for arrays with null elements - ROW comparison is not supported for fields with null elements - MAP comparison is not supported for null value elements Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6570 Reviewed By: xiaoxmeng Differential Revision: D49315892 Pulled By: mbasmanova fbshipit-source-id: dffb780ac6252c5e85fb3700db452cf2d7ace8df
Summary: Presto aggregate functions that compare values of complex types need to detect cases when complex type values contain nulls. The new ContainerRowSerde::compareWithNulls API allows to signal the case when complex type value contains a null. Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6419 Reviewed By: xiaoxmeng Differential Revision: D49187047 Pulled By: mbasmanova fbshipit-source-id: 326c6b5e47cd1050ec456b41e7ff56ab3c75c3a6
…okincubator#6570) Summary: In Presto, min and max check for null array elements, map values, and struct fields and throw, but in Velox, they don't. For example, the following SQL should throw an exception, "**ARRAY comparison not supported for arrays with null elements**". ```SQL SELECT min(col0) FROM (VALUES (array [1, 2, 3, 4]), (array [null, 1])) AS t(col0) ``` This PR adds a comparison method in `SingleValueAccumulator` using the method added in facebookincubator#6419 to make `min/max` aggregate functions that compare values of complex types able to detect cases when complex type values contain nulls, making the following guarantees, - ARRAY comparison is not supported for arrays with null elements - ROW comparison is not supported for fields with null elements - MAP comparison is not supported for null value elements Part of facebookincubator#6314 Pull Request resolved: facebookincubator#6570 Reviewed By: xiaoxmeng Differential Revision: D49315892 Pulled By: mbasmanova fbshipit-source-id: dffb780ac6252c5e85fb3700db452cf2d7ace8df
…7138) Summary: Presto aggregate functions that compare values of complex types must detect cases when complex values contain nulls. The new ContainerRowSerde::compareWithNulls API allows signaling the case when a complex type value contains a null when comparing ByteStream with ByteStream. Similar to #6419. Pull Request resolved: #7138 Reviewed By: Yuhta Differential Revision: D50593886 Pulled By: mbasmanova fbshipit-source-id: b0946ab7b1539b1656c831680be0df314f9243af
Presto aggregate functions that compare values of complex types need to detect
cases when complex type values contain nulls. The new
ContainerRowSerde::compareWithNulls API allows to signal the case when complex
type value contains a null.
Part of #6314