-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement mergewith for bytesRange and bytesValue #297
Conversation
bda4971
to
3621f6d
Compare
Per this PR, seems to me that I will need to address |
cc @mbasmanova to confirm? |
898655f
to
ce968d5
Compare
569d41c
to
2a55e6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atanu1991 Overall looks good. Some comments below.
Consider adding BytesRange::toString() method to help debugging test failures:
std::string toString() const final {
return fmt::format(
"BytesRange: {}{}, {}{} {}",
(lowerUnbounded_ || lowerExclusive_) ? "(" : "[",
lowerUnbounded_ ? "..." : lower_,
upperUnbounded_ ? "..." : upper_,
(upperUnbounded_ || upperExclusive_) ? ")" : "]",
nullAllowed_ ? "with nulls" : "no nulls");
}
velox/type/Filter.cpp
Outdated
std::string upper = "", lower = ""; | ||
|
||
// TODO: | ||
// Handle the case of same filter being upperUnbounded and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a TODO? What makes it difficult to support this case?
velox/type/Filter.cpp
Outdated
} | ||
} | ||
|
||
std::unique_ptr<Filter> MultiRange::mergeWith(const Filter* other) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a move? Are there any changes? It would be easier to review if this method stayed in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function was in line 475. It has not only just been moved but also functionality was added. Line 1060 to 1116 was added.
I can move the whole thing back to 475 line, but I feel its right to have the Multirange::mergewith at the end of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to review the changes if the function was moved back. Otherwise, I can't tell what has changed.
There are some typos in the PR title and description. Title:
Implement BytesRange::mergeWith and BytesValues::mergeWith Description:
Any particular reason not to include this functionality in this PR? Would this require significant refactoring of the current implementation? |
Thanks a lot for the detailed review. @mbasmanova The case where the same range is both lower and upper unbounded might result in a MultiRange when merged with another range. This makes it a bit different from the existing cases now, and needs to be handled in a special way. Hence I thought handling it in a new PR would be better. Let me know your thoughts. |
Can you give an example? How are you planning to handle this case? |
Is this possible? That's just kAlwaysTrue. We can add a check to the constructor to make sure if such range is never created. |
I see that comments in Filter.h are not explaining that MultiRange filter is expected to be a combination of non-overlapping filters and in general each filter must be restrictive, e.g. cannot pass all non-null values unless it is kAlwaysTrue or kIsNotNull. We need to clarify that. |
Example a range like (<=c, >=q) ("q", true, false "c", true, false) merged with [a-z] I havent quite thought of exactly how to handle this right now. |
If we can have a such a check and this case is not required then there would be no follow ups |
That would be a MultiRange, right? Not kBytesRange. The TODO is currently in the code that merges two BytesRange's. Is it in the right place? |
I see. So basically is it impossible to create a BytesRange object with ("q", true, false "c", true, false) And yes if this is a MultiRange then there is no TODO remaining for this PR |
BytesRange can only represent a single range. It cannot represent <= a OR >= b. Let's add a check to the constructor in a separate PR. |
velox/type/Filter.cpp
Outdated
bool bothNanAllowed = nanAllowed_ && multiRangeOther->nanAllowed_; | ||
bool bothNanAllowed = nanAllowed_; | ||
|
||
std::vector<const Filter*> otherFilters; | ||
std::vector<std::unique_ptr<Filter>> merged; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's move these variables closer to where they are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you for the contribution.
@atanu1991 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
) Summary: Implement BytesRange::mergeWith and BytesValues::mergeWith This commit is a follow up of facebookincubator#119. Pull Request resolved: facebookincubator#297 Reviewed By: mbasmanova Differential Revision: D31444325 Pulled By: atanu1991 fbshipit-source-id: 3e8d8bb2645455fcaf24ebd18d6c122613221958
This pull request was exported from Phabricator. Differential Revision: D31444325 |
…adapt upstream Arrow (facebookincubator#297) * separate arrow version for gazelle and velox backend * separate velox and gazelle module to support different arrow version
Implement BytesRange::mergeWith and BytesValues::mergeWith
This commit is a follow up of #119.