-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][Draft] Implement mergewith for bytesRange and bytesValue #233
Conversation
99fa504
to
22aa839
Compare
22aa839
to
a7db1ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atanu1991 Overall looks good. Use make format-fix to fix formatting issues.
Consider editing PR title for clarity: Implement Filter::mergeWith for string filters
return filters_; | ||
} | ||
|
||
const bool nanAllowed() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the return type can be just "bool"; is there any particular advantage of returning "const bool"?
case FilterKind::kIsNotNull: | ||
return std::make_unique<BytesValues>(*this, false); | ||
case FilterKind::kBytesValues: { | ||
auto otherBytesValues = dynamic_cast<const BytesValues*>(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use static_cast when you know the type; it is faster than dynamic_cast
std::vector<std::string> newValues; | ||
newValues.reserve(smallerFilter->values().size()); | ||
|
||
for (const auto& value: smallerFilter->values()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider comparing the [lower, upper] ranges for the two filters before checking individual values; if there is no overlap, the loop over values can be skipped
return std::make_unique<BytesValues>(std::move(newValues), bothNullAllowed); | ||
} | ||
case FilterKind::kBytesRange: { | ||
auto otherBytesRange = dynamic_cast<const BytesRange*>(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_cast
return std::make_unique<BytesValues>(std::move(newValues), bothNullAllowed); | ||
} | ||
case FilterKind::kMultiRange: { | ||
return mergeByteMultiRange(this, dynamic_cast<const MultiRange*>(other)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_cast
assert(filter->kind() == FilterKind::kBytesRange || | ||
filter->kind() == FilterKind::kBytesValues); | ||
auto merged = current->mergeWith(filter.get()); | ||
newValues.emplace_back(merged.release()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to consider the type of "merged". If it is AllFalse, we can simply skip it. If at the end of the loop we have no filters, we can return AllFalse. If we have just one filter, we can return it as is without wrapping into MultiRange. If we have all filters kBytesValues, then we want to make a new kBytesValues filter with combined set of values. If we have a mix of kBytesValues and kBytesRange, we'd want to combine all kBytesValues into one, then make MultiRange from a single kBytesValues and multiple kBytesRange. This will ensure we'll get the most efficient to evaluate filter as the result of the merge. MultiRange is the least efficient as it needs to loop over individual filters and evaluate each.
"h", "i", "j", "k", "l", "m", "n", | ||
"o", "p", "q", "r", "s", "t", "u", "v", | ||
"w", "x", "y", "z", | ||
"abca", "abcb", "abcc", "abcd"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would you add some longer strings here (> 12 characters) as well as strings with upper case letters, numbers and other special characters?
filters.push_back(between("p", "t")); | ||
filters.push_back(between("p", "t", true)); | ||
|
||
filters.push_back(lessThanOrEqual("k")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you also add > and < filters?
std::vector<std::unique_ptr<Filter>> filters; | ||
std::vector<std::unique_ptr<Filter>> filtersMultiRange; | ||
|
||
// addUntypedFilters(filters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why commented out?
filters.push_back(in({"e", "f", "g", "h"}, true)); | ||
|
||
filtersMultiRange.push_back(orFilter( | ||
between("b", "f"), greaterThanOrEqual("e"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultiRange filter is expected to contain non-overlapping filters only.
Created a new PR as velox is open sourced |
* remove TimeStampWithTimeZone function register * use PModeFunction instead of PModIntFunction * remove TimestampWithTimeZone test * code format
No description provided.