Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Draft] Implement mergewith for bytesRange and bytesValue #233

Closed
wants to merge 1 commit into from

Conversation

atanu1991
Copy link
Contributor

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 16, 2021
@atanu1991 atanu1991 force-pushed the filterbytes branch 5 times, most recently from 99fa504 to 22aa839 Compare September 21, 2021 06:43
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atanu1991 Overall looks good. Use make format-fix to fix formatting issues.

Consider editing PR title for clarity: Implement Filter::mergeWith for string filters

return filters_;
}

const bool nanAllowed() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the return type can be just "bool"; is there any particular advantage of returning "const bool"?

case FilterKind::kIsNotNull:
return std::make_unique<BytesValues>(*this, false);
case FilterKind::kBytesValues: {
auto otherBytesValues = dynamic_cast<const BytesValues*>(other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use static_cast when you know the type; it is faster than dynamic_cast

std::vector<std::string> newValues;
newValues.reserve(smallerFilter->values().size());

for (const auto& value: smallerFilter->values()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider comparing the [lower, upper] ranges for the two filters before checking individual values; if there is no overlap, the loop over values can be skipped

return std::make_unique<BytesValues>(std::move(newValues), bothNullAllowed);
}
case FilterKind::kBytesRange: {
auto otherBytesRange = dynamic_cast<const BytesRange*>(other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast

return std::make_unique<BytesValues>(std::move(newValues), bothNullAllowed);
}
case FilterKind::kMultiRange: {
return mergeByteMultiRange(this, dynamic_cast<const MultiRange*>(other));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast

assert(filter->kind() == FilterKind::kBytesRange ||
filter->kind() == FilterKind::kBytesValues);
auto merged = current->mergeWith(filter.get());
newValues.emplace_back(merged.release());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider the type of "merged". If it is AllFalse, we can simply skip it. If at the end of the loop we have no filters, we can return AllFalse. If we have just one filter, we can return it as is without wrapping into MultiRange. If we have all filters kBytesValues, then we want to make a new kBytesValues filter with combined set of values. If we have a mix of kBytesValues and kBytesRange, we'd want to combine all kBytesValues into one, then make MultiRange from a single kBytesValues and multiple kBytesRange. This will ensure we'll get the most efficient to evaluate filter as the result of the merge. MultiRange is the least efficient as it needs to loop over individual filters and evaluate each.

"h", "i", "j", "k", "l", "m", "n",
"o", "p", "q", "r", "s", "t", "u", "v",
"w", "x", "y", "z",
"abca", "abcb", "abcc", "abcd"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would you add some longer strings here (> 12 characters) as well as strings with upper case letters, numbers and other special characters?

filters.push_back(between("p", "t"));
filters.push_back(between("p", "t", true));

filters.push_back(lessThanOrEqual("k"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you also add > and < filters?

std::vector<std::unique_ptr<Filter>> filters;
std::vector<std::unique_ptr<Filter>> filtersMultiRange;

// addUntypedFilters(filters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commented out?

filters.push_back(in({"e", "f", "g", "h"}, true));

filtersMultiRange.push_back(orFilter(
between("b", "f"), greaterThanOrEqual("e")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiRange filter is expected to contain non-overlapping filters only.

@atanu1991
Copy link
Contributor Author

Created a new PR as velox is open sourced
https://github.com/facebookincubator/velox/pull/297/commits

@atanu1991 atanu1991 closed this Sep 24, 2021
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Apr 28, 2023
* remove TimeStampWithTimeZone function register
* use PModeFunction instead of  PModIntFunction
* remove TimestampWithTimeZone test
* code format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants