Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Filter::mergeWith for MultiRange #299

Closed
wants to merge 2 commits into from

Conversation

amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Sep 24, 2021

  1. Implement Filter::mergeWith for MultiRange.
  2. Filter clone() now allows overriding nullAllowed.

@facebook-github-bot
Copy link
Contributor

Hi @amaliujia!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@amaliujia
Copy link
Contributor Author

cc @mbasmanova

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2021
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice. I made some comments. Adding a test would help flesh our corner cases.

velox/type/Filter.cpp Outdated Show resolved Hide resolved
// null, then this filter should be treated as isNULL.
if (bothNullAllowed && !merged[0]->testNull()) {
auto replaceFilter =
merged[0]->mergeWith(std::make_unique<IsNull>().get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite right. We want OR semantics here, not AND. We may need to modify the clone() method to take an optional new value for the nullAllowed flag. Also, we don't want to wrap a single filter in a MultiRange, rather return it as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think I have to return IsNull directly.

return std::unique_ptr<Filter>(
new MultiRange(std::move(merged), bothNullAllowed, nanAllowed_));
} else {
return this->clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite right as we still need to merge with the other filter. See existing implementations of mergeWith.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to add more cases. PLAT

@amaliujia
Copy link
Contributor Author

@mbasmanova

I tried to address your comments and added some unit tests. Can you take a look?

meanwhile, can you share some examples of testing nulless? For multirange nullness tests, it's a bit complicated:

there are difference between top-level and inner-level nullness, there are also left/right multi-range that have not the same nullAllowed.

Until this moment, I am actually not sure the rule of dealing with null in Velox? If one filter allows NULL and another filter does not allow it, does their AND allow or not allow NULL? how about OR?

@amaliujia
Copy link
Contributor Author

cc @atanu1991 @kgpai

@amaliujia amaliujia changed the title [WIP] Implement Filter::mergeWith for MultiRange Implement Filter::mergeWith for MultiRange Sep 28, 2021
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments below.

MultiRange represents an OR of 2 or more filters. The nullAllowed flag in MultiRange is used to determine whether a null value passes the filter or not. The nullAllowed flags of the contained filters are not considered.

@@ -423,6 +423,18 @@ bool MultiRange::testFloat(float value) const {
return false;
}

bool MultiRange::testInt64(int64_t value) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiRange filters are not used for integral types. For these types, we only use BigintMultiRange. Hence, MultiRange::testInt64 should just throw.

auto innerMerged = filter->mergeWith(filterOther.get());
switch (innerMerged->kind()) {
case FilterKind::kAlwaysFalse:
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "continue;"? E.g. (a or b) AND (c or d) = (a and c) or (a and d) or (b and c) or (b and d). If one of these is false, we just drop it but keep the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This break is to break the switch, not the nested loop? Thus this is doing what you are saying If one of these is false, we just drop it but keep the rest..

case FilterKind::kIsNull:
// the merged filter is valid when it matches with top-level
// nullAllowed.
if (innerMerged->testNull() && bothNullAllowed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kIsNotNull is not possible. kIsNull can be treated same as kAlwaysFalse, since, bothNullAllowed alone will determine whether null value passes the combined filter.

if (merged.empty()) {
return nullOrFalse(bothNullAllowed);
} else if (merged.size() == 1) {
// If both sides allow null while the merged inner filter does not allow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not consider nullAllowed flags of the inner filters, only top-level nullAllowed flag should be used. Hence, we need to make a clone of merged[0] with nullAllowed = bothNullAllowed

case FilterKind::kIsNull:
return other->mergeWith(this);
case FilterKind::kIsNotNull: {
std::unique_ptr<Filter> clonedFilter = this->clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do this only if this filter's nullAllowed is false?

Copy link
Contributor Author

@amaliujia amaliujia Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So null AND not null should be null still. Only not null AND not null will be not null.

Sure but what to return when it is null AND not null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking it should just return this->clone() when null ADN not null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how I'm thinking about this. There are 2 possibilities for a MultiRange. Either it allows a null value or not.

  • MultiRange(nullAllowed = true) AND IS NULL => IS NULL
  • MultiRange(nullAllowed = true) AND IS NOT NULL => MultiRange(nullAllowed = false)
  • MultiRange(nullAllowed = false) AND IS NULL => ALWAYS FALSE
  • MultiRange(nullAllowed = false) AND IS NOT NULL => MultiRange(nullAllowed = false)

Does this make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellence summary. I even added this summary as comments in code.

return std::make_unique<MultiRange>(
std::move(filters), /*nullAllowed=*/false, this->nanAllowed_);
}
case FilterKind::kBoolValue:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, not necessary to list all these values

velox/type/tests/FilterTest.cpp Outdated Show resolved Hide resolved
velox/type/tests/FilterTest.cpp Outdated Show resolved Hide resolved
velox/type/Filter.cpp Outdated Show resolved Hide resolved
velox/type/Filter.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a few comments.

velox/type/Filter.cpp Outdated Show resolved Hide resolved
velox/type/Filter.cpp Outdated Show resolved Hide resolved
velox/type/Filter.h Outdated Show resolved Hide resolved
return other->mergeWith(this);
case FilterKind::kIsNotNull: {
if (this->nullAllowed_) {
return this->clone(/*nullAllowed=*/false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova I dont understand why
MultiRange(nullAllowed=true) AND IS NOT NULL => MultiRange(nullAllowed=false)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A value passes F1 AND F2 only if it passes both F1 and F2. When F2 = IS NOT NULL, any value other than null passes, e.g. null does not pass. However, null passes MultiRange(nullAllowed=true). To exclude null we simply change to MultiRange(nullAllowed=false)

@@ -1083,8 +1154,26 @@ class BytesValues final : public Filter {
upper_ = *std::max_element(values_.begin(), values_.end());
}

std::unique_ptr<Filter> clone() const final {
return std::make_unique<BytesValues>(*this);
BytesValues(const BytesValues& other, bool nullAllowed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of repeating code (which will need changes twice if we change the logic), we can do something like re-using the above constructor/assigning the value directly. Its also more efficient imho (I guess)?

a12933b#diff-280c77e7a8b1d7cfaeaaed70f8ac11c67e881b4f1682ee099bc326a1b53e9349

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried this style, for example:
BytesValues(const BytesValues& other, bool nullAllowed) {
// call the existing constructor.
}

I am from Java world and this is how usually Java works.

However the compile complains and say bt this way the default base class is not initialized. Do you know how to fix this issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the compiler complaining. When I did the same as pointed in the diff, it seemed to work.

@@ -1021,8 +1072,28 @@ class BytesRange final : public AbstractRange {
!lowerExclusive_ && !upperExclusive_ && !lowerUnbounded_ &&
!upperUnbounded_ && lower_ == upper_) {}

std::unique_ptr<Filter> clone() const final {
return std::make_unique<BytesRange>(*this);
BytesRange(const BytesRange& other, bool nullAllowed)
Copy link
Contributor

@atanu1991 atanu1991 Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

TEST(FilterTest, cloneTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why all of them are expected to return true for nullallowed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: cloneTest -> clone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those filters are constructed by setting nullAllowed as false. Then I explicitly call clone(true) to reset the nullAllowed to true, and then expect testIsNull returns true.

if (merged.empty()) {
return nullOrFalse(bothNullAllowed);
} else if (merged.size() == 1) {
return merged[0]->clone(bothNullAllowed);
Copy link
Contributor

@atanu1991 atanu1991 Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return std::move(merged.front())

Are we cloning to account for bothNullAllowed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is just ensure that the only filter matches the top level nullAllowed. For MultiRange merge, we will ignore inner filters' nullness.

new MultiRange(std::move(merged), bothNullAllowed, bothNanAllowed));
}
}
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a couple of comments remaining. Please, update the PR description.

case FilterKind::kIsNull:
return other->mergeWith(this);
case FilterKind::kIsNotNull: {
if (this->nullAllowed_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, shorten to return this->clone(false);

}
}

TEST(FilterTest, cloneTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: cloneTest -> clone

@amaliujia
Copy link
Contributor Author

Have updated PR description.

@amaliujia amaliujia closed this Oct 1, 2021
@amaliujia amaliujia reopened this Oct 1, 2021
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I have one final ask and this will be ready to merge.

case FilterKind::kIsNotNull:
return this->clone(/*nullAllowed=*/false);
default:
VELOX_UNREACHABLE();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the remaining things to add here would be:

kDoubleRange
kFloatRange
kBytesRange
kBytesValues

These can come in a follow-up PR though.

One ask here is to change the order of the 'case' statements to match all other implementations to allow for easier side-by-side comparison, e.g.

case FilterKind::kAlwaysTrue:
case FilterKind::kAlwaysFalse:
case FilterKind::kIsNull:
  return other->mergeWith(this);
case FilterKind::kIsNotNull:
  return this->clone(/*nullAllowed=*/false);
case FilterKind::kDoubleRange:
case FilterKind::kFloatRange:
case FilterKind::kBytesRange:
case FilterKind::kBytesValues:
  // TODO Implement
  VELOX_UNREACHABLE();
case FilterKind::kMultiRange:
  ...
default:
  VELOX_UNREACHABLE();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Done.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@amaliujia Rui, thank you for the contribution.

rui-mo added a commit to rui-mo/velox that referenced this pull request Mar 17, 2023
rui-mo added a commit to rui-mo/velox that referenced this pull request Jun 8, 2023
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Jun 9, 2023
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Jun 27, 2023
zhli1142015 pushed a commit to zhli1142015/velox that referenced this pull request Jul 3, 2023
Yohahaha pushed a commit to Yohahaha/velox that referenced this pull request Jul 4, 2023
chenxu14 pushed a commit to chenxu14/velox that referenced this pull request Jul 5, 2023
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Jul 17, 2023
rui-mo added a commit to rui-mo/velox that referenced this pull request Jul 21, 2023
rui-mo added a commit to rui-mo/velox that referenced this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants