Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement mergewith for bytesRange and bytesValue #297

Closed
wants to merge 1 commit into from

Conversation

atanu1991
Copy link
Contributor

@atanu1991 atanu1991 commented Sep 24, 2021

Implement BytesRange::mergeWith and BytesValues::mergeWith
This commit is a follow up of #119.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2021
@amaliujia
Copy link
Contributor

Per this PR, seems to me that I will need to address Merging ranges/byte values in MultiRange::mergeWith in #299?

velox/type/Filter.cpp Outdated Show resolved Hide resolved
@amaliujia
Copy link
Contributor

Per this PR, seems to me that I will need to address Merging ranges/byte values in MultiRange::mergeWith in #299?

cc @mbasmanova to confirm?

@atanu1991 atanu1991 force-pushed the filterbytes branch 2 times, most recently from 898655f to ce968d5 Compare October 1, 2021 00:15
@atanu1991 atanu1991 changed the title [WIP][Draft] Implement mergewith for bytesRange and bytesValue Implement mergewith for bytesRange and bytesValue Oct 1, 2021
@atanu1991
Copy link
Contributor Author

Per this PR, seems to me that I will need to address Merging ranges/byte values in MultiRange::mergeWith in #299?

I have addressed that here. Probably there would some changes required to consider merging bytes ranges and values based of your implementation in #299

@atanu1991 atanu1991 force-pushed the filterbytes branch 6 times, most recently from 569d41c to 2a55e6c Compare October 5, 2021 04:06
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atanu1991 Overall looks good. Some comments below.

Consider adding BytesRange::toString() method to help debugging test failures:

  std::string toString() const final {
    return fmt::format(
        "BytesRange: {}{}, {}{} {}",
        (lowerUnbounded_ || lowerExclusive_) ? "(" : "[",
        lowerUnbounded_ ? "..." : lower_,
        upperUnbounded_ ? "..." : upper_,
        (upperUnbounded_ || upperExclusive_) ? ")" : "]",
        nullAllowed_ ? "with nulls" : "no nulls");
  }

std::string upper = "", lower = "";

// TODO:
// Handle the case of same filter being upperUnbounded and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a TODO? What makes it difficult to support this case?

velox/type/Filter.cpp Outdated Show resolved Hide resolved
velox/type/Filter.cpp Outdated Show resolved Hide resolved
velox/type/Filter.cpp Outdated Show resolved Hide resolved
}
}

std::unique_ptr<Filter> MultiRange::mergeWith(const Filter* other) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a move? Are there any changes? It would be easier to review if this method stayed in place.

Copy link
Contributor Author

@atanu1991 atanu1991 Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function was in line 475. It has not only just been moved but also functionality was added. Line 1060 to 1116 was added.

I can move the whole thing back to 475 line, but I feel its right to have the Multirange::mergewith at the end of the file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to review the changes if the function was moved back. Otherwise, I can't tell what has changed.

velox/type/Filter.cpp Outdated Show resolved Hide resolved
velox/type/tests/FilterTest.cpp Outdated Show resolved Hide resolved
velox/type/tests/FilterTest.cpp Outdated Show resolved Hide resolved
velox/type/tests/FilterTest.cpp Outdated Show resolved Hide resolved
velox/type/tests/FilterTest.cpp Outdated Show resolved Hide resolved
@mbasmanova
Copy link
Contributor

There are some typos in the PR title and description.

Title:

Implement mergewith for bytesRange and bytesValue

Implement BytesRange::mergeWith and BytesValues::mergeWith

Description:

  • "Implement mergewith for bytesRange and bytesValue" is redundant as it repeats the title. Let's remove.
  • "and introduces mergeWith implmentation for BytesRange and BytesValues." - this is also redundant. It is just a different way to say what the title already says.

(New PR) Handling case of merging multiple byte ranges which results in MultiRange as the output

Any particular reason not to include this functionality in this PR? Would this require significant refactoring of the current implementation?

@atanu1991
Copy link
Contributor Author

Thanks a lot for the detailed review.

@mbasmanova The case where the same range is both lower and upper unbounded might result in a MultiRange when merged with another range. This makes it a bit different from the existing cases now, and needs to be handled in a special way. Hence I thought handling it in a new PR would be better. Let me know your thoughts.

@mbasmanova
Copy link
Contributor

The case where the same range is both lower and upper unbounded might result in a MultiRange when merged with another range.

Can you give an example? How are you planning to handle this case?

@mbasmanova
Copy link
Contributor

the same range is both lower and upper unbounded

Is this possible? That's just kAlwaysTrue. We can add a check to the constructor to make sure if such range is never created.

@mbasmanova
Copy link
Contributor

I see that comments in Filter.h are not explaining that MultiRange filter is expected to be a combination of non-overlapping filters and in general each filter must be restrictive, e.g. cannot pass all non-null values unless it is kAlwaysTrue or kIsNotNull. We need to clarify that.

@atanu1991
Copy link
Contributor Author

The case where the same range is both lower and upper unbounded might result in a MultiRange when merged with another range.

Can you give an example? How are you planning to handle this case?

Example a range like (<=c, >=q) ("q", true, false "c", true, false) merged with [a-z]
The output of this would be [a-c] and [q-z] right?

I havent quite thought of exactly how to handle this right now.
Was planning to have a special if block " if (upperunbounded && lowerunbounded) " and handle it there.

@atanu1991
Copy link
Contributor Author

the same range is both lower and upper unbounded

Is this possible? That's just kAlwaysTrue. We can add a check to the constructor to make sure if such range is never created.

If we can have a such a check and this case is not required then there would be no follow ups

@mbasmanova
Copy link
Contributor

Example a range like (<=c, >=q) ("q", true, false "c", true, false)

That would be a MultiRange, right? Not kBytesRange. The TODO is currently in the code that merges two BytesRange's. Is it in the right place?

@atanu1991
Copy link
Contributor Author

Example a range like (<=c, >=q) ("q", true, false "c", true, false)

That would be a MultiRange, right? Not kBytesRange. The TODO is currently in the code that merges two BytesRange's. Is it in the right place?

I see. So basically is it impossible to create a BytesRange object with ("q", true, false "c", true, false)
If yes, then should we have that assert in the BytesRange Constructor?

And yes if this is a MultiRange then there is no TODO remaining for this PR

@mbasmanova
Copy link
Contributor

BytesRange can only represent a single range. It cannot represent <= a OR >= b. Let's add a check to the constructor in a separate PR.

bool bothNanAllowed = nanAllowed_ && multiRangeOther->nanAllowed_;
bool bothNanAllowed = nanAllowed_;

std::vector<const Filter*> otherFilters;
std::vector<std::unique_ptr<Filter>> merged;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's move these variables closer to where they are used.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for the contribution.

@facebook-github-bot
Copy link
Contributor

@atanu1991 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

)

Summary:
Implement BytesRange::mergeWith and BytesValues::mergeWith
This commit is a follow up of facebookincubator#119.

Pull Request resolved: facebookincubator#297

Reviewed By: mbasmanova

Differential Revision: D31444325

Pulled By: atanu1991

fbshipit-source-id: 3e8d8bb2645455fcaf24ebd18d6c122613221958
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31444325

rui-mo pushed a commit to rui-mo/velox that referenced this pull request Mar 17, 2023
…adapt upstream Arrow (facebookincubator#297)

* separate arrow version for gazelle and velox backend

* separate velox and gazelle module to support different arrow version
rui-mo added a commit to rui-mo/velox that referenced this pull request Jun 8, 2023
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Jun 9, 2023
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Jun 27, 2023
zhli1142015 pushed a commit to zhli1142015/velox that referenced this pull request Jul 3, 2023
Yohahaha pushed a commit to Yohahaha/velox that referenced this pull request Jul 4, 2023
chenxu14 pushed a commit to chenxu14/velox that referenced this pull request Jul 5, 2023
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Jul 17, 2023
rui-mo added a commit to rui-mo/velox that referenced this pull request Jul 21, 2023
rui-mo added a commit to rui-mo/velox that referenced this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants