Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix window operation with kRange frames with column boundary #9257

Closed
wants to merge 1 commit into from

Conversation

kagamiori
Copy link
Contributor

Summary:
When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes #9229.

Differential Revision: D55351848

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

Copy link

netlify bot commented Mar 26, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit f2e70fb
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/660b0b2e897c8a000856cfb5

kagamiori added a commit to kagamiori/velox that referenced this pull request Mar 26, 2024
…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to 
place partition keys and sorting keys first before adding input rows to 
RowContainer. However, for window operations that use kRange frame with 
column boundary, e.g., range between c1 preceding and current row, the original 
column index of the frame column before reordering is passed to 
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes 
this bug by passing the actual index of the frame column after the reordering to 
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

@kagamiori
Copy link
Contributor Author

Hi @aditi-pandit, could you help review this fix? Thanks!

@aditi-pandit
Copy link
Collaborator

@kagamiori : So your original idea to pass windowBuild_.inputType_ to createWindowFrame is actually a more correct fix imo.

The reason you had other issues is because the wrong type is used to figure indices in multiple places...in sending the function parameter indices to the WindowFunction, in mapping output columns to input columns for getOutput().. There might be other places as well.

We likely didn't encounter these issues because none of our testing flipped order of columns in this way :( So we should incorporate this in the fuzzer as well.

Let me try your test with some fix logic. I'll send you a sample PR in mind in a few hours.

@aditi-pandit
Copy link
Collaborator

aditi-pandit commented Mar 26, 2024

@kagamiori : I debugged this a bit more. Your fix is in the right direction.

The WindowBuild code had constructed WindowPartitions (used for function and output indices) providing a column mapping using the input type order. So the function and column indexes were accessed correctly.

The range frame code in WindowPartition needed to use that mapping as well. Since the Window operator code uses the input type indices everywhere its consistent to send the original index from Window->WindowPartition.

See 355e13f for changes over some of the code in this fix.

velox/exec/WindowBuild.cpp Show resolved Hide resolved
@@ -76,6 +76,10 @@ class WindowBuild {
return data_->estimateRowSize();
}

column_index_t inputChannelOf(column_index_t originalIndex) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function is needed after the changes in #9266

@@ -462,5 +462,44 @@ TEST_F(SumTest, distinct) {
.assertResults("SELECT c0, sum(distinct c1) FROM tmp GROUP BY 1");
}

TEST_F(SumTest, aaa) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the test name.

kagamiori added a commit to kagamiori/velox that referenced this pull request Mar 30, 2024
…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

@kagamiori
Copy link
Contributor Author

Hi @aditi-pandit, thanks a lot for the suggestions in #9266! I've updated the PR and addressed your comments. I've also added addition comments about the expected order of columns in WindowBuild::data_ and WindowBuild::inputColumns_ for clarity. I've combined the computation of inversedInputChannels_ with the computation of inputChannels_ and inputType_ together. Please let me know if you would like a different refactoring or have further suggestions. Thanks!

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kagamiori. This is looking very good. One minor comment.

std::unique_ptr<RowContainer> data_;

// The decodedInputVectors_ are reused across addInput() calls to decode
// the partition and sort keys for the above RowContainer.
std::vector<DecodedVector> decodedInputVectors_;

// RowColumns for window build used to construct WindowPartition.
// RowColumns for window build used to construct WindowPartition. Columns are
// in the same order as the original input before column reordering.
std::vector<exec::RowColumn> inputColumns_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this variable needed anymore ? We can remove it if so.

@@ -462,5 +462,66 @@ TEST_F(SumTest, distinct) {
.assertResults("SELECT c0, sum(distinct c1) FROM tmp GROUP BY 1");
}

TEST_F(SumTest, kRangeAndKRowsWithUnorderedKeyColumns) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this test be moved to AggregateWindowTest.cpp ?

kagamiori added a commit to kagamiori/velox that referenced this pull request Apr 1, 2024
…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

kagamiori added a commit to kagamiori/velox that referenced this pull request Apr 1, 2024
…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

kagamiori added a commit to kagamiori/velox that referenced this pull request Apr 1, 2024
…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

kagamiori added a commit to kagamiori/velox that referenced this pull request Apr 1, 2024
…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

@kagamiori
Copy link
Contributor Author

Hi @aditi-pandit, I've updated the PR to address your comments. Please take a look again. Thanks!

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kagamiori

kagamiori added a commit to kagamiori/velox that referenced this pull request Apr 1, 2024
…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

…kincubator#9257)

Summary:

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Differential Revision: D55351848
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55351848

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 0fdf608.

Copy link

Conbench analyzed the 1 benchmark run on commit 0fdf6082.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Apr 4, 2024
…kincubator#9257)

Summary:
Pull Request resolved: facebookincubator#9257

When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().

This diff fixes facebookincubator#9229.

Reviewed By: kgpai

Differential Revision: D55351848

fbshipit-source-id: c5512700327becf44aee8b29e798455ed184ac87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect result of window operation with k range frame
3 participants