-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix window operation with kRange frames with column boundary #9257
Conversation
This pull request was exported from Phabricator. Differential Revision: D55351848 |
✅ Deploy Preview for meta-velox canceled.
|
17e2754
to
9c8a27f
Compare
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
This pull request was exported from Phabricator. Differential Revision: D55351848 |
Hi @aditi-pandit, could you help review this fix? Thanks! |
@kagamiori : So your original idea to pass windowBuild_.inputType_ to createWindowFrame is actually a more correct fix imo. The reason you had other issues is because the wrong type is used to figure indices in multiple places...in sending the function parameter indices to the WindowFunction, in mapping output columns to input columns for getOutput().. There might be other places as well. We likely didn't encounter these issues because none of our testing flipped order of columns in this way :( So we should incorporate this in the fuzzer as well. Let me try your test with some fix logic. I'll send you a sample PR in mind in a few hours. |
@kagamiori : I debugged this a bit more. Your fix is in the right direction. The WindowBuild code had constructed WindowPartitions (used for function and output indices) providing a column mapping using the input type order. So the function and column indexes were accessed correctly. The range frame code in WindowPartition needed to use that mapping as well. Since the Window operator code uses the input type indices everywhere its consistent to send the original index from Window->WindowPartition. See 355e13f for changes over some of the code in this fix. |
velox/exec/WindowBuild.h
Outdated
@@ -76,6 +76,10 @@ class WindowBuild { | |||
return data_->estimateRowSize(); | |||
} | |||
|
|||
column_index_t inputChannelOf(column_index_t originalIndex) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function is needed after the changes in #9266
@@ -462,5 +462,44 @@ TEST_F(SumTest, distinct) { | |||
.assertResults("SELECT c0, sum(distinct c1) FROM tmp GROUP BY 1"); | |||
} | |||
|
|||
TEST_F(SumTest, aaa) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the test name.
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
9c8a27f
to
a44be03
Compare
This pull request was exported from Phabricator. Differential Revision: D55351848 |
Hi @aditi-pandit, thanks a lot for the suggestions in #9266! I've updated the PR and addressed your comments. I've also added addition comments about the expected order of columns in WindowBuild::data_ and WindowBuild::inputColumns_ for clarity. I've combined the computation of inversedInputChannels_ with the computation of inputChannels_ and inputType_ together. Please let me know if you would like a different refactoring or have further suggestions. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kagamiori. This is looking very good. One minor comment.
velox/exec/WindowBuild.h
Outdated
std::unique_ptr<RowContainer> data_; | ||
|
||
// The decodedInputVectors_ are reused across addInput() calls to decode | ||
// the partition and sort keys for the above RowContainer. | ||
std::vector<DecodedVector> decodedInputVectors_; | ||
|
||
// RowColumns for window build used to construct WindowPartition. | ||
// RowColumns for window build used to construct WindowPartition. Columns are | ||
// in the same order as the original input before column reordering. | ||
std::vector<exec::RowColumn> inputColumns_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this variable needed anymore ? We can remove it if so.
@@ -462,5 +462,66 @@ TEST_F(SumTest, distinct) { | |||
.assertResults("SELECT c0, sum(distinct c1) FROM tmp GROUP BY 1"); | |||
} | |||
|
|||
TEST_F(SumTest, kRangeAndKRowsWithUnorderedKeyColumns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this test be moved to AggregateWindowTest.cpp ?
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
a44be03
to
4348a91
Compare
This pull request was exported from Phabricator. Differential Revision: D55351848 |
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
4348a91
to
b9228df
Compare
This pull request was exported from Phabricator. Differential Revision: D55351848 |
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
b9228df
to
24a9fe0
Compare
This pull request was exported from Phabricator. Differential Revision: D55351848 |
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
24a9fe0
to
180d9d9
Compare
This pull request was exported from Phabricator. Differential Revision: D55351848 |
Hi @aditi-pandit, I've updated the PR to address your comments. Please take a look again. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kagamiori
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
180d9d9
to
5a279a4
Compare
This pull request was exported from Phabricator. Differential Revision: D55351848 |
…kincubator#9257) Summary: When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Differential Revision: D55351848
5a279a4
to
f2e70fb
Compare
This pull request was exported from Phabricator. Differential Revision: D55351848 |
This pull request has been merged in 0fdf608. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…kincubator#9257) Summary: Pull Request resolved: facebookincubator#9257 When Window operator is initialized, WindowBuild reorders the input columns to place partition keys and sorting keys first before adding input rows to RowContainer. However, for window operations that use kRange frame with column boundary, e.g., range between c1 preceding and current row, the original column index of the frame column before reordering is passed to computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes this bug by passing the actual index of the frame column after the reordering to computeKRangeFrameBounds(). This diff fixes facebookincubator#9229. Reviewed By: kgpai Differential Revision: D55351848 fbshipit-source-id: c5512700327becf44aee8b29e798455ed184ac87
Summary:
When Window operator is initialized, WindowBuild reorders the input columns to
place partition keys and sorting keys first before adding input rows to
RowContainer. However, for window operations that use kRange frame with
column boundary, e.g., range between c1 preceding and current row, the original
column index of the frame column before reordering is passed to
computeKRangeFrameBounds(), which leads to incorrect result. This diff fixes
this bug by passing the actual index of the frame column after the reordering to
computeKRangeFrameBounds().
This diff fixes #9229.
Differential Revision: D55351848