-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize WindowOperator for pre-sorted input #5437
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf I assume you have a use case where inputs to Window operator are already partitioned and sorted. In this case, you'd like to optimize Window operator to skip partitioning and sorting. To achieve that, I suggest to add a flag to Window operator that indicates that input is already partitioned and sorted and provide an optimized (streaming) implementation that doesn't accumulate all of the input in memory. See StreamingAggregation for an example.
1aebf93
to
c171c0d
Compare
Sorry for the delay response. Agree with implementing StreamingWindow to skip the sorting, and also can reduce the memory footprint. I pushed the code and can you help to review? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit Aditi, would you help take a first pass?
@aditi-pandit Thanks for your review. I have resolved all the comments. Can you help to review again? Thanks. |
@rui-mo Can you help to review? Thanks. |
@rui-mo Thanks for your review. I have made the modifications. Could you please help with another review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just several tiny issues.
@aditi-pandit @rui-mo Thanks for your comments. I have updated and can you help to review again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf Thank you for working on this optimization. Some initial comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for iterating. Looks good % a few nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf Looks great. Just a couple nits remain.
@mbasmanova Thanks for your review. I have resolved all your comment. Can you help to review again? Thanks. |
@mbasmanova It seems that the failed unit tests are not related to this pull request. Could you please help to verify this? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf Looks good % some typos and a question.
d182907
to
e545fc5
Compare
@mbasmanova The CI is passed. Do you have any further comment? Thanks. |
@mbasmanova Do you have any further comment? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova Can you help to look the failed testing? Thanks. |
@mbasmanova merged this pull request in 5a34994. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Spark planner inserts a Sort operator before a Window operator to sort the data. Hence, there is no need to sort the data again in the window operator. Add an option to WindowOperator to skip sorting if inputs are already sorted. This allow WindowOperator to process data in streaming manner without accumulating all input in memory. Pull Request resolved: facebookincubator#5437 Reviewed By: laithsakka Differential Revision: D50198545 Pulled By: mbasmanova fbshipit-source-id: 05ed9ea26691c00310730ceea0153405291023ab
@JkSelf @aditi-pandit Would you help cleanup this code in PlanNode.h:
|
@mbasmanova : This needs Prestissimo changes to be removed. I can follow up. |
Prestissimo code was already updated. Sent PR #7223 |
Spark planner inserts a Sort operator before a Window operator to sort the data.
Hence, there is no need to sort the data again in the window operator.
Add an option to WindowOperator to skip sorting if inputs are already sorted.
This allow WindowOperator to process data in streaming manner without
accumulating all input in memory.