Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize WindowOperator for pre-sorted input #5437

Closed
wants to merge 1 commit into from

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Jun 28, 2023

Spark planner inserts a Sort operator before a Window operator to sort the data.
Hence, there is no need to sort the data again in the window operator.

Add an option to WindowOperator to skip sorting if inputs are already sorted.
This allow WindowOperator to process data in streaming manner without
accumulating all input in memory.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2023
@netlify
Copy link

netlify bot commented Jun 28, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit e545fc5
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/652487827004b80008b7796b

velox/exec/Window.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf I assume you have a use case where inputs to Window operator are already partitioned and sorted. In this case, you'd like to optimize Window operator to skip partitioning and sorting. To achieve that, I suggest to add a flag to Window operator that indicates that input is already partitioned and sorted and provide an optimized (streaming) implementation that doesn't accumulate all of the input in memory. See StreamingAggregation for an example.

@JkSelf JkSelf force-pushed the window-op1 branch 2 times, most recently from 1aebf93 to c171c0d Compare July 24, 2023 02:03
@JkSelf
Copy link
Contributor Author

JkSelf commented Jul 24, 2023

@JkSelf I assume you have a use case where inputs to Window operator are already partitioned and sorted. In this case, you'd like to optimize Window operator to skip partitioning and sorting. To achieve that, I suggest to add a flag to Window operator that indicates that input is already partitioned and sorted and provide an optimized (streaming) implementation that doesn't accumulate all of the input in memory. See StreamingAggregation for an example.

Sorry for the delay response. Agree with implementing StreamingWindow to skip the sorting, and also can reduce the memory footprint. I pushed the code and can you help to review? Thanks.

@JkSelf JkSelf changed the title Add flag to control whether need to sort in window op Implement Streaming Window to align with Spark Window operator Jul 24, 2023
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit Aditi, would you help take a first pass?

velox/exec/StreamingWindow.h Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/functions/lib/window/tests/WindowTestBase.cpp Outdated Show resolved Hide resolved
velox/functions/lib/window/tests/WindowTestBase.cpp Outdated Show resolved Hide resolved
velox/functions/lib/window/tests/WindowTestBase.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
@JkSelf
Copy link
Contributor Author

JkSelf commented Jul 25, 2023

@aditi-pandit Thanks for your review. I have resolved all the comments. Can you help to review again? Thanks.

@JkSelf
Copy link
Contributor Author

JkSelf commented Jul 25, 2023

@rui-mo Can you help to review? Thanks.

velox/exec/StreamingWindow.h Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
@JkSelf
Copy link
Contributor Author

JkSelf commented Jul 26, 2023

@rui-mo Thanks for your review. I have made the modifications. Could you please help with another review?

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just several tiny issues.

velox/exec/Window.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.h Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.h Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.h Outdated Show resolved Hide resolved
@JkSelf
Copy link
Contributor Author

JkSelf commented Jul 27, 2023

@aditi-pandit @rui-mo Thanks for your comments. I have updated and can you help to review again?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf Thank you for working on this optimization. Some initial comments.

velox/docs/develop/operators.rst Outdated Show resolved Hide resolved
velox/core/PlanNode.h Outdated Show resolved Hide resolved
velox/core/PlanNode.cpp Outdated Show resolved Hide resolved
velox/exec/tests/utils/PlanBuilder.h Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.h Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
velox/exec/StreamingWindow.cpp Outdated Show resolved Hide resolved
@JkSelf JkSelf changed the title Implement Streaming Window to align with Spark Window operator GLUTEN: Implement Streaming Window to align with Spark Window operator Aug 3, 2023
@JkSelf JkSelf changed the title GLUTEN: Implement Streaming Window to align with Spark Window operator [GLUTEN] Implement Streaming Window to align with Spark Window operator Aug 8, 2023
@JkSelf JkSelf changed the title [GLUTEN] Implement Streaming Window to align with Spark Window operator Implement Streaming Window to align with Spark Window operator Aug 24, 2023
velox/exec/Window.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterating. Looks good % a few nits.

velox/exec/tests/utils/PlanBuilder.h Outdated Show resolved Hide resolved
velox/exec/tests/utils/PlanBuilder.cpp Outdated Show resolved Hide resolved
velox/functions/lib/window/tests/WindowTestBase.cpp Outdated Show resolved Hide resolved
velox/functions/lib/window/tests/WindowTestBase.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf Looks great. Just a couple nits remain.

@JkSelf
Copy link
Contributor Author

JkSelf commented Sep 30, 2023

@JkSelf Looks great. Just a couple nits remain.

@mbasmanova Thanks for your review. I have resolved all your comment. Can you help to review again? Thanks.

@JkSelf
Copy link
Contributor Author

JkSelf commented Sep 30, 2023

@mbasmanova It seems that the failed unit tests are not related to this pull request. Could you please help to verify this? Thank you.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf Looks good % some typos and a question.

velox/docs/develop/operators.rst Outdated Show resolved Hide resolved
@JkSelf JkSelf force-pushed the window-op1 branch 2 times, most recently from d182907 to e545fc5 Compare October 9, 2023 23:06
@JkSelf
Copy link
Contributor Author

JkSelf commented Oct 10, 2023

@mbasmanova The CI is passed. Do you have any further comment? Thanks.

@JkSelf
Copy link
Contributor Author

JkSelf commented Oct 11, 2023

@mbasmanova Do you have any further comment? Thanks.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@mbasmanova mbasmanova changed the title Add StreamingWindowBuild support Optimize WindowOperator for pre-sorted input Oct 11, 2023
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@JkSelf
Copy link
Contributor Author

JkSelf commented Oct 11, 2023

@mbasmanova Can you help to look the failed testing? Thanks.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 5a34994.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit 5a34994d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
Spark planner inserts a Sort operator before a Window operator to sort the data.
Hence, there is no need to sort the data again in the window operator.

Add an option to WindowOperator to skip sorting if inputs are already sorted.
This allow WindowOperator to process data in streaming manner without
accumulating all input in memory.

Pull Request resolved: facebookincubator#5437

Reviewed By: laithsakka

Differential Revision: D50198545

Pulled By: mbasmanova

fbshipit-source-id: 05ed9ea26691c00310730ceea0153405291023ab
@mbasmanova
Copy link
Contributor

@JkSelf @aditi-pandit Would you help cleanup this code in PlanNode.h:

#ifdef VELOX_ENABLE_BACKWARD_COMPATIBILITY
  WindowNode(
      PlanNodeId id,

@aditi-pandit
Copy link
Collaborator

@JkSelf @aditi-pandit Would you help cleanup this code in PlanNode.h:

#ifdef VELOX_ENABLE_BACKWARD_COMPATIBILITY
  WindowNode(
      PlanNodeId id,

@mbasmanova : This needs Prestissimo changes to be removed. I can follow up.

@aditi-pandit
Copy link
Collaborator

aditi-pandit commented Oct 24, 2023

Prestissimo code was already updated. Sent PR #7223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants