Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer copying of non-key columns at TopN::addInput(). #7172

Closed
wants to merge 4 commits into from

Conversation

usurai
Copy link
Contributor

@usurai usurai commented Oct 21, 2023

Current implementation copies all the columns for the row that's added to the
priority queue to RowContainer. If a row is added to priority queue and then is
replaced by another row from the same batch of input, copying non-key columns
to RowContainer is wasteful.

This commit adds the following optimization. First, loop over the input rows,
identify rows that need to be added to the priority queue, copy key columns for
these to RowContainer and add these to the priority queue. For the input rows
that remain in the priority queue, copy non-key columns to the RowContainer.
This logic avoids copying non-key columns for rows that do not stay in the
priority queue.

Also, add a check for duplicate sorting keys to TopNNode's constructor.

@netlify
Copy link

netlify bot commented Oct 21, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 40cf586
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/654a0ccac2c0dc00087f9a82

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 21, 2023
@usurai
Copy link
Contributor Author

usurai commented Oct 21, 2023

The failed CI is unrelated: #7154

@xumingming
Copy link
Contributor

@usurai Do you have performance numbers to share?

@usurai
Copy link
Contributor Author

usurai commented Oct 28, 2023

Converting to draft due to #7298

…Input.

At first, only decodes the key columns. While processing the input, only copies the key columns for the row that's added to priority queue to RowContainer(to be used for later comparison). After the whole batch of input is all compared, decodes and copies the passed input rows to RowContainer.
@usurai usurai marked this pull request as ready for review November 3, 2023 16:36
@usurai
Copy link
Contributor Author

usurai commented Nov 3, 2023

@xumingming No concrete number at my hand since there is no benchmark for this operator. I guess the diff would mostly depend on the workload, e.g. if non-key columns are complex type, this change will make some difference.
But yeah, if needed, I can add a benchmark for this operator.

@usurai
Copy link
Contributor Author

usurai commented Nov 3, 2023

@mbasmanova Any chance you could help to review this? Thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

velox/exec/TopN.cpp Outdated Show resolved Hide resolved
velox/exec/TopN.cpp Outdated Show resolved Hide resolved
decodedVectors_[col].decode(*input->childAt(col));
}

// Maps passed rows of 'data_' to the corresponding input row number. These
// input rows of non-key columns are later decoded and stored into data_.
std::unordered_map<char*, size_t> passedRows;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use F14FastMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F14FastMap with char* as key type caused some issue so I use void* instead.

velox/exec/TopN.cpp Outdated Show resolved Hide resolved
velox/exec/TopN.cpp Outdated Show resolved Hide resolved
velox/exec/TopN.cpp Outdated Show resolved Hide resolved
- Skip at some path if no non-key columns and decode all the rows.
- Use F14FastMap.
- Enforce non-duplicated sorting key at TopNNode.
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % a few minor comments.

velox/core/PlanNode.h Outdated Show resolved Hide resolved
- Update TopN ctor: move to cpp and check the result of insert.
- Add sanity check.
- Update the documentation.
@usurai usurai changed the title Defer decoding and copying of non-key columns at TopN::addInput(). Defer copying of non-key columns at TopN::addInput(). Nov 7, 2023
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % some nits.

velox/core/PlanNode.h Outdated Show resolved Hide resolved
velox/core/PlanNode.cpp Outdated Show resolved Hide resolved
velox/core/PlanNode.cpp Outdated Show resolved Hide resolved
velox/docs/develop/operators.rst Outdated Show resolved Hide resolved
velox/exec/TopN.cpp Outdated Show resolved Hide resolved
velox/exec/tests/TopNTest.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Move #include to PlanNode.cpp and fix naming issues.
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 26e4c06.

Copy link

Conbench analyzed the 1 benchmark run on commit 26e4c062.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@usurai usurai deleted the topn branch November 8, 2023 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants