Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite parallel join probe loop #7925

Closed
wants to merge 1 commit into from

Conversation

xiaoxmeng
Copy link
Contributor

@xiaoxmeng xiaoxmeng commented Dec 8, 2023

There is infinite loop problem found in parallel table build in Meta internal test.
If bucketOffset_ (int64_t) in ProbeState exceeds int32_t max limit, then
HashTable::nextBucketOffset(int32_t) might cast it to a smaller value which cause
fullProbe to restart the probe from a lower bucket offset and repeat this process
forever.
This PR fixes the problem by (1) change nextBucketOffset to take int64_t to int32_t
to prevent offset integer overflow, and (2) fall back to the overflow insert path if the
next bucket offset is below the starting bucket offset; (3) to prevent the similar bug
to happen, we add to count the number of probed buckets and throws if the count
exceeds the total number of buckets in the table. This fails a query but will not hang
the driver thread.
Also refactor the code by putting all the parallel join build related parameter into a
TableInsertPartitionInfo data structure to simplify the implementation.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 8, 2023
Copy link

netlify bot commented Dec 8, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit e5d8208
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65734e168d4e640008337096

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@xiaoxmeng xiaoxmeng force-pushed the parallel branch 2 times, most recently from 39c2aad to 05479d7 Compare December 8, 2023 04:09
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -864,8 +896,8 @@ class HashTable : public BaseHashTable {
// many threads can set this.
std::atomic<bool> hasDuplicates_{false};

// Offset of next row link for join build side, 0 if none. Copied
// from 'rows_'.
// Offset of next row link for join build side, 0 if none. Copied from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Copied from 'rows_'" means here? Does this need to be int64_t?

Copy link
Contributor Author

@xiaoxmeng xiaoxmeng Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change to "set from 'rows_'". This is (relative) offset from a pointer so int32_t should be sufficient.

std::vector<char*>* overflow) {
// The insertable rows are in the table, all get put in the hash
// table or array.
TableInsertPartitionInfo* partitionInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. Thanks.

velox/exec/tests/HashTableTest.cpp Outdated Show resolved Hide resolved
velox/exec/tests/HashTableTest.cpp Outdated Show resolved Hide resolved
There is infinite loop problem found in parallel table build in Meta internal test.
If bucketOffset_ (int64_t) in ProbeState exceeds int32_t max limit, then
HashTable::nextBucketOffset(int32_t) might cast it to a smaller value which cause
fullProbe to restart the probe from a lower bucket offset and repeat this process
forever.
This PR fixes the problem by (1) change nextBucketOffset to take int64_t to int32_t
to prevent offset integer overflow, and (2) fall back to the overflow insert path if the
next bucket offset is below the starting bucket offset; (3) to prevent the similar bug
to happen, we add to count the number of probed buckets and throws if the count
exceeds the total number of buckets in the table. This fails a query but will not hang
the driver thread.
Also refactor the code by putting all the parallel join build related parameter into a
TableInsertPartitionInfo data structure to simplify the implementation.
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in bcd6652.

Copy link

Conbench analyzed the 1 benchmark run on commit bcd6652d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@xiaoxmeng xiaoxmeng deleted the parallel branch December 9, 2023 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants