Fix infinite parallel join probe loop #7925

xiaoxmeng · 2023-12-08T01:02:26Z

There is infinite loop problem found in parallel table build in Meta internal test.
If bucketOffset_ (int64_t) in ProbeState exceeds int32_t max limit, then
HashTable::nextBucketOffset(int32_t) might cast it to a smaller value which cause
fullProbe to restart the probe from a lower bucket offset and repeat this process
forever.
This PR fixes the problem by (1) change nextBucketOffset to take int64_t to int32_t
to prevent offset integer overflow, and (2) fall back to the overflow insert path if the
next bucket offset is below the starting bucket offset; (3) to prevent the similar bug
to happen, we add to count the number of probed buckets and throws if the count
exceeds the total number of buckets in the table. This fails a query but will not hang
the driver thread.
Also refactor the code by putting all the parallel join build related parameter into a
TableInsertPartitionInfo data structure to simplify the implementation.

netlify · 2023-12-08T01:02:33Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`e5d8208`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65734e168d4e640008337096

facebook-github-bot · 2023-12-08T01:17:14Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-12-08T04:10:05Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-12-08T06:20:07Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-12-08T06:53:43Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-12-08T07:22:34Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mbasmanova

Thanks.

mbasmanova · 2023-12-08T07:36:50Z

velox/exec/HashTable.h

@@ -864,8 +896,8 @@ class HashTable : public BaseHashTable {
  // many threads can set this.
  std::atomic<bool> hasDuplicates_{false};

-  // Offset of next row link for join build side, 0 if none. Copied
-  // from 'rows_'.
+  // Offset of next row link for join build side, 0 if none. Copied from


What does "Copied from 'rows_'" means here? Does this need to be int64_t?

I change to "set from 'rows_'". This is (relative) offset from a pointer so int32_t should be sufficient.

mbasmanova · 2023-12-08T07:40:33Z

velox/exec/HashTable.cpp

-    std::vector<char*>* overflow) {
-  // The insertable rows are in the table, all get put in the hash
-  // table or array.
+    TableInsertPartitionInfo* partitionInfo) {


This is nice. Thanks.

velox/exec/tests/HashTableTest.cpp

There is infinite loop problem found in parallel table build in Meta internal test. If bucketOffset_ (int64_t) in ProbeState exceeds int32_t max limit, then HashTable::nextBucketOffset(int32_t) might cast it to a smaller value which cause fullProbe to restart the probe from a lower bucket offset and repeat this process forever. This PR fixes the problem by (1) change nextBucketOffset to take int64_t to int32_t to prevent offset integer overflow, and (2) fall back to the overflow insert path if the next bucket offset is below the starting bucket offset; (3) to prevent the similar bug to happen, we add to count the number of probed buckets and throws if the count exceeds the total number of buckets in the table. This fails a query but will not hang the driver thread. Also refactor the code by putting all the parallel join build related parameter into a TableInsertPartitionInfo data structure to simplify the implementation.

facebook-github-bot · 2023-12-08T17:11:03Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-12-08T20:56:22Z

@xiaoxmeng merged this pull request in bcd6652.

conbench-facebook · 2023-12-08T21:17:11Z

Conbench analyzed the 1 benchmark run on commit bcd6652d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 8, 2023

xiaoxmeng force-pushed the parallel branch from 3311f43 to a0c0bbf Compare December 8, 2023 01:16

xiaoxmeng force-pushed the parallel branch 2 times, most recently from 39c2aad to 05479d7 Compare December 8, 2023 04:09

xiaoxmeng marked this pull request as ready for review December 8, 2023 04:48

xiaoxmeng force-pushed the parallel branch from 05479d7 to 534eec5 Compare December 8, 2023 04:49

xiaoxmeng requested review from mbasmanova and oerling December 8, 2023 04:49

xiaoxmeng force-pushed the parallel branch 4 times, most recently from ef95744 to 7c76b4f Compare December 8, 2023 06:19

xiaoxmeng force-pushed the parallel branch from 7c76b4f to 0712f70 Compare December 8, 2023 06:52

xiaoxmeng force-pushed the parallel branch from 0712f70 to cc2065c Compare December 8, 2023 07:22

mbasmanova approved these changes Dec 8, 2023

View reviewed changes

xiaoxmeng force-pushed the parallel branch from cc2065c to e5d8208 Compare December 8, 2023 17:10

facebook-github-bot closed this in bcd6652 Dec 8, 2023

facebook-github-bot added the Merged label Dec 8, 2023

xiaoxmeng deleted the parallel branch December 9, 2023 00:08

This was referenced Dec 10, 2023

Fix parallel build overflow for last bucket #7919

Closed

HashTable::fullProbe may enter an infinite loop #7914

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix infinite parallel join probe loop #7925

Fix infinite parallel join probe loop #7925

xiaoxmeng commented Dec 8, 2023 •

edited

netlify bot commented Dec 8, 2023 •

edited

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

mbasmanova left a comment

mbasmanova Dec 8, 2023

xiaoxmeng Dec 8, 2023 •

edited

mbasmanova Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

conbench-facebook bot commented Dec 8, 2023

Fix infinite parallel join probe loop #7925

Fix infinite parallel join probe loop #7925

Conversation

xiaoxmeng commented Dec 8, 2023 • edited

netlify bot commented Dec 8, 2023 • edited

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

mbasmanova left a comment

Choose a reason for hiding this comment

mbasmanova Dec 8, 2023

Choose a reason for hiding this comment

xiaoxmeng Dec 8, 2023 • edited

Choose a reason for hiding this comment

mbasmanova Dec 8, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Dec 8, 2023

facebook-github-bot commented Dec 8, 2023

conbench-facebook bot commented Dec 8, 2023

xiaoxmeng commented Dec 8, 2023 •

edited

netlify bot commented Dec 8, 2023 •

edited

xiaoxmeng Dec 8, 2023 •

edited