Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner, executor: implement the null-aware antiSemiJoin and null-aware antiLeftOuterSemiJoin (hash join with inner build) #37512

Merged
merged 24 commits into from Sep 19, 2022

Conversation

AilinKid
Copy link
Contributor

@AilinKid AilinKid commented Aug 31, 2022

Signed-off-by: AilinKid 314806019@qq.com

What problem does this PR solve?

Issue Number: close #37525

Problem Summary:

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

implement the null-aware antiSemiJoin and null-aware antiLeftOuterSemiJoin

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 31, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • fixdb
  • windtalker

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 31, 2022
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 31, 2022

@AilinKid AilinKid changed the title planner, executor: implement the null-aware antiSemiJoin and null-aware antiLeftOuterSemiJoin planner, executor: implement the null-aware antiSemiJoin and null-aware antiLeftOuterSemiJoin (hash join with inner build) Aug 31, 2022
@fixdb
Copy link
Contributor

fixdb commented Aug 31, 2022

Here is the issue: #37525

@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 1, 2022

/run-mysql-test tidb-test=pr/1953

4 similar comments
@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 2, 2022

/run-mysql-test tidb-test=pr/1953

@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 2, 2022

/run-mysql-test tidb-test=pr/1953

@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 2, 2022

/run-mysql-test tidb-test=pr/1953

@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 2, 2022

/run-mysql-test tidb-test=pr/1953

@@ -142,6 +142,7 @@ type LogicalJoin struct {
preferJoinOrder bool

EqualConditions []*expression.ScalarFunction
NAEQConditions []*expression.ScalarFunction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better if we use a bool value to tell whether is null aware or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Step2: when step1 is finished, then we can determine whether we need to extract NA-EQ from OtherCondition to NAEQConditions.
// when there are still no EqualConditions, let's try to be a NAAJ.

In where the comment is, we only fill the NAEQConditions when there are no common EqualConditions.

@@ -575,12 +646,84 @@ func (j *leftOuterSemiJoiner) Clone() joiner {
return &leftOuterSemiJoiner{baseJoiner: j.baseJoiner.Clone()}
}

type nullAwareAntiLeftOuterSemiJoiner struct {
baseJoiner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put the NAAJType as a member variable of the struct?

Copy link
Contributor Author

@AilinKid AilinKid Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAAJType is not a fixed joiner parameter, but the runtime status according to the left and right rows characteristic dynamically.
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it is only used by AntiLeftOuterSemiJoiner to fill the specified result according to runtime NAAJType.
image

Copy link
Contributor Author

@AilinKid AilinKid Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On only the OnMisMatch(), we will fill it with 1. quite different like before, handling null in onMatch is a kind of acceleration of execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like null-aware join is so different from the normal join, this pr introduce too many if isNA related code, even the top interface tryMatchInners/onMatch are polluted. I wonder that comparing to the current implementation, if it can be more clear by adding a new NAHashJoinExec instead? @XuHuaiyu what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i'm also thinking whether we could split to two join exec that extract the comment part as an interface and baseHashJoinExec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable, the code size will expand dramatically in this pr (some copy work is unavoidable even if extracting an interface for baseHashJoinExec), we can put the clear work in the next pr?

planner/core/exhaust_physical_plans.go Show resolved Hide resolved
@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 8, 2022

/run-mysql-test tidb-test=pr/1953

4 similar comments
@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 8, 2022

/run-mysql-test tidb-test=pr/1953

@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 8, 2022

/run-mysql-test tidb-test=pr/1953

@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 8, 2022

/run-mysql-test tidb-test=pr/1953

@AilinKid
Copy link
Contributor Author

/run-mysql-test tidb-test=pr/1953

// IsNullEQ is used for cases like Except statement where null key should be matched with null key.
// <1,null> is exactly matched with <1,null>, where the null value should not be filtered and
// the null is exactly matched with null only. (while in NAAJ null value should also be matched
// with other non-null item as well)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, null value matched with non-null values?

Copy link
Contributor Author

@AilinKid AilinKid Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, null in (y set), for lhs, it doesn't care what the y actually is.

additional, when match <1, null> probe row to hash table, what are you ganna fetch from hash-table? the pattern should be <1, whatever>

planner/core/explain.go Show resolved Hide resolved
delete from naaj_B;
insert into naaj_B values(2, 2, 2);
select (a, b) not in (select a, b from naaj_B) from naaj_A;
select * from naaj_A where (a, b) not in (select a, b from naaj_B);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have cases where naaj_A has (null, null, null) tuple?

1 NULL 1
explain format = 'brief' select (a, b) not in (select a, b from naaj_B where naaj_A.c = naaj_B.c) from naaj_A;
id estRows task access object operator info
HashJoin 10000.00 root anti left outer semi join, equal:[eq(test.naaj_a.c, test.naaj_b.c)], other cond:eq(test.naaj_a.a, test.naaj_b.a), eq(test.naaj_a.b, test.naaj_b.b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the other condition eq(test.naaj_a.a, test.naaj_b.a), eq(test.naaj_a.b, test.naaj_b.b) a normal eq condition or need to be null aware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it does.

here is the old path, since it already has the correlated EQ condition "naaj_A.c = naaj_B.c", which could be used as a hash join key. So the na-eq condition should be put in the other condition as it was before. (mixture EQ cond and NA-EQ cond in hash join key is little tricky)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be future improvements. We can open another issue to track this optimization (EQ and NAEQ as the hash keys).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

@@ -55,7 +55,7 @@ insert into exam values(1, 'math', 100);
set names utf8 collate utf8_general_ci;
explain format = 'brief' select * from stu where stu.name not in (select 'guo' from exam where exam.stu_id = stu.id);
id estRows task access object operator info
Apply 10000.00 root CARTESIAN anti semi join, other cond:eq(test.stu.name, Column#8)
Apply 10000.00 root Null-aware anti semi join, equal:[eq(test.stu.name, Column#8)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this sql need using Apply?

Copy link
Contributor Author

@AilinKid AilinKid Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, collation block this, see issue #37032

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in apply mode, even there is a na-eq here, it will be appended as other conds in running

@@ -1972,7 +1993,10 @@ func (b *executorBuilder) buildApply(v *plannercore.PhysicalApply) Executor {
if b.err != nil {
return nil
}
otherConditions := append(expression.ScalarFuncs2Exprs(v.EqualConditions), v.OtherConditions...)
// test is in the explain/naaj.test#part5.
// although we prepared the NAEqualConditions, but for Apply mode, we still need move it to other conditions like eq condition did here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Apply does not need care about null/empty things?

Copy link
Contributor Author

@AilinKid AilinKid Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it does, it feels the null & empty of other conditions(na-eq condition is marked) in joiner logic. doesn't like hash join, you can do this at the matching phase before joiner detail works.

@@ -22,7 +22,7 @@
{
"SQL": "select * from t1 where (t1.a, t1.b) not in (select a, b from t2)",
"Plan": [
"HashJoin 3.20 root CARTESIAN anti semi join, other cond:eq(test.t1.a, test.t2.a), eq(test.t1.b, test.t2.b)",
"HashJoin 3.20 root Null-aware anti semi join, equal:[eq(test.t1.b, test.t2.b) eq(test.t1.a, test.t2.a)]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why condition order is changed.

Copy link
Contributor Author

@AilinKid AilinKid Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, some misorder before (>,<), but it doesn't affect the correctness by now.

leftKeyNDV := getColsNDV(h.leftJoinKeys, h.leftSchema, h.leftProfile)
rightKeyNDV := getColsNDV(h.rightJoinKeys, h.rightSchema, h.rightProfile)
var leftKeyNDV, rightKeyNDV float64
if len(h.leftJoinKeys) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both leftJoinKyes and leftNAJoinKeys are non-empty, the ndv will be based only on the leftJoinKyes, is it a by-design behavior?

Copy link
Contributor Author

@AilinKid AilinKid Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(h.leftJoinKeys) > 0 || len(h.rightJoinKeys) > 0, I want this actually, I take it for granted that left keys are always same size of right keys. (join key and na-join key is mutually exclusive)

fixed

planner/core/logical_plans.go Outdated Show resolved Hide resolved
@@ -1942,6 +1950,8 @@ func (p *LogicalJoin) tryToGetMppHashJoin(prop *property.PhysicalProperty, useBC
return nil
}
lkeys, rkeys, _, _ := p.GetJoinKeys()
lNAkeys, rNAKeys := p.GetNAJoinKeys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since mpp does not support na-keys, looks like we should return nil once na-keys is not empty?

Copy link
Contributor Author

@AilinKid AilinKid Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed

@@ -1809,6 +1809,11 @@ func (p *LogicalJoin) shouldUseMPPBCJ() bool {
return checkChildFitBC(p.children[0]) || checkChildFitBC(p.children[1])
}

// canPushToCop checks if it can be pushed to some stores.
func (p *LogicalJoin) canPushToCop(storeTp kv.StoreType) bool {
return len(p.NAEQConditions) == 0 && p.baseLogicalPlan.canPushToCop(storeTp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be some compatible issues here:
Before this pr, for sql like select * from t where nullable_col not in (select xx from t1), if the subquery meet the broadcast threshold, it can be pushed to TiFlash, but after this pr, there is no way to pushdown such join to TiFlash? I think before TiFlash's support of naaj is ready, it's better to add a switch to let user decide to use naaj or cross join

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it will be controlled by the switch before tiflash supports the exec logic. Just forbid pushing down for now is okay?

Copy link
Contributor Author

@AilinKid AilinKid Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, after the switch, the len(p.NAEQConditions) == 0 should exactly be 0 here

Copy link
Contributor

@fixdb fixdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 16, 2022
@AilinKid AilinKid requested a review from a team as a code owner September 17, 2022 05:31
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 17, 2022
Signed-off-by: AilinKid <314806019@qq.com>
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 17, 2022
needCheckBuildRowPos = append(needCheckBuildRowPos, c.hCtx.naKeyColIdx[i])
needCheckProbeRowPos = append(needCheckProbeRowPos, probeHCtx.naKeyColIdx[i])
}
// check the idxs-th value of the join columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does idxs here mean?

Copy link
Contributor Author

@AilinKid AilinKid Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join column(a,b,c) (e,f,g)

find the non-null bit, for example, (b,c) - (e,f)
use the b,c's, and e,f's column index to fetch the value from the chunk to check the same.
the idxs (means b,c's column index)/(e,f's column index)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like needCheckProbeRowPos and needCheckBuildRowPos can be generated outside the loop since it only related to probeKeyNullBits?

executor/hash_table.go Outdated Show resolved Hide resolved
@@ -575,12 +646,84 @@ func (j *leftOuterSemiJoiner) Clone() joiner {
return &leftOuterSemiJoiner{baseJoiner: j.baseJoiner.Clone()}
}

type nullAwareAntiLeftOuterSemiJoiner struct {
baseJoiner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i'm also thinking whether we could split to two join exec that extract the comment part as an interface and baseHashJoinExec.

@@ -1809,6 +1809,11 @@ func (p *LogicalJoin) shouldUseMPPBCJ() bool {
return checkChildFitBC(p.children[0]) || checkChildFitBC(p.children[1])
}

// canPushToCop checks if it can be pushed to some stores.
func (p *LogicalJoin) canPushToCop(storeTp kv.StoreType) bool {
return len(p.NAEQConditions) == 0 && p.baseLogicalPlan.canPushToCop(storeTp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it will be controlled by the switch before tiflash supports the exec logic. Just forbid pushing down for now is okay?

AilinKid and others added 2 commits September 18, 2022 09:26
Co-authored-by: Yiding Cui <winoros@gmail.com>
needCheckBuildRowPos = append(needCheckBuildRowPos, c.hCtx.naKeyColIdx[i])
needCheckProbeRowPos = append(needCheckProbeRowPos, probeHCtx.naKeyColIdx[i])
}
// check the idxs-th value of the join columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like needCheckProbeRowPos and needCheckBuildRowPos can be generated outside the loop since it only related to probeKeyNullBits?

hashVals []hash.Hash64
hasNull []bool
naHasNull []bool
naColNullBitMap []*bitmap.ConcurrentBitmap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need a concurrentBitmap here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's unnecessary here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's unnecessary, I think we should not use concurrentBitMap here, please use a normal bitmap instead.

hCtx.initHash(probeSideChk.NumRows())
numRows := probeSideChk.NumRows()
hCtx.initHash(numRows)
// By now, path 1 and 2 won't be conducted at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why not use if isNAAJ to make sure path 1 and path 2 could not be conducted at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cause we guarantee the EQcondition and NAEQcondition couldn't be filled at same time before.

@AilinKid
Copy link
Contributor Author

AilinKid commented Sep 19, 2022

Looks like needCheckProbeRowPos and needCheckBuildRowPos can be generated outside the loop since it only related to probeKeyNullBits?

Yes it could, for the same code style of GetNullBucketRows, we put it here, fixed

Signed-off-by: AilinKid <314806019@qq.com>
Signed-off-by: AilinKid <314806019@qq.com>
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 19, 2022
@AilinKid
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 3f3f9b7

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Sep 19, 2022
@AilinKid
Copy link
Contributor Author

/merge

@AilinKid
Copy link
Contributor Author

/merge

@ti-chi-bot ti-chi-bot merged commit 0823fdb into pingcap:master Sep 19, 2022
@AilinKid
Copy link
Contributor Author

/run-cherry-picker

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Sep 20, 2022
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-6.2 in PR #37980

@sre-bot
Copy link
Contributor

sre-bot commented Sep 20, 2022

TiDB MergeCI notify

🔴 Bad News! New failing [1] after this pr merged.
These new failed integration tests seem to be caused by the current PR, please try to fix these new failed integration tests, thanks!

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci-tidb/tics-test 🟥 failed 1, success 0, total 1 4 min 18 sec New failing
idc-jenkins-ci-tidb/common-test 🔴 failed 1, success 10, total 11 52 min Existing failure
idc-jenkins-ci-tidb/integration-ddl-test 🔴 failed 1, success 5, total 6 25 min Existing failure
idc-jenkins-ci-tidb/integration-common-test 🔴 failed 2, success 15, total 17 12 min Existing failure
idc-jenkins-ci/integration-cdc-test 🟢 all 37 tests passed 27 min Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 5 min 17 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 4 min 58 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 17 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 2 min 38 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Null-aware Anti Join
8 participants