planner: add an upper bound for estimated row count of inner side of index join #41996

time-and-fate · 2023-03-07T14:36:45Z

What problem does this PR solve?

Issue Number: ref #31316

Problem Summary:

Because of our current implementation of index join, the row count for its inner side might be severely overestimated.

What is changed and how it works?

Add a reasonable upper bound for the inner children of index join to prevent very severe estimation errors.

Specifically, the average row count for the inner side IndexScan that corresponds to each row from the outer side should be no larger than (total row count / NDV of join key columns).

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

ti-chi-bot · 2023-03-07T14:36:47Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

AilinKid
winoros

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

time-and-fate · 2023-03-07T15:31:25Z

/retest

time-and-fate · 2023-03-07T16:11:20Z

/test unit-test

time-and-fate · 2023-03-08T12:37:34Z

statistics/integration_test.go

+			"  │ └─IndexRangeScan 1000000.00 cop[tikv] table:t2, index:idx(b) range: decided by [eq(test.t.b, test.t.a)], keep order:false, stats:pseudo",
+			"  └─Selection(Probe) 1000000.00 cop[tikv]  eq(test.t.a, 0)",
+			"    └─TableRowIDScan 1000000.00 cop[tikv] table:t2 keep order:false, stats:pseudo",
+		))


As a comparison, on current master branch, the execution plan is:

IndexJoin 1000000.00 root inner join, inner:IndexLookUp, outer key:test.t.a, inner key:test.t.b, equal cond:eq(test.t.a, test.t.b) ├─TableReader(Build) 1000.00 root data:Selection │ └─Selection 1000.00 cop[tikv] lt(test.t.a, 1), not(isnull(test.t.a)) │ └─TableFullScan 500000.00 cop[tikv] table:t keep order:false, stats:pseudo └─IndexLookUp(Probe) 1000000.00 root ├─Selection(Build) 500000000.00 cop[tikv] not(isnull(test.t.b)) │ └─IndexRangeScan 500000000.00 cop[tikv] table:t2, index:idx(b) range: decided by [eq(test.t.b, test.t.a)], keep order:false, stats:pseudo └─Selection(Probe) 1000000.00 cop[tikv] eq(test.t.a, 0) └─TableRowIDScan 500000000.00 cop[tikv] table:t2 keep order:false, stats:pseudo

winoros · 2023-03-08T20:44:03Z

planner/core/exhaust_physical_plans.go

+			return idxStats.NDV
+		}
+	}
+	return -1


return max{single col ndv} instead?

I think it should be min.

AilinKid

Rest LGTM

AilinKid · 2023-03-12T16:13:55Z

planner/core/exhaust_physical_plans.go

+		}
+	}
+
+	// 3. If we still haven't got an NDV, we use the minimal NDV in the column stats as a lower bound.


prefer a more case 2.5， once the index prefix columns satisfied the keys，we can use the index‘s DNV / the abscent column’s DNV （thought they are completely independent distributions， and result after division is quite small than the real value， leading a higher row count estimations（but not that highest from direct dividing lowest col-ndv） is which we wanna get from here

That's a clever one!
It's not that obvious but indeed a correct lower bound for the NDV. And I think it doesn't need to be prefix columns, it's enough if the index contains the columns we want.

But if we add this, there needs to be a strategy for choosing from multiple possible indexes, and we need to make sure the result is stable, we also need to care about some edge cases like prefix index and generated columns, and we also need to construct corresponding test cases, that's too much for this sprint.
Besides, this PR would be cherry-picked for a customer, safety is more important, and less change is preferred.
So I tend to add a TODO comment here and leave it to the future.

That's a clever one! It's not that obvious but indeed a correct lower bound for the NDV. And I think it doesn't need to be prefix columns, it's enough if the index contains the columns we want.

make sense，subset of index column is enough

time-and-fate · 2023-03-13T12:59:46Z

/retest

time-and-fate · 2023-03-14T08:37:37Z

/merge

ti-chi-bot · 2023-03-14T08:37:41Z

This pull request has been accepted and is ready to merge.

Commit hash: 23f34df

time-and-fate · 2023-03-14T11:34:33Z

/retest

…index join (pingcap#41996) ref pingcap#31316

…V for inner side of index join (#42261) ref #31316, ref #41996

…kload (#42362) close #42351

…#44865) close #44855

…#44865) (#44964) close #44855

…#44865) (#46231) close #44855

time-and-fate added 3 commits March 7, 2023 22:33

add

ff50795

Merge remote-tracking branch 'upstream/master' into s14-issue31316

cb17c50

fmt

42cfcfb

ti-chi-bot added release-note-none size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 7, 2023

time-and-fate added 4 commits March 8, 2023 16:13

update test result

88f933d

add test

76a14c9

update test

54799f2

Merge remote-tracking branch 'upstream/master' into s14-issue31316

d099896

ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 8, 2023

time-and-fate commented Mar 8, 2023

View reviewed changes

time-and-fate added 3 commits March 8, 2023 20:42

reduce changes

4b0cd06

add comments

4995403

Merge remote-tracking branch 'upstream/master' into s14-issue31316

c122370

winoros reviewed Mar 8, 2023

View reviewed changes

time-and-fate added 3 commits March 9, 2023 15:17

add

6a2ef90

Merge remote-tracking branch 'upstream/master' into s14-issue31316

571920f

Merge remote-tracking branch 'upstream/master' into s14-issue31316

dfeabe1

AilinKid reviewed Mar 12, 2023

View reviewed changes

time-and-fate added 3 commits March 13, 2023 20:11

add

bf1213e

Merge remote-tracking branch 'upstream/master' into s14-issue31316

020b838

add

23f34df

AilinKid approved these changes Mar 14, 2023

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 14, 2023

winoros approved these changes Mar 14, 2023

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 14, 2023

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 14, 2023

ti-chi-bot merged commit e264615 into pingcap:master Mar 14, 2023

time-and-fate added a commit to time-and-fate/tidb that referenced this pull request Mar 15, 2023

planner: add an upper bound for estimated row count of inner side of …

236b3f8

…index join (pingcap#41996) ref pingcap#31316

This was referenced Mar 15, 2023

planner: cherry pick #41996 to v5.4.2 #42259

Draft

planner: fix appropriate index stats may be missed when estimating NDV for inner side of index join #42261

Merged

ti-chi-bot pushed a commit that referenced this pull request Mar 16, 2023

planner: fix appropriate index stats may be missed when estimating ND…

94903a2

…V for inner side of index join (#42261) ref #31316, ref #41996

time-and-fate mentioned this pull request Mar 17, 2023

planner: revert #41996 due to an execution plan regression in JOB workload #42362

Merged

12 tasks

ti-chi-bot pushed a commit that referenced this pull request Mar 17, 2023

planner: revert #41996 due to an execution plan regression in JOB wor…

56412f5

…kload (#42362) close #42351

This was referenced Jun 21, 2023

Provide a fix control switch to fix overestimation of index scan of inner side of index join #44855

Closed

planner, sessionctx: reintroduce #41996 through optimizer fix control #44865

Merged

ti-chi-bot bot pushed a commit that referenced this pull request Jun 26, 2023

planner, sessionctx: reintroduce #41996 through optimizer fix control (…

bc80cf9

…#44865) close #44855

ti-chi-bot mentioned this pull request Jun 26, 2023

planner, sessionctx: reintroduce #41996 through optimizer fix control (#44865) #44964

Merged

12 tasks

ti-chi-bot bot pushed a commit that referenced this pull request Jul 10, 2023

planner, sessionctx: reintroduce #41996 through optimizer fix control (…

eb5074d

…#44865) (#44964) close #44855

ti-chi-bot mentioned this pull request Aug 18, 2023

planner, sessionctx: reintroduce #41996 through optimizer fix control (#44865) #46231

Merged

12 tasks

ti-chi-bot bot pushed a commit that referenced this pull request Aug 18, 2023

planner, sessionctx: reintroduce #41996 through optimizer fix control (…

7de3979

…#44865) (#46231) close #44855

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: add an upper bound for estimated row count of inner side of index join #41996

planner: add an upper bound for estimated row count of inner side of index join #41996

time-and-fate commented Mar 7, 2023 •

edited

ti-chi-bot commented Mar 7, 2023 •

edited

time-and-fate commented Mar 7, 2023

time-and-fate commented Mar 7, 2023

time-and-fate Mar 8, 2023

winoros Mar 8, 2023

time-and-fate Mar 9, 2023

time-and-fate Mar 9, 2023

AilinKid left a comment

AilinKid Mar 12, 2023 •

edited

time-and-fate Mar 13, 2023

time-and-fate Mar 13, 2023

AilinKid Mar 14, 2023

time-and-fate commented Mar 13, 2023

time-and-fate commented Mar 14, 2023

ti-chi-bot commented Mar 14, 2023

time-and-fate commented Mar 14, 2023

planner: add an upper bound for estimated row count of inner side of index join #41996

planner: add an upper bound for estimated row count of inner side of index join #41996

Conversation

time-and-fate commented Mar 7, 2023 • edited

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Mar 7, 2023 • edited

time-and-fate commented Mar 7, 2023

time-and-fate commented Mar 7, 2023

time-and-fate Mar 8, 2023

Choose a reason for hiding this comment

winoros Mar 8, 2023

Choose a reason for hiding this comment

time-and-fate Mar 9, 2023

Choose a reason for hiding this comment

time-and-fate Mar 9, 2023

Choose a reason for hiding this comment

AilinKid left a comment

Choose a reason for hiding this comment

AilinKid Mar 12, 2023 • edited

Choose a reason for hiding this comment

time-and-fate Mar 13, 2023

Choose a reason for hiding this comment

time-and-fate Mar 13, 2023

Choose a reason for hiding this comment

AilinKid Mar 14, 2023

Choose a reason for hiding this comment

time-and-fate commented Mar 13, 2023

time-and-fate commented Mar 14, 2023

ti-chi-bot commented Mar 14, 2023

time-and-fate commented Mar 14, 2023

time-and-fate commented Mar 7, 2023 •

edited

ti-chi-bot commented Mar 7, 2023 •

edited

AilinKid Mar 12, 2023 •

edited