plan, stats: fix inconsistent row count estimation #7233

alivxxx · 2018-08-01T12:10:27Z

What have you changed? (mandatory)

Sometimes, the estimated count of a smaller set of expression could be smaller than that of a superset, and it is not consistent. This PR fixes it by preferring the estimation of the superset because it could use more stats info.

What is the type of the changes? (mandatory)

Improvement (non-breaking change which is an improvement to an existing feature)

How has this PR been tested? (mandatory)

Unit test.

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

No.

Does this PR affect tidb-ansible update? (mandatory)

No.

Does this PR need to be added to the release notes? (mandatory)

No.

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

PTAL @coocood @zz-jason @winoros

alivxxx · 2018-08-01T12:14:33Z

statistics/selectivity.go

+			// (1): The stats type, always prefer the primary key or index.
+			// (2): The number of expression that it covers, the more the better.
+			// (3): The number of columns that it contains, the less the better.
+			if (bestTp == colType && set.tp < colType) || bestCount < bits || (bestCount == bits && bestNumCols > set.numCols) {


It is required by this PR, because after the change in logical_plans.go, it would cause failure in TestIndexRead, it would sometimes choose index (b,c).

ngaut · 2018-08-01T12:17:07Z

executor/analyze.go

@@ -41,11 +41,13 @@ type AnalyzeExec struct {
 	tasks []*analyzeTask
 }

+// MaxBucketSize is the maximum number of bucket that a histogram could contain.
+var MaxBucketSize = int64(256)


This is not a good design. Do not export a variable.

winoros · 2018-08-02T07:59:50Z

executor/analyze.go

@@ -357,3 +358,13 @@ func (e *AnalyzeColumnsExec) buildStats() (hists []*statistics.Histogram, cms []
 	}
 	return hists, cms, nil
 }
+
+// SetMaxBucketSize sets the `maxBucketSize`.
+func SetMaxBucketSize(size int64) {


This should only used in test?

winoros

lgtm

alivxxx · 2018-08-06T06:38:58Z

PTAL @coocood @zz-jason

coocood · 2018-08-06T07:13:48Z

statistics/selectivity.go

+			// (1): The stats type, always prefer the primary key or index.
+			// (2): The number of expression that it covers, the more the better.
+			// (3): The number of columns that it contains, the less the better.
+			if (bestTp == colType && set.tp < colType) || bestCount < bits || (bestCount == bits && bestNumCols > set.numCols) {


I think set.tp != colType is better, because the type is not a scalar.

zhexuany · 2018-08-06T07:55:24Z

statistics/selectivity.go

@@ -35,6 +35,8 @@ type exprSet struct {
 	mask int64
 	// ranges contains all the ranges we got.
 	ranges []*ranger.Range
+	// numCols is the number of columns contained in the index or column(which is always 1).


If it is always 1, why don't use a const?

It is always 1 for the column, while it could also greater than 1 for the index.

zhexuany

LGTM

alivxxx · 2018-08-06T09:19:40Z

/run-all-tests

zz-jason

LGTM

plan, stats: fix inconsistent estimation

66e5570

alivxxx changed the title ~~plan, stats: fix inconsistent estimation~~ plan, stats: fix inconsistent row count estimation Aug 1, 2018

alivxxx added component/statistics type/enhancement labels Aug 1, 2018

alivxxx commented Aug 1, 2018

View reviewed changes

ngaut reviewed Aug 1, 2018

View reviewed changes

address comment

9b59798

winoros reviewed Aug 2, 2018

View reviewed changes

address comment

962bb19

winoros previously approved these changes Aug 2, 2018

View reviewed changes

winoros added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 3, 2018

Merge branch 'master' of github.com:pingcap/tidb into stats

8ba5843

alivxxx dismissed winoros’s stale review via 8ba5843 August 6, 2018 06:37

coocood reviewed Aug 6, 2018

View reviewed changes

address comment

2569283

winoros approved these changes Aug 6, 2018

View reviewed changes

zhexuany reviewed Aug 6, 2018

View reviewed changes

zhexuany approved these changes Aug 6, 2018

View reviewed changes

alivxxx added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 6, 2018

zz-jason approved these changes Aug 6, 2018

View reviewed changes

zz-jason merged commit 44e6c3c into pingcap:master Aug 6, 2018

alivxxx deleted the stats branch August 6, 2018 09:41

alivxxx added the status/all tests passed label Aug 6, 2018

eurekaka mentioned this pull request Jan 3, 2019

Incorrect row count estimation in TiDB 2.1.1 #8921

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plan, stats: fix inconsistent row count estimation #7233

plan, stats: fix inconsistent row count estimation #7233

alivxxx commented Aug 1, 2018 •

edited

alivxxx Aug 1, 2018 •

edited

ngaut Aug 1, 2018

winoros Aug 2, 2018

winoros left a comment

alivxxx commented Aug 6, 2018

coocood Aug 6, 2018

zhexuany Aug 6, 2018

alivxxx Aug 6, 2018

zhexuany Aug 6, 2018

zhexuany left a comment

alivxxx commented Aug 6, 2018

zz-jason left a comment

plan, stats: fix inconsistent row count estimation #7233

plan, stats: fix inconsistent row count estimation #7233

Conversation

alivxxx commented Aug 1, 2018 • edited

What have you changed? (mandatory)

What is the type of the changes? (mandatory)

How has this PR been tested? (mandatory)

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

Does this PR affect tidb-ansible update? (mandatory)

Does this PR need to be added to the release notes? (mandatory)

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

alivxxx Aug 1, 2018 • edited

Choose a reason for hiding this comment

ngaut Aug 1, 2018

Choose a reason for hiding this comment

winoros Aug 2, 2018

Choose a reason for hiding this comment

winoros left a comment

Choose a reason for hiding this comment

alivxxx commented Aug 6, 2018

coocood Aug 6, 2018

Choose a reason for hiding this comment

zhexuany Aug 6, 2018

Choose a reason for hiding this comment

alivxxx Aug 6, 2018

Choose a reason for hiding this comment

zhexuany Aug 6, 2018

Choose a reason for hiding this comment

zhexuany left a comment

Choose a reason for hiding this comment

alivxxx commented Aug 6, 2018

zz-jason left a comment

Choose a reason for hiding this comment

alivxxx commented Aug 1, 2018 •

edited

alivxxx Aug 1, 2018 •

edited