Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan, stats: fix inconsistent row count estimation #7233

Merged
merged 5 commits into from Aug 6, 2018

Conversation

@lamxTyler
Copy link
Member

commented Aug 1, 2018

What have you changed? (mandatory)

Sometimes, the estimated count of a smaller set of expression could be smaller than that of a superset, and it is not consistent. This PR fixes it by preferring the estimation of the superset because it could use more stats info.

What is the type of the changes? (mandatory)

  • Improvement (non-breaking change which is an improvement to an existing feature)

How has this PR been tested? (mandatory)

Unit test.

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

No.

Does this PR affect tidb-ansible update? (mandatory)

No.

Does this PR need to be added to the release notes? (mandatory)

No.

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

PTAL @coocood @zz-jason @winoros

@lamxTyler lamxTyler changed the title plan, stats: fix inconsistent estimation plan, stats: fix inconsistent row count estimation Aug 1, 2018
// (1): The stats type, always prefer the primary key or index.
// (2): The number of expression that it covers, the more the better.
// (3): The number of columns that it contains, the less the better.
if (bestTp == colType && set.tp < colType) || bestCount < bits || (bestCount == bits && bestNumCols > set.numCols) {

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler Aug 1, 2018

Author Member

It is required by this PR, because after the change in logical_plans.go, it would cause failure in TestIndexRead, it would sometimes choose index (b,c).

@@ -41,11 +41,13 @@ type AnalyzeExec struct {
tasks []*analyzeTask
}

// MaxBucketSize is the maximum number of bucket that a histogram could contain.
var MaxBucketSize = int64(256)

This comment has been minimized.

Copy link
@ngaut

ngaut Aug 1, 2018

Member

This is not a good design. Do not export a variable.

@@ -357,3 +358,13 @@ func (e *AnalyzeColumnsExec) buildStats() (hists []*statistics.Histogram, cms []
}
return hists, cms, nil
}

// SetMaxBucketSize sets the `maxBucketSize`.
func SetMaxBucketSize(size int64) {

This comment has been minimized.

Copy link
@winoros

winoros Aug 2, 2018

Member

This should only used in test?

Copy link
Member

left a comment

lgtm

@winoros winoros added the status/LGT1 label Aug 3, 2018
@lamxTyler

This comment has been minimized.

Copy link
Member Author

commented Aug 6, 2018

// (1): The stats type, always prefer the primary key or index.
// (2): The number of expression that it covers, the more the better.
// (3): The number of columns that it contains, the less the better.
if (bestTp == colType && set.tp < colType) || bestCount < bits || (bestCount == bits && bestNumCols > set.numCols) {

This comment has been minimized.

Copy link
@coocood

coocood Aug 6, 2018

Member

I think set.tp != colType is better, because the type is not a scalar.

@winoros
winoros approved these changes Aug 6, 2018
@@ -35,6 +35,8 @@ type exprSet struct {
mask int64
// ranges contains all the ranges we got.
ranges []*ranger.Range
// numCols is the number of columns contained in the index or column(which is always 1).

This comment has been minimized.

Copy link
@zhexuany

zhexuany Aug 6, 2018

Member

If it is always 1, why don't use a const?

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler Aug 6, 2018

Author Member

It is always 1 for the column, while it could also greater than 1 for the index.

This comment has been minimized.

Copy link
@zhexuany

zhexuany Aug 6, 2018

Member

gotcha.

Copy link
Member

left a comment

LGTM

@lamxTyler

This comment has been minimized.

Copy link
Member Author

commented Aug 6, 2018

/run-all-tests

@lamxTyler lamxTyler added status/LGT2 and removed status/LGT1 labels Aug 6, 2018
Copy link
Member

left a comment

LGTM

@zz-jason zz-jason merged commit 44e6c3c into pingcap:master Aug 6, 2018
11 checks passed
11 checks passed
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci-tidb/build Jenkins job succeeded.
Details
jenkins-ci-tidb/common-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-compatibility-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
jenkins-ci-tidb/mybatis-test Jenkins job succeeded.
Details
jenkins-ci-tidb/sqllogic-test Jenkins job succeeded.
Details
jenkins-ci-tidb/unit-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details
@lamxTyler lamxTyler deleted the lamxTyler:stats branch Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.