Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: use max correlation in heuristic row count estimation #10537

Merged
merged 9 commits into from Jun 4, 2019

Conversation

Projects
None yet
4 participants
@eurekaka
Copy link
Contributor

commented May 20, 2019

What problem does this PR solve?

For queries like

select id from tbl where asc_25 between '2020-05-16' and date_add('2020-05-17', interval 100 day) and asc_90 between '2020-05-16' and date_add('2020-05-17', interval 100 day) order by id limit 1;

column asc_90 is more decisive for the row count estimation, so we should use max correlation if conditions contain multiple columns, instead of the previous compound correlation, otherwise, tidb_opt_correlation_exp_factor may have little effect to adjust the preference of index scan.

What is changed and how it works?

  • Remove duplicate extracted columns;
  • Compute max correlation instead of compound correlation;
  • Enable heuristic estimation by default;
  • Fix bug of setting session variable for tidb_opt_correlation_exp_factor;

Check List

Tests

  • Unit test: existing tests cover part of the change
  • Integration test: would be tested in plan stability test framework

Code changes

N/A

Side effects

N/A

Related changes

N/A

eurekaka added some commits May 4, 2019

@eurekaka

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2019

/run-all-tests

1 similar comment
@eurekaka

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

/run-all-tests

@codecov

This comment has been minimized.

Copy link

commented May 23, 2019

Codecov Report

Merging #10537 into master will decrease coverage by 0.0256%.
The diff coverage is 77.7777%.

@@               Coverage Diff               @@
##            master     #10537        +/-   ##
===============================================
- Coverage   77.328%   77.3023%   -0.0257%     
===============================================
  Files          413        412         -1     
  Lines        87262      86573       -689     
===============================================
- Hits         67478      66923       -555     
+ Misses       14600      14506        -94     
+ Partials      5184       5144        -40
1 similar comment
@codecov

This comment has been minimized.

Copy link

commented May 23, 2019

Codecov Report

Merging #10537 into master will decrease coverage by 0.0256%.
The diff coverage is 77.7777%.

@@               Coverage Diff               @@
##            master     #10537        +/-   ##
===============================================
- Coverage   77.328%   77.3023%   -0.0257%     
===============================================
  Files          413        412         -1     
  Lines        87262      86573       -689     
===============================================
- Hits         67478      66923       -555     
+ Misses       14600      14506        -94     
+ Partials      5184       5144        -40
@codecov

This comment has been minimized.

Copy link

commented May 23, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@ce91d25). Click here to learn what that means.
The diff coverage is 77.7777%.

@@             Coverage Diff             @@
##             master     #10537   +/-   ##
===========================================
  Coverage          ?   78.3631%           
===========================================
  Files             ?        414           
  Lines             ?      87716           
  Branches          ?          0           
===========================================
  Hits              ?      68737           
  Misses            ?      13836           
  Partials          ?       5143

@eurekaka eurekaka marked this pull request as ready for review May 23, 2019

@eurekaka

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

/rebuild

@eurekaka eurekaka requested review from lamxTyler and zz-jason May 23, 2019

Show resolved Hide resolved sessionctx/variable/session.go
Show resolved Hide resolved planner/core/find_best_task.go Outdated

eurekaka added some commits May 27, 2019

@eurekaka eurekaka requested a review from lamxTyler May 27, 2019

@@ -804,6 +807,13 @@ func (ds *DataSource) convertToTableScan(prop *property.PhysicalProperty, candid
if prop.ExpectedCnt < ds.stats.RowCount {
count, ok, corr := ds.crossEstimateRowCount(path, prop.ExpectedCnt, candidate.isMatchProp && prop.Items[0].Desc)
if ok {
// TODO: actually, before using this count as the estimated row count of table scan, we need additionally
// check if count < row_count(first_region | last_region), and use the larger one since we build one copTask

This comment has been minimized.

Copy link
@winoros

winoros May 27, 2019

Member

Do we really need this todo? Coprocessor also breaks when it scans the first needed row?

This comment has been minimized.

Copy link
@eurekaka

eurekaka May 29, 2019

Author Contributor

Coprocessor also breaks when it scans the first needed row?

Hmm I don't understand this comment, could you please elaborate more on this? I am not saying that we modify coprocessor to return just needed rows, I mean that we should set rowCount as:

rowCount = math.Max(count, regionRowCount)
// regionRowCount is the row count of first region or last region based on ASC/DESC
@lamxTyler
Copy link
Member

left a comment

LGTM

@zz-jason
Copy link
Member

left a comment

LGTM

@zz-jason zz-jason added the status/LGT2 label Jun 4, 2019

@zz-jason zz-jason merged commit fc1da0d into pingcap:master Jun 4, 2019

8 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 77.7777% of diff hit (target 0%)
Details
codecov/project No report found to compare against
Details
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/build_check_race Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev_2 Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@eurekaka eurekaka deleted the eurekaka:max_corr branch Jun 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.