Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: add aggregation hints `TIDB_HASHAGG` and `TIDB_STREAMAGG` #11364

Merged
merged 33 commits into from Aug 7, 2019

Conversation

@foreyes
Copy link
Contributor

commented Jul 22, 2019

What problem does this PR solve?

Add Optimizer Hints TIDB_HASHAGG and TIDB_STREAMAGG.

What is changed and how it works?

Handle the hint from parser, and enforce planner to choose the aggregation type.
Related parser PR: pingcap/parser#394

mysql> explain select count(*) from t t1, t t2 where t1.a = t2.b;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| StreamAgg_13             | 1.00     | root | funcs:count(1)                                                     |
| └─HashLeftJoin_26        | 12500.00 | root | inner join, inner:TableReader_20, equal:[eq(test.t1.a, test.t2.b)] |
|   ├─TableReader_22       | 10000.00 | root | data:TableScan_21                                                  |
|   │ └─TableScan_21       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
|   └─TableReader_20       | 10000.00 | root | data:TableScan_19                                                  |
|     └─TableScan_19       | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set (0.00 sec)

mysql> explain select /*+ TIDB_HASHAGG() */ count(*) from t t1, t t2 where t1.a = t2.b;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| HashAgg_11               | 1.00     | root | funcs:count(1)                                                     |
| └─HashLeftJoin_15        | 12500.00 | root | inner join, inner:TableReader_18, equal:[eq(test.t1.a, test.t2.b)] |
|   ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                  |
|   │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
|   └─TableReader_18       | 10000.00 | root | data:TableScan_17                                                  |
|     └─TableScan_17       | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set (0.01 sec)
mysql> explain select count(t1.a) from t t1, t t2 where t1.a = t2.a*2 group by t1.a;
+--------------------------+----------+------+---------------------------------------------------------------------------+
| id                       | count    | task | operator info                                                             |
+--------------------------+----------+------+---------------------------------------------------------------------------+
| HashAgg_13               | 8000.00  | root | group by:test.t1.a, funcs:count(test.t1.a)                                |
| └─HashLeftJoin_16        | 12500.00 | root | inner join, inner:Projection_21, equal:[eq(test.t1.a, mul(test.t2.a, 2))] |
|   ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                         |
|   │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
|   └─Projection_21        | 10000.00 | root | test.t2.a, mul(test.t2.a, 2)                                              |
|     └─TableReader_23     | 10000.00 | root | data:TableScan_22                                                         |
|       └─TableScan_22     | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo               |
+--------------------------+----------+------+---------------------------------------------------------------------------+
7 rows in set (0.00 sec)

mysql> explain select /*+ TIDB_STREAMAGG() */ count(t1.a) from t t1, t t2 where t1.a = t2.a*2 group by t1.a;
+----------------------------+----------+------+---------------------------------------------------------------------------+
| id                         | count    | task | operator info                                                             |
+----------------------------+----------+------+---------------------------------------------------------------------------+
| StreamAgg_15               | 8000.00  | root | group by:test.t1.a, funcs:count(test.t1.a)                                |
| └─Sort_24                  | 12500.00 | root | test.t1.a:asc                                                             |
|   └─HashLeftJoin_16        | 12500.00 | root | inner join, inner:Projection_21, equal:[eq(test.t1.a, mul(test.t2.a, 2))] |
|     ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                         |
|     │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
|     └─Projection_21        | 10000.00 | root | test.t2.a, mul(test.t2.a, 2)                                              |
|       └─TableReader_23     | 10000.00 | root | data:TableScan_22                                                         |
|         └─TableScan_22     | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo               |
+----------------------------+----------+------+---------------------------------------------------------------------------+
8 rows in set (0.00 sec)

Check List

Tests

  • Unit test

Code changes

  • Change plan builder to handling aggregation hints.
  • Change exhaust physical plan to apply aggregation hints.

Side effects

  • Change optimizer behaviors.

Related changes

  • Add new rule in parser
@codecov

This comment has been minimized.

Copy link

commented Jul 22, 2019

Codecov Report

Merging #11364 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #11364   +/-   ##
===========================================
  Coverage   81.6243%   81.6243%           
===========================================
  Files           426        426           
  Lines         93640      93640           
===========================================
  Hits          76433      76433           
  Misses        11807      11807           
  Partials       5400       5400

@foreyes foreyes force-pushed the foreyes:dev/add_agg_hints branch 2 times, most recently from ecd3e00 to ff1f239 Jul 23, 2019

@foreyes foreyes removed the status/WIP label Jul 24, 2019

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

/run-all-tests

@foreyes foreyes requested review from lamxTyler and zz-jason Jul 24, 2019

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

@foreyes foreyes changed the title [WIP] planner: add aggregation hints `TIDB_HASHAGG` and `TIDB_STREAMAGG` planner: add aggregation hints `TIDB_HASHAGG` and `TIDB_STREAMAGG` Jul 24, 2019

Show resolved Hide resolved planner/core/exhaust_physical_plans.go Outdated
Show resolved Hide resolved planner/core/logical_plans.go Outdated

@foreyes foreyes force-pushed the foreyes:dev/add_agg_hints branch from 37eff4e to 5accbd8 Jul 24, 2019

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

/run-all-tests

Show resolved Hide resolved planner/core/logical_plans.go Outdated
Show resolved Hide resolved planner/core/logical_plan_builder.go Outdated
Show resolved Hide resolved planner/core/exhaust_physical_plans.go Outdated

@zz-jason zz-jason removed their request for review Jul 24, 2019

@foreyes foreyes force-pushed the foreyes:dev/add_agg_hints branch from fa37100 to 2c46284 Jul 24, 2019

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

Code improved, PTAL. @zz-jason @XuHuaiyu

@XuHuaiyu
Copy link
Contributor

left a comment

LGTM

@XuHuaiyu

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

We should update the version of parser in go.mod before merging this commit.

@foreyes foreyes force-pushed the foreyes:dev/add_agg_hints branch 3 times, most recently from 80af97d to 74c1bf3 Jul 25, 2019

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Aug 5, 2019

from

This is a good case, it's table t is not matched, since there is no join. But we can improve the warning message.

Show resolved Hide resolved executor/join_test.go Outdated
@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

@foreyes foreyes requested review from eurekaka and XuHuaiyu Aug 6, 2019

all, desc := prop.AllSameOrder()
if len(la.possibleProperties) == 0 || !all {
if !all {

This comment has been minimized.

Copy link
@XuHuaiyu

XuHuaiyu Aug 6, 2019

Contributor

This is not resolved?

@@ -1570,3 +1570,71 @@ func (s *testPlanSuite) TestIndexJoinHint(c *C) {
}
}
}

func (s *testPlanSuite) TestAggregationHints(c *C) {

This comment has been minimized.

Copy link
@XuHuaiyu

XuHuaiyu Aug 6, 2019

Contributor

Add a test case which contains subquery?

This comment has been minimized.

Copy link
@foreyes

foreyes Aug 6, 2019

Author Contributor

For the first one, because possibleChildProperties are only possible... We'd better not rely too much on it, I handle this in line 1272 - 1275, you can take a look.

For the test case, I will add them soon.

This comment has been minimized.

Copy link
@foreyes

foreyes Aug 6, 2019

Author Contributor

When adding test case, find another bug. Fix it soon...

foreyes added some commits Aug 6, 2019

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

Add tests and fix a Merge Join bug, PTAL. @eurekaka @XuHuaiyu

@eurekaka
Copy link
Contributor

left a comment

LGTM

@eurekaka eurekaka added the status/LGT1 label Aug 6, 2019

@XuHuaiyu
Copy link
Contributor

left a comment

LGTM

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

/run-all-tests

@XuHuaiyu

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2019

I'm still curious that, is this expectable?
Or, this will be fixed in another PR?

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `a` (`a`)
);
tidb> desc select /*+ TIDB_HJ(t)  */ a, count(b) from t group by a order by a;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| Projection_22            | 8000.00  | root | test.t.a, 2_col_0                                                  |
| └─StreamAgg_24           | 8000.00  | root | group by:test.t.a, funcs:count(test.t.b), firstrow(test.t.a)       |
|   └─Projection_21        | 10000.00 | root | test.t.a, test.t.b                                                 |
|     └─IndexLookUp_20     | 10000.00 | root |                                                                    |
|       ├─IndexScan_18     | 10000.00 | cop  | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|       └─TableScan_19     | 10000.00 | cop  | table:t, keep order:false, stats:pseudo                            |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

tidb> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                               |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | There are no matching table names for (t) in optimizer hint /*+ TIDB_HJ(t) */. Maybe you can use the table alias name |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@XuHuaiyu XuHuaiyu added status/LGT2 and removed status/LGT1 labels Aug 7, 2019

@foreyes foreyes merged commit a530f87 into pingcap:master Aug 7, 2019

14 checks passed

ci/circleci Your tests passed on CircleCI!
Details
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/build_check_race Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev_2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/common-test job succeeded
Details
idc-jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/mybatis-test job succeeded
Details
idc-jenkins-ci-tidb/sqllogic-test-1 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/sqllogic-test-2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/unit-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@foreyes foreyes deleted the foreyes:dev/add_agg_hints branch Aug 7, 2019

@foreyes

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

I'm still curious that, is this expectable?
Or, this will be fixed in another PR?

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `a` (`a`)
);
tidb> desc select /*+ TIDB_HJ(t)  */ a, count(b) from t group by a order by a;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| Projection_22            | 8000.00  | root | test.t.a, 2_col_0                                                  |
| └─StreamAgg_24           | 8000.00  | root | group by:test.t.a, funcs:count(test.t.b), firstrow(test.t.a)       |
|   └─Projection_21        | 10000.00 | root | test.t.a, test.t.b                                                 |
|     └─IndexLookUp_20     | 10000.00 | root |                                                                    |
|       ├─IndexScan_18     | 10000.00 | cop  | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|       └─TableScan_19     | 10000.00 | cop  | table:t, keep order:false, stats:pseudo                            |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

tidb> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                               |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | There are no matching table names for (t) in optimizer hint /*+ TIDB_HJ(t) */. Maybe you can use the table alias name |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

I know this case, it's expected, but looks weird, I will fix it in another PR. @XuHuaiyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.