Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan,executor: support IndexJoin over UnionScan #7877

Merged
merged 9 commits into from Oct 12, 2018

Conversation

Projects
None yet
4 participants
@eurekaka
Copy link
Contributor

eurekaka commented Oct 11, 2018

What problem does this PR solve?

Fix pingcap/docs-cn#862

Before this PR, IndexJoin only applies when inner child is DataSource. In the above issue, the join query is in a write transaction, so each DataSource would be decorated with an LogicalUnionScan on top of it, thus making IndexJoin not applicable for the join implementation.

What is changed and how it works?

First enhance executor to support UnionScan as inner child of IndexJoin, then we enable IndexJoin for UnionScan in planner.

Below is the output after this PR:

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1;
Empty set (0.00 sec)

mysql> select * from t2;
+----+------+------+------+
| id | a    | b    | c    |
+----+------+------+------+
|  1 |    1 |    1 |    1 |
+----+------+------+------+
1 row in set (0.00 sec)

mysql> insert into t1 values(2,2,2);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t2 values(2,2,2);
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1;
+----+------+------+
| id | a    | b    |
+----+------+------+
|  2 |    2 |    2 |
+----+------+------+
1 row in set (0.00 sec)

mysql> select * from t2;
+----+------+------+------+
| id | a    | b    | c    |
+----+------+------+------+
|  1 |    1 |    1 |    1 |
|  2 |    2 |    2 |    2 |
+----+------+------+------+
2 rows in set (0.00 sec)

mysql> select /*+ TIDB_INLJ(t1, t2) */ * from t1 join t2 on t1.a = t2.id;
+----+------+------+----+------+------+------+
| id | a    | b    | id | a    | b    | c    |
+----+------+------+----+------+------+------+
|  2 |    2 |    2 |  2 |    2 |    2 |    2 |
+----+------+------+----+------+------+------+
1 row in set (0.00 sec)

mysql> explain select /*+ TIDB_INLJ(t1, t2) */ * from t1 join t2 on t1.a = t2.id;
+--------------------------+----------+------+---------------------------------------------------------------------------+
| id                       | count    | task | operator info                                                             |
+--------------------------+----------+------+---------------------------------------------------------------------------+
| IndexJoin_11             | 1.25     | root | inner join, inner:UnionScan_10, outer key:test.t1.a, inner key:test.t2.id |
| ├─UnionScan_12           | 10000.00 | root |                                                                           |
| │ └─TableReader_14       | 10000.00 | root | data:TableScan_13                                                         |
| │   └─TableScan_13       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
| └─UnionScan_10           | 0.00     | root |                                                                           |
|   └─TableReader_9        | 0.00     | root | data:TableScan_8                                                          |
|     └─TableScan_8        | 0.00     | cop  | table:t2, range: decided by [test.t1.a], keep order:false, stats:pseudo   |
+--------------------------+----------+------+---------------------------------------------------------------------------+
7 rows in set (0.00 sec)

Check List

Tests

  • Unit test
  • Manual test: check example pasted above

Future works

  • For each outer tuple of IndexJoin, we would call buildAndSortAddedRows to sort dirty table, we should consider caching the sorted dirty table for this IndexJoin operator;
  • For TableScan or IndexScan below UnionScan, we would build a new range according to each outer tuple now, shall we apply this on UnionScan as well? if we do this, we cannot cache the sorted dirty table then, this should be a tradeoff;
  • Another issue related to above decision is the computation of IndexJoin cost, i.e, shall we update PhysicalIndexJoin::getCost to take UnionScan overhead into consideration?

eurekaka added some commits Oct 10, 2018

@eurekaka

This comment has been minimized.

Copy link
Contributor Author

eurekaka commented Oct 11, 2018

/run-all-tests

@eurekaka eurekaka removed the status/WIP label Oct 11, 2018

@eurekaka eurekaka changed the title [WIP] plan,executor: support IndexJoin over UnionScan plan,executor: support IndexJoin over UnionScan Oct 11, 2018

@eurekaka

This comment has been minimized.

Copy link
Contributor Author

eurekaka commented Oct 12, 2018

@zz-jason

This comment has been minimized.

Copy link
Member

zz-jason commented Oct 12, 2018

LGTM

@@ -199,10 +204,8 @@ func (ds *DataSource) tryToGetDualTask() (task, error) {
// findBestTask implements the PhysicalPlan interface.
// It will enumerate all the available indices and choose a plan with least cost.
func (ds *DataSource) findBestTask(prop *property.PhysicalProperty) (t task, err error) {
// If ds is an inner plan in an IndexJoin, the IndexJoin will generate an inner plan by itself.
// So here we do nothing.
// TODO: Add a special prop to handle IndexJoin's inner plan.

This comment has been minimized.

Copy link
@winoros

winoros Oct 12, 2018

Member

Keep this TODO?

This comment has been minimized.

Copy link
@eurekaka

eurekaka Oct 12, 2018

Author Contributor

I have thought about adding this special prop in this patch, and then realized that current solution using nil for quick return is clear enough, so I removed this TODO.

physicalUnionScan.SetChildren(scan)
scan = physicalUnionScan
}
return scan

This comment has been minimized.

Copy link
@lysu

lysu Oct 12, 2018

Member

465-473 and 516-524 maybe can extract a method, except this LGTM

@eurekaka

This comment has been minimized.

Copy link
Contributor Author

eurekaka commented Oct 12, 2018

@lysu comments addressed, PTAL.

@eurekaka

This comment has been minimized.

Copy link
Contributor Author

eurekaka commented Oct 12, 2018

/run-all-tests

@lysu

This comment has been minimized.

Copy link
Member

lysu commented Oct 12, 2018

LGTM

@lysu lysu added status/LGT2 and removed status/LGT1 labels Oct 12, 2018

@zz-jason zz-jason merged commit 5efcacb into pingcap:master Oct 12, 2018

4 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci-tidb/build Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@eurekaka eurekaka deleted the eurekaka:index_join_union_scan branch Oct 12, 2018

lysu added a commit to lysu/tidb that referenced this pull request Nov 2, 2018

lysu added a commit to lysu/tidb that referenced this pull request Dec 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.