Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: add the doc for the SEMI_JOIN_REWRITE hint #9734

Merged
merged 3 commits into from
Aug 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
52 changes: 52 additions & 0 deletions optimizer-hints.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,58 @@ select /*+ HASH_JOIN(t1, t2) */ * from t1, t2 where t1.id = t2.id;
>
> `TIDB_HJ` is the alias for `HASH_JOIN` in TiDB 3.0.x and earlier versions. If you are using any of these versions, you must apply the `TIDB_HJ(t1_name [, tl_name ...])` syntax for the hint. For the later versions of TiDB, `TIDB_HJ` and `HASH_JOIN` are both valid names for the hint, but `HASH_JOIN` is recommended.

### SEMI_JOIN_REWRITE()

The `SEMI_JOIN_REWRITE()` hint tells the optimizer to rewrite the semi-join query to an ordinary join query. Currently, this hint only works for `EXISTS` subqueries.

If this hint is not used to rewrite the query, when the hash join is selected in the execution plan, the semi-join query can only use the subquery to build a hash table. In this case, when the result of the subquery is bigger than that of the outer query, the execution speed might be slower than expected.

Similarly, when the index join is selected in the execution plan, the semi-join query can only use the outer query as the driving table. In this case, when the result of the subquery is smaller than that of the outer query, the execution speed might be slower than expected.

When `SEMI_JOIN_REWRITE()` is used to rewrite the query, the optimizer can extend the selection range to select a better execution plan.

{{< copyable "sql" >}}

```sql
-- Does not use SEMI_JOIN_REWRITE() to rewrite the query.
EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT 1 FROM t1 WHERE t1.a = t.a);
```

```sql
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
| MergeJoin_9 | 7992.00 | root | | semi join, left key:test.t.a, right key:test.t1.a |
| ├─IndexReader_25(Build) | 9990.00 | root | | index:IndexFullScan_24 |
| │ └─IndexFullScan_24 | 9990.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true, stats:pseudo |
| └─IndexReader_23(Probe) | 9990.00 | root | | index:IndexFullScan_22 |
| └─IndexFullScan_22 | 9990.00 | cop[tikv] | table:t, index:idx(a) | keep order:true, stats:pseudo |
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
```

{{< copyable "sql" >}}

```sql
-- Uses SEMI_JOIN_REWRITE() to rewrite the query.
EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT /*+ SEMI_JOIN_REWRITE() */ 1 FROM t1 WHERE t1.a = t.a);
```

```sql
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
| IndexJoin_16 | 1.25 | root | | inner join, inner:IndexReader_15, outer key:test.t1.a, inner key:test.t.a, equal cond:eq(test.t1.a, test.t.a) |
| ├─StreamAgg_39(Build) | 1.00 | root | | group by:test.t1.a, funcs:firstrow(test.t1.a)->test.t1.a |
| │ └─IndexReader_34 | 1.00 | root | | index:IndexFullScan_33 |
| │ └─IndexFullScan_33 | 1.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true |
| └─IndexReader_15(Probe) | 1.25 | root | | index:Selection_14 |
| └─Selection_14 | 1.25 | cop[tikv] | | not(isnull(test.t.a)) |
| └─IndexRangeScan_13 | 1.25 | cop[tikv] | table:t, index:idx(a) | range: decided by [eq(test.t.a, test.t1.a)], keep order:false, stats:pseudo |
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
```

From the preceding example, you can see that when using the `SEMI_JOIN_REWRITE()` hint, TiDB can select the execution method of IndexJoin based on the driving table `t1`.

### HASH_AGG()

The `HASH_AGG()` hint tells the optimizer to use the hash aggregation algorithm in all the aggregate functions in the specified query block. This algorithm allows the query to be executed concurrently with multiple threads, which achieves a higher processing speed but consumes more memory. For example:
Expand Down
8 changes: 8 additions & 0 deletions subquery-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,11 @@ explain select * from t where exists (select * from t2);
| └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+------------------------+----------+-----------+---------------+--------------------------------+
```

In the preceding optimization, the optimizer automatically optimizes the statement execution. In addition, you can also add the [`SEMI_JOIN_REWRITE`](/optimizer-hints.md#semi_join_rewrite) hint to further rewrite the statement.

If this hint is not used to rewrite the query, when the hash join is selected in the execution plan, the semi-join query can only use the subquery to build a hash table. In this case, when the result of the subquery is bigger than that of the outer query, the execution speed might be slower than expected.

Similarly, when the index join is selected in the execution plan, the semi-join query can only use the outer query as the driving table. In this case, when the result of the subquery is smaller than that of the outer query, the execution speed might be slower than expected.

When `SEMI_JOIN_REWRITE()` is used to rewrite the query, the optimizer can extend the selection range to select a better execution plan.