diff --git a/TOC.md b/TOC.md index 1b9131aa64184..b1d0167289759 100644 --- a/TOC.md +++ b/TOC.md @@ -121,7 +121,6 @@ + Control Execution Plan + [Optimizer Hints](/optimizer-hints.md) + [SQL Plan Management](/sql-plan-management.md) - + [Access Tables Using `IndexMerge`](/index-merge.md) + [The Blocklist of Optimization Rules and Expression Pushdown](/blocklist-control-plan.md) + Tutorials + [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) diff --git a/index-merge.md b/index-merge.md deleted file mode 100644 index 9340a64646b80..0000000000000 --- a/index-merge.md +++ /dev/null @@ -1,100 +0,0 @@ ---- -title: Access Tables Using `IndexMerge` -summary: Learn how to access tables using the `IndexMerge` query execution plan. -aliases: ['/docs/stable/index-merge/','/docs/v4.0/index-merge/','/docs/stable/reference/performance/index-merge/'] ---- - -# Access Tables Using `IndexMerge` - -`IndexMerge` is a method introduced in TiDB v4.0 to access tables. Using this method, the TiDB optimizer can use multiple indexes per table and merge the results returned by each index. In some scenarios, this method makes the query more efficient by avoiding full table scans. - -This document introduces the applicable scenarios, a use case, and how to enable `IndexMerge`. - -## Applicable scenarios - -For each table involved in the SQL query, the TiDB optimizer during the physical optimization used to choose one of the following three access methods based on the cost estimation: - -- `TableScan`: Scans the table data, with `_tidb_rowid` as the key. -- `IndexScan`: Scans the index data, with the index column values as the key. -- `IndexLookUp`: Gets the `_tidb_rowid` set from the index, with the index column values as the key, and then retrieves the corresponding data rows of the tables. - -The above methods can use only one index per table. In some cases, the selected execution plan is not optimal. For example: - -{{< copyable "sql" >}} - -```sql -create table t(a int, b int, c int, unique key(a), unique key(b)); -explain select * from t where a = 1 or b = 1; -``` - -In the above query, the filter condition is a `WHERE` clause that uses `OR` as the connector. Because you can use only one index per table, `a = 1` cannot be pushed down to the index `a`; neither can `b = 1` be pushed down to the index `b`. To ensure that the result is correct, the execution plan of `TableScan` is generated for the query: - -``` -+-------------------------+----------+-----------+---------------+--------------------------------------+ -| id | estRows | task | access object | operator info | -+-------------------------+----------+-----------+---------------+--------------------------------------+ -| TableReader_7 | 8000.00 | root | | data:Selection_6 | -| └─Selection_6 | 8000.00 | cop[tikv] | | or(eq(test.t.a, 1), eq(test.t.b, 1)) | -| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+-------------------------+----------+-----------+---------------+--------------------------------------+ -``` - -The full table scan is inefficient when a huge volume of data exists in `t`, but the query returns only two rows at most. To handle such a scenario, `IndexMerge` is introduced in TiDB to access tables. - -## Use case - -`IndexMerge` allows the optimizer to use multiple indexes per table, and merge the results returned by each index before further operation. Take the [above query](#applicable-scenarios) as an example, the generated execution plan is shown as follows: - -``` -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| id | estRows | task | access object | operator info | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| IndexMerge_11 | 2.00 | root | | | -| ├─IndexRangeScan_8(Build) | 1.00 | cop[tikv] | table:t, index:a(a) | range:[1,1], keep order:false, stats:pseudo | -| ├─IndexRangeScan_9(Build) | 1.00 | cop[tikv] | table:t, index:b(b) | range:[1,1], keep order:false, stats:pseudo | -| └─TableRowIDScan_10(Probe) | 2.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -``` - -The structure of the `IndexMerge` execution plan is similar to that of the `IndexLookUp`, both of which consist of index scans and full table scans. However, the index scan part of `IndexMerge` might include multiple `IndexScan`s. When the primary key index of the table is the integer type, index scans might even include `TableScan`. For example: - -{{< copyable "sql" >}} - -```sql -create table t(a int primary key, b int, c int, unique key(b)); -``` - -``` -Query OK, 0 rows affected (0.01 sec) -``` - -{{< copyable "sql" >}} - -```sql -explain select * from t where a = 1 or b = 1; -``` - -``` -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| id | estRows | task | access object | operator info | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| IndexMerge_11 | 2.00 | root | | | -| ├─TableRangeScan_8(Build) | 1.00 | cop[tikv] | table:t | range:[1,1], keep order:false, stats:pseudo | -| ├─IndexRangeScan_9(Build) | 1.00 | cop[tikv] | table:t, index:b(b) | range:[1,1], keep order:false, stats:pseudo | -| └─TableRowIDScan_10(Probe) | 2.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -4 rows in set (0.01 sec) -``` - -Note that `IndexMerge` is used only when the optimizer cannot use a single index to access the table. If the condition in the query expression is `a = 1 and b = 1`, the optimizer uses the index `a` or the index `b`, instead of `IndexMerge`, to access the table. - -## Enable `IndexMerge` - -`IndexMerge` is disabled by default. Enable the `IndexMerge` in one of two ways: - -- Set the `tidb_enable_index_merge` system variable to `1`; -- Use the SQL Hint [`USE_INDEX_MERGE`](/optimizer-hints.md#use_index_merget1_name-idx1_name--idx2_name-) in the query. - - > **Note:** - > - > The SQL Hint has a higher priority over the system variable. diff --git a/query-execution-plan.md b/query-execution-plan.md index af7e45b52966c..527cb6cb8f9e4 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -1,16 +1,16 @@ --- title: Understand the Query Execution Plan summary: Learn about the execution plan information returned by the `EXPLAIN` statement in TiDB. -aliases: ['/docs/stable/query-execution-plan/','/docs/v4.0/query-execution-plan/','/docs/stable/reference/performance/understanding-the-query-execution-plan/'] +aliases: ['/docs/stable/query-execution-plan/','/docs/stable/reference/performance/understanding-the-query-execution-plan/','/docs/stable/index-merge/','/docs/stable/reference/performance/index-merge/','/tidb/stable/index-merge'] --- # Understand the Query Execution Plan -Based on the details of your tables, the TiDB optimizer chooses the most efficient query execution plan, which consists of a series of operators. This document details the execution plan information returned by the `EXPLAIN` statement in TiDB. +Based on the latest statistics of your tables, the TiDB optimizer chooses the most efficient query execution plan, which consists of a series of operators. This document details the execution plan in TiDB. ## `EXPLAIN` overview -The result of the `EXPLAIN` statement provides information about how TiDB executes SQL queries: +You can use the `EXPLAIN` command in TiDB to view the execution plan. The result of the `EXPLAIN` statement provides information about how TiDB executes SQL queries: - `EXPLAIN` works together with statements such as `SELECT` and `DELETE`. - When you execute the `EXPLAIN` statement, TiDB returns the final optimized physical execution plan. In other words, `EXPLAIN` displays the complete information about how TiDB executes the SQL statement, such as in which order, how tables are joined, and what the expression tree looks like. @@ -84,6 +84,10 @@ Currently, calculation tasks of TiDB can be divided into two categories: cop tas One of the goals of SQL optimization is to push the calculation down to TiKV as much as possible. The Coprocessor in TiKV supports most of the built-in SQL functions (including the aggregate functions and the scalar functions), SQL `LIMIT` operations, index scans, and table scans. However, all `Join` operations can only be performed as root tasks in TiDB. +### Access Object overview + +Access Object is the data item accessed by the operator, including `table`, `partition`, and `index` (if any). Only operators that directly access the data have this information. + ### Range query In the `WHERE`/`HAVING`/`ON` conditions, the TiDB optimizer analyzes the result returned by the primary key query or the index key query. For example, these conditions might include comparison operators of the numeric and date type, such as `>`, `<`, `=`, `>=`, `<=`, and the character type such as `LIKE`. @@ -153,6 +157,8 @@ The `IndexLookUp_6` operator has two child nodes: `IndexFullScan_4(Build)` and ` This execution plan is not as efficient as using `TableReader` to perform a full table scan, because `IndexLookUp` performs an extra index scan (which comes with additional overhead), apart from the table scan. +For table scan operations, the operator info column in the `explain` table shows whether the data is sorted. In the above example, the `keep order:false` in the `IndexFullScan` operator indicates that the data is unsorted. The `stats:pseudo` in the operator info means that there is no statistics, or that the statistics will not be used for estimation because it is outdated. For other scan operations, the operator info involves similar information. + #### `TableReader` example {{< copyable "sql" >}} @@ -178,32 +184,44 @@ In the above example, the child node of the `TableReader_7` operator is `Selecti #### `IndexMerge` example -{{< copyable "sql" >}} +`IndexMerge` is a method introduced in TiDB v4.0 to access tables. Using this method, the TiDB optimizer can use multiple indexes per table and merge the results returned by each index. In some scenarios, this method makes the query more efficient by avoiding full table scans. ```sql -set @@tidb_enable_index_merge = 1; -explain select * from t use index(idx_a, idx_b) where a > 1 or b > 1; +mysql> explain select * from t where a = 1 or b = 1; ++-------------------------+----------+-----------+---------------+--------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+---------------+--------------------------------------+ +| TableReader_7 | 8000.00 | root | | data:Selection_6 | +| └─Selection_6 | 8000.00 | cop[tikv] | | or(eq(test.t.a, 1), eq(test.t.b, 1)) | +| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+---------------+--------------------------------------+ +mysql> set @@tidb_enable_index_merge = 1; +mysql> explain select * from t use index(idx_a, idx_b) where a > 1 or b > 1; ++--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ +| id | estRows | task | access object | operator info | ++--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ +| IndexMerge_16 | 6666.67 | root | | | +| ├─IndexRangeScan_13(Build) | 3333.33 | cop[tikv] | table:t, index:idx_a(a) | range:(1,+inf], keep order:false, stats:pseudo | +| ├─IndexRangeScan_14(Build) | 3333.33 | cop[tikv] | table:t, index:idx_b(b) | range:(1,+inf], keep order:false, stats:pseudo | +| └─TableRowIDScan_15(Probe) | 6666.67 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ ``` -```sql -+------------------------------+---------+-----------+-------------------------+------------------------------------------------+ -| id | estRows | task | access object | operator info | -+------------------------------+---------+-----------+-------------------------+------------------------------------------------+ -| IndexMerge_16 | 6666.67 | root | | | -| ├─IndexRangeScan_13(Build) | 3333.33 | cop[tikv] | table:t, index:idx_a(a) | range:(1,+inf], keep order:false, stats:pseudo | -| ├─IndexRangeScan_14(Build) | 3333.33 | cop[tikv] | table:t, index:idx_b(b) | range:(1,+inf], keep order:false, stats:pseudo | -| └─TableRowIDScan_15(Probe) | 6666.67 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+------------------------------+---------+-----------+-------------------------+------------------------------------------------+ -4 rows in set (0.00 sec) -``` +In the above query, the filter condition is a `WHERE` clause that uses `OR` as the connector. Without `IndexMerge`, you can use only one index per table. `a = 1` cannot be pushed down to the index `a`; neither can `b = 1` be pushed down to the index `b`. The full table scan is inefficient when a huge volume of data exists in `t`. To handle such a scenario, `IndexMerge` is introduced in TiDB to access tables. -`IndexMerge` makes it possible that multiple indexes are used during table scans. In the above example, the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meet the conditions according to these `RowID`s. +`IndexMerge` allows the optimizer to use multiple indexes per table, and merge the results returned by each index to generate the execution plan of the latter `IndexMerge` in the figure above. Here the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meets the conditions according to these `RowID`s. + +For the scan operation that is performed on a specific range of data, such as `IndexRangeScan`/`TableRangeScan`, the `operator info` column in the result has additional information about the scan range compared with other scan operations like `IndexFullScan`/`TableFullScan`. In the above example, the `range:(1,+inf]` in the `IndexRangeScan_13` operator indicates that the operator scans the data from 1 to positive infinity. > **Note:** > -> At present, the `IndexMerge` feature is disabled by default in TiDB 4.0.0-rc.1. In addition, the currently supported scenarios of `IndexMerge` in TiDB 4.0 are limited to the disjunctive normal form (expressions connected by `or`). The conjunctive normal form (expressions connected by `and`) will be supported in later versions. +> At present, the `IndexMerge` feature is disabled by default in TiDB 4.0.0-rc.1. In addition, the currently supported scenarios of `IndexMerge` in TiDB 4.0 are limited to the disjunctive normal form (expressions connected by `or`). The conjunctive normal form (expressions connected by `and`) will be supported in later versions. Enable the `IndexMerge` in one of two ways: +> +> - Set the `tidb_enable_index_merge` system variable to 1; > -> You can enable `IndexMerge` by configuring the `session` or `global` variables: execute the `set @@tidb_enable_index_merge = 1;` statement in the client. +> - Use the SQL Hint [`USE_INDEX_MERGE`](/optimizer-hints.md#use_index_merget1_name-idx1_name--idx2_name-) in the query. +> +> SQL Hint has a higher priority than system variables. ### Read the aggregated execution plan @@ -239,6 +257,8 @@ Generally speaking, `Hash Aggregate` is executed in two stages. - One is on the Coprocessor of TiKV/TiFlash, with the intermediate results of the aggregation function calculated when the table scan operator reads the data. - The other is at the TiDB layer, with the final result calculated through aggregating the intermediate results of all Coprocessor Tasks. +The operator info column in the `explain` table also records other information about `Hash Aggregation`. You need to pay attention to what aggregate function that `Hash Aggregation` uses. In the above example, the operator info of the `Hash Aggregation` operator is `funcs:count(Column#7)->Column#4`. It means that `Hash Aggregation` uses the aggregate function `count` for calculation. The operator info of the `Stream Aggregation` operator in the following example is the same with this one. + #### `Stream Aggregate` example The `Stream Aggregation` operator usually takes up less memory than `Hash Aggregate`. In some scenarios, `Stream Aggregation` executes faster than `Hash Aggregate`. In the case of a large amount of data or insufficient system memory, it is recommended to use the `Stream Aggregate` operator. An example is as follows: @@ -309,6 +329,8 @@ The execution process of `Hash Join` is as follows: 4. Use the data of the `Probe` side to probe the Hash Table. 5. Return qualified data to the user. +The operator info column in the `explain` table also records other information about `Hash Join`, including whether the query is Inner Join or Outer Join, and what are the conditions of Join. In the above example, the query is an Inner Join, where the Join condition `equal:[eq(test.t1.id, test.t2.id)]` partly corresponds with the query statement `where t1.id = t2. id`. The operator info of the other Join operators in the following examples is similar to this one. + #### `Merge Join` example The `Merge Join` operator usually uses less memory than `Hash Join`. However, `Merge Join` might take longer to be executed. When the amount of data is large, or the system memory is insufficient, it is recommended to use `Merge Join`. The following is an example: @@ -470,9 +492,14 @@ EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00 After adding the index, use `IndexScan_24` to directly read the data that meets the `start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'` condition. The estimated number of rows to be scanned decreases from 19117643.00 to 8166.73. In the test environment, the execution time of this query decreases from 50.41 seconds to 0.01 seconds. +## Operator-related system variables + +Based on MySQL, TiDB defines some special system variables and syntax to optimize performance. Some system variables are related to specific operators, such as the concurrency of the operator, the upper limit of the operator memory, and whether to use partitioned tables. These can be controlled by system variables, thereby affecting the efficiency of each operator. + ## See also * [EXPLAIN](/sql-statements/sql-statement-explain.md) * [EXPLAIN ANALYZE](/sql-statements/sql-statement-explain-analyze.md) * [ANALYZE TABLE](/sql-statements/sql-statement-analyze-table.md) * [TRACE](/sql-statements/sql-statement-trace.md) +* [System Variables](/system-variables.md) diff --git a/whats-new-in-tidb-4.0.md b/whats-new-in-tidb-4.0.md index 0fc87466b2e78..0fb74ec2d3b2a 100644 --- a/whats-new-in-tidb-4.0.md +++ b/whats-new-in-tidb-4.0.md @@ -56,7 +56,7 @@ TiUP is a new package manager tool introduced in v4.0 that is used to manage all - Add the `FLASHBACK` statement to support recovering the truncated tables. See [`Flashback Table`](/sql-statements/sql-statement-flashback-table.md) for details. - Support writing the intermediate results of Join and Sort to the local disk when you make queries, which avoids the Out of Memory (OOM) issue because the queries occupy excessive memory. This also improves system stability. - Optimize the output of `EXPLAIN` and `EXPLAIN ANALYZE`. More information is shown in the result, which improves troubleshooting efficiency. See [Explain Analyze](/sql-statements/sql-statement-explain-analyze.md) and [Explain](/sql-statements/sql-statement-explain.md) for details. -- Support using the Index Merge feature to access tables. When you make a query on a single table, the TiDB optimizer automatically reads multiple index data according to the query condition and makes a union of the result, which improves the performance of querying on a single table. See [Index Merge](/index-merge.md) for details. +- Support using the Index Merge feature to access tables. When you make a query on a single table, the TiDB optimizer automatically reads multiple index data according to the query condition and makes a union of the result, which improves the performance of querying on a single table. See [Index Merge](/query-execution-plan.md#indexmerge-example) for details. - Support the expression index feature (**experimental**). The expression index is also called the function-based index. When you create an index, the index fields do not have to be a specific column but can be an expression calculated from one or more columns. This feature is useful for quickly accessing the calculation-based tables. See [Expression index](/sql-statements/sql-statement-create-index.md) for details. - Support `AUTO_RANDOM` keys as an extended syntax for the TiDB columnar attribute (**experimental**). `AUTO_RANDOM` is designed to address the hotspot issue caused by the auto-increment column and provides a low-cost migration solution from MySQL for users who work with auto-increment columns. See [`AUTO_RANDOM` Key](/auto-random.md) for details. - Add system tables that provide information of cluster topology, configuration, logs, hardware, operating systems, and slow queries, which helps DBAs to quickly learn, analyze system metrics. See [SQL Diagnosis](/information-schema/information-schema-sql-diagnostics.md) for details.