Skip to content

Commit

Permalink
planner: add documents for cost model variables (#9802)
Browse files Browse the repository at this point in the history
  • Loading branch information
Oreoxmt committed Aug 5, 2022
1 parent d7a0b25 commit 5e05281
Show file tree
Hide file tree
Showing 5 changed files with 80 additions and 0 deletions.
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@
- [Statistics](/statistics.md)
- [Wrong Index Solution](/wrong-index-solution.md)
- [Distinct Optimization](/agg-distinct-optimization.md)
- [Cost Model](/cost-model.md)
- [Prepare Execution Plan Cache](/sql-prepared-plan-cache.md)
- Control Execution Plans
- [Overview](/control-execution-plan.md)
Expand Down
52 changes: 52 additions & 0 deletions cost-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: Cost Model
summary: Learn how the cost model used by TiDB works during physical optimization.
---

# Cost Model

TiDB uses a cost model to choose an index and operator during [physical optimization](/sql-physical-optimization.md). The process is illustrated in the following diagram:

![CostModel](/media/cost-model.png)

TiDB calculates the access cost of each index and the execution cost of each physical operator in plans (such as HashJoin and IndexJoin) and chooses the minimum cost plan.

The following is a simplified example to explain how the cost model works. Suppose that there is a table `t`:

```sql
mysql> SHOW CREATE TABLE t;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t | CREATE TABLE `t` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
KEY `b` (`b`),
KEY `c` (`c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

When executing the `SELECT * FROM t WHERE b < 100 and c < 100` statement, suppose that TiDB estimates 20 rows meet the `b < 100` condition and 500 rows meet `c < 100`, and the length of `INT` type indexes is 8. Then TiDB calculates the cost for two indexes:

+ The cost of index `b` = row count of `b < 100` \* length of index `b` = 20 * 8 = 160
+ The cost of index `c` = row count of `c < 100` \* length of index `c` = 500 * 8 = 4000

Because the cost of index `b` is lower, TiDB chooses `b` as the index.

The preceding example is simplified and only used to explain the basic principle. In real SQL executions, the TiDB cost model is more complex.

## Cost Model Version 2

> **Warning:**
>
> - Cost Model Version 2 is currently an experimental feature. It is not recommended that you use it for production environments.
> - Switching the version of the cost model might cause changes to query plans.
TiDB v6.2.0 introduces Cost Model Version 2, a new cost model.

Cost Model Version 2 provides a more accurate regression calibration of the cost formula, adjusts some of the cost formulas, and is more accurate than the previous version of the cost formula.

To switch the version of cost model, you can set the [`tidb_cost_model_version`](/system-variables.md#tidb_cost_model_version-new-in-v620) variable.
Binary file added media/cost-model.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions sql-physical-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ Physical optimization is cost-based optimization, which makes a physical executi
- In [Introduction to Statistics](/statistics.md), you will learn what statistics TiDB collects to obtain the data distribution of a table.
- [Wrong Index Solution](/wrong-index-solution.md) introduces how to use the right index when you find the index is selected wrongly.
- [Distinct Optimization](/agg-distinct-optimization.md) introduces an optimization related to the `DISTINCT` keyword during physical optimization. In this section, you will learn its advantages and disadvantages and how to use it.
- [Cost Model](/cost-model.md) introduces how to choose a optimal execution plan based on the cost model during physical optimization.
26 changes: 26 additions & 0 deletions system-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -681,6 +681,21 @@ MPP is a distributed computing framework provided by the TiFlash engine, which a

Constraint checking is always performed in place for pessimistic transactions (default).

### tidb_cost_model_version <span class="version-mark">New in v6.2.0</span>

> **Warning:**
>
> - Cost Model Version 2 is currently an experimental feature. It is not recommended that you use it for production environments.
> - Switching the version of the cost model might cause changes to query plans.
- Scope: SESSION | GLOBAL
- Persists to cluster: Yes
- Default value: `1`
- Value options: `1`, `2`
- TiDB v6.2.0 introduces the [Cost Model Version 2](/cost-model.md#cost-model-version-2), which is more accurate than the previous version in internal tests.
- To enable the Cost Model Version 2, you can set the `tidb_cost_model_version` to `2`. If you set this variable to `1`, the Cost Model Version 1 will be used.
- The version of cost model affects the plan decision of optimizer. For more details, see [Cost Model](/cost-model.md).

### tidb_current_ts

- Scope: SESSION
Expand Down Expand Up @@ -956,6 +971,17 @@ Constraint checking is always performed in place for pessimistic transactions (d
- This variable is used to control whether to enable TiDB mutation checker, which is a tool used to check consistency between data and indexes during the execution of DML statements. If the checker returns an error for a statement, TiDB rolls back the execution of the statement. Enabling this variable causes a slight increase in CPU usage. For more information, see [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md ).
- For new clusters of v6.0.0 or later versions, the default value is `ON`. For existing clusters that upgrade from versions earlier than v6.0.0, the default value is `OFF`.

### tidb_enable_new_cost_interface <span class="version-mark">New in v6.2.0</span>

- Scope: SESSION | GLOBAL
- Persists to cluster: Yes
- Type: Boolean
- Default value: `ON`
- Value options: `OFF` and `ON`
- TiDB v6.2.0 refactors the implementation of previous cost model. This variable controls whether to enable the refactored Cost Model implementation.
- This variable is enabled by default because the refactored Cost Model uses the same cost formula as before, which does not change the plan decision.
- If your cluster is upgraded from v6.1 to v6.2, this variable remains `OFF`, and it is recommended to enable it manually. If your cluster is upgraded from a version earlier than v6.1, this variable sets to `ON` by default.

### tidb_enable_new_only_full_group_by_check <span class="version-mark">New in v6.1.0</span>

- Scope: SESSION | GLOBAL
Expand Down

0 comments on commit 5e05281

Please sign in to comment.