From d39eb5cd04db8814168e67441a31f0e3a00f0dd3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Tue, 30 Jun 2020 14:47:47 +0800 Subject: [PATCH 01/22] Create choose-index.md --- choose-index.md | 77 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 choose-index.md diff --git a/choose-index.md b/choose-index.md new file mode 100644 index 0000000000000..1bb3aea713b35 --- /dev/null +++ b/choose-index.md @@ -0,0 +1,77 @@ +--- +title: Index Selection +category: performance +--- + +# Index Selection + +Reading data from storage engines is one of the most time-consuming parts during the SQL execution. Up to now, TiDB supports reading data from different storage engines and different indexes. Query execution performance depends largely on whether you select a suitable index or not. + +This section introduces how to select a index to access a table, and some related ways to control index selection. + +## Access Tables + +Before introducing index selection, it is important to understand how TiDB accesses tables, what triggers they are, what differences they make, and what the pros and cons are. + +### The operators of access tables + +| Operator | Trigger Conditions | Applicable Scenarios | Explanations | +| :------- | :------- | :------- | :---- | +| PointGet / BatchPointGet | The scope of access tables is one or more single point ranges. | Any scene | If triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface. | +| TableReader | None | Any scene | It is generally considered as the least efficient operator that scan table data directly from the TiKV . It can be selected only if there is a range query on the _tidb_rowid column, or if there are no other access tables operators to choose from. | +| TableReader | The table has a replica on the TiFlash node. | There are fewer columns to read, but many rows to evaluate. | Tiflash is column storage. If you need to calculate a small number of columns and a large number of rows, It is recommended to choose this operator. | +| IndexReader | A table has one or more indexes, and the columns needed for the calculation are included in the indexes. | When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns. | When multiple indexes exist, a reasonable index is selected based on the cost estimation. | +| IndexLookupReader | A table has one or more indexes, and the columns needed for calculation are not completely included in the index. | Same as IndexReader. | Since the index does not completely cover calculated columns, it needs to retrieve rows from a table after reading indexes. There is an extra cost compared to IndexReader operator. | + +> Note: +> +> The TableReader operator is based on the _tidb_rowid column index, TiFlash is a column storage index, so the choice of index is the choice of a access tables operator. + +## Index Selection + +TiDB provides a heuristic rule named Skyline-Pruning based on the cost estimation of each access tables operator.It can reduce the probability of wrong index selection caused by wrong estimation. + +### Skyline-Pruning + +Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, the following three dimensions are needed: + +- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. + +- Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension. + +- How many access conditions are covered by the indexed columns. An “access condition” is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension. + +For these three dimensions, if an index named idx_a is not worse than the index named idx_b in all three dimensions and one of the dimensions is better than Idx_b, then idx_a is preferred. + +### Selection Based on the Cost Estimation + +After using the Skyline-Pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of access tables requires the following considerations: + +- The average length of each row of the indexed data in the storage engine. +- The number of rows in the query range generated by the index. +- The cost for retrieving rows from a table. +- The number of ranges generated by index during the query executing. + +According to these factors and the cost model, the optimizer selects a index with the lowest cost to access the table. + +#### Common Problems with Cost Selection Tunning. + +1. The estimated number of rows is not accurate? + + This is usually due to stale or inaccurate statistics. You can re-execute the analyze table or modify the parameters of the analyze table. + +2. Statistics are accurate, why read TiFlash faster, and the optimizer chose the TiKV? + + At present, the cost model of distinguishing from TiFlash and TiKV is still rough. You can decrease the value of tidb_opt_seek_factor parameter, then the optimizer prefers to choose TiFlash. + +3. The statistics are accurate. One index need to retrieve rows from tables, but it actually executes faster than the index that do not retrieve rows from tables. Why select the index that do not retrieve rows from tables? + + In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of tidb_opt_network_factor parameter in order to reduce the cost for retrieving rows from tables. + +## Control Index Selection + +The index selection can be controlled by a single query through [Optimizer Hints](/optimizer-hints.md). + +- `USE_INDEX` / `IGNORE_INDEX` can force the optimizer to use / not use certain indexes.。 + +- `READ_FROM_STORAGE` can force the optimizer to choose the TiKV / TiFlash storage engine for certain tables to execute queries. From e05b1bfadb1440f00d3b7027860122f8d59fd3bc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Tue, 30 Jun 2020 15:05:18 +0800 Subject: [PATCH 02/22] Update choose-index.md --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 1bb3aea713b35..ebac844ae9a4b 100644 --- a/choose-index.md +++ b/choose-index.md @@ -13,7 +13,7 @@ This section introduces how to select a index to access a table, and some relate Before introducing index selection, it is important to understand how TiDB accesses tables, what triggers they are, what differences they make, and what the pros and cons are. -### The operators of access tables +### The Operators of Access Tables | Operator | Trigger Conditions | Applicable Scenarios | Explanations | | :------- | :------- | :------- | :---- | From 5c6dc112e0e6e2f1e493fa1800ba348a99bc8116 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Tue, 30 Jun 2020 15:05:51 +0800 Subject: [PATCH 03/22] Update choose-index.md --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index ebac844ae9a4b..4a2e0275e061d 100644 --- a/choose-index.md +++ b/choose-index.md @@ -54,7 +54,7 @@ After using the Skyline-Pruning rule to rule out inappropriate indexes, the sele According to these factors and the cost model, the optimizer selects a index with the lowest cost to access the table. -#### Common Problems with Cost Selection Tunning. +#### Common Problems with Cost Selection Tunning 1. The estimated number of rows is not accurate? From ce40a03c2816b196de92ad3a2b07e0ef62759078 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Tue, 30 Jun 2020 15:07:18 +0800 Subject: [PATCH 04/22] Update choose-index.md --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 4a2e0275e061d..ce0e615080a30 100644 --- a/choose-index.md +++ b/choose-index.md @@ -72,6 +72,6 @@ According to these factors and the cost model, the optimizer selects a index wit The index selection can be controlled by a single query through [Optimizer Hints](/optimizer-hints.md). -- `USE_INDEX` / `IGNORE_INDEX` can force the optimizer to use / not use certain indexes.。 +- `USE_INDEX` / `IGNORE_INDEX` can force the optimizer to use / not use certain indexes. - `READ_FROM_STORAGE` can force the optimizer to choose the TiKV / TiFlash storage engine for certain tables to execute queries. From 995c3aa36c5030318ddca4d013ae30cf44991726 Mon Sep 17 00:00:00 2001 From: yikeke Date: Mon, 13 Jul 2020 17:10:14 +0800 Subject: [PATCH 05/22] unify docs styles and update some format, wording issues --- choose-index.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/choose-index.md b/choose-index.md index ce0e615080a30..4719d21c68f2d 100644 --- a/choose-index.md +++ b/choose-index.md @@ -1,19 +1,19 @@ --- title: Index Selection -category: performance +summary: Choose the best indexes for TiDB query optimization. --- # Index Selection -Reading data from storage engines is one of the most time-consuming parts during the SQL execution. Up to now, TiDB supports reading data from different storage engines and different indexes. Query execution performance depends largely on whether you select a suitable index or not. +Reading data from storage engines is one of the most time-consuming steps during the SQL execution. Currently, TiDB supports reading data from different storage engines and different indexes. Query execution performance depends largely on whether you select a suitable index or not. -This section introduces how to select a index to access a table, and some related ways to control index selection. +This document introduces how to select an index to access a table, and some related ways to control index selection. -## Access Tables +## Access tables -Before introducing index selection, it is important to understand how TiDB accesses tables, what triggers they are, what differences they make, and what the pros and cons are. +Before introducing index selection, it is important to understand the ways TiDB accesses tables, what triggers each way, what differences each way makes, and what the pros and cons are. -### The Operators of Access Tables +### Operators for accessing tables | Operator | Trigger Conditions | Applicable Scenarios | Explanations | | :------- | :------- | :------- | :---- | @@ -23,15 +23,15 @@ Before introducing index selection, it is important to understand how TiDB acces | IndexReader | A table has one or more indexes, and the columns needed for the calculation are included in the indexes. | When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns. | When multiple indexes exist, a reasonable index is selected based on the cost estimation. | | IndexLookupReader | A table has one or more indexes, and the columns needed for calculation are not completely included in the index. | Same as IndexReader. | Since the index does not completely cover calculated columns, it needs to retrieve rows from a table after reading indexes. There is an extra cost compared to IndexReader operator. | -> Note: -> -> The TableReader operator is based on the _tidb_rowid column index, TiFlash is a column storage index, so the choice of index is the choice of a access tables operator. +> **Note:** +> +> The TableReader operator is based on the `_tidb_rowid` column index, and TiFlash is a column storage index, so the choice of index is the choice of a access tables operator. -## Index Selection +## Index selection rules TiDB provides a heuristic rule named Skyline-Pruning based on the cost estimation of each access tables operator.It can reduce the probability of wrong index selection caused by wrong estimation. -### Skyline-Pruning +### Skyline-pruning Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, the following three dimensions are needed: @@ -43,7 +43,7 @@ Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, th For these three dimensions, if an index named idx_a is not worse than the index named idx_b in all three dimensions and one of the dimensions is better than Idx_b, then idx_a is preferred. -### Selection Based on the Cost Estimation +### Selection based on cost estimation After using the Skyline-Pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of access tables requires the following considerations: @@ -68,7 +68,7 @@ According to these factors and the cost model, the optimizer selects a index wit In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of tidb_opt_network_factor parameter in order to reduce the cost for retrieving rows from tables. -## Control Index Selection +## Control index selection The index selection can be controlled by a single query through [Optimizer Hints](/optimizer-hints.md). From e6d949012144efe39a4bc9f35cd4e3b1989466f6 Mon Sep 17 00:00:00 2001 From: yikeke Date: Wed, 15 Jul 2020 11:13:32 +0800 Subject: [PATCH 06/22] Update TOC.md --- TOC.md | 1 + 1 file changed, 1 insertion(+) diff --git a/TOC.md b/TOC.md index ce4643a991335..0f644febf9c51 100644 --- a/TOC.md +++ b/TOC.md @@ -101,6 +101,7 @@ + Logic Optimization + [Join Reorder](/join-reorder.md) + Physical Optimization + + [Index Selection](/choose-index.md) + [Statistics](/statistics.md) + Control Execution Plan + [Optimizer Hints](/optimizer-hints.md) From 935de503fde755c52e1dea556159cf743db3c751 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 17:56:00 +0800 Subject: [PATCH 07/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 4719d21c68f2d..597e858ae4def 100644 --- a/choose-index.md +++ b/choose-index.md @@ -54,7 +54,7 @@ After using the Skyline-Pruning rule to rule out inappropriate indexes, the sele According to these factors and the cost model, the optimizer selects a index with the lowest cost to access the table. -#### Common Problems with Cost Selection Tunning +#### Common tuning problems with cost estimation based selection 1. The estimated number of rows is not accurate? From 375bbf67689d38fc438850d39c02e52d523badeb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:03:36 +0800 Subject: [PATCH 08/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 597e858ae4def..19af836a5b227 100644 --- a/choose-index.md +++ b/choose-index.md @@ -17,7 +17,7 @@ Before introducing index selection, it is important to understand the ways TiDB | Operator | Trigger Conditions | Applicable Scenarios | Explanations | | :------- | :------- | :------- | :---- | -| PointGet / BatchPointGet | The scope of access tables is one or more single point ranges. | Any scene | If triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface. | +| PointGet / BatchPointGet | When accessing tables in one or more single point ranges. | Any scenario | If triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface. | | TableReader | None | Any scene | It is generally considered as the least efficient operator that scan table data directly from the TiKV . It can be selected only if there is a range query on the _tidb_rowid column, or if there are no other access tables operators to choose from. | | TableReader | The table has a replica on the TiFlash node. | There are fewer columns to read, but many rows to evaluate. | Tiflash is column storage. If you need to calculate a small number of columns and a large number of rows, It is recommended to choose this operator. | | IndexReader | A table has one or more indexes, and the columns needed for the calculation are included in the indexes. | When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns. | When multiple indexes exist, a reasonable index is selected based on the cost estimation. | From b9c31508148484036d342fd2edf232aba98c5811 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:04:16 +0800 Subject: [PATCH 09/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 19af836a5b227..76730feff579f 100644 --- a/choose-index.md +++ b/choose-index.md @@ -18,7 +18,7 @@ Before introducing index selection, it is important to understand the ways TiDB | Operator | Trigger Conditions | Applicable Scenarios | Explanations | | :------- | :------- | :------- | :---- | | PointGet / BatchPointGet | When accessing tables in one or more single point ranges. | Any scenario | If triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface. | -| TableReader | None | Any scene | It is generally considered as the least efficient operator that scan table data directly from the TiKV . It can be selected only if there is a range query on the _tidb_rowid column, or if there are no other access tables operators to choose from. | +| TableReader | None | Any scenario | It is generally considered as the least efficient operator that scans table data directly from the TiKV layer. It can be selected only if there is a range query on the `_tidb_rowid` column, or if there are no other operators for accessing tables to choose from. | | TableReader | The table has a replica on the TiFlash node. | There are fewer columns to read, but many rows to evaluate. | Tiflash is column storage. If you need to calculate a small number of columns and a large number of rows, It is recommended to choose this operator. | | IndexReader | A table has one or more indexes, and the columns needed for the calculation are included in the indexes. | When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns. | When multiple indexes exist, a reasonable index is selected based on the cost estimation. | | IndexLookupReader | A table has one or more indexes, and the columns needed for calculation are not completely included in the index. | Same as IndexReader. | Since the index does not completely cover calculated columns, it needs to retrieve rows from a table after reading indexes. There is an extra cost compared to IndexReader operator. | From 323af6b18490b75d15a87e0cb5b0ff60d409752b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:04:40 +0800 Subject: [PATCH 10/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 76730feff579f..5dc731615dcb5 100644 --- a/choose-index.md +++ b/choose-index.md @@ -21,7 +21,7 @@ Before introducing index selection, it is important to understand the ways TiDB | TableReader | None | Any scenario | It is generally considered as the least efficient operator that scans table data directly from the TiKV layer. It can be selected only if there is a range query on the `_tidb_rowid` column, or if there are no other operators for accessing tables to choose from. | | TableReader | The table has a replica on the TiFlash node. | There are fewer columns to read, but many rows to evaluate. | Tiflash is column storage. If you need to calculate a small number of columns and a large number of rows, It is recommended to choose this operator. | | IndexReader | A table has one or more indexes, and the columns needed for the calculation are included in the indexes. | When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns. | When multiple indexes exist, a reasonable index is selected based on the cost estimation. | -| IndexLookupReader | A table has one or more indexes, and the columns needed for calculation are not completely included in the index. | Same as IndexReader. | Since the index does not completely cover calculated columns, it needs to retrieve rows from a table after reading indexes. There is an extra cost compared to IndexReader operator. | +| IndexLookupReader | A table has one or more indexes, and the columns needed for calculation are not completely included in the indexes. | Same as IndexReader. | Since the index does not completely cover calculated columns, TiDB needs to retrieve rows from a table after reading indexes. There is an extra cost compared to the IndexReader operator. | > **Note:** > From c8b2d28ac93ffc064de2e2a7c320828665c06baf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:05:16 +0800 Subject: [PATCH 11/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 5dc731615dcb5..60f45242ec92e 100644 --- a/choose-index.md +++ b/choose-index.md @@ -19,7 +19,7 @@ Before introducing index selection, it is important to understand the ways TiDB | :------- | :------- | :------- | :---- | | PointGet / BatchPointGet | When accessing tables in one or more single point ranges. | Any scenario | If triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface. | | TableReader | None | Any scenario | It is generally considered as the least efficient operator that scans table data directly from the TiKV layer. It can be selected only if there is a range query on the `_tidb_rowid` column, or if there are no other operators for accessing tables to choose from. | -| TableReader | The table has a replica on the TiFlash node. | There are fewer columns to read, but many rows to evaluate. | Tiflash is column storage. If you need to calculate a small number of columns and a large number of rows, It is recommended to choose this operator. | +| TableReader | A table has a replica on the TiFlash node. | There are fewer columns to read, but many rows to evaluate. | Tiflash is column-based storage. If you need to calculate a small number of columns and a large number of rows, it is recommended to choose this operator. | | IndexReader | A table has one or more indexes, and the columns needed for the calculation are included in the indexes. | When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns. | When multiple indexes exist, a reasonable index is selected based on the cost estimation. | | IndexLookupReader | A table has one or more indexes, and the columns needed for calculation are not completely included in the indexes. | Same as IndexReader. | Since the index does not completely cover calculated columns, TiDB needs to retrieve rows from a table after reading indexes. There is an extra cost compared to the IndexReader operator. | From 8b355c1fd63cfe424257fdc0295f065f437da61a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:05:49 +0800 Subject: [PATCH 12/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 60f45242ec92e..a0663bfc806b4 100644 --- a/choose-index.md +++ b/choose-index.md @@ -25,7 +25,7 @@ Before introducing index selection, it is important to understand the ways TiDB > **Note:** > -> The TableReader operator is based on the `_tidb_rowid` column index, and TiFlash is a column storage index, so the choice of index is the choice of a access tables operator. +> The TableReader operator is based on the `_tidb_rowid` column index, and TiFlash uses a column storage index, so the selection of index is the selection of an operator for accessing tables. ## Index selection rules From 663fcd64e2489ae022dd71575b502463b38f3168 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:06:09 +0800 Subject: [PATCH 13/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index a0663bfc806b4..f049cac57e680 100644 --- a/choose-index.md +++ b/choose-index.md @@ -29,7 +29,7 @@ Before introducing index selection, it is important to understand the ways TiDB ## Index selection rules -TiDB provides a heuristic rule named Skyline-Pruning based on the cost estimation of each access tables operator.It can reduce the probability of wrong index selection caused by wrong estimation. +TiDB provides a heuristic rule named skyline-pruning based on the cost estimation of each operator for accessing tables. It can reduce the probability of wrong index selection caused by wrong estimation. ### Skyline-pruning From ce7604037190ccdc6488b488730c4e9700dc9b8e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:06:28 +0800 Subject: [PATCH 14/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index f049cac57e680..21a7893347660 100644 --- a/choose-index.md +++ b/choose-index.md @@ -45,7 +45,7 @@ For these three dimensions, if an index named idx_a is not worse than the index ### Selection based on cost estimation -After using the Skyline-Pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of access tables requires the following considerations: +After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations: - The average length of each row of the indexed data in the storage engine. - The number of rows in the query range generated by the index. From e31355651a9ca49b1542353a4247f87f76179c87 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:06:38 +0800 Subject: [PATCH 15/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 21a7893347660..382012b95f2aa 100644 --- a/choose-index.md +++ b/choose-index.md @@ -50,7 +50,7 @@ After using the skyline-pruning rule to rule out inappropriate indexes, the sele - The average length of each row of the indexed data in the storage engine. - The number of rows in the query range generated by the index. - The cost for retrieving rows from a table. -- The number of ranges generated by index during the query executing. +- The number of ranges generated by index during the query execution. According to these factors and the cost model, the optimizer selects a index with the lowest cost to access the table. From 1dc76fd21c685bd3aff4bf416eb11b182caf190f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:06:53 +0800 Subject: [PATCH 16/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 382012b95f2aa..a761a8a9853e3 100644 --- a/choose-index.md +++ b/choose-index.md @@ -52,7 +52,7 @@ After using the skyline-pruning rule to rule out inappropriate indexes, the sele - The cost for retrieving rows from a table. - The number of ranges generated by index during the query execution. -According to these factors and the cost model, the optimizer selects a index with the lowest cost to access the table. +According to these factors and the cost model, the optimizer selects an index with the lowest cost to access the table. #### Common tuning problems with cost estimation based selection From 954f3f31364211bd62dd8e06dd87c999dfa06c7b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:07:19 +0800 Subject: [PATCH 17/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index a761a8a9853e3..24895dd9a5508 100644 --- a/choose-index.md +++ b/choose-index.md @@ -58,7 +58,7 @@ According to these factors and the cost model, the optimizer selects an index wi 1. The estimated number of rows is not accurate? - This is usually due to stale or inaccurate statistics. You can re-execute the analyze table or modify the parameters of the analyze table. + This is usually due to stale or inaccurate statistics. You can re-execute the `analyze table` statement or modify the parameters of the `analyze table` statement. 2. Statistics are accurate, why read TiFlash faster, and the optimizer chose the TiKV? From 1bccc7120e9a257fb3ce4bf7b0133d16f30ee8c9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:08:11 +0800 Subject: [PATCH 18/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 24895dd9a5508..6c1c2986fee15 100644 --- a/choose-index.md +++ b/choose-index.md @@ -60,7 +60,7 @@ According to these factors and the cost model, the optimizer selects an index wi This is usually due to stale or inaccurate statistics. You can re-execute the `analyze table` statement or modify the parameters of the `analyze table` statement. -2. Statistics are accurate, why read TiFlash faster, and the optimizer chose the TiKV? +2. Statistics are accurate, and reading from TiFlash is faster, but why does the optimizer choose to read from TiKV? At present, the cost model of distinguishing from TiFlash and TiKV is still rough. You can decrease the value of tidb_opt_seek_factor parameter, then the optimizer prefers to choose TiFlash. From 3438cabe71ad9883d12819d0922f4bb878786744 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:08:44 +0800 Subject: [PATCH 19/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 6c1c2986fee15..786dcf1124c83 100644 --- a/choose-index.md +++ b/choose-index.md @@ -62,7 +62,7 @@ According to these factors and the cost model, the optimizer selects an index wi 2. Statistics are accurate, and reading from TiFlash is faster, but why does the optimizer choose to read from TiKV? - At present, the cost model of distinguishing from TiFlash and TiKV is still rough. You can decrease the value of tidb_opt_seek_factor parameter, then the optimizer prefers to choose TiFlash. + At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of `tidb_opt_seek_factor` parameter, then the optimizer prefers to choose TiFlash. 3. The statistics are accurate. One index need to retrieve rows from tables, but it actually executes faster than the index that do not retrieve rows from tables. Why select the index that do not retrieve rows from tables? From 2da51e7ad7d859a4694f778ce20ee2451858ffa3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:09:39 +0800 Subject: [PATCH 20/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 786dcf1124c83..5924187a9d8fd 100644 --- a/choose-index.md +++ b/choose-index.md @@ -64,7 +64,7 @@ According to these factors and the cost model, the optimizer selects an index wi At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of `tidb_opt_seek_factor` parameter, then the optimizer prefers to choose TiFlash. -3. The statistics are accurate. One index need to retrieve rows from tables, but it actually executes faster than the index that do not retrieve rows from tables. Why select the index that do not retrieve rows from tables? +3. The statistics are accurate. Index A needs to retrieve rows from tables, but it actually executes faster than Index B that does not retrieve rows from tables. Why does the optimizer choose Index B? In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of tidb_opt_network_factor parameter in order to reduce the cost for retrieving rows from tables. From 9530d322e3207455149ed669d328a220e7b89d4d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:10:07 +0800 Subject: [PATCH 21/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index 5924187a9d8fd..dd15b0ac925a9 100644 --- a/choose-index.md +++ b/choose-index.md @@ -66,7 +66,7 @@ According to these factors and the cost model, the optimizer selects an index wi 3. The statistics are accurate. Index A needs to retrieve rows from tables, but it actually executes faster than Index B that does not retrieve rows from tables. Why does the optimizer choose Index B? - In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of tidb_opt_network_factor parameter in order to reduce the cost for retrieving rows from tables. + In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of `tidb_opt_network_factor` parameter to reduce the cost of retrieving rows from tables. ## Control index selection From 43e74691b585be9883e6b98d912275da895b260c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=8B=97=E9=9D=92=E5=88=A9?= <51319517+miaoqingli@users.noreply.github.com> Date: Wed, 15 Jul 2020 18:10:28 +0800 Subject: [PATCH 22/22] Update choose-index.md Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index dd15b0ac925a9..26280ff192b7d 100644 --- a/choose-index.md +++ b/choose-index.md @@ -74,4 +74,4 @@ The index selection can be controlled by a single query through [Optimizer Hints - `USE_INDEX` / `IGNORE_INDEX` can force the optimizer to use / not use certain indexes. -- `READ_FROM_STORAGE` can force the optimizer to choose the TiKV / TiFlash storage engine for certain tables to execute queries. +- `READ_FROM_STORAGE` can force the optimizer to choose the TiKV / TiFlash storage engine for certain tables to execute queries.