-
Notifications
You must be signed in to change notification settings - Fork 709
Create choose-index.md #3079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Create choose-index.md #3079
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
d39eb5c
Create choose-index.md
miaoqingli e05b1bf
Update choose-index.md
miaoqingli 5c6dc11
Update choose-index.md
miaoqingli ce40a03
Update choose-index.md
miaoqingli c721550
Merge branch 'master' into master
miaoqingli 995c3aa
unify docs styles and update some format, wording issues
e6d9490
Update TOC.md
935de50
Update choose-index.md
miaoqingli 375bbf6
Update choose-index.md
miaoqingli b9c3150
Update choose-index.md
miaoqingli 323af6b
Update choose-index.md
miaoqingli c8b2d28
Update choose-index.md
miaoqingli 8b355c1
Update choose-index.md
miaoqingli 663fcd6
Update choose-index.md
miaoqingli ce76040
Update choose-index.md
miaoqingli e313556
Update choose-index.md
miaoqingli 1dc76fd
Update choose-index.md
miaoqingli 954f3f3
Update choose-index.md
miaoqingli 1bccc71
Update choose-index.md
miaoqingli 3438cab
Update choose-index.md
miaoqingli 2da51e7
Update choose-index.md
miaoqingli 9530d32
Update choose-index.md
miaoqingli 43e7469
Update choose-index.md
miaoqingli b5ef72c
Merge branch 'master' into master
yikeke ce493a1
Merge branch 'master' into master
yikeke File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| --- | ||
| title: Index Selection | ||
| summary: Choose the best indexes for TiDB query optimization. | ||
| --- | ||
|
|
||
| # Index Selection | ||
|
|
||
| Reading data from storage engines is one of the most time-consuming steps during the SQL execution. Currently, TiDB supports reading data from different storage engines and different indexes. Query execution performance depends largely on whether you select a suitable index or not. | ||
|
|
||
| This document introduces how to select an index to access a table, and some related ways to control index selection. | ||
|
|
||
| ## Access tables | ||
|
|
||
| Before introducing index selection, it is important to understand the ways TiDB accesses tables, what triggers each way, what differences each way makes, and what the pros and cons are. | ||
|
|
||
| ### Operators for accessing tables | ||
|
|
||
| | Operator | Trigger Conditions | Applicable Scenarios | Explanations | | ||
| | :------- | :------- | :------- | :---- | | ||
| | PointGet / BatchPointGet | When accessing tables in one or more single point ranges. | Any scenario | If triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface. | | ||
| | TableReader | None | Any scenario | It is generally considered as the least efficient operator that scans table data directly from the TiKV layer. It can be selected only if there is a range query on the `_tidb_rowid` column, or if there are no other operators for accessing tables to choose from. | | ||
| | TableReader | A table has a replica on the TiFlash node. | There are fewer columns to read, but many rows to evaluate. | Tiflash is column-based storage. If you need to calculate a small number of columns and a large number of rows, it is recommended to choose this operator. | | ||
| | IndexReader | A table has one or more indexes, and the columns needed for the calculation are included in the indexes. | When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns. | When multiple indexes exist, a reasonable index is selected based on the cost estimation. | | ||
| | IndexLookupReader | A table has one or more indexes, and the columns needed for calculation are not completely included in the indexes. | Same as IndexReader. | Since the index does not completely cover calculated columns, TiDB needs to retrieve rows from a table after reading indexes. There is an extra cost compared to the IndexReader operator. | | ||
|
|
||
| > **Note:** | ||
| > | ||
| > The TableReader operator is based on the `_tidb_rowid` column index, and TiFlash uses a column storage index, so the selection of index is the selection of an operator for accessing tables. | ||
|
|
||
| ## Index selection rules | ||
|
|
||
| TiDB provides a heuristic rule named skyline-pruning based on the cost estimation of each operator for accessing tables. It can reduce the probability of wrong index selection caused by wrong estimation. | ||
|
|
||
| ### Skyline-pruning | ||
|
|
||
| Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, the following three dimensions are needed: | ||
|
|
||
| - Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. | ||
|
|
||
| - Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension. | ||
|
|
||
| - How many access conditions are covered by the indexed columns. An “access condition” is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension. | ||
|
|
||
| For these three dimensions, if an index named idx_a is not worse than the index named idx_b in all three dimensions and one of the dimensions is better than Idx_b, then idx_a is preferred. | ||
|
|
||
| ### Selection based on cost estimation | ||
|
|
||
| After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations: | ||
|
|
||
| - The average length of each row of the indexed data in the storage engine. | ||
| - The number of rows in the query range generated by the index. | ||
| - The cost for retrieving rows from a table. | ||
| - The number of ranges generated by index during the query execution. | ||
|
|
||
| According to these factors and the cost model, the optimizer selects an index with the lowest cost to access the table. | ||
|
|
||
| #### Common tuning problems with cost estimation based selection | ||
|
|
||
| 1. The estimated number of rows is not accurate? | ||
|
|
||
| This is usually due to stale or inaccurate statistics. You can re-execute the `analyze table` statement or modify the parameters of the `analyze table` statement. | ||
|
|
||
| 2. Statistics are accurate, and reading from TiFlash is faster, but why does the optimizer choose to read from TiKV? | ||
|
|
||
| At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of `tidb_opt_seek_factor` parameter, then the optimizer prefers to choose TiFlash. | ||
|
|
||
| 3. The statistics are accurate. Index A needs to retrieve rows from tables, but it actually executes faster than Index B that does not retrieve rows from tables. Why does the optimizer choose Index B? | ||
|
|
||
| In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of `tidb_opt_network_factor` parameter to reduce the cost of retrieving rows from tables. | ||
|
|
||
| ## Control index selection | ||
|
|
||
| The index selection can be controlled by a single query through [Optimizer Hints](/optimizer-hints.md). | ||
|
|
||
| - `USE_INDEX` / `IGNORE_INDEX` can force the optimizer to use / not use certain indexes. | ||
|
|
||
| - `READ_FROM_STORAGE` can force the optimizer to choose the TiKV / TiFlash storage engine for certain tables to execute queries. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.