-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide settings to prefer index scan over table scan #18808
Comments
@zz-jason If we implemented this variable, should it be picked up to |
Also for implementation notes, this can be implemented in |
If we limit the affected session/server only executes the queries which are expected to return in seconds, we may treat this option as an always good option. |
Implement it as a variable is the most clean way that user don't need to change any part of their production env(like SQL text or their codes) just modify the variable is enough. If we implement it as a hint, maybe directly a hint |
duplicated with #15623 |
I think having a generic setting to disable table scans is error prone. What if it is a very small table (such as a list of category definitions)? If it is an OLTP workload, my experience has been when users ask for such a setting, what they really want is a way of refusing really expensive queries. So for example if the estimated query cost is really high return an error and don't execute the statement. Switching from an expensive table scan to a more expensive index scan (with a double read) could make this situation worse. |
@nullnotnil This feature is not disabling table scan in all cases. It only disables table scan when there is an available index for current statement.
For users there might be no difference for: a normal SQL statement suddently begin refused to be executed , vs a normal SQL statement suddenly takes a long time to be executed -- business logic cannot continue anyway. Additionally the estimated query cost may never be correct, i.e. very low, when TiDB incorrectly selected a table scan over index scan, because that's why TiDB choose it. This option is not a performance optimization. The execution of some sql statements can be slower after this option is enabled due to double read. It is a stability improvement though, by trading off such edge case performance (when force index scan lead to double read) with stability: you will not suffer from more than 10x slower surprise in production. |
Sure. We can even backport it to release 3.0. |
I prefer using an instance-level config to control this behavior:
|
Is an instance-level config enough? |
We are grown-up and can we have all? (session + instance + global) |
There is a good use-case for all three:
Most variables should actually follow this pattern. Global (cluster-wide) is quite risky. |
setting it globally is risky. we should allow an instance-level setting to avoid changes to the existing applications. @xiaodong-ji PTAL |
@SunRunAway Can we formally provide some infrastructures / features for such A/B testing & gradually spread pattern as proposed in #18808 (comment) ? As I know there are some limitations to implement a session+instance+global variable now. |
see #19332 |
There are a lot of different ways to do this, we might have to spend some time gathering the use cases. For example, they might choose 5% randomly, or apply it to internal users first (most likely to give feedback). The easiest way for us to offer is per-instance. Second easiest would be user-specific variables, but I think it is easiest to keep out of scope for now. |
Feature Request
Is your feature request related to a problem? Please describe:
There is a notable number of stability problems caused by a SQL statement becomes slow suddenly due to a wrong execution plan that uses table scan instead of index scan. The reason is that in some cases table scan can be faster than index scan, due to no table lookup. CBO may choose wrong plan due to outdated or inaccurate statistics, which results in the risk of 10x+ slower.
Describe the feature you'd like:
Provide a global variable, a session variable (and optionally a statement hint / bind syntax) that allows optimizer to always prefer index scan over table scan, ignoring their costs (since costs can be inaccurate). The name can be
tidb_opt_always_prefer_index
.Note that prefer index scan also means prefer TiKV over TiFlash when there IS corresponding index hit. So it could impact analytical queries significantly. This should be reminded in the documentation if this feature is implemented.
MongoDB provides a similar but more aggressive option: No Table Scan
This idea is previously raised in SIG planner
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
The text was updated successfully, but these errors were encountered: