From 6bec04bd4369fa13ef984ecebd18cc45fe06b8fc Mon Sep 17 00:00:00 2001 From: Null not nil <67764674+nullnotnil@users.noreply.github.com> Date: Tue, 28 Jul 2020 00:43:09 -0600 Subject: [PATCH 1/5] cherry pick #3418 to release-3.1 Signed-off-by: ti-srebot --- faq/sql-faq.md | 241 +++++++++++++++++++ faq/tidb-faq.md | 4 +- sql-statements/sql-statement-set-variable.md | 6 +- 3 files changed, 248 insertions(+), 3 deletions(-) create mode 100644 faq/sql-faq.md diff --git a/faq/sql-faq.md b/faq/sql-faq.md new file mode 100644 index 0000000000000..791f4a1eb1edd --- /dev/null +++ b/faq/sql-faq.md @@ -0,0 +1,241 @@ +--- +title: SQL FAQs +summary: Learn about the FAQs related to TiDB SQL. +--- + +# SQL FAQs + +This document summarizes the FAQs related to SQL operations in TiDB. + +## What are the MySQL variables that TiDB is compatible with? + +See [System Variables](/system-variables.md). + +## Does TiDB support `SELECT FOR UPDATE`? + +Yes. When using pessimistic locking (the default since TiDB v3.0) the `SELECT FOR UPDATE` execution behaves similar to MySQL. + +When using optimistic locking, `SELECT FOR UPDATE` does not lock data when the transaction is started, but checks conflicts when the transaction is committed. If the check reveals conflicts, the committing transaction rolls back. + +## Can the codec of TiDB guarantee that the UTF-8 string is memcomparable? Is there any coding suggestion if our key needs to support UTF-8? + +TiDB uses the UTF-8 character set by default and currently only supports UTF-8. The string of TiDB uses the memcomparable format. + +## What is the maximum number of statements in a transaction? + +The maximum number of statements in a transaction is 5000 by default. + +## Why does the auto-increment ID of the later inserted data is smaller than that of the earlier inserted data in TiDB? + +The auto-increment ID feature in TiDB is only guaranteed to be automatically incremental and unique but is not guaranteed to be allocated sequentially. Currently, TiDB is allocating IDs in batches. If data is inserted into multiple TiDB servers simultaneously, the allocated IDs are not sequential. When multiple threads concurrently insert data to multiple `tidb-server` instances, the auto-increment ID of the later inserted data may be smaller. TiDB allows specifying `AUTO_INCREMENT` for the integer field, but allows only one `AUTO_INCREMENT` field in a single table. For details, see [Auto-increment ID](/mysql-compatibility.md#auto-increment-id). + +## How do I modify the `sql_mode` in TiDB? + +TiDB supports modifying the [`sql_mode`](/sql-mode.md) as a [system variable](/system-variables.md#sql_mode), as in MySQL. Currently, TiDB does not permit modifying the `sql mode` in a configuration file, but system variable changes made with [`SET GLOBAL`](/sql-statements/sql-statement-set-variable.md) propagate to all TiDB servers in the cluster and persist across restarts. + +## Error: `java.sql.BatchUpdateExecption:statement count 5001 exceeds the transaction limitation` while using Sqoop to write data into TiDB in batches + +In Sqoop, `--batch` means committing 100 statements in each batch, but by default each statement contains 100 SQL statements. So, 100 * 100 = 10000 SQL statements, which exceeds 5000, the maximum number of statements allowed in a single TiDB transaction. + +Two solutions: + +- Add the `-Dsqoop.export.records.per.statement=10` option as follows: + + {{< copyable "shell-regular" >}} + + ```bash + sqoop export \ + -Dsqoop.export.records.per.statement=10 \ + --connect jdbc:mysql://mysql.example.com/sqoop \ + --username sqoop ${user} \ + --password ${passwd} \ + --table ${tab_name} \ + --export-dir ${dir} \ + --batch + ``` + +- You can also increase the limited number of statements in a single TiDB transaction, but this will consume more memory. + +## Does TiDB have a function like the Flashback Query in Oracle? Does it support DDL? + + Yes, it does. And it supports DDL as well. For details, see [how TiDB reads data from history versions](/read-historical-data.md). + +## Does TiDB release space immediately after deleting data? + +None of the `DELETE`, `TRUNCATE` and `DROP` operations release data immediately. For the `TRUNCATE` and `DROP` operations, after the TiDB GC (Garbage Collection) time (10 minutes by default), the data is deleted and the space is released. For the `DELETE` operation, the data is deleted but the space is not released according to TiDB GC. When subsequent data is written into RocksDB and executes `COMPACT`, the space is reused. + +## Does TiDB support the `REPLACE INTO` syntax? + +Yes. The exception being that `LOAD DATA` does not currently support the `REPLACE INTO` syntax. + +## Why does the query speed get slow after data is deleted? + +Deleting a large amount of data leaves a lot of useless keys, affecting the query efficiency. Currently the [Region Merge](/best-practices/massive-regions-best-practices.md) feature is in development, which is expected to solve this problem. For details, see the [deleting data section in TiDB Best Practices](https://pingcap.com/blog/2017-07-24-tidbbestpractice/#write). + +## What should I do if it is slow to reclaim storage space after deleting data? + +You can configure concurrent GC to increase the speed of reclaiming storage space. The default concurrency is 1, and you can modify it to at most 50% of the number of TiKV instances using the following command: + +{{< copyable "sql" >}} + +```sql +update mysql.tidb set VARIABLE_VALUE="3" where VARIABLE_NAME="tikv_gc_concurrency"; +``` + +## Does `SHOW PROCESSLIST` display the system process ID? + +The display content of TiDB `SHOW PROCESSLIST` is almost the same as that of MySQL `SHOW PROCESSLIST`. TiDB `show processlist` does not display the system process ID. The ID that it displays is the current session ID. The differences between TiDB `show processlist` and MySQL `show processlist` are as follows: + +- As TiDB is a distributed database, the `tidb-server` instance is a stateless engine for parsing and executing the SQL statements (for details, see [TiDB architecture](/tidb-architecture.md)). `show processlist` displays the session list executed in the `tidb-server` instance that the user logs in to from the MySQL client, not the list of all the sessions running in the cluster. But MySQL is a standalone database and its `show processlist` displays all the SQL statements executed in MySQL. +- The `State` column in TiDB is not continually updated during query execution. As TiDB supports parallel query, each statement may be in multiple _states_ at once, and thus it is difficult to simplify to a single value. + +## How to control or change the execution priority of SQL commits? + +TiDB supports changing the priority on a [per-session](/system-variables.md#tidb_force_priority), [global](/tidb-configuration-file.md#force-priority) or individual statement basis. Priority has the following meaning: + +- `HIGH_PRIORITY`: this statement has a high priority, that is, TiDB gives priority to this statement and executes it first. + +- `LOW_PRIORITY`: this statement has a low priority, that is, TiDB reduces the priority of this statement during the execution period. + +You can combine the above two parameters with the DML of TiDB to use them. For example: + +1. Adjust the priority by writing SQL statements in the database: + + {{< copyable "sql" >}} + + ```sql + select HIGH_PRIORITY | LOW_PRIORITY count(*) from table_name; + insert HIGH_PRIORITY | LOW_PRIORITY into table_name insert_values; + delete HIGH_PRIORITY | LOW_PRIORITY from table_name; + update HIGH_PRIORITY | LOW_PRIORITY table_reference set assignment_list where where_condition; + replace HIGH_PRIORITY | LOW_PRIORITY into table_name; + ``` + +2. The full table scan statement automatically adjusts itself to a low priority. `analyze` has a low priority by default. + +## What's the trigger strategy for `auto analyze` in TiDB? + +Trigger strategy: `auto analyze` is automatically triggered when the number of pieces of data in a new table reaches 1000 and this table has no write operation within one minute. + +When the modified number or the current total row number is larger than `tidb_auto_analyze_ratio`, the `analyze` statement is automatically triggered. The default value of `tidb_auto_analyze_ratio` is 0.5, indicating that this feature is enabled by default. To ensure safety, its minimum value is 0.3 when the feature is enabled, and it must be smaller than `pseudo-estimate-ratio` whose default value is 0.8, otherwise pseudo statistics will be used for a period of time. It is recommended to set `tidb_auto_analyze_ratio` to 0.5. + +## Can I use hints to override the optimizer behavior? + +TiDB supports multiple ways to override the default query optimizer behavior, including [hints](/optimizer-hints.md) and [SQL Plan Management](/sql-plan-management.md). The basic usage is similar to MySQL, with several TiDB specific extensions: + +{{< copyable "sql" >}} + +```sql +SELECT column_name FROM table_name USE INDEX(index_name)WHERE where_condition; +``` + +## Why the `Information schema is changed` error is reported? + +TiDB handles the SQL statement using the `schema` of the time and supports online asynchronous DDL change. A DML statement and a DDL statement might be executed at the same time and you must ensure that each statement is executed using the same `schema`. Therefore, when the DML operation meets the ongoing DDL operation, the `Information schema is changed` error might be reported. Some improvements have been made to prevent too many error reportings during the DML operation. + +Now, there are still a few reasons for this error reporting (the latter two are unrelated to tables): + ++ Some tables involved in the DML operation are the same tables involved in the ongoing DDL operation. ++ The DML operation goes on for a long time. During this period, many DDL statements have been executed, which causes more than 1024 `schema` version changes. You can modify this default value by modifying the `tidb_max_delta_schema_count` variable. ++ The TiDB server that accepts the DML request is not able to load `schema information` for a long time (possibly caused by the connection failure between TiDB and PD or TiKV). During this period, many DDL statements have been executed, which causes more than 100 `schema` version changes. + +> **Note:** +> +> + Currently, TiDB does not cache all the `schema` version changes. +> + For each DDL operation, the number of `schema` version changes is the same with the number of corresponding `schema state` version changes. +> + Different DDL operations cause different number of `schema` version changes. For example, the `CREATE TABLE` statement causes one `schema` version change while the `ADD COLUMN` statement causes four. + +## What are the causes of the "Information schema is out of date" error? + +When executing a DML statement, if TiDB fails to load the latest schema within a DDL lease (45s by default), the `Information schema is out of date` error might occur. Possible causes are: + +- The TiDB instance that executed this DML was killed, and the transaction execution corresponding to this DML statement took longer than a DDL lease. When the transaction was committed, the error occurred. +- TiDB failed to connect to PD or TiKV while executing this DML statement. As a result, TiDB failed to load schema within a DDL lease or disconnected from PD due to the keepalive setting. + +## Error is reported when executing DDL statements under high concurrency? + +When you execute DDL statements (such as creating tables in batches) under high concurrency, a very few of these statements might fail because of key conflicts during the concurrent execution. + +It is recommended to keep the number of concurrent DDL statements under 20. Otherwise, you need to retry the failed statements from the client. + +## SQL optimization + +### TiDB execution plan description + +See [Understand the Query Execution Plan](/query-execution-plan.md). + +### Statistics collection + +See [Introduction to Statistics](/statistics.md). + +### How to optimize `select count(1)`? + +The `count(1)` statement counts the total number of rows in a table. Improving the degree of concurrency can significantly improve the speed. To modify the concurrency, refer to the [document](/system-variables.md#tidb_distsql_scan_concurrency). But it also depends on the CPU and I/O resources. TiDB accesses TiKV in every query. When the amount of data is small, all MySQL is in memory, and TiDB needs to conduct a network access. + +Recommendations: + +1. Improve the hardware configuration. See [Software and Hardware Requirements](/hardware-and-software-requirements.md). +2. Improve the concurrency. The default value is 10. You can improve it to 50 and have a try. But usually the improvement is 2-4 times of the default value. +3. Test the `count` in the case of large amount of data. +4. Optimize the TiKV configuration. See [Tune TiKV Thread Performance](/tune-tikv-thread-performance.md) and [Tune TiKV Memory Performance](/tune-tikv-memory-performance.md). +5. Enable the [Coprocessor Cache](/coprocessor-cache.md). + +### How to view the progress of the current DDL job? + +You can use `admin show ddl` to view the progress of the current DDL job. The operation is as follows: + +{{< copyable "sql" >}} + +```sql +admin show ddl; +``` + +``` +*************************** 1. row *************************** + SCHEMA_VER: 140 + OWNER: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc +RUNNING_JOBS: ID:121, Type:add index, State:running, SchemaState:write reorganization, SchemaID:1, TableID:118, RowCount:77312, ArgLen:0, start time: 2018-12-05 16:26:10.652 +0800 CST, Err:, ErrCount:0, SnapshotVersion:404749908941733890 + SELF_ID: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc +``` + +From the above results, you can get that the `add index` operation is being processed currently. You can also get from the `RowCount` field of the `RUNNING_JOBS` column that now the `add index` operation has added 77312 rows of indexes. + +### How to view the DDL job? + +- `admin show ddl`: to view the running DDL job +- `admin show ddl jobs`: to view all the results in the current DDL job queue (including tasks that are running and waiting to run) and the last ten results in the completed DDL job queue +- `admin show ddl job queries 'job_id' [, 'job_id'] ...`: to view the original SQL statement of the DDL task corresponding to the `job_id`; the `job_id` only searches the running DDL job and the last ten results in the DDL history job queue. + +### Does TiDB support CBO (Cost-Based Optimization)? If yes, to what extent? + +Yes. TiDB uses the cost-based optimizer. The cost model and statistics are constantly optimized. TiDB also supports join algorithms like hash join and sort-merge join. + +### How to determine whether I need to execute `analyze` on a table? + +View the `Healthy` field using `show stats_healthy` and generally you need to execute `analyze` on a table when the field value is smaller than 60. + +### What is the ID rule when a query plan is presented as a tree? What is the execution order for this tree? + +No rule exists for these IDs but the IDs are unique. When IDs are generated, a counter works and adds one when one plan is generated. The execution order has nothing to do with the ID. The whole query plan is a tree and the execution process starts from the root node and the data is returned to the upper level continuously. For details about the query plan, see [Understanding the TiDB Query Execution Plan](/query-execution-plan.md). + +### In the TiDB query plan, `cop` tasks are in the same root. Are they executed concurrently? + +Currently the computing tasks of TiDB belong to two different types of tasks: `cop task` and `root task`. + +`cop task` is the computing task which is pushed down to the KV end for distributed execution; `root task` is the computing task for single point execution on the TiDB end. + +Generally the input data of `root task` comes from `cop task`; when `root task` processes data, `cop task` of TiKV can processes data at the same time and waits for the pull of `root task` of TiDB. Therefore, `cop` tasks can be considered as executed concurrently; but their data has an upstream and downstream relationship. During the execution process, they are executed concurrently during some time. For example, the first `cop task` is processing the data in [100, 200] and the second `cop task` is processing the data in [1, 100]. For details, see [Understanding the TiDB Query Plan](/query-execution-plan.md). + +## Database optimization + +### Edit TiDB options + +See [The TiDB Command Options](/command-line-flags-for-tidb-configuration.md). + +### How to scatter the hotspots? + +In TiDB, data is divided into Regions for management. Generally, the TiDB hotspot means the Read/Write hotspot in a Region. In TiDB, for the table whose primary key (PK) is not an integer or which has no PK, you can properly break Regions by configuring `SHARD_ROW_ID_BITS` to scatter the Region hotspots. For details, see the introduction of `SHARD_ROW_ID_BITS` in [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md). + +### Tune TiKV performance + +See [Tune TiKV Thread Performance](/tune-tikv-thread-performance.md) and [Tune TiKV Memory Performance](/tune-tikv-memory-performance.md). diff --git a/faq/tidb-faq.md b/faq/tidb-faq.md index b04ad08d93d65..b50bd29e9bf4f 100644 --- a/faq/tidb-faq.md +++ b/faq/tidb-faq.md @@ -127,9 +127,9 @@ It is recommended to use the official standard statements when modifying the use The auto-increment ID feature in TiDB is only guaranteed to be automatically incremental and unique but is not guaranteed to be allocated sequentially. Currently, TiDB is allocating IDs in batches. If data is inserted into multiple TiDB servers simultaneously, the allocated IDs are not sequential. When multiple threads concurrently insert data to multiple `tidb-server` instances, the auto-increment ID of the later inserted data may be smaller. TiDB allows specifying `AUTO_INCREMENT` for the integer field, but allows only one `AUTO_INCREMENT` field in a single table. For details, see [MySQL Compatibility](/mysql-compatibility.md#auto-increment-id). -#### How to modify the `sql_mode` in TiDB except using the `set` command? +#### How do I modify the `sql_mode` in TiDB? -The configuration method of TiDB `sql_mode` is different from that of MySQL `sql_mode`. TiDB does not support using the configuration file to configure `sql\_mode` of the database; it only supports using the `set` command to configure `sql\_mode` of the database. You can use `set @@global.sql_mode = 'STRICT_TRANS_TABLES';` to configure it. +TiDB supports modifying the [`sql_mode`](/sql-mode.md) as a [system variable](/system-variables.md#sql_mode), as in MySQL. Currently, TiDB does not permit modifying the sql mode in a configuration file, but system variable changes made with [`SET GLOBAL`](/sql-statements/sql-statement-set-variable.md) propagate to all TiDB servers in the cluster and persist across restarts. #### Does TiDB support modifying the MySQL version string of the server to a specific one that is required by the security vulnerability scanning tool? diff --git a/sql-statements/sql-statement-set-variable.md b/sql-statements/sql-statement-set-variable.md index bdbfeed991c1d..81514391ef40a 100644 --- a/sql-statements/sql-statement-set-variable.md +++ b/sql-statements/sql-statement-set-variable.md @@ -66,7 +66,11 @@ mysql> SHOW SESSION VARIABLES LIKE 'sql_mode'; ## MySQL compatibility -This statement is understood to be fully compatible with MySQL. Any compatibility differences should be [reported via an issue](/report-issue.md) on GitHub. +The following behavior differences apply: + +* Changes made with `SET GLOBAL` will be propagated to all TiDB instances in the cluster. This differs from MySQL, where changes do not propagate to replicas. +* TiDB presents several variables as both readable and settable. This is required for MySQL compatibility, because it is common for both applications and connectors to read MySQL variables. For example: JDBC connectors both read and set query cache settings, despite not relying on the behavior. +* Changes made with `SET GLOBAL` will persist through TiDB server restarts. This means that `SET GLOBAL` in TiDB behaves more similar to `SET PERSIST` as available in MySQL 8.0 and above. ## See also From 83fd51bf045474cf7122ef2c1818f97bafc1c42b Mon Sep 17 00:00:00 2001 From: Null not nil <67764674+nullnotnil@users.noreply.github.com> Date: Tue, 28 Jul 2020 14:52:48 -0600 Subject: [PATCH 2/5] Delete sql-faq.md --- faq/sql-faq.md | 241 ------------------------------------------------- 1 file changed, 241 deletions(-) delete mode 100644 faq/sql-faq.md diff --git a/faq/sql-faq.md b/faq/sql-faq.md deleted file mode 100644 index 791f4a1eb1edd..0000000000000 --- a/faq/sql-faq.md +++ /dev/null @@ -1,241 +0,0 @@ ---- -title: SQL FAQs -summary: Learn about the FAQs related to TiDB SQL. ---- - -# SQL FAQs - -This document summarizes the FAQs related to SQL operations in TiDB. - -## What are the MySQL variables that TiDB is compatible with? - -See [System Variables](/system-variables.md). - -## Does TiDB support `SELECT FOR UPDATE`? - -Yes. When using pessimistic locking (the default since TiDB v3.0) the `SELECT FOR UPDATE` execution behaves similar to MySQL. - -When using optimistic locking, `SELECT FOR UPDATE` does not lock data when the transaction is started, but checks conflicts when the transaction is committed. If the check reveals conflicts, the committing transaction rolls back. - -## Can the codec of TiDB guarantee that the UTF-8 string is memcomparable? Is there any coding suggestion if our key needs to support UTF-8? - -TiDB uses the UTF-8 character set by default and currently only supports UTF-8. The string of TiDB uses the memcomparable format. - -## What is the maximum number of statements in a transaction? - -The maximum number of statements in a transaction is 5000 by default. - -## Why does the auto-increment ID of the later inserted data is smaller than that of the earlier inserted data in TiDB? - -The auto-increment ID feature in TiDB is only guaranteed to be automatically incremental and unique but is not guaranteed to be allocated sequentially. Currently, TiDB is allocating IDs in batches. If data is inserted into multiple TiDB servers simultaneously, the allocated IDs are not sequential. When multiple threads concurrently insert data to multiple `tidb-server` instances, the auto-increment ID of the later inserted data may be smaller. TiDB allows specifying `AUTO_INCREMENT` for the integer field, but allows only one `AUTO_INCREMENT` field in a single table. For details, see [Auto-increment ID](/mysql-compatibility.md#auto-increment-id). - -## How do I modify the `sql_mode` in TiDB? - -TiDB supports modifying the [`sql_mode`](/sql-mode.md) as a [system variable](/system-variables.md#sql_mode), as in MySQL. Currently, TiDB does not permit modifying the `sql mode` in a configuration file, but system variable changes made with [`SET GLOBAL`](/sql-statements/sql-statement-set-variable.md) propagate to all TiDB servers in the cluster and persist across restarts. - -## Error: `java.sql.BatchUpdateExecption:statement count 5001 exceeds the transaction limitation` while using Sqoop to write data into TiDB in batches - -In Sqoop, `--batch` means committing 100 statements in each batch, but by default each statement contains 100 SQL statements. So, 100 * 100 = 10000 SQL statements, which exceeds 5000, the maximum number of statements allowed in a single TiDB transaction. - -Two solutions: - -- Add the `-Dsqoop.export.records.per.statement=10` option as follows: - - {{< copyable "shell-regular" >}} - - ```bash - sqoop export \ - -Dsqoop.export.records.per.statement=10 \ - --connect jdbc:mysql://mysql.example.com/sqoop \ - --username sqoop ${user} \ - --password ${passwd} \ - --table ${tab_name} \ - --export-dir ${dir} \ - --batch - ``` - -- You can also increase the limited number of statements in a single TiDB transaction, but this will consume more memory. - -## Does TiDB have a function like the Flashback Query in Oracle? Does it support DDL? - - Yes, it does. And it supports DDL as well. For details, see [how TiDB reads data from history versions](/read-historical-data.md). - -## Does TiDB release space immediately after deleting data? - -None of the `DELETE`, `TRUNCATE` and `DROP` operations release data immediately. For the `TRUNCATE` and `DROP` operations, after the TiDB GC (Garbage Collection) time (10 minutes by default), the data is deleted and the space is released. For the `DELETE` operation, the data is deleted but the space is not released according to TiDB GC. When subsequent data is written into RocksDB and executes `COMPACT`, the space is reused. - -## Does TiDB support the `REPLACE INTO` syntax? - -Yes. The exception being that `LOAD DATA` does not currently support the `REPLACE INTO` syntax. - -## Why does the query speed get slow after data is deleted? - -Deleting a large amount of data leaves a lot of useless keys, affecting the query efficiency. Currently the [Region Merge](/best-practices/massive-regions-best-practices.md) feature is in development, which is expected to solve this problem. For details, see the [deleting data section in TiDB Best Practices](https://pingcap.com/blog/2017-07-24-tidbbestpractice/#write). - -## What should I do if it is slow to reclaim storage space after deleting data? - -You can configure concurrent GC to increase the speed of reclaiming storage space. The default concurrency is 1, and you can modify it to at most 50% of the number of TiKV instances using the following command: - -{{< copyable "sql" >}} - -```sql -update mysql.tidb set VARIABLE_VALUE="3" where VARIABLE_NAME="tikv_gc_concurrency"; -``` - -## Does `SHOW PROCESSLIST` display the system process ID? - -The display content of TiDB `SHOW PROCESSLIST` is almost the same as that of MySQL `SHOW PROCESSLIST`. TiDB `show processlist` does not display the system process ID. The ID that it displays is the current session ID. The differences between TiDB `show processlist` and MySQL `show processlist` are as follows: - -- As TiDB is a distributed database, the `tidb-server` instance is a stateless engine for parsing and executing the SQL statements (for details, see [TiDB architecture](/tidb-architecture.md)). `show processlist` displays the session list executed in the `tidb-server` instance that the user logs in to from the MySQL client, not the list of all the sessions running in the cluster. But MySQL is a standalone database and its `show processlist` displays all the SQL statements executed in MySQL. -- The `State` column in TiDB is not continually updated during query execution. As TiDB supports parallel query, each statement may be in multiple _states_ at once, and thus it is difficult to simplify to a single value. - -## How to control or change the execution priority of SQL commits? - -TiDB supports changing the priority on a [per-session](/system-variables.md#tidb_force_priority), [global](/tidb-configuration-file.md#force-priority) or individual statement basis. Priority has the following meaning: - -- `HIGH_PRIORITY`: this statement has a high priority, that is, TiDB gives priority to this statement and executes it first. - -- `LOW_PRIORITY`: this statement has a low priority, that is, TiDB reduces the priority of this statement during the execution period. - -You can combine the above two parameters with the DML of TiDB to use them. For example: - -1. Adjust the priority by writing SQL statements in the database: - - {{< copyable "sql" >}} - - ```sql - select HIGH_PRIORITY | LOW_PRIORITY count(*) from table_name; - insert HIGH_PRIORITY | LOW_PRIORITY into table_name insert_values; - delete HIGH_PRIORITY | LOW_PRIORITY from table_name; - update HIGH_PRIORITY | LOW_PRIORITY table_reference set assignment_list where where_condition; - replace HIGH_PRIORITY | LOW_PRIORITY into table_name; - ``` - -2. The full table scan statement automatically adjusts itself to a low priority. `analyze` has a low priority by default. - -## What's the trigger strategy for `auto analyze` in TiDB? - -Trigger strategy: `auto analyze` is automatically triggered when the number of pieces of data in a new table reaches 1000 and this table has no write operation within one minute. - -When the modified number or the current total row number is larger than `tidb_auto_analyze_ratio`, the `analyze` statement is automatically triggered. The default value of `tidb_auto_analyze_ratio` is 0.5, indicating that this feature is enabled by default. To ensure safety, its minimum value is 0.3 when the feature is enabled, and it must be smaller than `pseudo-estimate-ratio` whose default value is 0.8, otherwise pseudo statistics will be used for a period of time. It is recommended to set `tidb_auto_analyze_ratio` to 0.5. - -## Can I use hints to override the optimizer behavior? - -TiDB supports multiple ways to override the default query optimizer behavior, including [hints](/optimizer-hints.md) and [SQL Plan Management](/sql-plan-management.md). The basic usage is similar to MySQL, with several TiDB specific extensions: - -{{< copyable "sql" >}} - -```sql -SELECT column_name FROM table_name USE INDEX(index_name)WHERE where_condition; -``` - -## Why the `Information schema is changed` error is reported? - -TiDB handles the SQL statement using the `schema` of the time and supports online asynchronous DDL change. A DML statement and a DDL statement might be executed at the same time and you must ensure that each statement is executed using the same `schema`. Therefore, when the DML operation meets the ongoing DDL operation, the `Information schema is changed` error might be reported. Some improvements have been made to prevent too many error reportings during the DML operation. - -Now, there are still a few reasons for this error reporting (the latter two are unrelated to tables): - -+ Some tables involved in the DML operation are the same tables involved in the ongoing DDL operation. -+ The DML operation goes on for a long time. During this period, many DDL statements have been executed, which causes more than 1024 `schema` version changes. You can modify this default value by modifying the `tidb_max_delta_schema_count` variable. -+ The TiDB server that accepts the DML request is not able to load `schema information` for a long time (possibly caused by the connection failure between TiDB and PD or TiKV). During this period, many DDL statements have been executed, which causes more than 100 `schema` version changes. - -> **Note:** -> -> + Currently, TiDB does not cache all the `schema` version changes. -> + For each DDL operation, the number of `schema` version changes is the same with the number of corresponding `schema state` version changes. -> + Different DDL operations cause different number of `schema` version changes. For example, the `CREATE TABLE` statement causes one `schema` version change while the `ADD COLUMN` statement causes four. - -## What are the causes of the "Information schema is out of date" error? - -When executing a DML statement, if TiDB fails to load the latest schema within a DDL lease (45s by default), the `Information schema is out of date` error might occur. Possible causes are: - -- The TiDB instance that executed this DML was killed, and the transaction execution corresponding to this DML statement took longer than a DDL lease. When the transaction was committed, the error occurred. -- TiDB failed to connect to PD or TiKV while executing this DML statement. As a result, TiDB failed to load schema within a DDL lease or disconnected from PD due to the keepalive setting. - -## Error is reported when executing DDL statements under high concurrency? - -When you execute DDL statements (such as creating tables in batches) under high concurrency, a very few of these statements might fail because of key conflicts during the concurrent execution. - -It is recommended to keep the number of concurrent DDL statements under 20. Otherwise, you need to retry the failed statements from the client. - -## SQL optimization - -### TiDB execution plan description - -See [Understand the Query Execution Plan](/query-execution-plan.md). - -### Statistics collection - -See [Introduction to Statistics](/statistics.md). - -### How to optimize `select count(1)`? - -The `count(1)` statement counts the total number of rows in a table. Improving the degree of concurrency can significantly improve the speed. To modify the concurrency, refer to the [document](/system-variables.md#tidb_distsql_scan_concurrency). But it also depends on the CPU and I/O resources. TiDB accesses TiKV in every query. When the amount of data is small, all MySQL is in memory, and TiDB needs to conduct a network access. - -Recommendations: - -1. Improve the hardware configuration. See [Software and Hardware Requirements](/hardware-and-software-requirements.md). -2. Improve the concurrency. The default value is 10. You can improve it to 50 and have a try. But usually the improvement is 2-4 times of the default value. -3. Test the `count` in the case of large amount of data. -4. Optimize the TiKV configuration. See [Tune TiKV Thread Performance](/tune-tikv-thread-performance.md) and [Tune TiKV Memory Performance](/tune-tikv-memory-performance.md). -5. Enable the [Coprocessor Cache](/coprocessor-cache.md). - -### How to view the progress of the current DDL job? - -You can use `admin show ddl` to view the progress of the current DDL job. The operation is as follows: - -{{< copyable "sql" >}} - -```sql -admin show ddl; -``` - -``` -*************************** 1. row *************************** - SCHEMA_VER: 140 - OWNER: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc -RUNNING_JOBS: ID:121, Type:add index, State:running, SchemaState:write reorganization, SchemaID:1, TableID:118, RowCount:77312, ArgLen:0, start time: 2018-12-05 16:26:10.652 +0800 CST, Err:, ErrCount:0, SnapshotVersion:404749908941733890 - SELF_ID: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc -``` - -From the above results, you can get that the `add index` operation is being processed currently. You can also get from the `RowCount` field of the `RUNNING_JOBS` column that now the `add index` operation has added 77312 rows of indexes. - -### How to view the DDL job? - -- `admin show ddl`: to view the running DDL job -- `admin show ddl jobs`: to view all the results in the current DDL job queue (including tasks that are running and waiting to run) and the last ten results in the completed DDL job queue -- `admin show ddl job queries 'job_id' [, 'job_id'] ...`: to view the original SQL statement of the DDL task corresponding to the `job_id`; the `job_id` only searches the running DDL job and the last ten results in the DDL history job queue. - -### Does TiDB support CBO (Cost-Based Optimization)? If yes, to what extent? - -Yes. TiDB uses the cost-based optimizer. The cost model and statistics are constantly optimized. TiDB also supports join algorithms like hash join and sort-merge join. - -### How to determine whether I need to execute `analyze` on a table? - -View the `Healthy` field using `show stats_healthy` and generally you need to execute `analyze` on a table when the field value is smaller than 60. - -### What is the ID rule when a query plan is presented as a tree? What is the execution order for this tree? - -No rule exists for these IDs but the IDs are unique. When IDs are generated, a counter works and adds one when one plan is generated. The execution order has nothing to do with the ID. The whole query plan is a tree and the execution process starts from the root node and the data is returned to the upper level continuously. For details about the query plan, see [Understanding the TiDB Query Execution Plan](/query-execution-plan.md). - -### In the TiDB query plan, `cop` tasks are in the same root. Are they executed concurrently? - -Currently the computing tasks of TiDB belong to two different types of tasks: `cop task` and `root task`. - -`cop task` is the computing task which is pushed down to the KV end for distributed execution; `root task` is the computing task for single point execution on the TiDB end. - -Generally the input data of `root task` comes from `cop task`; when `root task` processes data, `cop task` of TiKV can processes data at the same time and waits for the pull of `root task` of TiDB. Therefore, `cop` tasks can be considered as executed concurrently; but their data has an upstream and downstream relationship. During the execution process, they are executed concurrently during some time. For example, the first `cop task` is processing the data in [100, 200] and the second `cop task` is processing the data in [1, 100]. For details, see [Understanding the TiDB Query Plan](/query-execution-plan.md). - -## Database optimization - -### Edit TiDB options - -See [The TiDB Command Options](/command-line-flags-for-tidb-configuration.md). - -### How to scatter the hotspots? - -In TiDB, data is divided into Regions for management. Generally, the TiDB hotspot means the Read/Write hotspot in a Region. In TiDB, for the table whose primary key (PK) is not an integer or which has no PK, you can properly break Regions by configuring `SHARD_ROW_ID_BITS` to scatter the Region hotspots. For details, see the introduction of `SHARD_ROW_ID_BITS` in [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md). - -### Tune TiKV performance - -See [Tune TiKV Thread Performance](/tune-tikv-thread-performance.md) and [Tune TiKV Memory Performance](/tune-tikv-memory-performance.md). From e4395b2881358a881c60c33999caf0332dd60419 Mon Sep 17 00:00:00 2001 From: Null not nil <67764674+nullnotnil@users.noreply.github.com> Date: Tue, 28 Jul 2020 15:01:04 -0600 Subject: [PATCH 3/5] Update tidb-faq.md --- faq/tidb-faq.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/faq/tidb-faq.md b/faq/tidb-faq.md index b50bd29e9bf4f..65059d3d0fe29 100644 --- a/faq/tidb-faq.md +++ b/faq/tidb-faq.md @@ -86,9 +86,11 @@ Currently, [TiDB documentation](https://pingcap.com/docs/) is the most important See [The System Variables](/system-variables.md). -#### Does TiDB support `select for update`? +#### Does TiDB support `SELECT FOR UPDATE`? -Yes. But it differs from MySQL in syntax. As a distributed database, TiDB uses the optimistic lock. `select for update` does not lock data when the transaction is started, but checks conflicts when the transaction is committed. If the check reveals conflicts, the committing transaction rolls back. +Yes. When using pessimistic locking (the default since TiDB v3.0) the `SELECT FOR UPDATE` execution behaves similar to MySQL. + +When using optimistic locking, `SELECT FOR UPDATE` does not lock data when the transaction is started, but checks conflicts when the transaction is committed. If the check reveals conflicts, the committing transaction rolls back. #### Can the codec of TiDB guarantee that the UTF-8 string is memcomparable? Is there any coding suggestion if our key needs to support UTF-8? @@ -590,9 +592,9 @@ Trigger strategy: `auto analyze` is automatically triggered when the number of p When the modified number or the current total row number is larger than `tidb_auto_analyze_ratio`, the `analyze` statement is automatically triggered. The default value of `tidb_auto_analyze_ratio` is 0.5, indicating that this feature is enabled by default. To ensure safety, its minimum value is 0.3 when the feature is enabled, and it must be smaller than `pseudo-estimate-ratio` whose default value is 0.8, otherwise pseudo statistics will be used for a period of time. It is recommended to set `tidb_auto_analyze_ratio` to 0.5. -#### How to use a specific index with hint in a SQL statement? +#### Can I use hints to override the optimizer behavior? -Its usage is similar to MySQL: +TiDB supports multiple [hints](/optimizer-hints.md) to override the default query optimizer behavior. The basic usage is similar to MySQL, with several TiDB specific extensions: {{< copyable "sql" >}} From 11e32effef50cb3ac581c6eda998f87b8f21c013 Mon Sep 17 00:00:00 2001 From: Null not nil <67764674+nullnotnil@users.noreply.github.com> Date: Tue, 28 Jul 2020 15:05:49 -0600 Subject: [PATCH 4/5] Update tidb-faq.md --- faq/tidb-faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/faq/tidb-faq.md b/faq/tidb-faq.md index 65059d3d0fe29..fdbdcb76fa726 100644 --- a/faq/tidb-faq.md +++ b/faq/tidb-faq.md @@ -131,7 +131,7 @@ The auto-increment ID feature in TiDB is only guaranteed to be automatically inc #### How do I modify the `sql_mode` in TiDB? -TiDB supports modifying the [`sql_mode`](/sql-mode.md) as a [system variable](/system-variables.md#sql_mode), as in MySQL. Currently, TiDB does not permit modifying the sql mode in a configuration file, but system variable changes made with [`SET GLOBAL`](/sql-statements/sql-statement-set-variable.md) propagate to all TiDB servers in the cluster and persist across restarts. +TiDB supports modifying the [`sql_mode`](/sql-mode.md) as a [system variable](/system-variables.md), as in MySQL. Currently, TiDB does not permit modifying the sql mode in a configuration file, but system variable changes made with [`SET GLOBAL`](/sql-statements/sql-statement-set-variable.md) propagate to all TiDB servers in the cluster and persist across restarts. #### Does TiDB support modifying the MySQL version string of the server to a specific one that is required by the security vulnerability scanning tool? From 4690c5c6f2bef09182952069466a91779eb46be8 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Wed, 29 Jul 2020 10:48:36 +0800 Subject: [PATCH 5/5] Update faq/tidb-faq.md --- faq/tidb-faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/faq/tidb-faq.md b/faq/tidb-faq.md index fdbdcb76fa726..2924969ad5114 100644 --- a/faq/tidb-faq.md +++ b/faq/tidb-faq.md @@ -84,7 +84,7 @@ Currently, [TiDB documentation](https://pingcap.com/docs/) is the most important #### What are the MySQL variables that TiDB is compatible with? -See [The System Variables](/system-variables.md). +See [System Variables](/system-variables.md). #### Does TiDB support `SELECT FOR UPDATE`?