+
+
+Before v6.2.0, the process of handling asynchronous schema changes in the TiDB SQL layer is as follows:
+
+1. MySQL Client sends a DDL request to a TiDB server.
+
+2. After receiving the request, a TiDB server parses and optimizes the request at the MySQL Protocol layer, and then sends it to the TiDB SQL layer for execution.
+
+ Once the SQL layer of TiDB receives the DDL request, it starts the `start job` module to encapsulate the request into a specific DDL job (that is, a DDL task), and then stores this job in the corresponding DDL job queue in the KV layer based on the statement type. The corresponding worker is notified of the job that requires processing.
+
+3. When receiving the notification to process the job, the worker determines whether it has the role of the DDL Owner. If it does, it directly processes the job. Otherwise, it exits without any processing.
+
+ If a TiDB server is not the Owner role, then another node must be the Owner. The worker of the node in the Owner role periodically checks whether there is an available job that can be executed. If such a job is identified, the worker will process the job.
+
+4. After the worker processes the Job, it removes the job from the job queue in the KV layer and places it in the `job history queue`. The `start job` module that encapsulated the job periodically checks the ID of the job in the `job history queue` to see whether it has been processed. If so, the entire DDL operation corresponding to the job ends.
+
+5. TiDB server returns the DDL processing result to the MySQL Client.
+
+Before TiDB v6.2.0, the DDL execution framework had the following limitations:
+
+- The TiKV cluster only has two queues: `general job queue` and `add index job queue`, which handle logical DDL and physical DDL, respectively.
+- The DDL Owner always processes DDL jobs in a first-in-first-out way.
+- The DDL Owner can only execute one DDL task of the same type (either logical or physical) at a time, which is relatively strict, and affects the user experience.
+
+These limitations might lead to some "unintended" DDL blocking behavior. For more details, see [SQL FAQ - DDL Execution](/faq/sql-faq.md#ddl-execution).
+
+
+
+
+Before TiDB v6.2.0, because the Owner can only execute one DDL task of the same type (either logical or physical) at a time, which is relatively strict, and affects the user experience.
+
+If there is no dependency between DDL tasks, parallel execution does not affect data correctness and consistency. For example, user A adds an index to the `T1` table, while user B deletes a column from the `T2` table. These two DDL statements can be executed in parallel.
+
+To improve the user experience of DDL execution, starting from v6.2.0, TiDB enables the Owner to determine the relevance of DDL tasks. The logic is as follows:
+
++ DDL statements to be performed on the same table are mutually blocked.
++ `DROP DATABASE` and DDL statements that affect all objects in the database are mutually blocked.
++ Adding indexes and column type changes on different tables can be executed concurrently.
++ A logical DDL statement must wait for the previous logical DDL statement to be executed before it can be executed.
++ In other cases, DDL can be executed based on the level of availability for concurrent DDL execution.
+
+In specific, TiDB has upgraded the DDL execution framework in v6.2.0 in the following aspects:
+
++ The DDL Owner can execute DDL tasks in parallel based on the preceding logic.
++ The first-in-first-out issue in the DDL Job queue has been addressed. The DDL Owner no longer selects the first job in the queue, but instead selects the job that can be executed at the current time.
++ The number of workers that handle physical DDL statements has been increased, enabling multiple physical DDL statements to be executed in parallel.
+
+ Because all DDL tasks in TiDB are implemented using an online change approach, TiDB can determine the relevance of new DDL jobs through the Owner, and schedule DDL tasks based on this information. This approach enables the distributed database to achieve the same level of DDL concurrency as traditional databases.
+
+The concurrent DDL framework enhances the execution capability of DDL statements in TiDB, making it more compatible with the usage patterns of commercial databases.
+
+
+
+
+## Best practices
+
+### Balance the physical DDL execution speed and the impact on application load through system variables
+
+When executing physical DDL statements (including adding indexes or column type changes), you can adjust the values of the following system variables to balance the speed of DDL execution and the impact on application load:
+
+- [`tidb_ddl_reorg_worker_cnt`](/system-variables.md#tidb_ddl_reorg_worker_cnt): This variable sets the number of reorg workers for a DDL operation, which controls the concurrency of backfilling.
+
+- [`tidb_ddl_reorg_batch_size`](/system-variables.md#tidb_ddl_reorg_batch_size): This variable sets the batch size for a DDL operation in the `re-organize` phase, which controls the amount of data to be backfilled.
+
+ Recommended values:
+
+ - If there is no other load, you can increase the values of `tidb_ddl_reorg_worker_cnt` and `tidb_ddl_reorg_batch_size` to speed up the `ADD INDEX` operation. For example, you can set the values of the two variables to `20` and `2048`, respectively.
+ - If there is other load, you can decrease the values of `tidb_ddl_reorg_worker_cnt` and `tidb_ddl_reorg_batch_size` to minimize the impact on other application. For example, you can set the values of the these variables to `4` and `256`, respectively.
+
+> **Tip:**
+>
+> - The preceding two variables can be dynamically adjusted during the execution of a DDL task, and take effect in the next transaction batch.
+> - Choose the appropriate time to execute the DDL operation based on the type of the operation and the application load pressure. For example, it is recommended to run the `ADD INDEX` operation when the application load is low.
+> - Because the duration of adding an index is relatively long, TiDB will execute the task in the background after the command is sent. If the TiDB server is down, the execution will not be affected.
+
+### Quickly create many tables by concurrently sending DDL requests
+
+A table creation operation takes about 50 milliseconds. The actual time taken to create a table might be longer because of the framework limitations.
+
+To create tables faster, it is recommended to send multiple DDL requests concurrently to achieve the fastest table creation speed. If you send DDL requests serially and do not send them to the Owner node, the table creation speed will be very slow.
+
+### Make multiple changes in a single `ALTER` statement
+
+Starting from v6.2.0, TiDB supports modifying multiple schema objects (such as columns and indexes) of a table in a single `ALTER` statement while ensuring the atomicity of the entire statement. Therefore, it is recommended to make multiple changes in a single `ALTER` statement.
+
+### Check the read and write performance
+
+When TiDB is adding an index, the phase of backfilling data will cause read and write pressure on the cluster. After the `ADD INDEX` command is sent and the `write reorg` phase starts, it is recommended to check the read and write performance metrics of TiDB and TiKV on the Grafana dashboard and the application response time, to determine whether the `ADD INDEX` operation affects the cluster.
+
+## DDL-related commands
+
+- `ADMIN SHOW DDL`: Used to view the status of TiDB DDL operations, including the current schema version number, the DDL ID and address of the DDL Owner, the DDL task and SQL being executed, and the DDL ID of the current TiDB instance. For details, see [`ADMIN SHOW DDL`](/sql-statements/sql-statement-admin-show-ddl.md#admin-show-ddl).
+
+- `ADMIN SHOW DDL JOBS`: Used to view the detailed status of DDL tasks running in the cluster environment. For details, see [`ADMIN SHOW DDL JOBS`](/sql-statements/sql-statement-admin-show-ddl.md#admin-show-ddl-jobs).
+
+- `ADMIN SHOW DDL JOB QUERIES job_id [, job_id]`: Used to view the original SQL statement of the DDL task corresponding to the `job_id`. For details, see [`ADMIN SHOW DDL JOB QUERIES`](/sql-statements/sql-statement-admin-show-ddl.md#admin-show-ddl-job-queries).
+
+- `ADMIN CANCEL DDL JOBS job_id, [, job_id]`: Used to cancel DDL tasks that have been submitted but not completed. After the cancellation is completed, the SQL statement that executes the DDL task returns the `ERROR 8214 (HY000): Cancelled DDL job` error.
+
+ If a completed DDL task is canceled, you can see the `DDL Job:90 not found` error in the `RESULT` column, which means that the task has been removed from the DDL waiting queue.
+
+## Common questions
+
+For common questions about DDL execution, see [SQL FAQ - DDL execution](/faq/sql-faq.md#ddl-execution).
diff --git a/faq/sql-faq.md b/faq/sql-faq.md
index 926d65c1ed591..2716880bc9339 100644
--- a/faq/sql-faq.md
+++ b/faq/sql-faq.md
@@ -237,9 +237,43 @@ TiDB supports multiple ways to override the default query optimizer behavior, in
SELECT column_name FROM table_name USE INDEX(index_name)WHERE where_condition;
```
-## Why the `Information schema is changed` error is reported?
+## DDL Execution
-TiDB handles the SQL statement using the `schema` of the time and supports online asynchronous DDL change. A DML statement and a DDL statement might be executed at the same time and you must ensure that each statement is executed using the same `schema`. Therefore, when the DML operation meets the ongoing DDL operation, the `Information schema is changed` error might be reported. Some improvements have been made to prevent too many error reportings during the DML operation.
+This section lists issues related to DDL statement execution. For detailed explanations on the DDL execution principles, see [Execution Principles and Best Practices of DDL Statements](/ddl-introduction.md).
+
+### How long does it take to perform various DDL operations?
+
+Assume that DDL operations are not blocked, each TiDB server can update the schema version normally, and the DDL Owner node is running normally. In this case, the estimated time for various DDL operations is as follows:
+
+| DDL Operation Type | Estimated Time |
+|:----------|:-----------|
+| Reorg DDL, such as `ADD INDEX`, `MODIFY COLUMN` (Reorg type data changes) | Depends on the amount of data, system load, and DDL parameter settings. |
+| General DDL (DDL types other than Reorg), such as `CREATE DATABASE`, `CREATE TABLE`, `DROP DATABASE`, `DROP TABLE`, `TRUNCATE TABLE`, `ALTER TABLE ADD`, `ALTER TABLE DROP`, `MODIFY COLUMN` (only changes metadata), `DROP INDEX` | About 1 second |
+
+> **Note:**
+>
+> The above is estimated time for the operations. The actual time might be different.
+
+### Possible reasons why DDL execution is slow
+
+- In a user session, if there is a non-auto-commit DML statement before a DDL statement, and if the commit operation of the non-auto-commit DML statement is slow, it will cause the DDL statement to execute slowly. That is, TiDB commits the uncommitted DML statement before executing the DDL statement.
+
+- When multiple DDL statements are executed together, the execution of the later DDL statements might be slower because they might need to wait in queue. Queuing scenarios include:
+
+ - The same type of DDL statements need to be queued. For example, both `CREATE TABLE` and `CREATE DATABASE` are general DDL statements, so when both operations are executed at the same time, they need to be queued. Starting from TiDB v6.2.0, parallel DDL statements are supported, but to avoid DDL execution using too many TiDB computing resources, there is also a concurrency limit. Queuing occurs when DDL exceeds the concurrency limit.
+ - The DDL operations performed on the same table have a dependency relationship between them. The later DDL statement needs to wait for the previous DDL operation to complete.
+
+- After the cluster is started normally, the execution time of the first DDL operation might be relatively long because the DDL module is electing the DDL Owner.
+
+- TiDB is terminated, which causes TiDB to not able to communicate with PD normally (including power-off situations). Or TiDB is terminated by the `kill -9` command, which causes TiDB to not timely clear the registration data from PD.
+
+- A communication problem occurs between a certain TiDB node in the cluster and PD or TiKV, which makes TiDB not able to obtain the latest version information in time.
+
+### What triggers the `Information schema is changed` error?
+
+When executing SQL statements, TiDB determines the schema version of an object based on the isolation level and processes the SQL statement accordingly. TiDB also supports online asynchronous DDL changes. When you execute DML statements, there might be DDL statements being executed at the same time, and you need to ensure that each SQL statement is executed on the same schema. Therefore, when executing DML, if a DDL operation is ongoing, TiDB might report an `Information schema is changed` error.
+
+Starting from v6.4.0, TiDB has implemented a [metadata lock mechanism](/metadata-lock.md), which allows the coordinated execution of DML statements and DDL schema changes, and avoids most `Information schema is changed` errors.
Now, there are still a few causes for this error reporting:
@@ -256,19 +290,45 @@ In the preceding causes, only Cause 1 is related to tables. Cause 1 and Cause 2
> + For each DDL operation, the number of `schema` version changes is the same with the number of corresponding `schema state` version changes.
> + Different DDL operations cause different number of `schema` version changes. For example, the `CREATE TABLE` statement causes one `schema` version change while the `ADD COLUMN` statement causes four.
-## What are the causes of the "Information schema is out of date" error?
+### What are the causes of the "Information schema is out of date" error?
-When executing a DML statement, if TiDB fails to load the latest schema within a DDL lease (45s by default), the `Information schema is out of date` error might occur. Possible causes are:
+Before TiDB v6.5.0, when executing a DML statement, if TiDB fails to load the latest schema within a DDL lease (45s by default), the `Information schema is out of date` error might occur. Possible causes are:
- The TiDB instance that executed this DML was killed, and the transaction execution corresponding to this DML statement took longer than a DDL lease. When the transaction was committed, the error occurred.
- TiDB failed to connect to PD or TiKV while executing this DML statement. As a result, TiDB failed to load schema within a DDL lease or disconnected from PD due to the keepalive setting.
-## Error is reported when executing DDL statements under high concurrency?
+### Error is reported when executing DDL statements under high concurrency?
When you execute DDL statements (such as creating tables in batches) under high concurrency, a very few of these statements might fail because of key conflicts during the concurrent execution.
It is recommended to keep the number of concurrent DDL statements under 20. Otherwise, you need to retry the failed statements from the client.
+### Why is DDL execution blocked?
+
+Before TiDB v6.2.0, TiDB allocates DDL statements to two first-in-first-out queues based on the type of DDL statements. More specifically, Reorg DDLs go to the Reorg queue and General DDLs go to the general queue. Because of the first-in-first-out limitation and the need for serial execution of DDL statements on the same table, multiple DDL statements might be blocked during execution.
+
+For example, consider the following DDL statements:
+
+- DDL 1: `CREATE INDEX idx on t(a int);`
+- DDL 2: `ALTER TABLE t ADD COLUMN b int;`
+- DDL 3: `CREATE TABLE t1(a int);`
+
+Due to the limitation of the first-in-first-out queue, DDL 3 must wait for DDL 2 to execute. Also, because DDL statements on the same table need to be executed in serial, DDL 2 must wait for DDL 1 to execute. Therefore, DDL 3 needs to wait for DDL 1 to be executed first, even if they operate on different tables.
+
+Starting from TiDB v6.2.0, the TiDB DDL module uses a concurrent framework. In the concurrent framework, there is no longer the limitation of the first-in-first-out queue. Instead, TiDB picks up the DDL task that can be executed from all DDL tasks. Additionally, the number of Reorg workers has been expanded, approximately to `CPU/4` per node. This allows TiDB to build indexes for multiple tables simultaneously in the concurrent framework.
+
+Whether your cluster is a new cluster or an upgraded cluster from an earlier version, TiDB automatically uses the concurrent framework in TiDB v6.2 and later versions. You do not need to make manual adjustments.
+
+### Identify the cause of stuck DDL execution
+
+1. Eliminate other reasons that make the DDL statement execution slow.
+2. Use one of the following methods to identify the DDL Owner node:
+ - Use `curl http://{TiDBIP}:10080/info/all` to obtain the owner of the current cluster.
+ - View the owner during a specific time period from the monitoring dashboard **DDL** > **DDL META OPM**.
+
+- If the owner does not exist, try manually triggering owner election with: `curl -X POST http://{TiDBIP}:10080/ddl/owner/resign`.
+- If the owner exists, export the Goroutine stack and check for the possible stuck location.
+
## SQL optimization
### TiDB execution plan description
diff --git a/media/ddl-owner.png b/media/ddl-owner.png
new file mode 100644
index 0000000000000..b2524329b9ee1
Binary files /dev/null and b/media/ddl-owner.png differ
diff --git a/sql-statements/sql-statement-admin-cancel-ddl.md b/sql-statements/sql-statement-admin-cancel-ddl.md
index 375883115c132..fe95d83092080 100644
--- a/sql-statements/sql-statement-admin-cancel-ddl.md
+++ b/sql-statements/sql-statement-admin-cancel-ddl.md
@@ -8,6 +8,8 @@ category: reference
The `ADMIN CANCEL DDL` statement allows you to cancel a running DDL job. The `job_id` can be found by running [`ADMIN SHOW DDL JOBS`](/sql-statements/sql-statement-admin-show-ddl.md).
+The `ADMIN CANCEL DDL` statement also allows you to cancel a DDL job that is committed but not yet completed executing. After the cancellation, the SQL statement that executes the DDL job returns the `ERROR 8214 (HY000): Cancelled DDL job` error. If you cancel a DDL job that has already been completed, you will see the `DDL Job:90 not found` error in the `RESULT` column, which indicates that the job has been removed from the DDL waiting queue.
+
## Synopsis
```ebnf+diagram
diff --git a/sql-statements/sql-statement-admin-show-ddl.md b/sql-statements/sql-statement-admin-show-ddl.md
index ac0ed77c28abf..ecbad399bcac9 100644
--- a/sql-statements/sql-statement-admin-show-ddl.md
+++ b/sql-statements/sql-statement-admin-show-ddl.md
@@ -24,7 +24,7 @@ WhereClauseOptional ::=
### `ADMIN SHOW DDL`
-To view the currently running DDL jobs, use `ADMIN SHOW DDL`:
+To view the status of the currently running DDL jobs, use `ADMIN SHOW DDL`. The output includes the current schema version, the DDL ID and address of the owner, the running DDL jobs and SQL statements, and the DDL ID of the current TiDB instance.
{{< copyable "sql" >}}
@@ -44,7 +44,57 @@ mysql> ADMIN SHOW DDL;
### `ADMIN SHOW DDL JOBS`
-To view all the results in the current DDL job queue (including tasks that are running and waiting to be run) and the last ten results in the completed DDL job queue, use `ADMIN SHOW DDL JOBS`:
+The `ADMIN SHOW DDL JOBS` statement is used to view all the results in the current DDL job queue, including running and queuing tasks, as well as the latest ten results in the completed DDL job queue. The returned result fields are described as follows:
+
+