diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 5733a733d5795..749d010af8e4e 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -429,6 +429,7 @@ - [`BACKUP`](/sql-statements/sql-statement-backup.md) - [`BATCH`](/sql-statements/sql-statement-batch.md) - [`BEGIN`](/sql-statements/sql-statement-begin.md) + - [`CANCEL DISTRIBUTION JOB`](/sql-statements/sql-statement-cancel-distribution-job.md) - [`CANCEL IMPORT JOB`](/sql-statements/sql-statement-cancel-import-job.md) - [`COMMIT`](/sql-statements/sql-statement-commit.md) - [`CREATE [GLOBAL|SESSION] BINDING`](/sql-statements/sql-statement-create-binding.md) @@ -446,6 +447,7 @@ - [`DELETE`](/sql-statements/sql-statement-delete.md) - [`DESC`](/sql-statements/sql-statement-desc.md) - [`DESCRIBE`](/sql-statements/sql-statement-describe.md) + - [`DISTRIBUTE TABLE`](/sql-statements/sql-statement-distribute-table.md) - [`DO`](/sql-statements/sql-statement-do.md) - [`DROP [GLOBAL|SESSION] BINDING`](/sql-statements/sql-statement-drop-binding.md) - [`DROP DATABASE`](/sql-statements/sql-statement-drop-database.md) @@ -509,6 +511,7 @@ - [`SHOW CREATE TABLE`](/sql-statements/sql-statement-show-create-table.md) - [`SHOW CREATE USER`](/sql-statements/sql-statement-show-create-user.md) - [`SHOW DATABASES`](/sql-statements/sql-statement-show-databases.md) + - [`SHOW DISTRIBUTION JOBS`](/sql-statements/sql-statement-show-distribution-jobs.md) - [`SHOW ENGINES`](/sql-statements/sql-statement-show-engines.md) - [`SHOW ERRORS`](/sql-statements/sql-statement-show-errors.md) - [`SHOW FIELDS FROM`](/sql-statements/sql-statement-show-fields-from.md) @@ -531,6 +534,7 @@ - [`SHOW STATS_META`](/sql-statements/sql-statement-show-stats-meta.md) - [`SHOW STATS_TOPN`](/sql-statements/sql-statement-show-stats-topn.md) - [`SHOW STATUS`](/sql-statements/sql-statement-show-status.md) + - [`SHOW TABLE DISTRIBUTION`](/sql-statements/sql-statement-show-table-distribution.md) - [`SHOW TABLE NEXT_ROW_ID`](/sql-statements/sql-statement-show-table-next-rowid.md) - [`SHOW TABLE REGIONS`](/sql-statements/sql-statement-show-table-regions.md) - [`SHOW TABLE STATUS`](/sql-statements/sql-statement-show-table-status.md) @@ -735,6 +739,7 @@ - [Table Filter](/table-filter.md) - [URI Formats of External Storage Services](/external-storage-uri.md) - [DDL Execution Principles and Best Practices](/ddl-introduction.md) + - [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md) - [Batch Processing](/batch-processing.md) - [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md) - [Notifications](/tidb-cloud/notifications.md) diff --git a/TOC.md b/TOC.md index 2065601e434f4..f8f91d339e782 100644 --- a/TOC.md +++ b/TOC.md @@ -813,6 +813,7 @@ - [`BATCH`](/sql-statements/sql-statement-batch.md) - [`BEGIN`](/sql-statements/sql-statement-begin.md) - [`CALIBRATE RESOURCE`](/sql-statements/sql-statement-calibrate-resource.md) + - [`CANCEL DISTRIBUTION JOB`](/sql-statements/sql-statement-cancel-distribution-job.md) - [`CANCEL IMPORT JOB`](/sql-statements/sql-statement-cancel-import-job.md) - [`COMMIT`](/sql-statements/sql-statement-commit.md) - [`CREATE BINDING`](/sql-statements/sql-statement-create-binding.md) @@ -830,6 +831,7 @@ - [`DELETE`](/sql-statements/sql-statement-delete.md) - [`DESC`](/sql-statements/sql-statement-desc.md) - [`DESCRIBE`](/sql-statements/sql-statement-describe.md) + - [`DISTRIBUTE TABLE`](/sql-statements/sql-statement-distribute-table.md) - [`DO`](/sql-statements/sql-statement-do.md) - [`DROP BINDING`](/sql-statements/sql-statement-drop-binding.md) - [`DROP DATABASE`](/sql-statements/sql-statement-drop-database.md) @@ -894,6 +896,7 @@ - [`SHOW CREATE TABLE`](/sql-statements/sql-statement-show-create-table.md) - [`SHOW CREATE USER`](/sql-statements/sql-statement-show-create-user.md) - [`SHOW DATABASES`](/sql-statements/sql-statement-show-databases.md) + - [`SHOW DISTRIBUTION JOBS`](/sql-statements/sql-statement-show-distribution-jobs.md) - [`SHOW ENGINES`](/sql-statements/sql-statement-show-engines.md) - [`SHOW ERRORS`](/sql-statements/sql-statement-show-errors.md) - [`SHOW FIELDS FROM`](/sql-statements/sql-statement-show-fields-from.md) @@ -916,6 +919,7 @@ - [`SHOW STATS_META`](/sql-statements/sql-statement-show-stats-meta.md) - [`SHOW STATS_TOPN`](/sql-statements/sql-statement-show-stats-topn.md) - [`SHOW STATUS`](/sql-statements/sql-statement-show-status.md) + - [`SHOW TABLE DISTRIBUTION`](/sql-statements/sql-statement-show-table-distribution.md) - [`SHOW TABLE NEXT_ROW_ID`](/sql-statements/sql-statement-show-table-next-rowid.md) - [`SHOW TABLE REGIONS`](/sql-statements/sql-statement-show-table-regions.md) - [`SHOW TABLE STATUS`](/sql-statements/sql-statement-show-table-status.md) @@ -1078,6 +1082,7 @@ - [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md) - [URI Formats of External Storage Services](/external-storage-uri.md) - [Interaction Test on Online Workloads and `ADD INDEX` Operations](/benchmark/online-workloads-and-add-index-operations.md) + - [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md) - FAQs - [FAQ Summary](/faq/faq-overview.md) - [TiDB FAQs](/faq/tidb-faq.md) diff --git a/ddl_embedded_analyze.md b/ddl_embedded_analyze.md new file mode 100644 index 0000000000000..4056c761bf3c4 --- /dev/null +++ b/ddl_embedded_analyze.md @@ -0,0 +1,177 @@ +--- +title: "`ANALYZE` Embedded in DDL Statements" +summary: This document describes the `ANALYZE` feature embedded in DDL statements for newly created or reorganized indexes, which ensures that statistics for new indexes are updated promptly. +--- + +# `ANALYZE` Embedded in DDL Statements Introduced in v8.5.4 + +This document describes the `ANALYZE` feature embedded in the following two types of DDL statements: + +- DDL statements that create new indexes: [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) +- DDL statements that reorganize existing indexes: [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) + +When this feature is enabled, TiDB automatically runs an `ANALYZE` (statistics collection) operation before the new or reorganized index becomes visible to users. This prevents inaccurate optimizer estimates and potential plan changes caused by temporarily unavailable statistics after index creation or reorganization. + +## Usage scenarios + +In scenarios where DDL operations alternately add or modify indexes, existing stable queries might suffer from estimation bias because the new index lacks statistics, causing the optimizer to choose suboptimal plans. For more information, see [Issue #57948](https://github.com/pingcap/tidb/issues/57948). + +For example: + +```sql +CREATE TABLE t (a INT, b INT); +INSERT INTO t VALUES (1, 1), (2, 2), (3, 3); +INSERT INTO t SELECT * FROM t; -- * N times + +ALTER TABLE t ADD INDEX idx_a (a); + +EXPLAIN SELECT * FROM t WHERE a > 4; +``` + +``` ++-------------------------+-----------+-----------+---------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+-----------+-----------+---------------+--------------------------------+ +| TableReader_8 | 131072.00 | root | | data:Selection_7 | +| └─Selection_7 | 131072.00 | cop[tikv] | | gt(test.t.a, 4) | +| └─TableFullScan_6 | 393216.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+-----------+-----------+---------------+--------------------------------+ +3 rows in set (0.002 sec) +``` + +In the preceding plan, because the newly created index has no statistics yet, TiDB can only rely on heuristic rules for path estimation. Unless the index access path requires no table lookup and has a significantly lower cost, the optimizer tends to choose the more stable existing path. In the preceding example, it chooses a full table scan. However, from the data distribution perspective, `t.a > 4` actually returns 0 rows. If the new index `idx_a` were used, the query could quickly locate relevant rows and avoid the full table scan. In this example, because statistics are not promptly collected after the DDL creates the index, the generated plan is not optimal, but the optimizer continues to use the original plan so query performance does not sharply regress. However, according to [Issue #57948](https://github.com/pingcap/tidb/issues/57948), in some cases heuristics might cause an unreasonable comparison between old and new indexes, pruning the index that the original plan relies on and ultimately falling back to a full table scan. + +Starting from v8.5.0, TiDB has improved heuristic comparisons between indexes and behaviors when statistics are missing. Still, in some complex scenarios, embedding `ANALYZE` in DDL is the best way to prevent plan changes. You can control whether to run embedded `ANALYZE` during index creation or reorganization with the system variable [`tidb_stats_update_during_ddl`](/system-variables.md#tidb_stats_update_during_ddl-new-in-v854-and-v900). The default value is `OFF`. + +## `ADD INDEX` DDL + +When `tidb_stats_update_during_ddl` is `ON`, executing [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) automatically runs an embedded `ANALYZE` operation after the Reorg phase finishes. This `ANALYZE` operation collects statistics for the newly created index before the index becomes visible to users, and then `ADD INDEX` proceeds with its remaining phases. + +Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously. + +For example: + +```sql +CREATE TABLE t (a INT, b INT, c INT); +Query OK, 0 rows affected (0.011 sec) + +INSERT INTO t VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3); +Query OK, 3 rows affected (0.003 sec) +Records: 3 Duplicates: 0 Warnings: 0 + +SET @@tidb_stats_update_during_ddl = 1; +Query OK, 0 rows affected (0.001 sec) + +ALTER TABLE t ADD INDEX idx (a, b); +Query OK, 0 rows affected (0.049 sec) +``` + +```sql +EXPLAIN SELECT a FROM t WHERE a > 1; +``` + +``` ++------------------------+---------+-----------+--------------------------+----------------------------------+ +| id | estRows | task | access object | operator info | ++------------------------+---------+-----------+--------------------------+----------------------------------+ +| IndexReader_7 | 4.00 | root | | index:IndexRangeScan_6 | +| └─IndexRangeScan_6 | 4.00 | cop[tikv] | table:t, index:idx(a, b) | range:(1,+inf], keep order:false | ++------------------------+---------+-----------+--------------------------+----------------------------------+ +2 rows in set (0.002 sec) +``` + +```sql +SHOW STATS_HISTOGRAMS WHERE table_name = "t"; +``` + +``` ++---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+ +| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time | Distinct_count | Null_count | Avg_col_size | Correlation | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage | ++---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+ +| test | t | | a | 0 | 2025-10-30 20:17:57 | 3 | 0 | 0.5 | 1 | allLoaded | 155 | 0 | 155 | 0 | +| test | t | | idx | 1 | 2025-10-30 20:17:57 | 3 | 0 | 0 | 0 | allLoaded | 182 | 0 | 182 | 0 | ++---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+ +2 rows in set (0.013 sec) +``` + +```sql +ADMIN SHOW DDL JOBS 1; +``` + +``` ++--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+ +| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | COMMENTS | ++--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+ +| 151 | test | t | add index | write reorganization | 2 | 148 | 6291456 | 2025-10-29 00:14:47.181000 | 2025-10-29 00:14:47.183000 | NULL | running | analyzing, txn-merge, max_node_count=3 | ++--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+ +1 rows in set (0.001 sec) +``` + +From the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `ADD INDEX` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics. + +## DDL for reorganizing existing indexes + +When `tidb_stats_update_during_ddl` is `ON`, executing [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) or [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) that reorganizes an index will also run an embedded `ANALYZE` operation after the Reorg phase completes. The mechanism is the same as for `ADD INDEX`: + +- Start collecting statistics before the index becomes visible. +- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously. + +For example: + +```sql +CREATE TABLE s (a VARCHAR(10), INDEX idx (a)); +Query OK, 0 rows affected (0.012 sec) + +INSERT INTO s VALUES (1), (2), (3); +Query OK, 3 rows affected (0.003 sec) +Records: 3 Duplicates: 0 Warnings: 0 + +SET @@tidb_stats_update_during_ddl = 1; +Query OK, 0 rows affected (0.001 sec) + +ALTER TABLE s MODIFY COLUMN a INT; +Query OK, 0 rows affected (0.056 sec) + +EXPLAIN SELECT * FROM s WHERE a > 1; +``` + +``` ++------------------------+---------+-----------+-----------------------+----------------------------------+ +| id | estRows | task | access object | operator info | ++------------------------+---------+-----------+-----------------------+----------------------------------+ +| IndexReader_7 | 2.00 | root | | index:IndexRangeScan_6 | +| └─IndexRangeScan_6 | 2.00 | cop[tikv] | table:s, index:idx(a) | range:(1,+inf], keep order:false | ++------------------------+---------+-----------+-----------------------+----------------------------------+ +2 rows in set (0.005 sec) +``` + +```sql +SHOW STATS_HISTOGRAMS WHERE table_name = "s"; +``` + +``` ++---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+ +| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time | Distinct_count | Null_count | Avg_col_size | Correlation | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage | ++---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+ +| test | s | | a | 0 | 2025-10-30 20:10:18 | 3 | 0 | 2 | 1 | allLoaded | 158 | 0 | 158 | 0 | +| test | s | | a | 0 | 2025-10-30 20:10:18 | 3 | 0 | 1 | 1 | allLoaded | 155 | 0 | 155 | 0 | +| test | s | | idx | 1 | 2025-10-30 20:10:18 | 3 | 0 | 0 | 0 | allLoaded | 158 | 0 | 158 | 0 | +| test | s | | idx | 1 | 2025-10-30 20:10:18 | 3 | 0 | 0 | 0 | allLoaded | 155 | 0 | 155 | 0 | ++---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+ +4 rows in set (0.008 sec) +``` + +```sql +ADMIN SHOW DDL JOBS 1; +``` + +``` ++--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+ +| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | COMMENTS | ++--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+ +| 153 | test | s | modify column | write reorganization | 2 | 148 | 12582912 | 2025-10-29 00:26:49.240000 | 2025-10-29 00:26:49.244000 | NULL | running | analyzing | ++--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+ +1 rows in set (0.001 sec) +``` + +From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `MODIFY COLUMN` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics. diff --git a/develop/dev-guide-sample-application-java-jdbc.md b/develop/dev-guide-sample-application-java-jdbc.md index 2de789c41ac10..9e4f66f36b89f 100644 --- a/develop/dev-guide-sample-application-java-jdbc.md +++ b/develop/dev-guide-sample-application-java-jdbc.md @@ -310,6 +310,17 @@ Unless you need to write complex SQL statements, it is recommended to use [ORM]( - Reduce [boilerplate code](https://en.wikipedia.org/wiki/Boilerplate_code) for managing connections and transactions. - Manipulate data with data objects instead of a number of SQL statements. +### MySQL compatibility + +In MySQL, when you insert data into a `DECIMAL` column, if the number of decimal places exceeds the column's defined scale, MySQL automatically truncates the extra digits and inserts the truncated data successfully, regardless of how many extra decimal places there are. + +In TiDB v8.5.3 and earlier versions: + +- If the number of decimal places exceeds the defined scale but does not exceed 72, TiDB also automatically truncates the extra digits and inserts the truncated data successfully. +- However, if the number of decimal places exceeds 72, the insertion fails and returns an error. + +Starting from TiDB v8.5.4, TiDB aligns its behavior with MySQL: regardless of how many extra decimal places there are, it automatically truncates the extra digits and inserts the truncated data successfully. + ## Next steps - Learn more usage of MySQL Connector/J from [the documentation of MySQL Connector/J](https://dev.mysql.com/doc/connector-j/en/). diff --git a/follower-read.md b/follower-read.md index 76156e996f61a..672263f7880c3 100644 --- a/follower-read.md +++ b/follower-read.md @@ -5,15 +5,36 @@ summary: This document describes the use and implementation of Follower Read. # Follower Read -When a read hotspot appears in a Region, the Region leader can become a read bottleneck for the entire system. In this situation, enabling the Follower Read feature can significantly reduce the load of the leader, and improve the throughput of the whole system by balancing the load among multiple followers. This document introduces the use and implementation mechanism of Follower Read. +In TiDB, to ensure high availability and data safety, TiKV stores multiple replicas for each Region, one of which is the leader and the others are followers. By default, all read and write requests are processed by the leader. The Follower Read feature enables TiDB to read data from follower replicas of a Region while maintaining strong consistency, thereby reducing the read workload on the leader and improving the overall read throughput of the cluster. -## Overview + + +When performing Follower Read, TiDB selects an appropriate replica based on the topology information. Specifically, TiDB uses the `zone` label to identify local replicas: if the `zone` label of a TiDB node is the same as that of the target TiKV node, TiDB considers the replica as a local replica. For more information, see [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md). + + + + + +When performing Follower Read, TiDB selects an appropriate replica based on the topology information. Specifically, TiDB uses the `zone` label to identify local replicas: if the `zone` label of a TiDB node is the same as that of the target TiKV node, TiDB considers the replica as a local replica. The `zone` label is set automatically in TiDB Cloud. + + -The Follower Read feature refers to using any follower replica of a Region to serve a read request under the premise of strongly consistent reads. This feature improves the throughput of the TiDB cluster and reduces the load of the leader. It contains a series of load balancing mechanisms that offload TiKV read loads from the leader replica to the follower replica in a Region. TiKV's Follower Read implementation provides users with strongly consistent reads. +By enabling followers to handle read requests, Follower Read achieves the following goals: + +- Distribute read hotspots and reduce the leader workload. +- Prioritize local replica reads in multi-AZ or multi-datacenter deployments to minimize cross-AZ traffic. + +## Usage scenarios + +Follower Read is suitable for the following scenarios: + +- Applications with heavy read requests or significant read hotspots. +- Multi-AZ deployments where you want to prioritize reading from local replicas to reduce cross-AZ bandwidth usage. +- Read-write separation architectures that you want to further improve overall read performance. > **Note:** > -> To achieve strongly consistent reads, the follower node currently needs to request the current execution progress from the leader node (that is `ReadIndex`), which causes an additional network request overhead. Therefore, the main benefits of Follower Read are to isolate read requests from write requests in the cluster and to increase overall read throughput. +> To ensure strong consistency of the read results, Follower Read communicates with the leader before reading to confirm the latest commit progress (by executing the Raft `ReadIndex` operation). This introduces an additional network interaction. Therefore, Follower Read is most effective where a large number of read requests exist or read-write isolation is required. However, for low-latency single queries, the performance improvement might not be significant. ## Usage @@ -29,7 +50,24 @@ Scope: SESSION | GLOBAL Default: leader -This variable is used to set the expected data read mode. +This variable defines the expected data read mode. Starting from v8.5.4, this variable only takes effect on read-only SQL statements. + +In scenarios where you need to reduce cross-AZ traffic by reading from local replicas, the following configurations are recommended: + +- `leader`: the default value, providing the best performance. +- `closest-adaptive`: minimizes cross-AZ traffic while keeping performance loss to a minimum. +- `closest-replicas`: maximizes cross-AZ traffic savings but might cause some performance degradation. + +If you are using other configurations, refer to the following table to modify them to the recommended configurations: + +| Current configuration | Recommended configuration | +| ------------- | ------------- | +| `follower` | `closest-replicas` | +| `leader-and-follower` | `closest-replicas` | +| `prefer-leader` | `closest-adaptive` | +| `learner` | `closest-replicas` | + +If you want to use a more precise read replica selection policy, refer to the full list of available configurations as follows: - When you set the value of `tidb_replica_read` to `leader` or an empty string, TiDB maintains its default behavior and sends all read operations to the leader replica to perform. - When you set the value of `tidb_replica_read` to `follower`, TiDB selects a follower replica of the Region to perform read operations. If the Region has learner replicas, TiDB also considers them for reads with the same priority. If no available follower or learner replicas exist for the current Region, TiDB reads from the leader replica. @@ -56,18 +94,46 @@ This variable is used to set the expected data read mode. + + +## Basic monitoring + +You can check the [**TiDB** > **KV Request** > **Read Req Traffic** panel (New in v8.5.4)](/grafana-tidb-dashboard.md#kv-request) to determine whether to enable Follower Read and observe the traffic reduction effect after enabling it. + + + ## Implementation mechanism -Before the Follower Read feature was introduced, TiDB applied the strong leader principle and submitted all read and write requests to the leader node of a Region to handle. Although TiKV can distribute Regions evenly on multiple physical nodes, for each Region, only the leader can provide external services. The other followers can do nothing to handle read requests but receive the data replicated from the leader at all times and prepare for voting to elect a leader in case of a failover. +Before the Follower Read feature was introduced, TiDB applied the strong leader principle and submitted all read and write requests to the leader node of a Region to handle. Although TiKV can distribute Regions evenly on multiple physical nodes, for each Region, only the leader can provide external services. The other followers cannot handle read requests, and they only receive the data replicated from the leader at all times and prepare for voting to elect a leader in case of a failover. -To allow data reading in the follower node without violating linearizability or affecting Snapshot Isolation in TiDB, the follower node needs to use `ReadIndex` of the Raft protocol to ensure that the read request can read the latest data that has been committed on the leader. At the TiDB level, the Follower Read feature simply needs to send the read request of a Region to a follower replica based on the load balancing policy. +Follower Read includes a set of load balancing mechanisms that offload TiKV read requests from the leader replica to a follower replica in a Region. To allow data reading from the follower node without violating linearizability or affecting Snapshot Isolation in TiDB, the follower node needs to use `ReadIndex` of the Raft protocol to ensure that the read request can read the latest data that has been committed on the leader node. At the TiDB level, the Follower Read feature simply needs to send the read request of a Region to a follower replica based on the load balancing policy. ### Strongly consistent reads When the follower node processes a read request, it first uses `ReadIndex` of the Raft protocol to interact with the leader of the Region, to obtain the latest commit index of the current Raft group. After the latest commit index of the leader is applied locally to the follower, the processing of a read request starts. +![read-index-flow](/media/follower-read/read-index.png) + ### Follower replica selection strategy -Because the Follower Read feature does not affect TiDB's Snapshot Isolation transaction isolation level, TiDB adopts the round-robin strategy to select the follower replica. Currently, for the coprocessor requests, the granularity of the Follower Read load balancing policy is at the connection level. For a TiDB client connected to a specific Region, the selected follower is fixed, and is switched only when it fails or the scheduling policy is adjusted. +The Follower Read feature does not affect TiDB's Snapshot Isolation transaction isolation level. TiDB selects a replica based on the `tidb_replica_read` configuration for the first read attempt. From the second retry onward, TiDB prioritizes ensuring successful reads. Therefore, when the selected follower node becomes inaccessible or has other errors, TiDB switches to the leader for service. + +#### `leader` + +- Always selects the leader replica for reads, regardless of its location. + +#### `closest-replicas` + +- When the replica in the same AZ as TiDB is the leader node, TiDB does not perform Follower Read from it. +- When the replica in the same AZ as TiDB is a follower node, TiDB performs Follower Read from it. + +#### `closest-adaptive` + +- If the estimated result is not large enough, TiDB uses the `leader` policy and does not perform Follower Read. +- If the estimated result is large enough, TiDB uses the `closest-replicas` policy. + +### Follower Read performance overhead + +To ensure strong data consistency, Follower Read performs a `ReadIndex` operation regardless of how much data is read, which inevitably consumes additional TiKV CPU resources. Therefore, in small-query scenarios (such as point queries), the performance loss of Follower Read is relatively more obvious. Moreover, because the traffic reduced by local reads for small queries is limited, Follower Read is more recommended for large queries or batch reading scenarios. -However, for the non-coprocessor requests, such as a point query, the granularity of the Follower Read load balancing policy is at the transaction level. For a TiDB transaction on a specific Region, the selected follower is fixed, and is switched only when it fails or the scheduling policy is adjusted. If a transaction contains both point queries and coprocessor requests, the two types of requests are scheduled for reading separately according to the preceding scheduling policy. In this case, even if a coprocessor request and a point query are for the same Region, TiDB processes them as independent events. +When `tidb_replica_read` is set to `closest-adaptive`, TiDB does not perform Follower Read for small queries. As a result, under various workloads, the additional CPU overhead on TiKV is typically no more than 10% compared with the `leader` policy. diff --git a/grafana-tidb-dashboard.md b/grafana-tidb-dashboard.md index 73d0026926d3e..b7526e1d954ad 100644 --- a/grafana-tidb-dashboard.md +++ b/grafana-tidb-dashboard.md @@ -123,9 +123,14 @@ The following metrics relate to requests sent to TiKV. Retry requests are counte - **local**: the number of requests per second that attempt a stale read in the local zone - Stale Read Req Traffic: - **cross-zone-in**: the incoming traffic of responses to requests that attempt a stale read in a remote zone - - **cross-zone-out**: the outgoing traffic of requests that attempt a stale read in a remote zone + - **cross-zone-out**: the outgoing traffic of responses to requests that attempt a stale read in a remote zone - **local-in**: the incoming traffic of responses to requests that attempt a stale read in the local zone - **local-out**: the outgoing traffic of requests that attempt a stale read in the local zone +- Read Req Traffic + - **leader-local**: traffic generated by Leader Read processing read requests in the local zone + - **leader-cross-zone**: traffic generated by Leader Read processing read requests in a remote zone + - **follower-local**: traffic generated by Follower Read processing read requests in the local zone + - **follower-cross-zone**: traffic generated by Follower Read processing read requests in a remote zone ### PD Client diff --git a/media/follower-read/read-index.png b/media/follower-read/read-index.png new file mode 100644 index 0000000000000..b20a7047f905a Binary files /dev/null and b/media/follower-read/read-index.png differ diff --git a/partitioned-table.md b/partitioned-table.md index 5bc8b7f31522e..3782ac0ab9f94 100644 --- a/partitioned-table.md +++ b/partitioned-table.md @@ -1699,13 +1699,13 @@ CREATE TABLE t (a varchar(20), b blob, ERROR 8264 (HY000): Global Index is needed for index 'a', since the unique index is not including all partitioning columns, and GLOBAL is not given as IndexOption ``` -#### Global indexes +### Global indexes Before the introduction of global indexes, TiDB created a local index for each partition, leading to [a limitation](#partitioning-keys-primary-keys-and-unique-keys) that primary keys and unique keys had to include the partition key to ensure data uniqueness. Additionally, when querying data across multiple partitions, TiDB needed to scan the data of each partition to return results. -To address these issues, TiDB introduces the global indexes feature in v8.3.0. A global index covers the data of the entire table with a single index, allowing primary keys and unique keys to maintain global uniqueness without including all partition keys. Moreover, global indexes can access index data across multiple partitions in a single operation, significantly improving query performance for non-partitioned keys instead of looking up in one local index for each partition. +To address these issues, TiDB introduces the global indexes feature in v8.3.0. A global index covers the data of the entire table with a single index, allowing primary keys and unique keys to maintain global uniqueness without including all partition keys. Moreover, global indexes can access index data across multiple partitions in a single operation instead of looking up the local index for each partition, significantly improving query performance for non-partitioned keys. Starting from v8.5.4, non-unique indexes can also be created as global indexes. -To create a global index for a primary key or unique key, you can add the `GLOBAL` keyword in the index definition. +To create a global index, you can add the `GLOBAL` keyword in the index definition. > **Note:** > @@ -1718,13 +1718,14 @@ CREATE TABLE t1 ( col3 INT NOT NULL, col4 INT NOT NULL, UNIQUE KEY uidx12(col1, col2) GLOBAL, - UNIQUE KEY uidx3(col3) + UNIQUE KEY uidx3(col3), + KEY idx1(col1) GLOBAL ) PARTITION BY HASH(col3) PARTITIONS 4; ``` -In the preceding example, the unique index `uidx12` is a global index, while `uidx3` is a regular unique index. +In the preceding example, the unique index `uidx12` and non-unique index `idx1` are global indexes, while `uidx3` is a regular unique index. Note that a **clustered index** cannot be a global index, as shown in the following example: @@ -1756,7 +1757,8 @@ Create Table: CREATE TABLE `t1` ( `col3` int NOT NULL, `col4` int NOT NULL, UNIQUE KEY `uidx12` (`col1`,`col2`) /*T![global_index] GLOBAL */, - UNIQUE KEY `uidx3` (`col3`) + UNIQUE KEY `uidx3` (`col3`), + KEY `idx1` (`col1`) /*T![global_index] GLOBAL */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin PARTITION BY HASH (`col3`) PARTITIONS 4 1 row in set (0.00 sec) @@ -1775,26 +1777,23 @@ SELECT * FROM INFORMATION_SCHEMA.TIDB_INDEXES WHERE table_name='t1'; | test | t1 | 0 | uidx12 | 1 | col1 | NULL | | NULL | 1 | YES | NO | 1 | | test | t1 | 0 | uidx12 | 2 | col2 | NULL | | NULL | 1 | YES | NO | 1 | | test | t1 | 0 | uidx3 | 1 | col3 | NULL | | NULL | 2 | YES | NO | 0 | +| test | t1 | 1 | idx1 | 1 | col1 | NULL | | NULL | 3 | YES | NO | 1 | +--------------+------------+------------+----------+--------------+-------------+----------+---------------+------------+----------+------------+-----------+-----------+ 3 rows in set (0.00 sec) ``` -When partitioning a non-partitioned table or repartitioning an already partitioned table, you can update the indexes to be global indexes or revert them to local indexes as needed: +When partitioning a non-partitioned table or repartitioning an already partitioned table, you can update the indexes to be global indexes or local indexes as needed. + +For example, the following SQL statement repartitions table `t1` based on the `col1` column, updates the global indexes `uidx12` and `idx1` to local indexes, and updates the local index `uidx3` to a global index. Because `uidx3` is a unique index on the `col3` column, it must be a global index to ensure the uniqueness of `col3` across all partitions. `uidx12` and `idx1` are indexes on the `col1` column, which means they can be either global or local indexes. ```sql -ALTER TABLE t1 PARTITION BY HASH (col1) PARTITIONS 3 UPDATE INDEXES (uidx12 LOCAL, uidx3 GLOBAL); +ALTER TABLE t1 PARTITION BY HASH (col1) PARTITIONS 3 UPDATE INDEXES (uidx12 LOCAL, uidx3 GLOBAL, idx1 LOCAL); ``` -##### Limitations of global indexes +#### Limitations of global indexes - If the `GLOBAL` keyword is not explicitly specified in the index definition, TiDB creates a local index by default. - The `GLOBAL` and `LOCAL` keywords only apply to partitioned tables and do not affect non-partitioned tables. In other words, there is no difference between a global index and a local index in non-partitioned tables. -- Currently, TiDB only supports creating unique global indexes on unique columns. If you need to create a global index on a non-unique column, you can include a primary key in the global index to create a composite index. For example, if the non-unique column is `col3` and the primary key is `col1`, you can use the following statement to create a global index on the non-unique column `col3`: - - ```sql - ALTER TABLE ... ADD UNIQUE INDEX(col3, col1) GLOBAL; - ``` - - DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION` also trigger updates to global indexes. These DDL operations need to wait for the global index updates to complete before returning results, which increases the execution time accordingly. This is particularly evident in data archiving scenarios, such as `DROP PARTITION` and `TRUNCATE PARTITION`. Without global indexes, these operations can typically complete immediately. However, with global indexes, the execution time increases as the number of indexes that need to be updated grows. - Tables with global indexes do not support the `EXCHANGE PARTITION` operation. - By default, the primary key of a partitioned table is a clustered index and must include the partition key. If you require the primary key to exclude the partition key, you can explicitly specify the primary key as a non-clustered global index when creating the table, for example, `PRIMARY KEY(col1, col2) NONCLUSTERED GLOBAL`. @@ -1928,7 +1927,7 @@ select * from t; 5 rows in set (0.00 sec) ``` -### Dynamic pruning mode +## Dynamic pruning mode TiDB accesses partitioned tables in either `dynamic` or `static` mode. `dynamic` mode is used by default since v6.3.0. However, dynamic partitioning is effective only after the full table-level statistics, or global statistics, are collected. If you enable the `dynamic` pruning mode before global statistics collection is completed, TiDB remains in the `static` mode until global statistics are fully collected. For detailed information about global statistics, see [Collect statistics of partitioned tables in dynamic pruning mode](/statistics.md#collect-statistics-of-partitioned-tables-in-dynamic-pruning-mode). @@ -2133,7 +2132,7 @@ From example 2, you can see that in `dynamic` mode, the execution plan with Inde Currently, `static` pruning mode does not support plan cache for both prepared and non-prepared statements. -#### Update statistics of partitioned tables in dynamic pruning mode +### Update statistics of partitioned tables in dynamic pruning mode 1. Locate all partitioned tables: diff --git a/sql-statements/sql-statement-cancel-distribution-job.md b/sql-statements/sql-statement-cancel-distribution-job.md new file mode 100644 index 0000000000000..5cc3f28161b77 --- /dev/null +++ b/sql-statements/sql-statement-cancel-distribution-job.md @@ -0,0 +1,46 @@ +--- +title: CANCEL DISTRIBUTION JOB +summary: An overview of the usage of CANCEL DISTRIBUTION JOB in TiDB. +--- + +# CANCEL DISTRIBUTION JOB New in v8.5.4 + +The `CANCEL DISTRIBUTION JOB` statement is used to cancel a Region scheduling task created using the [`DISTRIBUTE TABLE`](/sql-statements/sql-statement-distribute-table.md) statement in TiDB. + + + +> **Note:** +> +> This feature is not available on [{{{ .starter }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-serverless) and [{{{ .essential }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#essential) clusters. + + + +## Synopsis + +```ebnf+diagram +CancelDistributionJobsStmt ::= + 'CANCEL' 'DISTRIBUTION' 'JOB' JobID +``` + +## Examples + +The following example cancels the distribution job with ID `1`: + +```sql +CANCEL DISTRIBUTION JOB 1; +``` + +The output is as follows: + +``` +Query OK, 0 rows affected (0.01 sec) +``` + +## MySQL compatibility + +This statement is a TiDB extension to MySQL syntax. + +## See also + +* [`DISTRIBUTE TABLE`](/sql-statements/sql-statement-distribute-table.md) +* [`SHOW DISTRIBUTION JOBS`](/sql-statements/sql-statement-show-distribution-jobs.md) \ No newline at end of file diff --git a/sql-statements/sql-statement-distribute-table.md b/sql-statements/sql-statement-distribute-table.md new file mode 100644 index 0000000000000..89ff8803eb34a --- /dev/null +++ b/sql-statements/sql-statement-distribute-table.md @@ -0,0 +1,128 @@ +--- +title: DISTRIBUTE TABLE +summary: An overview of the usage of DISTRIBUTE TABLE for the TiDB database. +--- + +# DISTRIBUTE TABLE New in v8.5.4 + +> **Warning:** +> +> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + + + +> **Note:** +> +> This feature is not available on [{{{ .starter }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-serverless) and [{{{ .essential }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#essential) clusters. + + + +The `DISTRIBUTE TABLE` statement redistributes and reschedules Regions of a specified table to achieve a balanced distribution at the table level. Executing this statement helps prevent Regions from being concentrated on a few TiFlash or TiKV nodes, addressing the issue of uneven region distribution in the table. + +## Synopsis + +```ebnf+diagram +DistributeTableStmt ::= + "DISTRIBUTE" "TABLE" TableName PartitionNameListOpt "RULE" EqOrAssignmentEq Identifier "ENGINE" EqOrAssignmentEq Identifier "TIMEOUT" EqOrAssignmentEq Identifier + +TableName ::= + (SchemaName ".")? Identifier + +PartitionNameList ::= + "PARTITION" "(" PartitionName ("," PartitionName)* ")" +``` + +## Parameter description + +When redistributing Regions in a table using the `DISTRIBUTE TABLE` statement, you can specify the storage engine (such as TiFlash or TiKV) and different Raft roles (such as Leader, Learner, or Voter) for balanced distribution. + +- `RULE`: specifies which Raft role's Region to balance and schedule. Optional values are `"leader-scatter"`, `"peer-scatter"`, and `"learner-scatter"`. +- `ENGINE`: specifies the storage engine. Optional values are `"tikv"` and `"tiflash"`. +- `TIMEOUT`: specifies the timeout limit for the scatter operation. If PD does not complete the scatter within this time, the scatter task will automatically exit. When this parameter is not specified, the default value is `"30m"`. + +## Examples + +Redistribute the Regions of the Leaders in the table `t1` on TiKV: + +```sql +CREATE TABLE t1 (a INT); +... +DISTRIBUTE TABLE t1 RULE = "leader-scatter" ENGINE = "tikv" TIMEOUT = "1h"; +``` + +``` ++--------+ +| JOB_ID | ++--------+ +| 100 | ++--------+ +``` + +Redistribute the Regions of the Learners in the table `t2` on TiFlash: + +```sql +CREATE TABLE t2 (a INT); +... +DISTRIBUTE TABLE t2 RULE = "learner-scatter" ENGINE = "tiflash"; +``` + +``` ++--------+ +| JOB_ID | ++--------+ +| 101 | ++--------+ +``` + +Redistribute the Regions of the Peers in the table `t3`'s `p1` and `p2` partitions on TiKV: + +```sql +CREATE TABLE t3 ( a INT, b INT, INDEX idx(b)) PARTITION BY RANGE( a ) ( + PARTITION p1 VALUES LESS THAN (10000), + PARTITION p2 VALUES LESS THAN (20000), + PARTITION p3 VALUES LESS THAN (MAXVALUE) ); +... +DISTRIBUTE TABLE t3 PARTITION (p1, p2) RULE = "peer-scatter" ENGINE = "tikv"; +``` + +``` ++--------+ +| JOB_ID | ++--------+ +| 102 | ++--------+ +``` + +Redistribute the Regions of the Learner in the table `t4`'s `p1` and `p2` partitions on TiFlash: + +```sql +CREATE TABLE t4 ( a INT, b INT, INDEX idx(b)) PARTITION BY RANGE( a ) ( + PARTITION p1 VALUES LESS THAN (10000), + PARTITION p2 VALUES LESS THAN (20000), + PARTITION p3 VALUES LESS THAN (MAXVALUE) ); +... +DISTRIBUTE TABLE t4 PARTITION (p1, p2) RULE = "learner-scatter" ENGINE="tiflash"; +``` + +``` ++--------+ +| JOB_ID | ++--------+ +| 103 | ++--------+ +``` + +## Notes + +When you execute the `DISTRIBUTE TABLE` statement to redistribute Regions of a table, the Region distribution result might be affected by the PD hotspot scheduler. After the redistribution, the Region distribution of this table might become imbalanced again over time. + +## MySQL compatibility + +This statement is a TiDB extension to MySQL syntax. + +## See also + +- [`SHOW DISTRIBUTION JOBS`](/sql-statements/sql-statement-show-distribution-jobs.md) +- [`SHOW TABLE DISTRIBUTION`](/sql-statements/sql-statement-show-table-distribution.md) +- [`SHOW TABLE REGIONS`](/sql-statements/sql-statement-show-table-regions.md) +- [`CANCEL DISTRIBUTION JOB`](/sql-statements/sql-statement-cancel-distribution-job.md) \ No newline at end of file diff --git a/sql-statements/sql-statement-show-distribution-jobs.md b/sql-statements/sql-statement-show-distribution-jobs.md new file mode 100644 index 0000000000000..f7d1ec48f0ad0 --- /dev/null +++ b/sql-statements/sql-statement-show-distribution-jobs.md @@ -0,0 +1,51 @@ +--- +title: SHOW DISTRIBUTION JOBS +summary: An overview of the usage of SHOW DISTRIBUTION JOBS for the TiDB database. +--- + +# SHOW DISTRIBUTION JOBS New in v8.5.4 + +The `SHOW DISTRIBUTION JOBS` statement shows all current Region distribution jobs. + + + +> **Note:** +> +> This feature is not available on [{{{ .starter }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-serverless) and [{{{ .essential }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#essential) clusters. + + + +## Synopsis + +```ebnf+diagram +ShowDistributionJobsStmt ::= + "SHOW" "DISTRIBUTION" "JOBS" +``` + +## Examples + +Show all current Region distribution jobs: + +```sql +SHOW DISTRIBUTION JOBS; +``` + +``` ++--------+----------+-------+----------------+--------+----------------+-----------+---------------------+---------------------+---------------------+ +| Job_ID | Database | Table | Partition_List | Engine | Rule | Status | Create_Time | Start_Time | Finish_Time | ++--------+----------+-------+----------------+--------+----------------+-----------+---------------------+---------------------+---------------------+ +| 100 | test | t1 | NULL | tikv | leader-scatter | finished | 2025-04-24 16:09:55 | 2025-04-24 16:09:55 | 2025-04-24 17:09:59 | +| 101 | test | t2 | NULL | tikv | learner-scatter| cancelled | 2025-05-08 15:33:29 | 2025-05-08 15:33:29 | 2025-05-08 15:33:37 | +| 102 | test | t5 | p1,p2 | tikv | peer-scatter | cancelled | 2025-05-21 15:32:44 | 2025-05-21 15:32:47 | 2025-05-21 15:32:47 | ++--------+----------+-------+----------------+--------+----------------+-----------+---------------------+---------------------+---------------------+ +``` + +## MySQL compatibility + +This statement is a TiDB extension to MySQL syntax. + +## See also + +- [`DISTRIBUTE TABLE`](/sql-statements/sql-statement-distribute-table.md) +- [`SHOW TABLE DISTRIBUTION`](/sql-statements/sql-statement-show-table-distribution.md) +- [`CANCEL DISTRIBUTION JOB`](/sql-statements/sql-statement-cancel-distribution-job.md) \ No newline at end of file diff --git a/sql-statements/sql-statement-show-table-distribution.md b/sql-statements/sql-statement-show-table-distribution.md new file mode 100644 index 0000000000000..489a2194b71f2 --- /dev/null +++ b/sql-statements/sql-statement-show-table-distribution.md @@ -0,0 +1,69 @@ +--- +title: SHOW TABLE DISTRIBUTION +summary: An overview of the usage of SHOW TABLE DISTRIBUTION for the TiDB database. +--- + +# SHOW TABLE DISTRIBUTION New in v8.5.4 + +The `SHOW TABLE DISTRIBUTION` statement shows the Region distribution information for a specified table. + + + +> **Note:** +> +> This feature is not available on [{{{ .starter }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-cloud-serverless) and [{{{ .essential }}}](https://docs.pingcap.com/tidbcloud/select-cluster-tier#essential) clusters. + + + +## Synopsis + +```ebnf+diagram +ShowTableDistributionStmt ::= + "SHOW" "TABLE" TableName "DISTRIBUTIONS" + +TableName ::= + (SchemaName ".")? Identifier +``` + +## Examples + +Show the Region distribution of the table `t`: + +```sql +CREATE TABLE `t` ( + `a` int DEFAULT NULL, + `b` int DEFAULT NULL, + KEY `idx` (`b`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +PARTITION BY RANGE (`a`) +(PARTITION `p1` VALUES LESS THAN (10000), + PARTITION `p2` VALUES LESS THAN (MAXVALUE)); +SHOW TABLE t DISTRIBUTIONS; +``` + +``` ++----------------+----------+------------+---------------------+-------------------+--------------------+-------------------+--------------------+--------------------------+-------------------------+--------------------------+------------------------+-----------------------+------------------------+ +| PARTITION_NAME | STORE_ID | STORE_TYPE | REGION_LEADER_COUNT | REGION_PEER_COUNT | REGION_WRITE_BYTES | REGION_WRITE_KEYS | REGION_WRITE_QUERY | REGION_LEADER_READ_BYTES | REGION_LEADER_READ_KEYS | REGION_LEADER_READ_QUERY | REGION_PEER_READ_BYTES | REGION_PEER_READ_KEYS | REGION_PEER_READ_QUERY | ++----------------+----------+------------+---------------------+-------------------+--------------------+-------------------+--------------------+--------------------------+-------------------------+--------------------------+------------------------+-----------------------+------------------------+ +| p1 | 1 | tikv | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p1 | 15 | tikv | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p1 | 4 | tikv | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p1 | 5 | tikv | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p1 | 6 | tikv | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p2 | 1 | tikv | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p2 | 15 | tikv | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p2 | 4 | tikv | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p2 | 5 | tikv | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| p2 | 6 | tikv | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ++----------------+----------+------------+---------------------+-------------------+--------------------+-------------------+--------------------+--------------------------+-------------------------+--------------------------+------------------------+-----------------------+------------------------+ +``` + +## MySQL compatibility + +This statement is a TiDB extension to MySQL syntax. + +## See also + +- [`DISTRIBUTE TABLE`](/sql-statements/sql-statement-distribute-table.md) +- [`SHOW DISTRIBUTION JOBS`](/sql-statements/sql-statement-show-distribution-jobs.md) +- [`CANCEL DISTRIBUTION JOB`](/sql-statements/sql-statement-cancel-distribution-job.md) \ No newline at end of file diff --git a/system-variables.md b/system-variables.md index 0d55b2a0abf83..0d4fb3573afbb 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1644,6 +1644,14 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; +### tidb_stats_update_during_ddl New in v8.5.4 + +- Scope: GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Default value: `OFF` +- This variable controls whether to enable DDL-embedded `ANALYZE`. When enabled, DDL statements that create new indexes ([`ADD INDEX`](/sql-statements/sql-statement-add-index.md)) or reorganize existing indexes ([`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md)) automatically collect statistics before the index becomes visible. For more information, see [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md). + ### tidb_enable_dist_task New in v7.1.0 - Scope: GLOBAL @@ -4113,8 +4121,8 @@ As shown in this diagram, when [`tidb_enable_paging`](#tidb_enable_paging-new-in - Persists to cluster: Yes - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No - Type: Duration -- Default value: `60s` -- The newly started TiFlash node does not provide services. To prevent queries from failing, TiDB limits the tidb-server sending queries to the newly started TiFlash node. This variable indicates the time range in which the newly started TiFlash node is not sent requests. +- Default value: `0s`. In v8.5.3 and earlier versions, the default value is `60s`. +- The newly started TiFlash node does not provide services. To prevent queries from failing, TiDB limits the tidb-server from sending queries to the newly started TiFlash node. This variable indicates the time range in which the newly started TiFlash node is not sent requests. ### tidb_multi_statement_mode New in v4.0.11 @@ -4357,6 +4365,24 @@ mysql> desc select count(distinct a) from test.t; - Default value: `OFF` - This variable controls whether to enable the [Cross-database binding](/sql-plan-management.md#cross-database-binding) feature. +### tidb_opt_enable_no_decorrelate_in_select New in v8.5.4 + +- Scope: SESSION | GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes +- Type: Boolean +- Default value: `OFF` +- This variable controls whether the optimizer applies the [`NO_DECORRELATE()`](/optimizer-hints.md#no_decorrelate) hint for all queries that contain a subquery in the `SELECT` list. + +### tidb_opt_enable_semi_join_rewrite New in v8.5.4 + +- Scope: SESSION | GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Type: Boolean +- Default value: `OFF` +- This variable controls whether the optimizer applies the [`SEMI_JOIN_REWRITE()`](/optimizer-hints.md#semi_join_rewrite) hint for all queries that contain subqueries. + ### tidb_opt_fix_control New in v6.5.3 and v7.1.0 @@ -5435,7 +5461,7 @@ SHOW WARNINGS; - Type: Enumeration - Default value: `leader` - Possible values: `leader`, `follower`, `leader-and-follower`, `prefer-leader`, `closest-replicas`, `closest-adaptive`, and `learner`. The `learner` value is introduced in v6.6.0. -- This variable is used to control where TiDB reads data. +- This variable is used to control where TiDB reads data. Starting from v8.5.4, this variable only takes effect on read-only SQL statements. - For more details about usage and implementation, see [Follower read](/follower-read.md). ### tidb_restricted_read_only New in v5.2.0 diff --git a/ticdc/monitor-ticdc.md b/ticdc/monitor-ticdc.md index 7f22a0a5994ff..fadd9e0d7fa78 100644 --- a/ticdc/monitor-ticdc.md +++ b/ticdc/monitor-ticdc.md @@ -15,7 +15,9 @@ cdc cli changefeed create --server=http://10.0.10.25:8300 --sink-uri="mysql://ro ## Metrics for TiCDC in the new architecture -The monitoring dashboard **TiCDC-New-Arch** for [TiCDC New Architecture](/ticdc/ticdc-architecture.md) is not managed by TiUP yet. To view the related monitoring data on Grafana, you need to manually import the TiCDC monitoring metrics file: +The monitoring dashboard for [TiCDC in the new architecture](/ticdc/ticdc-architecture.md) is **TiCDC-New-Arch**. For TiDB clusters of v8.5.4 and later versions, this monitoring dashboard is integrated into Grafana during cluster deployment or upgrade, so no manual operation is required. + +If your cluster version is earlier than v8.5.4, you need to manually import the TiCDC monitoring metrics file: 1. Download the monitoring metrics file for TiCDC in the new architecture: diff --git a/ticdc/ticdc-architecture.md b/ticdc/ticdc-architecture.md index 6dd0a66b7b5da..7eb5f15e41f95 100644 --- a/ticdc/ticdc-architecture.md +++ b/ticdc/ticdc-architecture.md @@ -5,7 +5,9 @@ summary: Introduces the features, architectural design, deployment guide, and no # TiCDC New Architecture -Starting from [TiCDC v8.5.4-release.1](https://github.com/pingcap/ticdc/releases/tag/v8.5.4-release.1), TiCDC introduces a new architecture that improves the performance, scalability, and stability of real-time data replication while reducing resource costs. This new architecture redesigns TiCDC core components and optimizes its data processing workflows, offering the following advantages: +Starting from [TiCDC v8.5.4-release.1](https://github.com/pingcap/ticdc/releases/tag/v8.5.4-release.1), TiCDC introduces a new architecture that improves the performance, scalability, and stability of real-time data replication while reducing resource costs. + +This new architecture redesigns TiCDC core components and optimizes its data processing workflows, while maintaining compatibility with the configuration, usage, and APIs of the [classic TiCDC architecture](/ticdc/ticdc-classic-architecture.md). It offers the following advantages: - **Higher single-node performance**: a single node can replicate up to 500,000 tables, achieving replication throughput of up to 190 MiB/s on a single node in wide table scenarios. - **Enhanced scalability**: cluster replication capability scales almost linearly. A single cluster can expand to over 100 nodes, support more than 10,000 changefeeds, and replicate millions of tables within a single changefeed. @@ -102,20 +104,61 @@ In addition, the new TiCDC architecture currently does not support splitting lar ## Upgrade guide -The TiCDC new architecture can only be deployed in TiDB clusters of v7.5.0 or later versions. Before deployment, make sure your TiDB cluster meets this version requirement. +TiCDC in the new architecture can only be deployed in TiDB clusters of v7.5.0 or later versions. Before deployment, make sure your TiDB cluster meets this requirement. + +You can deploy TiCDC nodes in the new architecture using TiUP or TiDB Operator. + +### Deploy a new TiDB cluster with TiCDC nodes in the new architecture + + +
+ +When deploying a new TiDB cluster of v8.5.4 or later using TiUP, you can also deploy TiCDC nodes in the new architecture at the same time. To do so, you only need to add the TiCDC-related section and set `newarch: true` in the configuration file that TiUP uses to start the TiDB cluster. The following is an example: + +```yaml +cdc_servers: + - host: 10.0.1.20 + config: + newarch: true + - host: 10.0.1.21 + config: + newarch: true +``` + +For more TiCDC deployment information, see [Deploy a new TiDB cluster that includes TiCDC using TiUP](/ticdc/deploy-ticdc.md#deploy-a-new-tidb-cluster-that-includes-ticdc-using-tiup). + +
+
+ +When deploying a new TiDB cluster of v8.5.4 or later using TiDB Operator, you can also deploy TiCDC nodes in the new architecture at the same time. To do so, you only need to add the TiCDC-related section and set `newarch = true` in the cluster configuration file. The following is an example: + +```yaml +spec: + ticdc: + baseImage: pingcap/ticdc + version: v8.5.4 + replicas: 3 + config: + newarch = true +``` + +For more TiCDC deployment information, see [Fresh TiCDC deployment](https://docs.pingcap.com/tidb-in-kubernetes/stable/deploy-ticdc/#fresh-ticdc-deployment). + +
+
-You can deploy the TiCDC new architecture using TiUP or TiDB Operator. +### Deploy TiCDC nodes in the new architecture in an existing TiDB cluster
-To deploy the TiCDC new architecture using TiUP, take the following steps: +To deploy TiCDC nodes in the new architecture using TiUP, take the following steps: 1. If your TiDB cluster does not have TiCDC nodes yet, refer to [Scale out a TiCDC cluster](/scale-tidb-using-tiup.md#scale-out-a-ticdc-cluster) to add new TiCDC nodes in the cluster. Otherwise, skip this step. -2. Download the TiCDC binary package for the new architecture. +2. If your TiDB cluster version is earlier than v8.5.4, you need to manually download the TiCDC binary package of the new architecture, and then patch the downloaded file to your TiDB cluster. Otherwise, skip this step. - The download link follows this format: `https://tiup-mirrors.pingcap.com/cdc-${version}-${os}-${arch}.tar.gz`, where `${version}` is the TiCDC version, `${os}` is your operating system, and `${arch}` is the platform the component runs on (`amd64` or `arm64`). + The download link follows this format: `https://tiup-mirrors.pingcap.com/cdc-${version}-${os}-${arch}.tar.gz`, where `${version}` is the TiCDC version (see [TiCDC releases for the new architecture](https://github.com/pingcap/ticdc/releases) for available versions), `${os}` is your operating system, and `${arch}` is the platform the component runs on (`amd64` or `arm64`). For example, to download the binary package of TiCDC v8.5.4-release.1 for Linux (x86-64), run the following command: @@ -158,9 +201,9 @@ To deploy the TiCDC new architecture using TiUP, take the following steps:
-To deploy the TiCDC new architecture using TiDB Operator, take the following steps: +To deploy TiCDC nodes in the new architecture in an existing TiDB cluster using TiDB Operator, take the following steps: -- If your TiDB cluster does not include a TiCDC component, refer to [Add TiCDC to an existing TiDB cluster](https://docs.pingcap.com/tidb-in-kubernetes/stable/deploy-ticdc/#add-ticdc-to-an-existing-tidb-cluster) to add new TiCDC nodes. When doing so, specify the TiCDC image version as the new architecture version in the cluster configuration file. +- If your TiDB cluster does not include a TiCDC component, refer to [Add TiCDC to an existing TiDB cluster](https://docs.pingcap.com/tidb-in-kubernetes/stable/deploy-ticdc/#add-ticdc-to-an-existing-tidb-cluster) to add new TiCDC nodes. When doing so, specify the TiCDC image version as the new architecture version in the cluster configuration file. For available versions, see [TiCDC releases for the new architecture](https://github.com/pingcap/ticdc/releases). For example: @@ -205,7 +248,7 @@ To deploy the TiCDC new architecture using TiDB Operator, take the following ste kubectl apply -f ${cluster_name} -n ${namespace} ``` - 3. Resume all replication tasks: + 3. Resume all replication tasks of the changefeeds: ```shell kubectl exec -it ${pod_name} -n ${namespace} -- sh @@ -223,7 +266,7 @@ To deploy the TiCDC new architecture using TiDB Operator, take the following ste After deploying the TiCDC nodes with the new architecture, you can continue using the same commands as in the classic architecture. There is no need to learn new commands or modify the commands used in the classic architecture. -For example, to create a replication task in a new architecture TiCDC node, run the following command: +For example, to create a replication task for a new TiCDC node in the new architecture, run the following command: ```shell cdc cli changefeed create --server=http://127.0.0.1:8300 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" @@ -239,6 +282,6 @@ For more command usage methods and details, see [Manage Changefeeds](/ticdc/ticd ## Monitoring -Currently, the monitoring dashboard **TiCDC-New-Arch** for the TiCDC new architecture is not managed by TiUP yet. To view this dashboard on Grafana, you need to manually import the [TiCDC monitoring metrics file](https://github.com/pingcap/ticdc/blob/master/metrics/grafana/ticdc_new_arch.json). +The monitoring dashboard for TiCDC in the new architecture is **TiCDC-New-Arch**. For TiDB clusters of v8.5.4 and later versions, this monitoring dashboard is integrated into Grafana during cluster deployment or upgrade, so no manual operation is required. If your cluster version is earlier than v8.5.4, you need to manually import the [TiCDC monitoring metrics file](https://github.com/pingcap/ticdc/blob/master/metrics/grafana/ticdc_new_arch.json) to enable monitoring. -For detailed descriptions of each monitoring metric, see [Metrics for TiCDC in the new architecture](/ticdc/monitor-ticdc.md#metrics-for-ticdc-in-the-new-architecture). \ No newline at end of file +For importing steps and detailed descriptions of each monitoring metric, see [Metrics for TiCDC in the new architecture](/ticdc/monitor-ticdc.md#metrics-for-ticdc-in-the-new-architecture). diff --git a/tiflash/tiflash-configuration.md b/tiflash/tiflash-configuration.md index b9b907207e0f1..a0af41cb10dba 100644 --- a/tiflash/tiflash-configuration.md +++ b/tiflash/tiflash-configuration.md @@ -241,6 +241,13 @@ The following configuration items only take effect for the TiFlash disaggregated - This configuration item only takes effect for the TiFlash disaggregated storage and compute architecture mode. For details, see [TiFlash Disaggregated Storage and Compute Architecture and S3 Support](/tiflash/tiflash-disaggregated-and-s3.md). - Value options: `"tiflash_write"`, `"tiflash_compute"` +##### `graceful_wait_shutdown_timeout` New in v8.5.4 + +- Controls the maximum wait time when shutting down a TiFlash server. During this period, TiFlash continues running unfinished MPP tasks but does not accept new ones. If all running MPP tasks finish before this timeout, TiFlash shuts down immediately; otherwise, it is forcibly shut down after the wait time expires. +- Default value: `600` +- Unit: seconds +- While the TiFlash server is waiting to shut down (in the grace period), TiDB will not send new MPP tasks to it. + #### flash.proxy ##### `addr`