From 4d7aa9a5177cda2dd20f946d96001d38a3becbcc Mon Sep 17 00:00:00 2001 From: Ti Chi Robot Date: Fri, 9 Jun 2023 09:56:46 +0800 Subject: [PATCH 1/2] This is an automated cherry-pick of #13816 Signed-off-by: ti-chi-bot --- tiflash/use-tiflash.md | 59 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 58 insertions(+), 1 deletion(-) diff --git a/tiflash/use-tiflash.md b/tiflash/use-tiflash.md index 5a0d8931f6c21..765dad6ef31df 100644 --- a/tiflash/use-tiflash.md +++ b/tiflash/use-tiflash.md @@ -96,7 +96,7 @@ Before TiFlash replicas are added, each TiKV instance performs a full table scan ```sql SET CONFIG tikv `server.snap-max-write-bytes-per-sec` = '300MiB'; ``` - + After adjusting the preceding configurations, you cannot observe the acceleration for now, as the replication speed is still restricted by the PD limit globally. 2. Use [PD Control](https://docs.pingcap.com/tidb/stable/pd-control) to progressively ease the new replica speed limit. @@ -485,6 +485,63 @@ TiFlash provides the following two global/session variables to control whether t - [`tidb_broadcast_join_threshold_size`](/system-variables.md#tidb_broadcast_join_threshold_count-new-in-v50): The unit of the value is bytes. If the table size (in the unit of bytes) is less than the value of the variable, the Broadcast Hash Join algorithm is used. Otherwise, the Shuffled Hash Join algorithm is used. - [`tidb_broadcast_join_threshold_count`](/system-variables.md#tidb_broadcast_join_threshold_count-new-in-v50): The unit of the value is rows. If the objects of the join operation belong to a subquery, the optimizer cannot estimate the size of the subquery result set, so the size is determined by the number of rows in the result set. If the estimated number of rows in the subquery is less than the value of this variable, the Broadcast Hash Join algorithm is used. Otherwise, the Shuffled Hash Join algorithm is used. +<<<<<<< HEAD +======= +### Known issues of MPP + +In the current version, TiFlash uses the `start_ts` of a query as the unique key of the query. In most cases, the `start_ts` of each query can uniquely identify a query, but in the following cases, different queries have the same `start_ts`: + +- All queries in the same transaction have the same `start_ts`. +- When you use [`tidb_snapshot`](/system-variables.md#tidb_snapshot) to read data at a specific historical time point, the same time point is manually specified. +- When [Stale Read](/stale-read.md) is enabled, the same time point is manually specified. + +When `start_ts` cannot uniquely represent the MPP query, if TiFlash detects that different queries have the same `start_ts` at a given time, TiFlash might report an error. Typical error cases are as follows: + +- When multiple queries with the same `start_ts` are sent to TiFlash at the same time, you might encounter the `task has been registered` error. +- When multiple simple queries with `LIMIT` are executed continuously in the same transaction, once the `LIMIT` condition is met, TiDB sends a cancel request to TiFlash to cancel the query. This request also uses `start_ts` to identify the query to be canceled. If there are other queries with the same `start_ts` in TiFlash, these queries might be canceled by mistake. An example of this issue can be found in [#43426](https://github.com/pingcap/tidb/issues/43426). + +This issue is fixed in TiDB v6.6.0. It is recommended to use the [latest LTS version](https://docs.pingcap.com/tidb/stable). + +## Data validation + +### User scenarios + +Data corruptions are usually caused by serious hardware failures. In such cases, even if you attempt to manually recover data, your data become less reliable. + +To ensure data integrity, by default, TiFlash performs basic data validation on data files, using the `City128` algorithm. In the event of any data validation failure, TiFlash immediately reports an error and exits, avoiding secondary disasters caused by inconsistent data. At this time, you need to manually intervene and replicate the data again before you can restore the TiFlash node. + +Starting from v5.4.0, TiFlash introduces more advanced data validation features. TiFlash uses the `XXH3` algorithm by default and allows you to customize the validation frame and algorithm. + +### Validation mechanism + +The validation mechanism builds upon the DeltaTree File (DTFile). DTFile is the storage file that persists TiFlash data. DTFile has three formats: + +| Version | State | Validation mechanism | Notes | +| :-- | :-- | :-- |:-- | +| V1 | Deprecated | Hashes are embedded in data files. | | +| V2 | Default | Hashes are embedded in data files. | Compared to V1, V2 adds statistics of column data. | +| V3 | Manually enable | V3 contains metadata and token data checksum, and supports multiple hash algorithms. | New in v5.4.0. | + +DTFile is stored in the `stable` folder in the data file directory. All formats currently enabled are in folder format, which means the data is stored in multiple files under a folder with a name like `dmf_`. + +#### Use data validation + +TiFlash supports both automatic and manual data validation: + +* Automatic data validation: + * TiFlash enables the V2 validation mechanism by default. + * To enable V3 validation mechanism, refer to [TiFlash configuration file](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file). +* Manual data validation. Refer to [`DTTool inspect`](/tiflash/tiflash-command-line-flags.md#dttool-inspect). + +> **Warning:** +> +> After you enable the V3 validation mechanism, the newly generated DTFile cannot be directly read by TiFlash earlier than v5.4.0. Since v5.4.0, TiFlash supports both V2 and V3 and does not actively upgrade or downgrade versions. If you need to upgrade or downgrade versions for existing files, you need to manually [switch versions](/tiflash/tiflash-command-line-flags.md#dttool-migrate). + +#### Validation tool + +In addition to automatic data validation performed when TiFlash reads data, a tool for manually checking data integrity is introduced in v5.4.0. For details, refer to [DTTool](/tiflash/tiflash-command-line-flags.md#dttool-inspect). + +>>>>>>> 4391d0e25 (Add a known issue for TiFlash mpp. (#13734) (#13816)) ## Notes Currently, TiFlash does not support some features. These features might be incompatible with the native TiDB: From 7a121f9889c056969d58c7f0fc20b61a3e87ac42 Mon Sep 17 00:00:00 2001 From: Ran Date: Fri, 9 Jun 2023 10:21:10 +0800 Subject: [PATCH 2/2] Update use-tiflash.md --- tiflash/use-tiflash.md | 42 ------------------------------------------ 1 file changed, 42 deletions(-) diff --git a/tiflash/use-tiflash.md b/tiflash/use-tiflash.md index 765dad6ef31df..88769b5d47b5e 100644 --- a/tiflash/use-tiflash.md +++ b/tiflash/use-tiflash.md @@ -485,8 +485,6 @@ TiFlash provides the following two global/session variables to control whether t - [`tidb_broadcast_join_threshold_size`](/system-variables.md#tidb_broadcast_join_threshold_count-new-in-v50): The unit of the value is bytes. If the table size (in the unit of bytes) is less than the value of the variable, the Broadcast Hash Join algorithm is used. Otherwise, the Shuffled Hash Join algorithm is used. - [`tidb_broadcast_join_threshold_count`](/system-variables.md#tidb_broadcast_join_threshold_count-new-in-v50): The unit of the value is rows. If the objects of the join operation belong to a subquery, the optimizer cannot estimate the size of the subquery result set, so the size is determined by the number of rows in the result set. If the estimated number of rows in the subquery is less than the value of this variable, the Broadcast Hash Join algorithm is used. Otherwise, the Shuffled Hash Join algorithm is used. -<<<<<<< HEAD -======= ### Known issues of MPP In the current version, TiFlash uses the `start_ts` of a query as the unique key of the query. In most cases, the `start_ts` of each query can uniquely identify a query, but in the following cases, different queries have the same `start_ts`: @@ -502,46 +500,6 @@ When `start_ts` cannot uniquely represent the MPP query, if TiFlash detects that This issue is fixed in TiDB v6.6.0. It is recommended to use the [latest LTS version](https://docs.pingcap.com/tidb/stable). -## Data validation - -### User scenarios - -Data corruptions are usually caused by serious hardware failures. In such cases, even if you attempt to manually recover data, your data become less reliable. - -To ensure data integrity, by default, TiFlash performs basic data validation on data files, using the `City128` algorithm. In the event of any data validation failure, TiFlash immediately reports an error and exits, avoiding secondary disasters caused by inconsistent data. At this time, you need to manually intervene and replicate the data again before you can restore the TiFlash node. - -Starting from v5.4.0, TiFlash introduces more advanced data validation features. TiFlash uses the `XXH3` algorithm by default and allows you to customize the validation frame and algorithm. - -### Validation mechanism - -The validation mechanism builds upon the DeltaTree File (DTFile). DTFile is the storage file that persists TiFlash data. DTFile has three formats: - -| Version | State | Validation mechanism | Notes | -| :-- | :-- | :-- |:-- | -| V1 | Deprecated | Hashes are embedded in data files. | | -| V2 | Default | Hashes are embedded in data files. | Compared to V1, V2 adds statistics of column data. | -| V3 | Manually enable | V3 contains metadata and token data checksum, and supports multiple hash algorithms. | New in v5.4.0. | - -DTFile is stored in the `stable` folder in the data file directory. All formats currently enabled are in folder format, which means the data is stored in multiple files under a folder with a name like `dmf_`. - -#### Use data validation - -TiFlash supports both automatic and manual data validation: - -* Automatic data validation: - * TiFlash enables the V2 validation mechanism by default. - * To enable V3 validation mechanism, refer to [TiFlash configuration file](/tiflash/tiflash-configuration.md#configure-the-tiflashtoml-file). -* Manual data validation. Refer to [`DTTool inspect`](/tiflash/tiflash-command-line-flags.md#dttool-inspect). - -> **Warning:** -> -> After you enable the V3 validation mechanism, the newly generated DTFile cannot be directly read by TiFlash earlier than v5.4.0. Since v5.4.0, TiFlash supports both V2 and V3 and does not actively upgrade or downgrade versions. If you need to upgrade or downgrade versions for existing files, you need to manually [switch versions](/tiflash/tiflash-command-line-flags.md#dttool-migrate). - -#### Validation tool - -In addition to automatic data validation performed when TiFlash reads data, a tool for manually checking data integrity is introduced in v5.4.0. For details, refer to [DTTool](/tiflash/tiflash-command-line-flags.md#dttool-inspect). - ->>>>>>> 4391d0e25 (Add a known issue for TiFlash mpp. (#13734) (#13816)) ## Notes Currently, TiFlash does not support some features. These features might be incompatible with the native TiDB: