From f1517b27862e04165c56a17780036e889895186d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 20 Jun 2024 08:56:14 +0200 Subject: [PATCH 1/6] Small updates to TiKV config docs --- glossary.md | 4 ++++ tikv-configuration-file.md | 8 ++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/glossary.md b/glossary.md index e5fe009cc7eac..85cc0687d40f6 100644 --- a/glossary.md +++ b/glossary.md @@ -75,6 +75,10 @@ Leader/Follower/Learner each corresponds to a role in a Raft group of [peers](#r Starting from v5.0, TiDB introduces Massively Parallel Processing (MPP) architecture through TiFlash nodes, which shares the execution workloads of large join queries among TiFlash nodes. When the MPP mode is enabled, TiDB, based on cost, determines whether to use the MPP framework to perform the calculation. In the MPP mode, the join keys are redistributed through the Exchange operation while being calculated, which distributes the calculation pressure to each TiFlash node and speeds up the calculation. For more information, see [Use TiFlash MPP Mode](/tiflash/use-tiflash-mpp-mode.md). +### MVCC + +Multiversion concurrency control is used by TiDB to allow concurrent access to data. See also [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) on wikipedia. + ## O ### Old value diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 13051f9e2532e..a06e2e14c09de 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -466,7 +466,7 @@ Configuration items related to storage. > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. > - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters). Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. -+ TTL is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. ++ [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` ### `ttl-check-poll-interval` @@ -488,9 +488,9 @@ Configuration items related to storage. + Value options: + `1`: Uses API V1, does not encode the data passed from the client, and stores data as it is. In versions earlier than v6.1.0, TiKV uses API V1 by default. + `2`: Uses API V2: - + The data is stored in the Multi-Version Concurrency Control (MVCC) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + + The data is stored in the [Multi-Version Concurrency Control (MVCC)](/glossary.md#mvcc) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + Data is scoped according to different usage and API V2 supports co-existence of TiDB, Transactional KV, and RawKV applications in a single cluster. - + When API V2 is used, you are expected to set `storage.enable-ttl = true` at the same time. Because API V2 supports the TTL feature, you must turn on `enable-ttl` explicitly. Otherwise, it will be in conflict because `storage.enable-ttl` defaults to `false`. + + When API V2 is used, you are expected to set `storage.enable-ttl = true` at the same time. Because API V2 supports the TTL feature, you must turn on [`enable-ttl`](#enable-ttl) explicitly. Otherwise, it will be in conflict because `storage.enable-ttl` defaults to `false`. + When API V2 is enabled, you need to deploy at least one tidb-server instance to reclaim obsolete data. This tidb-server instance can provide read and write services at the same time. To ensure high availability, you can deploy multiple tidb-server instances. + Client support is required for API V2. For details, see the corresponding instruction of the client for the API V2. + Since v6.2.0, Change Data Capture (CDC) for RawKV is supported. Refer to [RawKV CDC](https://tikv.org/docs/latest/concepts/explore-tikv-features/cdc/cdc). @@ -1675,7 +1675,7 @@ Configuration items related to `rocksdb.defaultcf.titan`. + The zstd dictionary compression size. The default value is `"0KiB"`, which means to disable the zstd dictionary compression. In this case, Titan compresses data based on single values, whereas RocksDB compresses data based on blocks (`32KiB` by default). When the average size of Titan values is less than `32KiB`, Titan's compression ratio is lower than that of RocksDB. Taking JSON as an example, the store size in Titan can be 30% to 50% larger than that of RocksDB. The actual compression ratio depends on whether the value content is suitable for compression and the similarity among different values. You can enable the zstd dictionary compression to increase the compression ratio by configuring `zstd-dict-size` (for example, set it to `16KiB`). The actual store size can be lower than that of RocksDB. But the zstd dictionary compression might lead to about 10% performance regression in specific workloads. + Default value: `"0KiB"` -+ Unit: KiB|MiB|GiB ++ Unit: KiB|MiB|GiB ### `blob-cache-size` From f85ff24f8b75fbc11c6e89f93e9f239b98d21265 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 20 Jun 2024 09:31:38 +0200 Subject: [PATCH 2/6] Remove incorrect information --- tikv-configuration-file.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index a06e2e14c09de..1b7ef73bffdc2 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -464,7 +464,6 @@ Configuration items related to storage. > **Warning:** > > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. -> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters). Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. + [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` From 377a68e4a6794ea88650e332880e796dca092336 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 20 Jun 2024 09:40:21 +0200 Subject: [PATCH 3/6] Updated to keep the warning, but mention api-version=2 --- tikv-configuration-file.md | 1 + 1 file changed, 1 insertion(+) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 1b7ef73bffdc2..4cb41bfbc5212 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -464,6 +464,7 @@ Configuration items related to storage. > **Warning:** > > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. +> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters) unless it has `api-version = 2`. Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. + [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` From 6c026a79f61d7e2cf1e7895515eeeb4d1aefa275 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 21 Jun 2024 10:15:59 +0200 Subject: [PATCH 4/6] Update tikv-configuration-file.md Co-authored-by: Aolin --- tikv-configuration-file.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 4cb41bfbc5212..9afd18beae3ce 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -464,7 +464,7 @@ Configuration items related to storage. > **Warning:** > > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. -> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters) unless it has `api-version = 2`. Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. +> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters) unless `storage.api-version = 2` is configured. Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. + [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` From 516e7ba3d510f0d497b15ca13f04de7a2f3a9bd1 Mon Sep 17 00:00:00 2001 From: Aolin Date: Thu, 27 Jun 2024 15:28:29 +0800 Subject: [PATCH 5/6] Apply suggestions from code review --- glossary.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/glossary.md b/glossary.md index 85cc0687d40f6..031519e9d2c5b 100644 --- a/glossary.md +++ b/glossary.md @@ -75,9 +75,9 @@ Leader/Follower/Learner each corresponds to a role in a Raft group of [peers](#r Starting from v5.0, TiDB introduces Massively Parallel Processing (MPP) architecture through TiFlash nodes, which shares the execution workloads of large join queries among TiFlash nodes. When the MPP mode is enabled, TiDB, based on cost, determines whether to use the MPP framework to perform the calculation. In the MPP mode, the join keys are redistributed through the Exchange operation while being calculated, which distributes the calculation pressure to each TiFlash node and speeds up the calculation. For more information, see [Use TiFlash MPP Mode](/tiflash/use-tiflash-mpp-mode.md). -### MVCC +### Multi-version concurrency control (MVCC) -Multiversion concurrency control is used by TiDB to allow concurrent access to data. See also [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) on wikipedia. +[MVCC](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) is a concurrency control mechanism in TiDB and other databases. It processes the memory read by transactions to achieve concurrent access to TiDB, thereby avoiding blocking caused by conflicts between concurrent reads and writes. ## O From 59547a31b29dcb5384ccaa5f6675175bf65aab9c Mon Sep 17 00:00:00 2001 From: Aolin Date: Thu, 27 Jun 2024 15:42:24 +0800 Subject: [PATCH 6/6] fix link --- tikv-configuration-file.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 9afd18beae3ce..505c58ea09b7a 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -488,7 +488,7 @@ Configuration items related to storage. + Value options: + `1`: Uses API V1, does not encode the data passed from the client, and stores data as it is. In versions earlier than v6.1.0, TiKV uses API V1 by default. + `2`: Uses API V2: - + The data is stored in the [Multi-Version Concurrency Control (MVCC)](/glossary.md#mvcc) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + + The data is stored in the [Multi-Version Concurrency Control (MVCC)](/glossary.md#multi-version-concurrency-control-mvcc) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + Data is scoped according to different usage and API V2 supports co-existence of TiDB, Transactional KV, and RawKV applications in a single cluster. + When API V2 is used, you are expected to set `storage.enable-ttl = true` at the same time. Because API V2 supports the TTL feature, you must turn on [`enable-ttl`](#enable-ttl) explicitly. Otherwise, it will be in conflict because `storage.enable-ttl` defaults to `false`. + When API V2 is enabled, you need to deploy at least one tidb-server instance to reclaim obsolete data. This tidb-server instance can provide read and write services at the same time. To ensure high availability, you can deploy multiple tidb-server instances.