From e9bfc6cc522bb2a3e0962c7307a3906ac1e705a9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 20 Jun 2024 08:56:14 +0200 Subject: [PATCH 1/6] Small updates to TiKV config docs --- glossary.md | 4 ++++ tikv-configuration-file.md | 8 ++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/glossary.md b/glossary.md index 2f5b5ee5f6b98..80efeed775242 100644 --- a/glossary.md +++ b/glossary.md @@ -76,6 +76,10 @@ Leader/Follower/Learner each corresponds to a role in a Raft group of [peers](#r Starting from v5.0, TiDB introduces Massively Parallel Processing (MPP) architecture through TiFlash nodes, which shares the execution workloads of large join queries among TiFlash nodes. When the MPP mode is enabled, TiDB, based on cost, determines whether to use the MPP framework to perform the calculation. In the MPP mode, the join keys are redistributed through the Exchange operation while being calculated, which distributes the calculation pressure to each TiFlash node and speeds up the calculation. For more information, see [Use TiFlash MPP Mode](/tiflash/use-tiflash-mpp-mode.md). +### MVCC + +Multiversion concurrency control is used by TiDB to allow concurrent access to data. See also [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) on wikipedia. + ## O ### Old value diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 18b194d0bf992..4c64410db841d 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -467,7 +467,7 @@ Configuration items related to storage. > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. > - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters). Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. -+ TTL is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. ++ [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` ### `ttl-check-poll-interval` @@ -489,9 +489,9 @@ Configuration items related to storage. + Value options: + `1`: Uses API V1, does not encode the data passed from the client, and stores data as it is. In versions earlier than v6.1.0, TiKV uses API V1 by default. + `2`: Uses API V2: - + The data is stored in the Multi-Version Concurrency Control (MVCC) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + + The data is stored in the [Multi-Version Concurrency Control (MVCC)](/glossary.md#mvcc) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + Data is scoped according to different usage and API V2 supports co-existence of TiDB, Transactional KV, and RawKV applications in a single cluster. - + When API V2 is used, you are expected to set `storage.enable-ttl = true` at the same time. Because API V2 supports the TTL feature, you must turn on `enable-ttl` explicitly. Otherwise, it will be in conflict because `storage.enable-ttl` defaults to `false`. + + When API V2 is used, you are expected to set `storage.enable-ttl = true` at the same time. Because API V2 supports the TTL feature, you must turn on [`enable-ttl`](#enable-ttl) explicitly. Otherwise, it will be in conflict because `storage.enable-ttl` defaults to `false`. + When API V2 is enabled, you need to deploy at least one tidb-server instance to reclaim obsolete data. This tidb-server instance can provide read and write services at the same time. To ensure high availability, you can deploy multiple tidb-server instances. + Client support is required for API V2. For details, see the corresponding instruction of the client for the API V2. + Since v6.2.0, Change Data Capture (CDC) for RawKV is supported. Refer to [RawKV CDC](https://tikv.org/docs/latest/concepts/explore-tikv-features/cdc/cdc). @@ -1676,7 +1676,7 @@ Configuration items related to `rocksdb.defaultcf.titan`. + The zstd dictionary compression size. The default value is `"0KiB"`, which means to disable the zstd dictionary compression. In this case, Titan compresses data based on single values, whereas RocksDB compresses data based on blocks (`32KiB` by default). When the average size of Titan values is less than `32KiB`, Titan's compression ratio is lower than that of RocksDB. Taking JSON as an example, the store size in Titan can be 30% to 50% larger than that of RocksDB. The actual compression ratio depends on whether the value content is suitable for compression and the similarity among different values. You can enable the zstd dictionary compression to increase the compression ratio by configuring `zstd-dict-size` (for example, set it to `16KiB`). The actual store size can be lower than that of RocksDB. But the zstd dictionary compression might lead to about 10% performance regression in specific workloads. + Default value: `"0KiB"` -+ Unit: KiB|MiB|GiB ++ Unit: KiB|MiB|GiB ### `blob-cache-size` From 95b3908aa2a3a73c04ab23e82378ab6d81cb4247 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 20 Jun 2024 09:31:38 +0200 Subject: [PATCH 2/6] Remove incorrect information --- tikv-configuration-file.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 4c64410db841d..02197daaa1a56 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -465,7 +465,6 @@ Configuration items related to storage. > **Warning:** > > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. -> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters). Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. + [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` From 4bfa8dab092f4d1a4e2d49edcbc8e49815de3d5a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 20 Jun 2024 09:40:21 +0200 Subject: [PATCH 3/6] Updated to keep the warning, but mention api-version=2 --- tikv-configuration-file.md | 1 + 1 file changed, 1 insertion(+) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 02197daaa1a56..ee3efc8ef7f66 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -465,6 +465,7 @@ Configuration items related to storage. > **Warning:** > > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. +> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters) unless it has `api-version = 2`. Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. + [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` From 7770ebfb7c282446ebd30bc78985b6c08d7ba0d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 21 Jun 2024 10:15:59 +0200 Subject: [PATCH 4/6] Update tikv-configuration-file.md Co-authored-by: Aolin --- tikv-configuration-file.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index ee3efc8ef7f66..b0cb6a8f97ceb 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -465,7 +465,7 @@ Configuration items related to storage. > **Warning:** > > - Set `enable-ttl` to `true` or `false` **ONLY WHEN** deploying a new TiKV cluster. **DO NOT** modify the value of this configuration item in an existing TiKV cluster. TiKV clusters with different `enable-ttl` values use different data formats. Therefore, if you modify the value of this item in an existing TiKV cluster, the cluster will store data in different formats, which causes the "can't enable TTL on a non-ttl" error when you restart the TiKV cluster. -> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters) unless it has `api-version = 2`. Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. +> - Use `enable-ttl` **ONLY IN** a TiKV cluster. **DO NOT** use this configuration item in a cluster that has TiDB nodes (which means setting `enable-ttl` to `true` in such clusters) unless `storage.api-version = 2` is configured. Otherwise, critical issues such as data corruption and the upgrade failure of TiDB clusters will occur. + [TTL](/time-to-live.md) is short for "Time to live". If this item is enabled, TiKV automatically deletes data that reaches its TTL. To set the value of TTL, you need to specify it in the requests when writing data via the client. If the TTL is not specified, it means that TiKV does not automatically delete the corresponding data. + Default value: `false` From df81d1cb633504b4c6121923b2b203af9d88192e Mon Sep 17 00:00:00 2001 From: Aolin Date: Thu, 27 Jun 2024 15:28:29 +0800 Subject: [PATCH 5/6] Apply suggestions from code review --- glossary.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/glossary.md b/glossary.md index 80efeed775242..39b96d20f7288 100644 --- a/glossary.md +++ b/glossary.md @@ -76,9 +76,9 @@ Leader/Follower/Learner each corresponds to a role in a Raft group of [peers](#r Starting from v5.0, TiDB introduces Massively Parallel Processing (MPP) architecture through TiFlash nodes, which shares the execution workloads of large join queries among TiFlash nodes. When the MPP mode is enabled, TiDB, based on cost, determines whether to use the MPP framework to perform the calculation. In the MPP mode, the join keys are redistributed through the Exchange operation while being calculated, which distributes the calculation pressure to each TiFlash node and speeds up the calculation. For more information, see [Use TiFlash MPP Mode](/tiflash/use-tiflash-mpp-mode.md). -### MVCC +### Multi-version concurrency control (MVCC) -Multiversion concurrency control is used by TiDB to allow concurrent access to data. See also [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) on wikipedia. +[MVCC](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) is a concurrency control mechanism in TiDB and other databases. It processes the memory read by transactions to achieve concurrent access to TiDB, thereby avoiding blocking caused by conflicts between concurrent reads and writes. ## O From 579206eab8fc14958a03e830aef3f2b70aa37ed5 Mon Sep 17 00:00:00 2001 From: Aolin Date: Thu, 27 Jun 2024 15:42:24 +0800 Subject: [PATCH 6/6] fix link --- tikv-configuration-file.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index b0cb6a8f97ceb..42954d95fba06 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -489,7 +489,7 @@ Configuration items related to storage. + Value options: + `1`: Uses API V1, does not encode the data passed from the client, and stores data as it is. In versions earlier than v6.1.0, TiKV uses API V1 by default. + `2`: Uses API V2: - + The data is stored in the [Multi-Version Concurrency Control (MVCC)](/glossary.md#mvcc) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + + The data is stored in the [Multi-Version Concurrency Control (MVCC)](/glossary.md#multi-version-concurrency-control-mvcc) format, where the timestamp is obtained from PD (which is TSO) by tikv-server. + Data is scoped according to different usage and API V2 supports co-existence of TiDB, Transactional KV, and RawKV applications in a single cluster. + When API V2 is used, you are expected to set `storage.enable-ttl = true` at the same time. Because API V2 supports the TTL feature, you must turn on [`enable-ttl`](#enable-ttl) explicitly. Otherwise, it will be in conflict because `storage.enable-ttl` defaults to `false`. + When API V2 is enabled, you need to deploy at least one tidb-server instance to reclaim obsolete data. This tidb-server instance can provide read and write services at the same time. To ensure high availability, you can deploy multiple tidb-server instances.