Skip to content
Permalink
Browse files

*: refine wording of data safety (#1557)

  • Loading branch information...
lilin90 committed Sep 27, 2019
1 parent 07af9a5 commit 703e3385d45f55f9f35db1679db4eb9a1ef3afee
@@ -59,7 +59,7 @@ enabled = true

Higher log level also means better performance for TiKV.

As TiKV is deployed in clusters, the Raft algorithm can guarantee that data is written into most of the nodes. Therefore, except the scenarios where data security is extremely sensitive, `sync-log` can be disabled in raftstore.
As TiKV is deployed in clusters, the Raft algorithm can guarantee that data is written into most of the nodes. Therefore, except the scenarios where data safety is extremely important, `sync-log` can be disabled in raftstore.

There are 2 Column Families (Default CF and Write CF) on TiKV cluster which are mainly used to store different types of data. For the Sysbench test, the Column Family that is used to import data has a constant proportion among TiDB clusters:

@@ -661,7 +661,7 @@ WAL belongs to ordered writing, and currently, we do not apply a unique configur

Generally, enabling `sync-log` reduces about 30% of the performance. For write performance when `sync-log` is set to `false`, see [Performance test result for TiDB using Sysbench](https://github.com/pingcap/docs/blob/master/dev/benchmark/sysbench-v4.md).

#### Can Raft + multiple replicas in the TiKV architecture achieve absolute data security? Is it necessary to apply the most strict mode (`sync-log = true`) to a standalone storage?
#### Can Raft + multiple replicas in the TiKV architecture achieve absolute data safety? Is it necessary to apply the most strict mode (`sync-log = true`) to a standalone storage?

Data is redundantly replicated between TiKV nodes using the [Raft consensus algorithm](https://raft.github.io/) to ensure recoverability should a node failure occur. Only when the data has been written into more than 50% of the replicas will the application return ACK (two out of three nodes). However, theoretically, two nodes might crash. Therefore, except for scenarios with less strict requirement on data safety but extreme requirement on performance, it is strongly recommended that you enable the `sync-log` mode.

@@ -34,7 +34,7 @@ Besides the operation of the Kubernetes cluster itself, there are the following

## What is the recommended deployment topology when I use TiDB Operator to orchestrate a TiDB cluster on a public cloud?

To achieve high availability and data security, it is recommended that you deploy the TiDB cluster in at least three availability zones in a production environment.
To achieve high availability and data safety, it is recommended that you deploy the TiDB cluster in at least three availability zones in a production environment.

In terms of the deployment topology relationship between the TiDB cluster and TiDB services, TiDB Operator supports the following three deployment modes. Each mode has its own merits and demerits, so your choice must be based on actual application needs.

@@ -16,7 +16,7 @@ Kubernetes currently supports statically allocated local storage. To create a lo

For more information, refer to [Kubernetes local storage](https://kubernetes.io/docs/concepts/storage/volumes/#local) and [local-static-provisioner document](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner#overview).

## Data security
## Data safety

By default, when a local PV is released, the provisioner recycles it. To prevent data from being recycled automatically, you must set the reclaim policy of your storage class to `Retain`. After confirming that a PV's data can be deleted, modify its reclaim policy to `Delete`. For how to change PV reclaim policy in Kubernetes, refer to [this doc](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/).

@@ -54,7 +54,7 @@ To use the diagnostic mode for troubleshooting:

## Recover the cluster after accidental deletion

TiDB Operator uses PV (Persistent Volume) and PVC (Persistent Volume Claim) to store persistent data. If you accidentally delete a cluster using `helm delete`, the PV/PVC objects and data are still retained to ensure data security.
TiDB Operator uses PV (Persistent Volume) and PVC (Persistent Volume Claim) to store persistent data. If you accidentally delete a cluster using `helm delete`, the PV/PVC objects and data are still retained to ensure data safety.

To restore the cluster at this time, use the `helm install` command to create a cluster with the same name. The retained PV/PVC and data are reused.

@@ -449,7 +449,7 @@ TiKV implements the Column Family (CF) feature of RocksDB. By default, the KV da

#### If the leader node is down, will the service be affected? How long?

TiDB uses Raft to replicate data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data security. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single Region leader fails, a new Region leader is soon elected after a maximum of 2 * lease time (lease time is 10 seconds).
TiDB uses Raft to replicate data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data safety. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single Region leader fails, a new Region leader is soon elected after a maximum of 2 * lease time (lease time is 10 seconds).

#### What are the TiKV scenarios that take up high I/O, memory, CPU, and exceed the parameter configuration?

@@ -485,9 +485,9 @@ WAL belongs to ordered writing, and currently, we do not apply a unique configur

Generally, enabling `sync-log` reduces about 30% of the performance. For the test about `sync-log = false`, see [Performance test result for TiDB using Sysbench](benchmark/sysbench.md).

#### Can the Raft + multiple replicas in the upper layer implement complete data security? Is it required to apply the most strict mode to standalone storage?
#### Can the Raft + multiple replicas in the upper layer implement complete data safety? Is it required to apply the most strict mode to standalone storage?

Raft uses strong consistency, and only when the data has been written into more than 50% of the nodes, the application returns ACK (two out of three nodes). In this case, data consistency is guaranteed. However, theoretically, two nodes might crash. Therefore, for scenarios that have a strict requirement on data security, such as scenarios in financial industry, you need to enable the `sync-log`.
Raft uses strong consistency, and only when the data has been written into more than 50% of the nodes, the application returns ACK (two out of three nodes). In this case, data consistency is guaranteed. However, theoretically, two nodes might crash. Therefore, for scenarios that have a strict requirement on data safety, such as scenarios in financial industry, you need to enable the `sync-log`.

#### In data writing using the Raft protocol, multiple network roundtrips occur. What is the actual write delay?

@@ -609,7 +609,7 @@ TiKV implements the Column Family (CF) feature of RocksDB. By default, the KV da

#### If the leader node is down, will the service be affected? How long?

TiDB uses Raft to replicate data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data security. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single Region leader fails, a new Region leader is soon elected after a maximum of 2 * lease time (lease time is 10 seconds).
TiDB uses Raft to replicate data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data safety. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single Region leader fails, a new Region leader is soon elected after a maximum of 2 * lease time (lease time is 10 seconds).

#### What are the TiKV scenarios that take up high I/O, memory, CPU, and exceed the parameter configuration?

@@ -645,9 +645,9 @@ WAL belongs to ordered writing, and currently, we do not apply a unique configur

Generally, enabling `sync-log` reduces about 30% of the performance. For the test about `sync-log = false`, see [Performance test result for TiDB using Sysbench](benchmark/sysbench.md).

#### Can the Raft + multiple replicas in the upper layer implement complete data security? Is it required to apply the most strict mode to standalone storage?
#### Can the Raft + multiple replicas in the upper layer implement complete data safety? Is it required to apply the most strict mode to standalone storage?

Raft uses strong consistency, and only when the data has been written into more than 50% of the nodes, the application returns ACK (two out of three nodes). In this case, data consistency is guaranteed. However, theoretically, two nodes might crash. Therefore, for scenarios that have a strict requirement on data security, such as scenarios in financial industry, you need to enable the `sync-log`.
Raft uses strong consistency, and only when the data has been written into more than 50% of the nodes, the application returns ACK (two out of three nodes). In this case, data consistency is guaranteed. However, theoretically, two nodes might crash. Therefore, for scenarios that have a strict requirement on data safety, such as scenarios in financial industry, you need to enable the `sync-log`.

#### In data writing using the Raft protocol, multiple network roundtrips occur. What is the actual write delay?

@@ -608,7 +608,7 @@ TiKV implements the Column Family (CF) feature of RocksDB. By default, the KV da

#### If a node is down, will the service be affected? How long?

TiDB uses Raft to replicate data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data security. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single leader fails, a follower is soon elected as Region leader after a maximum of 2 * lease time (lease time is 10 seconds).
TiDB uses Raft to replicate data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data safety. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single leader fails, a follower is soon elected as Region leader after a maximum of 2 * lease time (lease time is 10 seconds).

#### What are the TiKV scenarios that take up high I/O, memory, CPU, and exceed the parameter configuration?

@@ -644,9 +644,9 @@ WAL belongs to ordered writing, and currently, we do not apply a unique configur

Generally, enabling `sync-log` reduces about 30% of the performance. For the test about `sync-log = false`, see [Performance test result for TiDB using Sysbench](benchmark/sysbench.md).

#### Can the Raft + multiple replicas in the upper layer implement complete data security? Is it required to apply the most strict mode to standalone storage?
#### Can the Raft + multiple replicas in the upper layer implement complete data safety? Is it required to apply the most strict mode to standalone storage?

Raft uses strong consistency, and only when the data has been written into more than 50% of the nodes, the application returns ACK (two out of three nodes). In this case, data consistency is guaranteed. However, theoretically, two nodes might crash. Therefore, for scenarios that have a strict requirement on data security, such as scenarios in financial industry, you need to enable the `sync-log`.
Raft uses strong consistency, and only when the data has been written into more than 50% of the nodes, the application returns ACK (two out of three nodes). In this case, data consistency is guaranteed. However, theoretically, two nodes might crash. Therefore, for scenarios that have a strict requirement on data safety, such as scenarios in financial industry, you need to enable the `sync-log`.

#### In data writing using the Raft protocol, multiple network roundtrips occur. What is the actual write delay?

@@ -58,7 +58,7 @@ enabled = true

Higher log level also means better performance for TiKV.

As TiKV is deployed in clusters, the Raft algorithm can guarantee that data is written into most of the nodes. Therefore, except the scenarios where data security is extremely sensitive, `sync-log` can be disabled in raftstore.
As TiKV is deployed in clusters, the Raft algorithm can guarantee that data is written into most of the nodes. Therefore, except the scenarios where data safety is extremely important, `sync-log` can be disabled in raftstore.

There are 2 Column Families (Default CF and Write CF) on TiKV cluster which are mainly used to store different types of data. For the Sysbench test, the Column Family that is used to import data has a constant proportion among TiDB clusters:

@@ -663,7 +663,7 @@ WAL belongs to ordered writing, and currently, we do not apply a unique configur

Generally, enabling `sync-log` reduces about 30% of the performance. For write performance when `sync-log` is set to `false`, see [Performance test result for TiDB using Sysbench](https://github.com/pingcap/docs/blob/master/dev/benchmark/sysbench-v4.md).

#### Can Raft + multiple replicas in the TiKV architecture achieve absolute data security? Is it necessary to apply the most strict mode (`sync-log = true`) to a standalone storage?
#### Can Raft + multiple replicas in the TiKV architecture achieve absolute data safety? Is it necessary to apply the most strict mode (`sync-log = true`) to a standalone storage?

Data is redundantly replicated between TiKV nodes using the [Raft consensus algorithm](https://raft.github.io/) to ensure recoverability should a node failure occur. Only when the data has been written into more than 50% of the replicas will the application return ACK (two out of three nodes). However, theoretically, two nodes might crash. Therefore, except for scenarios with less strict requirement on data safety but extreme requirement on performance, it is strongly recommended that you enable the `sync-log` mode.

@@ -60,7 +60,7 @@ enabled = true

Higher log level also means better performance for TiKV.

As TiKV is deployed in clusters, the Raft algorithm can guarantee that data is written into most of the nodes. Therefore, except the scenarios where data security is extremely sensitive, `sync-log` can be disabled in raftstore.
As TiKV is deployed in clusters, the Raft algorithm can guarantee that data is written into most of the nodes. Therefore, except the scenarios where data safety is extremely important, `sync-log` can be disabled in raftstore.

There are 2 Column Families (Default CF and Write CF) on TiKV cluster which are mainly used to store different types of data. For the Sysbench test, the Column Family that is used to import data has a constant proportion among TiDB clusters:

@@ -662,7 +662,7 @@ WAL belongs to ordered writing, and currently, we do not apply a unique configur

Generally, enabling `sync-log` reduces about 30% of the performance. For write performance when `sync-log` is set to `false`, see [Performance test result for TiDB using Sysbench](https://github.com/pingcap/docs/blob/master/dev/benchmark/sysbench-v4.md).

#### Can Raft + multiple replicas in the TiKV architecture achieve absolute data security? Is it necessary to apply the most strict mode (`sync-log = true`) to a standalone storage?
#### Can Raft + multiple replicas in the TiKV architecture achieve absolute data safety? Is it necessary to apply the most strict mode (`sync-log = true`) to a standalone storage?

Data is redundantly replicated between TiKV nodes using the [Raft consensus algorithm](https://raft.github.io/) to ensure recoverability should a node failure occur. Only when the data has been written into more than 50% of the replicas will the application return ACK (two out of three nodes). However, theoretically, two nodes might crash. Therefore, except for scenarios with less strict requirement on data safety but extreme requirement on performance, it is strongly recommended that you enable the `sync-log` mode.

@@ -35,7 +35,7 @@ Besides the operation of the Kubernetes cluster itself, there are the following

## What is the recommended deployment topology when I use TiDB Operator to orchestrate a TiDB cluster on a public cloud?

To achieve high availability and data security, it is recommended that you deploy the TiDB cluster in at least three availability zones in a production environment.
To achieve high availability and data safety, it is recommended that you deploy the TiDB cluster in at least three availability zones in a production environment.

In terms of the deployment topology relationship between the TiDB cluster and TiDB services, TiDB Operator supports the following three deployment modes. Each mode has its own merits and demerits, so your choice must be based on actual application needs.

@@ -16,7 +16,7 @@ Kubernetes currently supports statically allocated local storage. To create a lo

For more information, refer to [Kubernetes local storage](https://kubernetes.io/docs/concepts/storage/volumes/#local) and [local-static-provisioner document](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner#overview).

## Data security
## Data safety

By default, when a local PV is released, the provisioner recycles it. To prevent data from being recycled automatically, you must set the reclaim policy of your storage class to `Retain`. After confirming that a PV's data can be deleted, modify its reclaim policy to `Delete`. For how to change PV reclaim policy in Kubernetes, refer to [this doc](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/).

@@ -55,7 +55,7 @@ To use the diagnostic mode for troubleshooting:

## Recover the cluster after accidental deletion

TiDB Operator uses PV (Persistent Volume) and PVC (Persistent Volume Claim) to store persistent data. If you accidentally delete a cluster using `helm delete`, the PV/PVC objects and data are still retained to ensure data security.
TiDB Operator uses PV (Persistent Volume) and PVC (Persistent Volume Claim) to store persistent data. If you accidentally delete a cluster using `helm delete`, the PV/PVC objects and data are still retained to ensure data safety.

To restore the cluster at this time, use the `helm install` command to create a cluster with the same name. The retained PV/PVC and data are reused.

0 comments on commit 703e338

Please sign in to comment.
You can’t perform that action at this time.