Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3d32ffb
Update TOC.md
ireneontheway Sep 4, 2020
f0f1dd1
update files
ireneontheway Sep 4, 2020
5d2625d
Fix links
ireneontheway Sep 4, 2020
7323b8b
Merge branch 'master' into update-location-awareness
ireneontheway Sep 4, 2020
f9922a0
Update configure-placement-rules.md
ireneontheway Sep 4, 2020
5d2bd13
Merge branch 'update-location-awareness' of https://github.com/ireneo…
ireneontheway Sep 4, 2020
82601a2
Merge branch 'master' into update-location-awareness
yikeke Sep 4, 2020
bab16aa
Merge branch 'master' into update-location-awareness
yikeke Sep 4, 2020
821a00b
Update TOC.md
ireneontheway Sep 7, 2020
faa600b
Apply suggestions from code review
ireneontheway Sep 7, 2020
18a4474
Update schedule-replicas-by-topology-labels.md
ireneontheway Sep 8, 2020
bc1c586
Merge remote-tracking branch 'upstream/master' into update-location-a…
ireneontheway Sep 8, 2020
eb1120e
Update schedule-replicas-by-topology-labels.md
ireneontheway Sep 8, 2020
5ae8a46
Merge branch 'master' into update-location-awareness
ireneontheway Sep 8, 2020
9a19f7d
Update schedule-replicas-by-topology-labels.md
ireneontheway Sep 8, 2020
a6bb97a
Merge branch 'update-location-awareness' of https://github.com/ireneo…
ireneontheway Sep 8, 2020
de3906f
Merge branch 'master' into update-location-awareness
ireneontheway Sep 9, 2020
860ffad
Apply suggestions from code review
ireneontheway Sep 14, 2020
667caae
Update schedule-replicas-by-topology-labels.md
ireneontheway Sep 14, 2020
1183679
Apply suggestions from code review
ireneontheway Sep 14, 2020
b2df899
Update schedule-replicas-by-topology-labels.md
ireneontheway Sep 14, 2020
e03873a
Update schedule-replicas-by-topology-labels.md
ireneontheway Sep 14, 2020
89c967b
Update schedule-replicas-by-topology-labels.md
TomShawn Sep 14, 2020
4d9c773
Merge branch 'master' into update-location-awareness
TomShawn Sep 14, 2020
0233719
Update schedule-replicas-by-topology-labels.md
TomShawn Sep 16, 2020
9f2de87
Merge branch 'master' into update-location-awareness
ireneontheway Sep 17, 2020
dbd5321
Update schedule-replicas-by-topology-labels.md
ireneontheway Sep 17, 2020
a866d26
Merge branch 'update-location-awareness' of https://github.com/ireneo…
ireneontheway Sep 17, 2020
a9f1799
Merge branch 'master' into update-location-awareness
TomShawn Sep 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,7 @@
+ [TiCDC Overview](/ticdc/ticdc-overview.md)
+ [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md)
+ [Table Filter](/table-filter.md)
+ [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md)
+ FAQs
+ [TiDB FAQs](/faq/tidb-faq.md)
+ [SQL FAQs](/faq/sql-faq.md)
Expand Down
2 changes: 1 addition & 1 deletion configure-placement-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ The following table shows the meaning of each field in a rule:

The meaning and function of `LocationLabels` are the same with those earlier than v4.0. For example, if you have deployed `[zone,rack,host]` that defines a three-layer topology: the cluster has multiple zones (Availability Zones), each zone has multiple racks, and each rack has multiple hosts. When performing schedule, PD first tries to place the Region's peers in different zones. If this try fails (such as there are three replicas but only two zones in total), PD guarantees to place these replicas in different racks. If the number of racks is not enough to guarantee isolation, then PD tries the host-level isolation.

The meaning and function of `IsolationLevel` is elaborated in [Cluster topology configuration](/location-awareness.md). For example, if you have deployed `[zone,rack,host]` that defines a three-layer topology with `LocationLabels` and set `IsolationLevel` to `zone`, then PD ensures that all peers of each Region are placed in different zones during the scheduling. If the minimum isolation level restriction on `IsolationLevel` cannot be met (for example, 3 replicas are configured but there are only 2 data zones in total), PD will not try to make up to meet this restriction. The default value of `IsolationLevel` is an empty string, which means that it is disabled.
The meaning and function of `IsolationLevel` is elaborated in [Cluster topology configuration](/schedule-replicas-by-topology-labels.md). For example, if you have deployed `[zone,rack,host]` that defines a three-layer topology with `LocationLabels` and set `IsolationLevel` to `zone`, then PD ensures that all peers of each Region are placed in different zones during the scheduling. If the minimum isolation level restriction on `IsolationLevel` cannot be met (for example, 3 replicas are configured but there are only 2 data zones in total), PD will not try to make up to meet this restriction. The default value of `IsolationLevel` is an empty string, which means that it is disabled.

### Fields of the rule group

Expand Down
128 changes: 0 additions & 128 deletions location-awareness.md

This file was deleted.

4 changes: 2 additions & 2 deletions pd-configuration-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,13 +311,13 @@ Configuration items related to replicas

+ The topology information of a TiKV cluster
+ Default value: `[]`
+ [Cluster topology configuration](/location-awareness.md)
+ [Cluster topology configuration](/schedule-replicas-by-topology-labels.md)

### `isolation-level`

+ The minimum topological isolation level of a TiKV cluster
+ Default value: `""`
+ [Cluster topology configuration](/location-awareness.md)
+ [Cluster topology configuration](/schedule-replicas-by-topology-labels.md)

### `strictly-match-label`

Expand Down
194 changes: 194 additions & 0 deletions schedule-replicas-by-topology-labels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
---
title: Schedule Replicas by Topology Labels
summary: Learn how to schedule replicas by topology labels.
aliases: ['/docs/dev/location-awareness/','/docs/dev/how-to/deploy/geographic-redundancy/location-awareness/','/tidb/dev/location-awareness']
---

# Schedule Replicas by Topology Labels

To improve the high availability and disaster recovery capability of TiDB clusters, it is recommended that TiKV nodes are physically scattered as much as possible. For example, TiKV nodes can be distributed on different racks or even in different data centers. According to the topology information of TiKV, the PD scheduler automatically performs scheduling at the background to isolate each replica of a Region as much as possible, which maximizes the capability of disaster recovery.

To make this mechanism effective, you need to properly configure TiKV and PD so that the topology information of the cluster, especially the TiKV location information, is reported to PD during deployment. Before you begin, see [Deploy TiDB Using TiUP](/production-deployment-using-tiup.md) first.

## Configure `labels` based on the cluster topology

### Configure `labels` for TiKV

You can use the command-line flag or set the TiKV configuration file to bind some attributes in the form of key-value pairs. These attributes are called `labels`. After TiKV is started, it reports its `labels` to PD so users can identify the location of TiKV nodes.

Assume that the topology has three layers: zone > rack > host, and you can use these labels (zone, rack, host) to set the TiKV location in one of the following methods:

+ Use the command-line flag:

{{< copyable "" >}}

```
tikv-server --labels zone=<zone>,rack=<rack>,host=<host>
```

+ Configure in the TiKV configuration file:

{{< copyable "" >}}

```toml
[server]
labels = "zone=<zone>,rack=<rack>,host=<host>"
```

### Configure `location-labels` for PD

According to the description above, the label can be any key-value pair used to describe TiKV attributes. But PD cannot identify the location-related labels and the layer relationship of these labels. Therefore, you need to make the following configuration for PD to understand the TiKV node topology.

+ If the PD cluster is not initialized, configure `location-labels` in the PD configuration file:

{{< copyable "" >}}

```toml
[replication]
location-labels = ["zone", "rack", "host"]
```

+ If the PD cluster is already initialized, use the pd-ctl tool to make online changes:

{{< copyable "shell-regular" >}}

```bash
pd-ctl config set location-labels zone,rack,host
```

The `location-labels` configuration is an array of strings, and each item corresponds to the key of TiKV `labels`. The sequence of each key represents the layer relationship of different labels.

> **Note:**
>
> You must configure `location-labels` for PD and `labels` for TiKV at the same time for the configurations to take effect. Otherwise, PD does not perform scheduling according to the topology.

### Configure `isolation-level` for PD

If `location-labels` has been configured, you can further enhance the topological isolation requirements on TiKV clusters by configuring `isolation-level` in the PD configuration file.

Assume that you have made a three-layer cluster topology by configuring `location-labels` according to the instructions above: zone -> rack -> host, you can configure the `isolation-level` to `zone` as follows:

{{< copyable "" >}}

```toml
[replication]
isolation-level = "zone"
```

If the PD cluster is already initialized, you need to use the pd-ctl tool to make online changes:

{{< copyable "shell-regular" >}}

```bash
pd-ctl config set isolation-level zone
```

The `location-level` configuration is an array of strings, which needs to correspond to a key of `location-labels`. This parameter limits the minimum and mandatory isolation level requirements on TiKV topology clusters.

> **Note:**
>
> `isolation-level` is empty by default, which means there is no mandatory restriction on the isolation level. To set it, you need to configure `location-labels` for PD and ensure that the value of `isolation-level` is one of `location-labels` names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ireneontheway Please align pingcap/docs-cn#4502 in this document, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dbd5321

### Configure a cluster using TiUP (recommended)

When using TiUP to deploy a cluster, you can configure the TiKV location in the [initialization configuration file](/production-deployment-using-tiup.md#step-3-edit-the-initialization-configuration-file). TiUP will generate the corresponding TiKV and PD configuration files during deployment.

In the following example, a two-layer topology of `zone/host` is defined. The TiKV nodes of the cluster are distributed among three zones, each zone with two hosts. In z1, two TiKV instances are deployed per host. In z2 and z3, one TiKV instance is deployed per host. In the following example, `tikv-n` represents the IP address of the `n`th TiKV node.

```
server_configs:
pd:
replication.location-labels: ["zone", "host"]

tikv_servers:
# z1
- host: tikv-1
config:
server.labels:
zone: z1
host: h1
- host: tikv-2
config:
server.labels:
zone: z1
host: h1
- host: tikv-3
config:
server.labels:
zone: z1
host: h2
- host: tikv-4
config:
server.labels:
zone: z1
host: h2
# z2
- host: tikv-5
config:
server.labels:
zone: z2
host: h1
- host: tikv-6
config:
server.labels:
zone: z2
host: h2
# z3
- host: tikv-7
config:
server.labels:
zone: z3
host: h1
- host: tikv-8
config:
server.labels:
zone: z3
host: h2s
```

For details, see [Geo-distributed Deployment topology](/geo-distributed-deployment-topology.md).

<details>
<summary> <strong>Configure a cluster using TiDB Ansible</strong> </summary>

When using TiDB Ansible to deploy a cluster, you can directly configure the TiKV location in the `inventory.ini` file. `tidb-ansible` will generate the corresponding TiKV and PD configuration files during deployment.

In the following example, a two-layer topology of `zone/host` is defined. The TiKV nodes of the cluster are distributed among three zones, each zone with two hosts. In z1, two TiKV instances are deployed per host. In z2 and z3, one TiKV instance is deployed per host.

```
[tikv_servers]
# z1
tikv-1 labels="zone=z1,host=h1"
tikv-2 labels="zone=z1,host=h1"
tikv-3 labels="zone=z1,host=h2"
tikv-4 labels="zone=z1,host=h2"
# z2
tikv-5 labels="zone=z2,host=h1"
tikv-6 labels="zone=z2,host=h2"
# z3
tikv-7 labels="zone=z3,host=h1"
tikv-8 labels="zone=z3,host=h2"

[pd_servers:vars]
location_labels = ["zone", "host"]
```

</details>

## PD schedules based on topology label

PD schedules replicas according to the label layer to make sure that different replicas of the same data are scattered as much as possible.

Take the topology in the previous section as an example.

Assume that the number of cluster replicas is 3 (`max-replicas=3`). Because there are 3 zones in total, PD ensures that the 3 replicas of each Region are respectively placed in z1, z2, and z3. In this way, the TiDB cluster is still available when one data center fails.

Then, assume that the number of cluster replicas is 5 (`max-replicas=5`). Because there are only 3 zones in total, PD cannot guarantee the isolation of each replica at the zone level. In this situation, the PD scheduler will ensure replica isolation at the host level. In other words, multiple replicas of a Region might be distributed in the same zone but not on the same host.

In the case of the 5-replica configuration, if z3 fails or is isolated as a whole, and cannot be recovered after a period of time (controlled by `max-store-down-time`), PD will make up the 5 replicas through scheduling. At this time, only 3 hosts are available. This means that host-level isolation cannot be guaranteed and that multiple replicas might be scheduled to the same host. But if the `isolation-level` value is set to `zone` instead of being left empty, this specifies the minimum physical isolation requirements for Region replicas. That is to say, PD will ensure that replicas of the same Region are scattered among different zones. PD will not perform corresponding scheduling even if following this isolation restriction does not meet the requirement of `max-replicas` for multiple replicas.

For example, a TiKV cluster is distributed across three data zones z1, z2, and z3. Each Region has three replicas as required, and PD distributes the three replicas of the same Region to these three data zones respectively. If a power outage occurs in z1 and cannot be recovered after a period of time, PD determines that the Region replicas on z1 are no longer available. However, because `isolation-level` is set to `zone`, PD needs to strictly guarantee that different replicas of the same Region will not be scheduled on the same data zone. Because both z2 and z3 already have replicas, PD will not perform any scheduling under the minimum isolation level restriction of `isolation-level`, even if there are only two replicas at this moment.

Similarly, when `isolation-level` is set to `rack`, the minimum isolation level applies to different racks in the same data center. With this configuration, the isolation at the zone layer is guaranteed first if possible. When the isolation at the zone level cannot be guaranteed, PD tries to avoid scheduling different replicas to the same rack in the same zone. The scheduling works similarly when `isolation-level` is set to `host` where PD first guarantees the isolation level of rack, and then the level of host.

In summary, PD maximizes the disaster recovery of the cluster according to the current topology. Therefore, if you want to achieve a certain level of disaster recovery, deploy more machines on different sites according to the topology than the number of `max-replicas`. TiDB also provides mandatory configuration items such as `isolation-level` for you to more flexibly control the topological isolation level of data according to different scenarios.
Loading