-
Notifications
You must be signed in to change notification settings - Fork 1.2k
improve location-awareness #3174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
yikeke
merged 21 commits into
pingcap:docs-special-week
from
disksing:update-location-awareness
May 27, 2020
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
8488846
improve location-awareness
disksing 9e159f2
Update location-awareness.md
disksing df2ffcd
Update location-awareness.md
disksing 97639a2
Update location-awareness.md
disksing 941bc30
Update location-awareness.md
disksing ce58dec
Update location-awareness.md
disksing 6fff09e
Update location-awareness.md
disksing 80e140f
Update location-awareness.md
disksing 889b2b8
Update location-awareness.md
disksing 0e549b4
Update location-awareness.md
disksing 69de45e
Update location-awareness.md
disksing 541401f
Update location-awareness.md
disksing e94b5f1
rename the file and add an alias
8a6fe80
update title
925b6ed
Merge branch 'docs-special-week' into update-location-awareness
yikeke 2df29f1
Merge branch 'docs-special-week' into update-location-awareness
yikeke 964770a
fix CI
d460d3d
Merge branch 'docs-special-week' into update-location-awareness
disksing 1b7ba66
Merge branch 'docs-special-week' into update-location-awareness
yikeke 22bfb72
fix 2 links
38e3a1f
Update TOC.md
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| --- | ||
| title: 通过拓扑 label 进行副本调度 | ||
| category: how-to | ||
| aliases: ['/docs-cn/dev/how-to/deploy/geographic-redundancy/location-awareness/','/docs-cn/dev/location-awareness/'] | ||
| --- | ||
|
|
||
| # 通过拓扑 label 进行副本调度 | ||
|
|
||
| 为了提升 TiDB 集群的高可用性和数据容灾能力,我们推荐让 TiKV 节点尽可能在物理层面上分散,例如让 TiKV 节点分布在不同的机架甚至不同的机房。PD 调度器根据 TiKV 的拓扑信息,会自动在后台通过调度使得 Region 的各个副本尽可能隔离,从而使得数据容灾能力最大化。 | ||
|
|
||
| 要让这个机制生效,我们需要在部署时进行合理配置,把集群的拓扑信息(特别是 TiKV 的位置)上报给 PD。阅读本章前,请先确保阅读 [TiDB Ansible 部署方案](/online-deployment-using-ansible.md) 和 [Docker 部署方案](/test-deployment-using-docker.md)。 | ||
|
|
||
| ## 根据集群拓扑配置 labels | ||
|
|
||
| ### 设置 TiKV 的 `labels` 配置 | ||
|
|
||
| TiKV 支持在命令行参数或者配置文件中以键值对的形式绑定一些属性,我们把这些属性叫做标签(label)。TiKV 在启动后,会将自身的标签上报给 PD,因此我们可以使用标签来标识 TiKV 节点的地理位置。 | ||
|
|
||
| 比如集群的拓扑结构分成三层:机房(zone) -> 机架(rack)-> 主机(host),就可以使用这 3 个标签来设置 TiKV 的位置。 | ||
|
|
||
| 使用命令行参数的方式: | ||
|
|
||
| {{< copyable "" >}} | ||
|
|
||
| ``` | ||
| tikv-server --labels zone=<zone>,rack=<rack>,host=<host> | ||
| ``` | ||
|
|
||
| 使用配置文件的方式: | ||
|
|
||
| {{< copyable "" >}} | ||
|
|
||
| ```toml | ||
| [server] | ||
| labels = "zone=<zone>,rack=<rack>,host=<host>" | ||
| ``` | ||
|
|
||
| ### 设置 PD 的 `location-labels` 配置 | ||
|
|
||
| 根据前面的描述,标签可以是用来描述 TiKV 属性的任意键值对,但 PD 无从得知哪些标签是用来标识地理位置的,而且也无从得知这些标签的层次关系。因此,PD 也需要一些配置来使得 PD 理解 TiKV 节点拓扑。 | ||
|
|
||
| PD 上的配置叫做 `location-labels`,可以通过 PD 的配置文件进行配置。 | ||
|
|
||
| {{< copyable "" >}} | ||
|
|
||
| ```toml | ||
| [replication] | ||
| location-labels = ["zone", "rack", "host"] | ||
| ``` | ||
|
|
||
| 当 PD 集群初始化完成后,需要使用 pd-ctl 工具进行在线更改: | ||
|
|
||
| {{< copyable "shell-regular" >}} | ||
|
|
||
| ```bash | ||
| pd-ctl config set location-labels zone,rack,host | ||
| ``` | ||
|
|
||
| 其中,`location-labels` 配置是一个字符串数组,每一项与 TiKV 的 `labels` 的 key 是对应的,且其中每个 key 的顺序代表了不同标签的层次关系。 | ||
|
|
||
| > **注意:** | ||
| > | ||
| > 必须同时配置 PD 的 `location-labels` 和 TiKV 的 `labels` 参数,否则 PD 不会根据拓扑结构进行调度。 | ||
|
|
||
| ### 使用 TiDB Ansible 进行配置 | ||
|
|
||
| 如果使用 TiDB Ansible 部署集群,可以直接在 inventory.ini 文件中统一进行 location 相关配置。tidb-ansible 会负责在 deploy 时生成对应的 TiKV 和 PD 配置文件。 | ||
|
|
||
| 下面的例子定义了 `zone/host` 两层拓扑结构。集群的 TiKV 分布在三个 zone,每个 zone 内有两台主机,其中 z1 每台主机部署两个 TiKV 实例,z2 和 z3 每台主机部署 1 个实例。 | ||
|
|
||
| ``` | ||
| [tikv_servers] | ||
| # z1 | ||
| tikv-1 labels="zone=z1,host=h1" | ||
| tikv-2 labels="zone=z1,host=h1" | ||
| tikv-3 labels="zone=z1,host=h2" | ||
| tikv-4 labels="zone=z1,host=h2" | ||
| # z2 | ||
| tikv-5 labels="zone=z2,host=h1" | ||
| tikv-6 labels="zone=z2,host=h2" | ||
| # z3 | ||
| tikv-7 labels="zone=z3,host=h1" | ||
| tikv-8 labels="zone=z3,host=h2" | ||
|
|
||
| [pd_servers:vars] | ||
| location_labels = ["zone", "rack"] | ||
| ``` | ||
|
|
||
| ## 基于拓扑 label 的 PD 调度策略 | ||
|
|
||
| PD 在副本调度时,会按照 label 层级,保证同一份数据的不同副本尽可能分散。 | ||
|
|
||
| 下面以上一节的拓扑结构为例分析。 | ||
|
|
||
| 假设集群副本数设置为 3(`max-replicas=3`),因为总共有 3 个 zone,PD 会保证每个 Region 的 3 个副本分别放置在 z1/z2/z3,这样当任何一个数据中心发生故障时,TiDB 集群依然是可用的。 | ||
|
|
||
| 假如集群副本数设置为 5(`max-replicas=5`),因为总共只有 3 个 zone,在这一层级 PD 无法保证各个副本的隔离,此时 PD 调度器会退而求其次,保证在 host 这一层的隔离。也就是说,会出现一个 Region 的多个副本分布在同一个 zone 的情况,但是不会出现多个副本分布在同一台主机。 | ||
|
|
||
| 在 5 副本配置的前提下,如果 z3 出现了整体故障或隔离,并且 z3 在一段时间后仍然不能恢复(由 `max-store-down-time` 控制),PD 会通过调度补齐 5 副本,此时可用的主机只有 3 个了,故而无法保证 host 级别的隔离,于是可能出现多个副本被调度到同一台主机的情况。 | ||
|
|
||
| 总的来说,PD 能够根据当前的拓扑结构使得集群容灾能力最大化,所以如果我们希望达到某个级别的容灾能力, | ||
| 就需要根据拓扑结构在对应级别提供多于副本数 (`max-replicas`) 的机器。 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.