Skip to content

Commit

Permalink
zh, en: add doc for tidb scheduler (#69)
Browse files Browse the repository at this point in the history
* zh, en: add doc for tidb scheduler

* address comments from Daniel
  • Loading branch information
ran-huang committed Mar 30, 2020
1 parent 81aee28 commit 6b7ca1a
Show file tree
Hide file tree
Showing 5 changed files with 139 additions and 0 deletions.
2 changes: 2 additions & 0 deletions en/TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,7 @@
+ Tools
- [tkctl](use-tkctl.md)
- [TiDB Toolkit](tidb-toolkit.md)
+ Components
- [TiDB Scheduler](tidb-scheduler.md)
- [Troubleshoot](troubleshoot.md)
- [FAQs](faq.md)
69 changes: 69 additions & 0 deletions en/tidb-scheduler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: TiDB Scheduler
summary: Learn what is TiDB Scheduler and how it works.
category: reference
---

# TiDB Scheduler

TiDB Scheduler is a TiDB implementation of [Kubernetes scheduler extender](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/scheduler_extender.md). TiDB Scheduler is used to add new scheduling rules to Kubernetes. This document introduces these new scheduling rules and how TiDB Scheduler works.

## TiDB cluster scheduling requirements

A TiDB cluster includes three key components: PD, TiKV, and TiDB. Each consists of multiple nodes: PD is a Raft cluster, and TiKV is a multi-Raft group cluster. PD and TiKV components are stateful. The default scheduling rules of the Kubernetes scheduler cannot meet the high availability scheduling requirements of the TiDB cluster, so the Kubernetes scheduling rules need to be extended.

TiDB Scheduler implements the following customized scheduling rules:

### PD component

Scheduling rule 1: Make sure that the number of PD instances scheduled on each node is less than `Replicas / 2`. For example:

| PD cluster size (Replicas) | Maximum number of PD instances that can be scheduled on each node |
| ------------- | ------------- |
This conversation was marked as resolved by toutdesuite
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 2 |
| ... | |

### TiKV component

Scheduling rule 2: If the number of Kubernetes nodes is less than three (in this case, TiKV cannot achieve high availability), scheduling is not limited; otherwise, the number of TiKV instances that can be scheduled on each node is no more than `ceil(Replicas / 3)`. For example:

| TiKV cluster size (Replicas) | Maximum number of TiKV instances that can be scheduled on each node | Best scheduling distribution |
| ------------- | ------------- | ------------- |
| 3 | 1 | 1,1,1 |
| 4 | 2 | 1,1,2 |
| 5 | 2 | 1,2,2 |
| 6 | 2 | 2,2,2 |
| 7 | 3 | 2,2,3 |
| 8 | 3 | 2,3,3 |
| ... | | |

### TiDB component

Scheduling rule 3: When you perform a rolling update to a TiDB instance, the instance tends to be scheduled back to its original node.

This ensures stable scheduling and is helpful for the scenario of manually configuring Node IP and NodePort to the LB backend. It can reduce the impact on the cluster during the rolling update because you do not need to adjust the LB configuration when the Node IP is changed after the upgrade.

## How TiDB Scheduler works

![TiDB Scheduler Overview](/media/tidb-scheduler-overview.png)

TiDB Scheduler adds customized scheduling rules by implementing Kubernetes [Scheduler extender](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/scheduler_extender.md).

The TiDB Scheduler component is deployed as one or more Pods, though only one Pod is working at the same time. Each Pod has two Containers inside: one Container is a native `kube-scheduler`, and the other is a `tidb-scheduler` implemented as a Kubernetes scheduler extender.

The `.spec.schedulerName` attribute of PD, TiDB, and TiKV Pods created by the TiDB Operator is set to `tidb-scheduler`. This means that the TiDB Scheduler is used for the scheduling.

If you are using a testing cluster and do not require high availability, you can change `.spec.schedulerName` into `default-scheduler` to use the built-in Kubernetes scheduler.

The scheduling process of a Pod is as follows:

- First, `kube-scheduler` pulls all Pods whose `.spec.schedulerName` is `tidb-scheduler`. And Each Pod is filtered using the default Kubernetes scheduling rules.
- Then, `kube-scheduler` sends a request to the `tidb-scheduler` service. Then `tidb-scheduler` filters the sent nodes through the customized scheduling rules (as mentioned above), and returns schedulable nodes to `kube-scheduler`.
- Finally, `kube-scheduler` determines the nodes to be scheduled.

If a Pod cannot be scheduled, see the [troubleshooting document](troubleshoot.md#the-Pod-is-in-the-Pending-state) to diagnose and solve the issue.
Binary file added media/tidb-scheduler-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions zh/TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,5 +60,7 @@
+ 工具
- [tkctl](use-tkctl.md)
- [TiDB Toolkit](tidb-toolkit.md)
+ 组件
- [TiDB Scheduler 扩展调度器](tidb-scheduler.md)
- [故障诊断](troubleshoot.md)
- [常见问题](faq.md)
66 changes: 66 additions & 0 deletions zh/tidb-scheduler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: TiDB Scheduler 扩展调度器
summary: 了解 TiDB Scheduler 扩展调度器及其工作原理。
category: reference
---

# TiDB Scheduler 扩展调度器

TiDB Scheduler 是 [Kubernetes 调度器扩展](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/scheduler_extender.md) 的 TiDB 实现。TiDB Scheduler 用于向 Kubernetes 添加新的调度规则。本文介绍 TiDB Scheduler 扩展调度器的工作原理。

## TiDB 集群调度需求

TiDB 集群包括 PD,TiKV 以及 TiDB 三个核心组件,每个组件又是由多个节点组成,PD 是一个 Raft 集群,TiKV 是一个多 Raft Group 集群,并且这两个组件都是有状态的。默认 Kubernetes 的调度器的调度规则无法满足 TiDB 集群的高可用调度需求,需要扩展 Kubernetes 的调度规则。

目前,TiDB Scheduler 实现了如下几种自定义的调度规则。

### PD 组件

调度规则一:确保每个节点上调度的 PD 实例个数小于 `Replicas / 2`。例如:

| PD 集群规模(Replicas) | 每个节点最多可调度的 PD 实例数量 |
| ------------- | ------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 2 |
| ... | |

### TiKV 组件

调度规则二:如果 Kubernetes 节点数小于 3 个(Kubernetes 集群节点数小于 3 个是无法实现 TiKV 高可用的),则可以任意调度;否则,每个节点上可调度的 TiKV 个数的计算公式为:`ceil(Replicas/3)` 。例如:

| TiKV 集群规模(Replicas) | 每个节点最多可调度的 TiKV 实例数量 | 最佳调度分布 |
| ------------- | ------------- | ------------- |
| 3 | 1 | 1,1,1 |
| 4 | 2 | 1,1,2 |
| 5 | 2 | 1,2,2 |
| 6 | 2 | 2,2,2 |
| 7 | 3 | 2,2,3 |
| 8 | 3 | 2,3,3 |
| ... | | |

### TiDB 组件

调度规则三:在 TiDB 实例滚动更新的时候,尽量将其调度回原来的节点。

这样实现了稳定调度,对于手动将 Node IP + NodePort 挂载在 LB 后端的场景比较有帮助,避免升级集群后 Node IP 发生变更时需要重新调整 LB,这样可以减少滚动更新时对集群的影响。

## 工作原理

![TiDB Scheduler 工作原理](/media/tidb-scheduler-overview.png)

TiDB Scheduler 通过实现 Kubernetes 调度器扩展([Scheduler extender](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/scheduler_extender.md))来添加自定义调度规则。

TiDB Scheduler 组件部署为一个或者多个 Pod,但同时只有一个 Pod 在工作。Pod 内部有两个 Container,一个 Container 是原生的 `kube-scheduler`;另外一个 Container 是 `tidb-scheduler`,实现为一个 Kubernetes scheduler extender。

TiDB Operator 创建的 PD、TiDB、TiKV Pod 的 `.spec.schedulerName` 属性会被设置为 `tidb-scheduler`,即都用 TiDB Scheduler 自定义调度器来调度。如果是测试集群,并且不要求高可用,可以将 `.spec.schedulerName` 改成 `default-scheduler` 使用 Kubernetes 内置的调度器。

一个 Pod 的调度流程是这样的:

- `kube-scheduler` 拉取所有 `.spec.schedulerName``tidb-scheduler` 的 Pod,对于每个 Pod 会首先经过 Kubernetes 默认调度规则过滤;
- 在这之后,`kube-scheduler` 会发请求到 `tidb-scheduler` 服务,`tidb-scheduler` 会通过一些自定义的调度规则(见上述介绍)对发送过来的节点进行过滤,最终将剩余可调度的节点返回给 `kube-scheduler`
- 最终由 `kube-scheduler` 决定最终调度的节点。

如果出现 Pod 无法调度,请参考此[文档](troubleshoot.md#pod-处于-pending-状态)诊断和解决。

0 comments on commit 6b7ca1a

Please sign in to comment.