Vertical scaling of TiKV and PD #191

gregwebs · 2018-11-21T19:33:42Z

Scaling horizontally is not always a substitute for scaling vertically.
For example, local SSD storage on GKE is limited to 1.5 TB. If you are near that limit with TiKV, you can scale out to get more CPU/memory (even if the cost structure is not optimal). However, if you want to reduce CPU/memory usage, the only way is to scale down vertically.

A more frequent workflow would be when just starting: it is ideal to keep your instances as small as possible if your workload is still quite small. However, you may eventually run into performance issues if you don't have enough RAM on one machine.

Another related workflow is just to change your instance type because the current one has less network capacity, etc.

In general, there is an optimal cost structure for a particular workload that is satisfied by a particular instance size.

I think of this problem in terms of TiKV, but it is very applicable to PD. With PD I assume one never actually wants to scale horizontally as data increases.

I think the ideal scaling workflow would be to add a new node pool with the new instances of the desired size, deploy new TIKV processes to them (perhaps a new stateful set), wait for the new set to catch up, evict leaders from the old, and then remove the old. As I understand it there is a big problem with avoiding over-loading the cluster during these operations.

tennix · 2018-11-22T01:22:54Z

Creating a new stateful seems a promising approach if users don't care about data migration during vertical scaling and they only want to reduce the cost. I think we should keep this low priority until users really want this feature.

gregwebs · 2018-11-22T01:38:42Z

We are putting new users in a very difficult situation because there is no way they can know what size instance they want.

Can we document the recommended approach right now? Would it be to go offline and do a backup + restore?

tennix · 2018-11-22T01:50:27Z

Our official documentation already has the recommendations https://github.com/pingcap/docs/blob/master/op-guide/recommendation.md I'll add this link to the user guide documentation.

Using a new stateful set, the old pods have to go offline and the data on it has to be migrated to new stateful set pods because the PVC and PV are fresh new ones. This approach will keep the TiDB service online. While the backup + restore approach will create a new cluster and the TiDB service has to be switched manually which involves out of service for a little time.

gregwebs · 2018-11-25T22:45:49Z

Our official documentation already has the recommendations https://github.com/pingcap/docs/blob/master/op-guide/recommendation.md I'll add this link to the user guide documentation.

That states the minimum recommendations for large data sets. Users may discover that they need more resources and then would like to scale up vertically. Additionally, many users can reduce cost by using fewer resources when they are starting off.

gregwebs · 2018-11-29T16:50:19Z

In theory if we use cloud disk (PD on GCP) we should be able to vertically scale with relative ease.

aylei · 2019-11-15T04:27:32Z

closed via pingcap/docs#1468

vertically scaling of TiKV pods which exceeds the resource capacity of current nodes is considered as an migration: https://pingcap.com/docs/stable/tidb-in-kubernetes/maintain/kubernetes-node/

gregwebs mentioned this issue Nov 21, 2018

add user guide #187

Merged

gregwebs mentioned this issue Nov 29, 2018

User guide: vertically scaling TiDB #204

Closed

gregwebs changed the title ~~Vertical scaling~~ Vertical scaling of TiKV and PD Nov 29, 2018

tennix added the area/doc label Nov 29, 2018

aylei closed this as completed Nov 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertical scaling of TiKV and PD #191

Vertical scaling of TiKV and PD #191

gregwebs commented Nov 21, 2018

tennix commented Nov 22, 2018

gregwebs commented Nov 22, 2018

tennix commented Nov 22, 2018 •

edited

gregwebs commented Nov 25, 2018

gregwebs commented Nov 29, 2018

aylei commented Nov 15, 2019

Vertical scaling of TiKV and PD #191

Vertical scaling of TiKV and PD #191

Comments

gregwebs commented Nov 21, 2018

tennix commented Nov 22, 2018

gregwebs commented Nov 22, 2018

tennix commented Nov 22, 2018 • edited

gregwebs commented Nov 25, 2018

gregwebs commented Nov 29, 2018

aylei commented Nov 15, 2019

tennix commented Nov 22, 2018 •

edited