Skip to content

Commit

Permalink
reference/tools/download: update br-related docs (#1729)
Browse files Browse the repository at this point in the history
* reference/tools/download: add br download introduction

* updates the introduction of br

* update br

* Apply suggestions from code review

Co-Authored-By: TomShawn <41534398+TomShawn@users.noreply.github.com>

* update dev

* Apply suggestions from code review

Co-Authored-By: TomShawn <41534398+TomShawn@users.noreply.github.com>

Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
  • Loading branch information
anotherrachel and TomShawn committed Jan 9, 2020
1 parent f4ea8f2 commit 300fe07
Show file tree
Hide file tree
Showing 4 changed files with 256 additions and 190 deletions.
211 changes: 116 additions & 95 deletions dev/how-to/maintain/backup-and-restore/br.md
Expand Up @@ -8,6 +8,94 @@ category: how-to

Backup & Restore (BR) is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with [`mydumper`/`loader`](/dev/how-to/maintain/backup-and-restore/mydumper-loader.md), BR is more suitable for scenarios of huge data volume. This document describes the BR command line, detailed use examples, best practices, restrictions, and introduces the implementation principles of BR.

## Usage restrictions

- BR only supports TiDB v3.1 and later versions.
- Currently, TiDB does not support backing up and restoring partitioned tables.
- Currently, you can perform restoration only on new clusters.
- It is recommended that you execute multiple backup operations serially. Otherwise, different backup operations might interfere with each other.

## Download Binary

Refer to the [download page](/dev/reference/tools/download.md#br-backup-and-restore) for more information.

## Implementation principles

BR sends the backup or restoration commands to each TiKV node. After receiving these commands, TiKV performs the corresponding backup or restoration operations. Each TiKV node has a path in which the backup files generated in the backup operation are stored and from which the stored backup files are read during the restoration.

### Backup principle

When BR performs a backup operation, it first obtains the following information from PD:

- The current TS (timestamp) as the time of the backup snapshot
- The TiKV node information of the current cluster

According to these information, BR starts a TiDB instance internally to obtain the database or table information corresponding to the TS, and filters out the system databases (`information_schema`, `performance_schema`, `mysql`) at the same time.

According to the backup sub-command, BR adopts the following two types of backup logic:

- Full backup: BR traverses all the tables and constructs the KV range to be backed up according to each table.
- Single table backup: BR constructs the KV range to be backed up according a single table.

Finally, BR collects the KV range to be backed up and sends the complete backup request to the TiKV node of the cluster.

The structure of the request:

```
BackupRequest{
ClusterId, // The cluster ID.
StartKey, // The starting key of the backup (backed up).
EndKey, // The ending key of the backup (not backed up).
EndVersion, // The backup snapshot time.
StorageBackend, // The path where backup files are stored.
RateLimit, // Backup speed (MB/s).
Concurrency, // The number of threads for the backup operation ("4" by default).
}
```

After receiving the backup request, the TiKV node traverses all Region leaders on the node to find the Regions that overlap with the KV ranges in this request. The TiKV node backs up some or all of the data within the range, and generates the corresponding SST file.

After finishing backing up the data of the corresponding Region, the TiKV node returns the metadata to BR. BR collects the metadata and stores it in the `backupmeta` file which is used for restoration.

If checksum is enabled when you execute the backup command, BR calculates the checksum of each backed up table for data check.

#### Types of backup files

Two types of backup files are generated in the path where backup files are stored:

- **The SST file**: stores the data that the TiKV node backed up.
- **The `backupmeta` file**: stores the metadata of this backup operation, including the number, the key range, the size, and the Hash (sha256) value of the backup files.

#### The format of the SST file name

The SST file is named in the format of `storeID_regionID_regionEpoch_keyHash_cf`, where

- `storeID` is the TiKV node ID;
- `regionID` is the Region ID;
- `regionEpoch` is the version number of the Region;
- `keyHash` is the Hash (sha256) value of the startKey of a range, which ensures the uniqueness of a key;
- `cf` indicates the [Column Family](/dev/reference/performance/tune-tikv.md#tune-tikv-performance) of RocksDB (`default` or `write` by default).

### Restoration principle

During the data restoration process, BR performs the following tasks in order:

1. It parses the `backupmeta` file in the backup path, and then starts a TiDB instance internally to create the corresponding databases and tables based on the parsed information.

2. It aggregates the parsed SST files according to the tables.

3. It pre-splits Regions according to the key range of the SST file so that every Region corresponds to at least one SST file.

4. It traverses each table to be restored and the SST file corresponding to each tables.

5. It finds the Region corresponding to the SST file and sends a request to the corresponding TiKV node for downloading the file. Then it sends a request for loading the file after the file is successfully downloaded.

After TiKV receives the request to load the SST file, TiKV uses the Raft mechanism to ensure the strong consistency of the SST data. After the downloaded SST file is loaded successfully, the file is deleted asynchronously.

After the restoration operation is completed, BR performs a checksum calculation on the restored data to compare the stored data with the backed up data.

![br-arch](/media/br-arch.png)

## Command-line description

A `br` command consists of sub-commands, options, and parameters.
Expand Down Expand Up @@ -44,7 +132,7 @@ A `br` command consists of multiple layers of sub-commands. Currently, BR has th
Each of the above three sub-commands might still include the following three sub-commands to specify the scope of an operation:

* `full`: used to back up or restore all the cluster data.
* `db`: used to restore the specified database of the cluster.
* `db`: used to back up or restore the specified database of the cluster.
* `table`: used to back up or restore a single table in the specified database of the cluster.

### Common options
Expand Down Expand Up @@ -108,7 +196,31 @@ br backup full \
Full Backup <---------/................................................> 17.12%.
```

### Back up data of a single table
### Back up a database

To back up a database in the cluster, execute the `br backup db` command. To get help on this command, execute `br backup db -h` or `br backup db --help`.

**Usage example:**

Back up the data of the `test` database to the `/tmp/backup` path on each TiKV node and write the `backupmeta` file to this path.

{{< copyable "shell-regular" >}}

```shell
br backup db \
--pd "${PDIP}:2379" \
--db test \
--storage "local:///tmp/backup" \
--ratelimit 120 \
--concurrency 4 \
--log-file backuptable.log
```

In the above command, `--db` specifies the name of the database to be backed up. For descriptions of other options, see [Back up all the cluster data](/dev/how-to/maintain/backup-and-restore/br.md#back-up-all-the-cluster-data).

A progress bar is displayed in the terminal during the backup. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety.

### Back up a table

To back up the data of a single table in the cluster, execute the `br backup table` command. To get help on this command, execute `br backup table -h` or `br backup table --help`.

Expand Down Expand Up @@ -225,34 +337,6 @@ In the above command, `--table` specifies the name of the table to be restored.
- It is recommended that you mount a shared storage (for example, NFS) on the backup path specified by `-s`, to make it easier to collect and manage backup files.
- It is recommended that you use a storage hardware with high throughput, because the throughput of a storage hardware limits the backup and restoration speed.
- It is recommended that you perform the backup operation during off-peak hours to minimize the impact on applications.
- To speed up the restoration, use pd-ctl to remove the schedulers related to scheduling before the restoration and add back these removed schedulers after the restoration.

Remove schedulers:

{{< copyable "shell-regular" >}}

```shell
./pd-ctl -u ${PDIP}:2379 scheduler remove balance-hot-region-scheduler
./pd-ctl -u ${PDIP}:2379 scheduler remove balance-leader-scheduler
./pd-ctl -u ${PDIP}:2379 scheduler remove balance-region-scheduler
```

Add schedulers:

{{< copyable "shell-regular" >}}

```shell
./pd-ctl -u ${PDIP}:2379 scheduler add balance-hot-region-scheduler
./pd-ctl -u ${PDIP}:2379 scheduler add balance-leader-scheduler
./pd-ctl -u ${PDIP}:2379 scheduler add balance-region-scheduler
```

## Usage restrictions

- BR only supports TiDB v3.1 and later versions.
- TiDB cannot perform backup operation when executing DDL operations.
- Currently, TiDB does not support backing up and restoring partitioned tables.
- Currently, you can perform restoration only on new clusters.

## Examples

Expand Down Expand Up @@ -323,8 +407,7 @@ bin/br backup full -s local:///tmp/backup --pd "${PDIP}:2379" --log-file backup.
```

```
[INFO] [client.go:288] ["Backup Ranges"] [take=2m25.801322134s]
[INFO] [schema.go:114] ["backup checksum finished"] [take=4.842154366s]
[INFO] [collector.go:165] ["Full backup summary: total backup ranges: 2, total success: 2, total failed: 0, total take(s): 0.00, total kv: 4, total size(Byte): 133, avg speed(Byte/s): 27293.78"] ["backup total regions"=2] ["backup checksum"=1.640969ms] ["backup fast checksum"=227.885µs]
```

### Restoration
Expand All @@ -340,67 +423,5 @@ bin/br restore full -s local:///tmp/backup --pd "${PDIP}:2379" --log-file restor
```

```
[INFO] [client.go:345] [RestoreAll] [take=2m8.907369337s]
[INFO] [client.go:435] ["Restore Checksum"] [take=6.385818026s]
[INFO] [collector.go:165] ["Full Restore summary: total restore tables: 1, total success: 1, total failed: 0, total take(s): 0.26, total kv: 20000, total size(MB): 10.98, avg speed(MB/s): 41.95"] ["restore files"=3] ["restore ranges"=2] ["split region"=0.562369381s] ["restore checksum"=36.072769ms]
```

## Implementation principles

BR sends the backup and restoration commands to each TiKV node. After receiving these commands, TiKV performs the corresponding backup and restoration operations. Each TiKV node has a path in which the backup files generated in the backup operation are stored and from which the stored backup files are read during the restoration.

### Backup principle

When BR performs a backup operation, it first obtains the following information from PD:

- The current TS (timestamp) as the time of the backup snapshot
- The TiKV node information of the current cluster

According to these information, BR starts a TiDB instance internally to obtain the database or table information corresponding to the TS, and filters out the system databases (`information_schema`, `performance_schema`, `mysql`) at the same time.

According to the backup sub-command, BR adopts the following two types of backup logic:

- Full backup: BR traverses all the tables and constructs the KV range to be backed up according to each table.
- Single table backup: BR constructs the KV range to be backed up according a single table.

Finally, BR collects the KV range to be backed up and sends the complete backup request to the TiKV node of the cluster.

The structure of the request:

```
backup.BackupRequest{
ClusterId: clusterID, // The cluster ID.
StartKey: startKey, // The starting key of the backup (backed up).
EndKey: endKey, // The ending key of the backup (not backed up).
StartVersion: backupTS, // The backup snapshot time.
...
Path: path, // The path where backup files are stored.
RateLimit: rateLimit, // Backup speed (MB/s).
Concurrency: concurrency, // The number of threads for the backup operation (4 by default).
}
```

After receiving the backup request, the TiKV node traverses all Region leaders on the node to find the Regions that overlap with the KV ranges in this request. The TiKV node backs up some or all of the data within the range, and generates the corresponding SST file (named in the format of `storeID_regionID_regionEpoch_tableID`) in the backup path.

After finishing backing up the data of the corresponding Region, the TiKV node returns the metadata to BR. BR collects the metadata and stores it in the `backupMeta` file which is used for restoration.

If checksum is enabled when you execute the backup command, BR calculates the checksum of each backed up table for data check.

### Restoration principle

During the data restoration process, BR performs the following tasks in order:

1. It parses the `backupMeta` file in the backup path, and then starts a TiDB instance internally to create the corresponding databases and tables based on the parsed information.

2. It aggregates the parsed SST files according to the tables and `GroupBy`.

3. It pre-splits Regions according to the key range of the SST file so that every Region corresponds to at least one SST file.

4. It traverses each table to be restored and the SST file corresponding to each tables.

5. It finds the Region corresponding to the SST file and sends a request to the corresponding TiKV node for downloading the file. Then it sends a request for loading the file after the file is successfully downloaded.

After TiKV receives the request to load the SST file, TiKV uses the Raft mechanism to ensure the strong consistency of the SST data. After the downloaded SST file is loaded successfully, the file is deleted asynchronously.

After the restoration operation is completed, BR performs a checksum calculation on the restored data to compare the stored data with the backed up data.

![br-arch](/media/br-arch.png)
12 changes: 12 additions & 0 deletions dev/reference/tools/download.md
Expand Up @@ -35,6 +35,18 @@ Download [TiDB Lightning](/dev/reference/tools/tidb-lightning/overview.md) by us
>
> `{version}` in the above download link indicates the version number of TiDB Lightning. For example, the download link for `v3.0.5` is `https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.tar.gz`. You can also download the latest unpublished version by replacing `{version}` with `latest`.
## BR (backup and restore)

Download [BR](/dev/how-to/maintain/backup-and-restore/br.md) by using the download link in the following table:

| Package name | OS | Architecure | SHA256 checksum |
|:---|:---|:---|:---|
| `http://download.pingcap.org/tidb-toolkit-{version}-linux-amd64.tar.gz` | Linux | amd64 | `http://download.pingcap.org/tidb-toolkit-{version}-linux-amd64.sha256` |

> **Note:**
>
> `{version}` in the above download link indicates the version number of BR. For example, the download link for `v3.1.0-beta` is `http://download.pingcap.org/tidb-toolkit-v3.1.0-beta-linux-amd64.tar.gz`. You can also download the latest unpublished version by replacing `{version}` with `latest`.
## TiDB DM (Data Migration)

Download [DM](/dev/reference/tools/data-migration/overview.md) by using the download link in the following table:
Expand Down

0 comments on commit 300fe07

Please sign in to comment.