Skip to content

Commit

Permalink
Cleaned up and added TiDB Server info (#1076)
Browse files Browse the repository at this point in the history
* Cleaned up and added TiDB Server info

* More doc cleanup

* Fixed table of node mappings

* Clarified command line parameters

* Fixed letter casing and indenting

* Clarified need to download binlogctl
  • Loading branch information
kolbe committed Apr 23, 2019
1 parent 9365012 commit 395c183
Showing 1 changed file with 50 additions and 40 deletions.
90 changes: 50 additions & 40 deletions tools/tidb-binlog-cluster.md
Expand Up @@ -6,9 +6,9 @@ category: tools

# TiDB-Binlog Cluster User Guide

This document introduces the architecture and the deployment of TiDB-Binlog of the cluster version.
This document introduces the architecture and the deployment of the cluster version of TiDB-Binlog.

TiDB-Binlog is an enterprise tool used to collect the binlog data of TiDB and provide real-time backup and synchronization.
TiDB-Binlog is tool used to collect binlog data from TiDB and provide real-time backup and synchronization to downstream platforms.

TiDB-Binlog has the following features:

Expand All @@ -29,14 +29,14 @@ Pump is used to record the binlogs generated in TiDB, sort the binlogs based on

### Drainer

Drainer collects and merges binlogs from each Pump, converts the binlog to SQL or data of a specific format, and synchronizes the data to the downstream.
Drainer collects and merges binlogs from each Pump, converts the binlog to SQL or data of a specific format, and synchronizes the data to a specific downstream platform.

## Main features

* Multiple Pumps form a cluster which can scale out horizontally.
* TiDB uses the built-in Pump Client to send the binlog to each Pump.
* Pump stores binlogs and sends the binlogs to Drainer in order.
* Drainer reads binlogs of each Pump, merges and sorts the binlogs, and sends the binlogs to the downstream.
* Multiple Pumps form a cluster which can scale out horizontally
* TiDB uses the built-in Pump Client to send the binlog to each Pump
* Pump stores binlogs and sends the binlogs to Drainer in order
* Drainer reads binlogs of each Pump, merges and sorts the binlogs, and sends the binlogs downstream

## Hardware requirements

Expand All @@ -51,24 +51,25 @@ The server hardware requirements for development, testing, and the production en

## Notes

* You need to use TiDB v2.0.8-binlog, v2.1.0-rc.5 or the later version. Otherwise, the TiDB cluster is not compatible with the cluster version of TiDB-Binlog.
* You need to use TiDB v2.0.8-binlog, v2.1.0-rc.5 or a later version. Older versions of TiDB cluster are not compatible with the cluster version of TiDB-Binlog.
* When TiDB is running, you need to guarantee that at least one Pump is running normally.
* To enable the TiDB-Binlog service, add the `enable-binlog` startup parameter in TiDB. Make sure that the TiDB-Binlog service is enabled in all TiDB instances in a same cluster, otherwise the upstream and downstream data inconsistency might occur during data synchronization. If you want to temporarily run a TiDB instance where the TiDB-Binlog service is not enabled, configure `run_ddl= false` in the TiDB configuration file.
* To enable the TiDB-Binlog service in TiDB server, use the `-enable-binlog` startup parameter in TiDB, or add `enable=true` to the `[binlog]` section of the TiDB server configuration file.
* Make sure that the TiDB-Binlog service is enabled in all TiDB instances in a same cluster, otherwise upstream and downstream data inconsistency might occur during data synchronization. If you want to temporarily run a TiDB instance where the TiDB-Binlog service is not enabled, set `run_ddl=false` in the TiDB configuration file.
* Drainer does not support the `rename` DDL operation on the table of `ignore schemas` (the schemas in the filter list).
* If you want to start Drainer in the existing TiDB cluster, generally, you need to make a full backup of the cluster data, obtain `savepoint`, import the data to the target database, and then start Drainer to synchronize the incremental data from `savepoint`.
* Drainer supports synchronizing binlogs to MySQL, TiDB, Kafka or the local files. If you need to synchronize binlogs to other destinations, you can set Drainer to synchronize the binlog to Kafka and read the data in Kafka for customization processing. See [Binlog Slave Client User Guide](../tools/binlog-slave-client.md).
* If TiDB-Binlog is used for recovering the incremental data, you can set the downstream to `pb` (local files in the proto buffer format). Drainer converts the binlog to data in the specified proto buffer format and writes the data to local files. In this way, you can use [Reparo](../tools/reparo.md) to recover the incremental data.
* Pump/Drainer has two states: `paused` and `offline`. If you press Ctrl + C or kill the process, both Pump and Drainer become `paused`. The paused Pump do not need to send all the binlog data to Drainer. If you need to exit from Pump for a long period of time (or do not use Pump any more), use `binlogctl` to make Pump offline. The same goes for Drainer.
* If the downstream is MySQL/TiDB, you can use [sync-diff-inspector](../tools/sync-diff-inspector.md) to verify the data after data synchronization.
* If you want to start Drainer in an existing TiDB cluster, generally you need to make a full backup of the cluster data, obtain `savepoint`, import the data to the target database, and then start Drainer to synchronize the incremental data from `savepoint`.
* Drainer supports synchronizing binlogs to MySQL, TiDB, Kafka or local files. If you need to synchronize binlogs to other destinations, you can set Drainer to synchronize the binlog to Kafka and read the data in Kafka for customized processing. See [Binlog Slave Client User Guide](../tools/binlog-slave-client.md).
* To use TiDB-Binlog for recovering incremental data, set the downstream to `pb` (local files in the proto buffer format). Drainer converts the binlog to data in the specified proto buffer format and writes the data to local files. In this way, you can use [Reparo](../tools/reparo.md) to recover data incrementally.
* Pump and Drainer have several states, including `online`, `paused`, and `offline`. If you press Ctrl + C or kill the process, both Pump and Drainer become `paused`. The paused Pump do not need to send all the binlog data to Drainer. If you need to exit from Pump for a long period of time (or are permanently removing Pump from the cluster), use `binlogctl` to make Pump offline. The same goes for Drainer.
* If the downstream is MySQL, MariaDB, or another TiDB cluster, you can use [sync-diff-inspector](../tools/sync-diff-inspector.md) to verify the data after data synchronization.

## TiDB-Binlog deployment

This section shows two methods of deploying TiDB-Binlog:

- [Deploy TiDB-Binlog using TiDB-Ansible](#deploy-tidb-binlog-using-tidb-ansible)
- [Deploy TiDB-Binlog using Binary](#deploy-tidb-binlog-using-binary)
- [Deploy TiDB-Binlog using a Binary package](#deploy-tidb-binlog-using-binary)

It is recommended to deploy TiDB-Binlog using TiDB-Ansible. If you just want to do a simple testing, you can deploy TiDB-Binlog using Binary.
It is recommended to deploy TiDB-Binlog using TiDB-Ansible. If you just want to do a simple testing, you can deploy TiDB-Binlog using a Binary package.

### Deploy TiDB-Binlog using TiDB-Ansible

Expand Down Expand Up @@ -294,49 +295,48 @@ It is recommended to deploy TiDB-Binlog using TiDB-Ansible. If you just want to
$ ansible-playbook start_drainer.yml
```

### Deploy TiDB-Binlog using Binary
### Deploy TiDB-Binlog using a Binary package

#### Download the official Binary
#### Download the official Binary package

Run the following command to download the binary:
Run the following commands to download the packages:

```bash
wget https://download.pingcap.org/tidb-{version}-linux-amd64.tar.gz
wget https://download.pingcap.org/tidb-{version}-linux-amd64.sha256
version=v2.1.8 # or "latest" for nightly builds
wget https://download.pingcap.org/tidb-$version-linux-amd64.{tar.gz,sha256}

# Check the file integrity. If the result is OK, the file is correct.
sha256sum -c tidb-{version}-binlog-linux-amd64.sha256
sha256sum -c tidb-$version-linux-amd64.sha256
```

For TiDB v2.1.0 GA or later versions, Pump and Drainer are already included in the TiDB download package. For other TiDB versions, you need to download Pump and Drainer separately using the following command:

```bash
wget https://download.pingcap.org/tidb-binlog-{version}-linux-amd64.tar.gz
wget https://download.pingcap.org/tidb-binlog-{version}-linux-amd64.sha256
wget https://download.pingcap.org/tidb-binlog-$version-linux-amd64.{tar.gz,sha256}

# Check the file integrity. If the result is OK, the file is correct.
sha256sum -c tidb-binlog-{version}-linux-amd64.sha256
sha256sum -c tidb-binlog-$version-linux-amd64.sha256
```

#### The usage example

Assuming that you have three PD nodes, one TiDB node, two Pump nodes, and one Drainer node, the information of each node is as follows:

```
TiDB="192.168.0.10"
PD1="192.168.0.16"
PD2="192.168.0.15"
PD3="192.168.0.14"
Pump="192.168.0.11"
Pump="192.168.0.12"
Drainer="192.168.0.13"
```
| Node | IP |
| ---------|:------------:|
| TiDB | 192.168.0.10 |
| PD1 | 192.168.0.16 |
| PD2 | 192.168.0.15 |
| PD3 | 192.168.0.14 |
| Pump | 192.168.0.11 |
| Pump | 192.168.0.12 |
| Drainer | 192.168.0.13 |

The following part shows how to use Pump and Drainer based on the nodes above.

1. Deploy Pump using the binary.

- Taking deploying Pump on "192.168.0.11" as an example, the description for command line parameters of Pump is as follows:
- To view the command line parameters of Pump, execute `./bin/pump -help`:

```
Usage of Pump:
Expand Down Expand Up @@ -408,7 +408,7 @@ The following part shows how to use Pump and Drainer based on the nodes above.

2. Deploy Drainer using binary.

- Taking deploying Drainer on "192.168.0.13" as an example, the description for command line parameters of Drainer is as follows:
- To view the command line parameters of Drainer, execute `./bin/drainer -help`:

```
Usage of Drainer:
Expand Down Expand Up @@ -547,7 +547,7 @@ The following part shows how to use Pump and Drainer based on the nodes above.
# topic-name = ""
```

- The example of starting Drainer:
- Starting Drainer:

> **Note:** If the downstream is MySQL/TiDB, to guarantee the data integrity, you need to obtain the `initial-commit-ts` value and make a full backup of the data and restore the data before the initial start of Drainer. For details, see [Deploy Drainer](#step-3-deploy-drainer).
Expand All @@ -559,6 +559,17 @@ The following part shows how to use Pump and Drainer based on the nodes above.

If the command line parameter and the configuration file parameter are the same, the parameter value in the command line is used.

3. Starting TiDB server:

- After starting Pump and Drainer, start TiDB server with binlog enabled by adding this section to your config file for TiDB server:
```
[binlog]
enable=true
```

- TiDB server will obtain the addresses of registered Pumps from PD and will stream data to all of them. If there are no registered Pump instances, TiDB server will refuse to start or will block starting until a Pump instance comes online.


## TiDB-Binlog operations

### Pump/Drainer state
Expand Down Expand Up @@ -598,11 +609,10 @@ For how to pause, close, check, and modify the state of Drainer, see the [binlog

#### Download `binlogctl`

Download `binlogctl` via:
Your distribution of TiDB or TiDB-Binlog may already include binlogctl. If not, download `binlogctl`:

```bash
wget https://download.pingcap.org/binlogctl-new-linux-amd64.tar.gz
wget https://download.pingcap.org/binlogctl-new-linux-amd64.sha256
wget https://download.pingcap.org/binlogctl-new-linux-amd64.{tar.gz,sha256}

# Check the file integrity. It should return OK.
sha256sum -c tidb-binlog-new-linux-amd64.sha256
Expand Down

0 comments on commit 395c183

Please sign in to comment.