From 0f8e3a76ef0cc22ed41328715c65bd974b7b6945 Mon Sep 17 00:00:00 2001 From: yikeke Date: Fri, 5 Jun 2020 17:16:39 +0800 Subject: [PATCH 1/4] dumpling: add export-or-backup-using-dumpling.md --- TOC.md | 1 + export-or-backup-using-dumpling.md | 136 +++++++++++++++++++++++++++++ mydumper-overview.md | 2 +- 3 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 export-or-backup-using-dumpling.md diff --git a/TOC.md b/TOC.md index 623e5248fcb2a..645c9627c3bea 100644 --- a/TOC.md +++ b/TOC.md @@ -87,6 +87,7 @@ - [Common Ansible Operations](/maintain-tidb-using-ansible.md) + Backup and Restore - [Use Mydumper and TiDB Lightning](/backup-and-restore-using-mydumper-lightning.md) + - [Use Dumpling for export or backup](/export-or-backup-using-dumpling.md) - [Use BR](/br/backup-and-restore-tool.md) - [BR Usage Scenarios](/br/backup-and-restore-use-cases.md) - [BR storages](/br/backup-and-restore-storages.md) diff --git a/export-or-backup-using-dumpling.md b/export-or-backup-using-dumpling.md new file mode 100644 index 0000000000000..ab7cd1f97997d --- /dev/null +++ b/export-or-backup-using-dumpling.md @@ -0,0 +1,136 @@ +--- +title: Export or Backup Data Using Dumpling +summary: Use the Dumpling tool to export or backup data in TiDB. +category: how-to +--- + +# Export or Backup Data Using Dumpling + +This document introduces how to use the [Dumpling](https://github.com/pingcap/dumpling) tool to export or backup data in TiDB. Dumpling exports data stored in TiDB as SQL or CSV data files and can be used to complete logical full backup or export. + +For backups of SST files (KV pairs) or backups of incremental data that are not sensitive to latency, refer to [BR](/br/backup-and-restore-tool.md). For real-time backups of incremental data, refer to [TiCDC](/ticdc/ticdc-overview.md). + +When using Dumpling, you need to execute the export command on a running cluster. This document assumes that there is a TiDB instance on the `127.0.0.1:4000` host and that this TiDB instance has a root user without a password. + +## Export data from TiDB + +Export data using the following command: + +{{< copyable "shell-regular" >}} + +```shell +dumpling \ + -u root \ + -P 4000 \ + -H 127.0.0.1 \ + --filetype sql \ + --threads 32 \ + -o /tmp/test \ + -F $(( 1024 * 1024 * 256 )) +``` + +In the above command, `-H`, `-P` and `-u` mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with `-p $YOUR_SECRET_PASSWORD`. + +Dumpling exports all tables (except for system tables) in the entire database by default. You can use `--where ` to select the records to be exported. If the exported data is in CSV format (CSV files can be exported using `--filetype csv`), you can also use `--sql ` to export records selected by the specified SQL statement. + +For example, you can export all records that match `id < 100` in `test.sbtest1` using the following command: + +{{< copyable "shell-regular" >}} + +```shell +./dumpling \ + -u root \ + -P 4000 \ + -H 127.0.0.1 \ + -o /tmp/test \ + --filetype csv \ + --sql "select * from `test`.`sbtest1` where id < 100" +``` + +Note that the `--sql` option can be used only for exporting CSV files for now. However, you can use `--where` to filter the rows to be exported, and use the following command to export all rows with `id < 100`: + +> **Note:** +> +> You need to execute the `select * from where id < 100` statement on all tables to be exported. If any table does not have the specified field, then the export fails. + +{{< copyable "shell-regular" >}} + +```shell +./dumpling \ + -u root \ + -P 4000 \ + -H 127.0.0.1 \ + -o /tmp/test \ + --where "id < 100" +``` + +> **Note:** +> +> Currently, Dumpling does not support exporting only certain tables specified by users (i.e. `-T` flag, see [this issue](https://github.com/pingcap/dumpling/issues/76)). If you do need this feature, you can use [MyDumper](/backup-and-restore-using-mydumper-lightning.md) instead. + +默认情况下,导出的文件会存储到 `./export-` 目录下。常用参数如下: +The exported file is stored in the `. /export-` directory by default. Commonly used parameters are as follows: + +- `-o` 用于选择存储导出文件的目录。 +- `-F` 选项用于指定单个文件的最大大小(和 MyDumper 不同,这里的单位是字节)。 +- `-r` 选项用于指定单个文件的最大记录数(或者说,数据库中的行数)。 + +- `-o` is used to select the directory where the exported file will be stored. +- `-F` is used to specify the maximum size of a single file (different from MyDumper, the unit here is byte). +- `-r` is used to specify the maximum number of records (or, rather, the number of rows in the database) for a single file. + +利用以上参数可以让 Dumpling 的并行度更高。 +You can use the above parameters to provide Dumpling with a higher degree of parallelism. + +还有一个尚未在上面展示出来的标志是 `--consistency `,这个标志控制导出数据“一致性保证”的方式。对于 TiDB 来说,默认情况下,会通过获取某个时间戳的快照来保证一致性(即 `--consistency snapshot`)。在使用 snapshot 来保证一致性的时候,可以使用 `--snapshot` 参数指定要备份的时间戳。还可以使用以下的一致性级别: +Another flag that has not yet been shown above is the `--consistency `, which controls the way in which data is exported for "consistency assurance". For TiDB, consistency is ensured by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to back up. You can also use the following levels of consistency: + + +- `flush`:使用 [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) 来保证一致性。 +- `snapshot`:获取指定时间戳的一致性快照并导出。 +- `lock`:为待导出的所有表上读锁。 +- `none`:不做任何一致性保证。 +- `auto`:对 MySQL 使用 `flush`,对 TiDB 使用 `snapshot`。 + +- `FLUSH`: use [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) to ensure consistency. +- `snapshot`: Get a consistent snapshot of the specified timestamp and export it. +- `lock`: Add locks to read all tables to be exported. +- `none`: No guarantee of consistency. +- `auto`: use `flush` for MySQL and `snapshot` for TiDB. + +一切完成之后,你应该可以在 `/tmp/test` 看到导出的文件了: +After everything is done, you can see the exported file in `/tmp/test`: + +```shell +$ ls -lh /tmp/test | awk '{print $5 "\t" $9}' + +140B metadata +66B test-schema-create.sql +300B test.sbtest1-schema.sql +190K test.sbtest1.0.sql +300B test.sbtest2-schema.sql +190K test.sbtest2.0.sql +300B test.sbtest3-schema.sql +190K test.sbtest3.0.sql +``` + +另外,假如数据量非常大,可以提前调长 GC 时间,以避免因为导出过程中发生 GC 导致导出失败: +In addition, if the data volume is very large, you can extend the GC time in advance to avoid export failure due to GC occurring during the export process. + +{{< copyable "sql" >}} + +```sql +update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time'; +``` + +在操作结束之后,再将 GC 时间调回原样(默认是 `10m`): +After the operation is completed, you can set the GC time back to the same (default is `10m`): + +{{< copyable "sql" >}} + +```sql +update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time'; +``` + +最后,所有的这些导出数据都可以用 [Lightning](/tidb-lightning/tidb-lightning-tidb-backend.md) 导入回 TiDB。 +Finally, you can import all this exported data back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-tidb-backend.md). diff --git a/mydumper-overview.md b/mydumper-overview.md index e163ccdd60aca..87b17e56def6b 100644 --- a/mydumper-overview.md +++ b/mydumper-overview.md @@ -9,7 +9,7 @@ aliases: ['/docs/dev/reference/tools/mydumper/'] ## What is Mydumper? -[Mydumper](https://github.com/pingcap/mydumper) is a fork project optimized for TiDB. It is recommended to use this tool for logical backups of TiDB. +[Mydumper](https://github.com/pingcap/mydumper) is a fork project optimized for TiDB. You can use this tool for logical backups of TiDB. It can be [downloaded](/download-ecosystem-tools.md) as part of the Enterprise Tools package. From 6def07700a3d7dc1dbe7b158365130dd3b972c3c Mon Sep 17 00:00:00 2001 From: yikeke Date: Fri, 5 Jun 2020 17:49:17 +0800 Subject: [PATCH 2/4] Update export-or-backup-using-dumpling.md --- export-or-backup-using-dumpling.md | 46 +++++++++++------------------- 1 file changed, 16 insertions(+), 30 deletions(-) diff --git a/export-or-backup-using-dumpling.md b/export-or-backup-using-dumpling.md index ab7cd1f97997d..254c841d388d4 100644 --- a/export-or-backup-using-dumpling.md +++ b/export-or-backup-using-dumpling.md @@ -68,42 +68,31 @@ Note that the `--sql` option can be used only for exporting CSV files for now. H > > Currently, Dumpling does not support exporting only certain tables specified by users (i.e. `-T` flag, see [this issue](https://github.com/pingcap/dumpling/issues/76)). If you do need this feature, you can use [MyDumper](/backup-and-restore-using-mydumper-lightning.md) instead. -默认情况下,导出的文件会存储到 `./export-` 目录下。常用参数如下: The exported file is stored in the `. /export-` directory by default. Commonly used parameters are as follows: -- `-o` 用于选择存储导出文件的目录。 -- `-F` 选项用于指定单个文件的最大大小(和 MyDumper 不同,这里的单位是字节)。 -- `-r` 选项用于指定单个文件的最大记录数(或者说,数据库中的行数)。 +- `-o` is used to select the directory where the exported files are stored. +- `-F` option is used to specify the maximum size of a single file (the unit here is byte, different from MyDumper). +- `-r` option is used to specify the maximum number of records (or the number of rows in the database) for a single file. -- `-o` is used to select the directory where the exported file will be stored. -- `-F` is used to specify the maximum size of a single file (different from MyDumper, the unit here is byte). -- `-r` is used to specify the maximum number of records (or, rather, the number of rows in the database) for a single file. - -利用以上参数可以让 Dumpling 的并行度更高。 You can use the above parameters to provide Dumpling with a higher degree of parallelism. -还有一个尚未在上面展示出来的标志是 `--consistency `,这个标志控制导出数据“一致性保证”的方式。对于 TiDB 来说,默认情况下,会通过获取某个时间戳的快照来保证一致性(即 `--consistency snapshot`)。在使用 snapshot 来保证一致性的时候,可以使用 `--snapshot` 参数指定要备份的时间戳。还可以使用以下的一致性级别: -Another flag that has not yet been shown above is the `--consistency `, which controls the way in which data is exported for "consistency assurance". For TiDB, consistency is ensured by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to back up. You can also use the following levels of consistency: - +Another flag that is not mentioned above is `--consistency `, which controls the way in which data is exported for "consistency assurance". For TiDB, consistency is ensured by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to be backed up. You can also use the following levels of consistency: -- `flush`:使用 [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) 来保证一致性。 -- `snapshot`:获取指定时间戳的一致性快照并导出。 -- `lock`:为待导出的所有表上读锁。 -- `none`:不做任何一致性保证。 -- `auto`:对 MySQL 使用 `flush`,对 TiDB 使用 `snapshot`。 - -- `FLUSH`: use [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) to ensure consistency. +- `flush`: Use [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) to ensure consistency. - `snapshot`: Get a consistent snapshot of the specified timestamp and export it. -- `lock`: Add locks to read all tables to be exported. -- `none`: No guarantee of consistency. -- `auto`: use `flush` for MySQL and `snapshot` for TiDB. +- `lock`: Add read locks on all tables to be exported. +- `none`: No guarantee for consistency. +- `auto`: Use `flush` for MySQL and `snapshot` for TiDB. -一切完成之后,你应该可以在 `/tmp/test` 看到导出的文件了: After everything is done, you can see the exported file in `/tmp/test`: +{{< copyable "shell-regular" >}} + ```shell -$ ls -lh /tmp/test | awk '{print $5 "\t" $9}' +ls -lh /tmp/test | awk '{print $5 "\t" $9}' +``` +``` 140B metadata 66B test-schema-create.sql 300B test.sbtest1-schema.sql @@ -114,8 +103,7 @@ $ ls -lh /tmp/test | awk '{print $5 "\t" $9}' 190K test.sbtest3.0.sql ``` -另外,假如数据量非常大,可以提前调长 GC 时间,以避免因为导出过程中发生 GC 导致导出失败: -In addition, if the data volume is very large, you can extend the GC time in advance to avoid export failure due to GC occurring during the export process. +In addition, if the data volume is very large, to avoid export failure due to GC during the export process, you can extend the GC time in advance: {{< copyable "sql" >}} @@ -123,8 +111,7 @@ In addition, if the data volume is very large, you can extend the GC time in adv update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time'; ``` -在操作结束之后,再将 GC 时间调回原样(默认是 `10m`): -After the operation is completed, you can set the GC time back to the same (default is `10m`): +After your operation is completed, set the GC time back (the default value is `10m`): {{< copyable "sql" >}} @@ -132,5 +119,4 @@ After the operation is completed, you can set the GC time back to the same (defa update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time'; ``` -最后,所有的这些导出数据都可以用 [Lightning](/tidb-lightning/tidb-lightning-tidb-backend.md) 导入回 TiDB。 -Finally, you can import all this exported data back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-tidb-backend.md). +Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-tidb-backend.md). From a3aeb3863400b0b866d40702f12b26c6380adc8f Mon Sep 17 00:00:00 2001 From: Keke Yi <40977455+yikeke@users.noreply.github.com> Date: Mon, 8 Jun 2020 13:47:43 +0800 Subject: [PATCH 3/4] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com> --- export-or-backup-using-dumpling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/export-or-backup-using-dumpling.md b/export-or-backup-using-dumpling.md index 254c841d388d4..68c8450ff3a67 100644 --- a/export-or-backup-using-dumpling.md +++ b/export-or-backup-using-dumpling.md @@ -6,7 +6,7 @@ category: how-to # Export or Backup Data Using Dumpling -This document introduces how to use the [Dumpling](https://github.com/pingcap/dumpling) tool to export or backup data in TiDB. Dumpling exports data stored in TiDB as SQL or CSV data files and can be used to complete logical full backup or export. +This document introduces how to use the [Dumpling](https://github.com/pingcap/dumpling) tool to export or backup data in TiDB. Dumpling exports data stored in TiDB as SQL or CSV data files and can be used to make a logical full backup or export. For backups of SST files (KV pairs) or backups of incremental data that are not sensitive to latency, refer to [BR](/br/backup-and-restore-tool.md). For real-time backups of incremental data, refer to [TiCDC](/ticdc/ticdc-overview.md). @@ -68,7 +68,7 @@ Note that the `--sql` option can be used only for exporting CSV files for now. H > > Currently, Dumpling does not support exporting only certain tables specified by users (i.e. `-T` flag, see [this issue](https://github.com/pingcap/dumpling/issues/76)). If you do need this feature, you can use [MyDumper](/backup-and-restore-using-mydumper-lightning.md) instead. -The exported file is stored in the `. /export-` directory by default. Commonly used parameters are as follows: +The exported file is stored in the `./export-` directory by default. Commonly used parameters are as follows: - `-o` is used to select the directory where the exported files are stored. - `-F` option is used to specify the maximum size of a single file (the unit here is byte, different from MyDumper). From 1e88db054ecc665651404ad1a6c5a866b1bddba3 Mon Sep 17 00:00:00 2001 From: Keke Yi <40977455+yikeke@users.noreply.github.com> Date: Wed, 10 Jun 2020 11:01:38 +0800 Subject: [PATCH 4/4] Update TOC.md Co-authored-by: Lilian Lee --- TOC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TOC.md b/TOC.md index b5a455c9be293..996d08162f07f 100644 --- a/TOC.md +++ b/TOC.md @@ -91,7 +91,7 @@ - [Common Ansible Operations](/maintain-tidb-using-ansible.md) + Backup and Restore - [Use Mydumper and TiDB Lightning](/backup-and-restore-using-mydumper-lightning.md) - - [Use Dumpling for export or backup](/export-or-backup-using-dumpling.md) + - [Use Dumpling for Export or Backup](/export-or-backup-using-dumpling.md) + Use BR - [Use BR](/br/backup-and-restore-tool.md) - [BR Use Cases](/br/backup-and-restore-use-cases.md)