Skip to content

Commit

Permalink
dumpling: refined as per cn doc (#12064) (#12087)
Browse files Browse the repository at this point in the history
  • Loading branch information
ti-chi-bot committed Jan 18, 2023
1 parent bc8faae commit c9398c9
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 33 deletions.
68 changes: 36 additions & 32 deletions dumpling-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,46 @@ title: Dumpling Overview
summary: Use the Dumpling tool to export data from TiDB.
---

# Dumpling Overview
# Use Dumpling to Export Data

This document introduces the data export tool - [Dumpling](https://github.com/pingcap/dumpling). Dumpling exports data stored in TiDB/MySQL as SQL or CSV data files and can be used to make a logical full backup or export.
This document introduces the data export tool - [Dumpling](https://github.com/pingcap/dumpling). Dumpling exports data stored in TiDB/MySQL as SQL or CSV data files and can be used to make a logical full backup or export. Dumpling also supports exporting data to Amazon S3.

For backups of SST files (key-value pairs) or backups of incremental data that are not sensitive to latency, refer to [BR](/br/backup-and-restore-tool.md). For real-time backups of incremental data, refer to [TiCDC](/ticdc/ticdc-overview.md).
You can get Dumpling using [TiUP](/tiup/tiup-overview.md) by running `tiup install dumpling`. Afterwards, you can use `tiup dumpling ...` to run Dumpling.

> **Note:**
>
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. This fork has since been replaced by [Dumpling](/dumpling-overview.md), which has been rewritten in Go, and supports more optimizations that are specific to TiDB. It is strongly recommended that you use Dumpling instead of mydumper.
>
> For the overview of Mydumper, refer to [v4.0 Mydumper documentation](https://docs.pingcap.com/tidb/v4.0/backup-and-restore-using-mydumper-lightning).
Dumpling is also included in the tidb-toolkit installation package and can be [download here](/download-ecosystem-tools.md#dumpling).

## Improvements of Dumpling compared with Mydumper
For detailed usage of Dumpling, use the `--help` option or refer to [Option list of Dumpling](#option-list-of-dumpling).

1. Support exporting data in multiple formats, including SQL and CSV
2. Support the [table-filter](https://github.com/pingcap/tidb-tools/blob/master/pkg/table-filter/README.md) feature, which makes it easier to filter data
3. Support exporting data to Amazon S3 cloud storage.
4. More optimizations are made for TiDB:
- Support configuring the memory limit of a single TiDB SQL statement
- Support automatic adjustment of TiDB GC time for TiDB v4.0.0 and above
- Use TiDB's hidden column `_tidb_rowid` to optimize the performance of concurrent data export from a single table
- For TiDB, you can set the value of [`tidb_snapshot`](/read-historical-data.md#how-tidb-reads-data-from-history-versions) to specify the time point of the data backup. This ensures the consistency of the backup, instead of using `FLUSH TABLES WITH READ LOCK` to ensure the consistency.
When using Dumpling, you need to execute the export command on a running cluster.

## Dumpling introduction
<CustomContent platform="tidb">

Dumpling is written in Go. The Github project is [pingcap/dumpling](https://github.com/pingcap/dumpling).
TiDB also provides other tools that you can choose to use as needed.

For detailed usage of Dumpling, use the `--help` option or refer to [Option list of Dumpling](#option-list-of-dumpling).
- For backups of SST files (key-value pairs) or backups of incremental data that are not sensitive to latency, refer to [BR](/br/backup-and-restore-tool.md).
- For real-time backups of incremental data, refer to [TiCDC](/ticdc/ticdc-overview.md).
- All exported data can be imported back to TiDB using [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md).

When using Dumpling, you need to execute the export command on a running cluster. This document assumes that there is a TiDB instance on the `127.0.0.1:4000` host and that this TiDB instance has a root user without a password.
</CustomContent>

You can get Dumpling using [TiUP](/tiup/tiup-overview.md) by running `tiup install dumpling`. Afterwards, you can use `tiup dumpling ...` to run Dumpling.
> **Note:**
>
> PingCAP previously maintained a fork of the [mydumper project](https://github.com/maxbube/mydumper) with enhancements specific to TiDB. This fork has since been replaced by [Dumpling](/dumpling-overview.md), which has been rewritten in Go, and supports more optimizations that are specific to TiDB. It is strongly recommended that you use Dumpling instead of mydumper.
>
> For more information on Mydumper, refer to [v4.0 Mydumper documentation](https://docs.pingcap.com/tidb/v4.0/backup-and-restore-using-mydumper-lightning).
Dumpling is also included in the tidb-toolkit installation package and can be [download here](/download-ecosystem-tools.md#dumpling).
Compared to Mydumper, Dumpling has the following improvements:

## Export data from TiDB/MySQL
- Support exporting data in multiple formats, including SQL and CSV.
- Support the [table-filter](https://github.com/pingcap/tidb-tools/blob/master/pkg/table-filter/README.md) feature, which makes it easier to filter data.
- Support exporting data to Amazon S3 cloud storage.
- More optimizations are made for TiDB:
- Support configuring the memory limit of a single TiDB SQL statement.
- Support automatic adjustment of TiDB GC time for TiDB v4.0.0 and later versions.
- Use TiDB's hidden column `_tidb_rowid` to optimize the performance of concurrent data export from a single table.
- For TiDB, you can set the value of [`tidb_snapshot`](/read-historical-data.md#how-tidb-reads-data-from-history-versions) to specify the time point of the data backup. This ensures the consistency of the backup, instead of using `FLUSH TABLES WITH READ LOCK` to ensure the consistency.

## Export data from TiDB or MySQL

### Required privileges

Expand All @@ -50,6 +54,8 @@ Dumpling is also included in the tidb-toolkit installation package and can be [d

### Export to SQL files

This document assumes that there is a TiDB instance on the 127.0.0.1:4000 host and that this TiDB instance has a root user without a password.

Dumpling exports data to SQL files by default. You can also export data to SQL files by adding the `--filetype sql` flag:

{{< copyable "shell-regular" >}}
Expand Down Expand Up @@ -166,11 +172,11 @@ For example, you can export all records that match `id < 100` in `test.sbtest1`

### Export data to Amazon S3 cloud storage

Since v4.0.8, Dumpling supports exporting data to cloud storages. If you need to back up data to Amazon's S3 backend storage, you need to specify the S3 storage path in the `-o` parameter.
Starting from v4.0.8, Dumpling supports exporting data to cloud storages. If you need to back up data to Amazon S3, you need to specify the Amazon S3 storage path in the `-o` parameter.

You need to create an S3 bucket in the specified region (see the [Amazon documentation - How do I create an S3 Bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html)). If you also need to create a folder in the bucket, see the [Amazon documentation - Creating a folder](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html).
You need to create an Amazon S3 bucket in the specified region (see the [Amazon documentation - How do I create an S3 Bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html)). If you also need to create a folder in the bucket, see the [Amazon documentation - Creating a folder](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html).

Pass `SecretKey` and `AccessKey` of the account with the permission to access the S3 backend storage to the Dumpling node as environment variables.
Pass `SecretKey` and `AccessKey` of the account with the permission to access the Amazon S3 backend storage to the Dumpling node as environment variables.

{{< copyable "shell-regular" >}}

Expand Down Expand Up @@ -261,7 +267,7 @@ With the above options specified, Dumpling can have a quicker speed of data expo

> **Note:**
>
> In most scenarios, you do not need to adjust the default data consistency options of Dumpling (the default value is `auto`).
> The default value is `auto` for the data consistency option. In most scenarios, you do not need to adjust the default data consistency options of Dumpling.
Dumpling uses the `--consistency <consistency level>` option to control the way in which data is exported for "consistency assurance". When using snapshot for consistency, you can use the `--snapshot` option to specify the timestamp to be backed up. You can also use the following levels of consistency:

Expand Down Expand Up @@ -290,7 +296,7 @@ ls -lh /tmp/test | awk '{print $5 "\t" $9}'
190K test.sbtest3.0.sql
```

### Export historical data snapshot of TiDB
### Export historical data snapshots of TiDB

Dumpling can export the data of a certain [tidb_snapshot](/read-historical-data.md#how-tidb-reads-data-from-history-versions) with the `--snapshot` option specified.

Expand All @@ -313,7 +319,7 @@ When Dumpling is exporting a large single table from TiDB, Out of Memory (OOM) m
+ Reduce the value of `--tidb-mem-quota-query` to `8589934592` (8 GB) or lower. `--tidb-mem-quota-query` controls the memory usage of a single query statement in TiDB.
+ Adjust the `--params "tidb_distsql_scan_concurrency=5"` parameter. [`tidb_distsql_scan_concurrency`](/system-variables.md#tidb_distsql_scan_concurrency) is a session variable which controls the concurrency of the scan operations in TiDB.

### TiDB GC settings when exporting a large volume of data
### Set TiDB GC when exporting a large volume of data (more than 1 TB)

When exporting data from TiDB, if the TiDB version is later than or equal to v4.0.0 and Dumpling can access the PD address of the TiDB cluster, Dumpling automatically extends the GC time without affecting the original cluster.

Expand All @@ -333,8 +339,6 @@ After your operation is completed, set the GC time back (the default value is `1
SET GLOBAL tidb_gc_life_time = '10m';
```

Finally, all the exported data can be imported back to TiDB using [TiDB Lightning](/tidb-lightning/tidb-lightning-backends.md).

## Option list of Dumpling

| Options | Usage | Default value |
Expand Down
2 changes: 1 addition & 1 deletion read-historical-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,4 +174,4 @@ To restore data from an older version, you can use one of the following methods:

- For simple cases, use `SELECT` after setting the `tidb_snapshot` variable and copy-paste the output, or use `SELECT ... INTO LOCAL OUTFLE` and use `LOAD DATA` to import the data later on.

- Use [Dumpling](/dumpling-overview.md#export-historical-data-snapshot-of-tidb) to export a historical snapshot. Dumpling performs well in exporting larger sets of data.
- Use [Dumpling](/dumpling-overview.md#export-historical-data-snapshots-of-tidb) to export a historical snapshot. Dumpling performs well in exporting larger sets of data.

0 comments on commit c9398c9

Please sign in to comment.