From 8bca779e286c761f1ecc92288716e69c2c2f11a7 Mon Sep 17 00:00:00 2001 From: kennytm Date: Fri, 3 Jul 2020 15:20:17 +0800 Subject: [PATCH 1/3] cherry pick #3065 to release-3.0 Signed-off-by: ti-srebot --- TOC.md | 235 ++++++ br/backup-and-restore-tool.md | 668 ++++++++++++++++++ table-filter.md | 252 +++++++ .../tidb-lightning-configuration.md | 29 + tidb-lightning/tidb-lightning-glossary.md | 16 +- 5 files changed, 1194 insertions(+), 6 deletions(-) create mode 100644 br/backup-and-restore-tool.md create mode 100644 table-filter.md diff --git a/TOC.md b/TOC.md index 4b5c6142ad2f0..9157d14b028d3 100644 --- a/TOC.md +++ b/TOC.md @@ -88,6 +88,7 @@ - [Identify Slow Queries](/identify-slow-queries.md) - [Identify Expensive Queries](/identify-expensive-queries.md) + Scale +<<<<<<< HEAD - [Scale using Ansible](/scale-tidb-using-ansible.md) - [Scale a TiDB Cluster](/horizontal-scale.md) + Upgrade @@ -96,6 +97,130 @@ - [TiDB Troubleshooting Map](/tidb-troubleshooting-map.md) - [Troubleshoot Cluster Setup](/troubleshoot-tidb-cluster.md) - [Troubleshoot TiDB Lightning](/troubleshoot-tidb-lightning.md) +======= + + [Use TiUP (Recommended)](/scale-tidb-using-tiup.md) + + [Use TiDB Ansible](/scale-tidb-using-ansible.md) + + [Use TiDB Operator](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/scale-a-tidb-cluster) + + Backup and Restore + + [Use Mydumper and TiDB Lightning](/backup-and-restore-using-mydumper-lightning.md) + + [Use Dumpling for Export or Backup](/export-or-backup-using-dumpling.md) + + Use BR Tool + + [Use BR Tool](/br/backup-and-restore-tool.md) + + [BR Use Cases](/br/backup-and-restore-use-cases.md) + + [BR storages](/br/backup-and-restore-storages.md) + + [Configure Time Zone](/configure-time-zone.md) + + [Daily Checklist](/daily-check.md) + + [Manage TiCDC Cluster and Replication Tasks](/ticdc/manage-ticdc.md) + + [Maintain TiFlash](/tiflash/maintain-tiflash.md) + + [Maintain TiDB Using TiUP](/maintain-tidb-using-tiup.md) + + [Maintain TiDB Using Ansible](/maintain-tidb-using-ansible.md) ++ Monitor and Alert + + [Monitoring Framework Overview](/tidb-monitoring-framework.md) + + [Monitoring API](/tidb-monitoring-api.md) + + [Deploy Monitoring Services](/deploy-monitoring-services.md) + + [TiDB Cluster Alert Rules](/alert-rules.md) + + [TiFlash Alert Rules](/tiflash/tiflash-alert-rules.md) ++ Troubleshoot + + [Identify Slow Queries](/identify-slow-queries.md) + + [SQL Diagnostics](/system-tables/system-table-sql-diagnostics.md) + + [Identify Expensive Queries](/identify-expensive-queries.md) + + [Statement Summary Tables](/statement-summary-tables.md) + + [Troubleshoot Cluster Setup](/troubleshoot-tidb-cluster.md) + + [TiDB Troubleshooting Map](/tidb-troubleshooting-map.md) + + [Troubleshoot TiCDC](/ticdc/troubleshoot-ticdc.md) + + [Troubleshoot TiFlash](/tiflash/troubleshoot-tiflash.md) ++ Performance Tuning + + System Tuning + + [Operating System Tuning](/tune-operating-system.md) + + Software Tuning + + Configuration + + [Tune TiDB Memory](/configure-memory-usage.md) + + [Tune TiKV Threads](/tune-tikv-thread-performance.md) + + [Tune TiKV Memory](/tune-tikv-memory-performance.md) + + [TiKV Follower Read](/follower-read.md) + + [TiFlash Tuning](/tiflash/tune-tiflash-performance.md) + + [Coprocessor Cache](/coprocessor-cache.md) + + SQL Tuning + + [SQL Tuning with `EXPLAIN`](/query-execution-plan.md) + + SQL Optimization + + [SQL Optimization Process](/sql-optimization-concepts.md) + + Logic Optimization + + [Join Reorder](/join-reorder.md) + + Physical Optimization + + [Statistics](/statistics.md) + + Control Execution Plan + + [Optimizer Hints](/optimizer-hints.md) + + [SQL Plan Management](/sql-plan-management.md) + + [Access Tables Using `IndexMerge`](/index-merge.md) ++ Tutorials + + [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) + + [Three Data Centers in Two Cities Deployment](/three-data-centers-in-two-cities-deployment.md) + + Best Practices + + [Use TiDB](/tidb-best-practices.md) + + [Java Application Development](/best-practices/java-app-best-practices.md) + + [Use HAProxy](/best-practices/haproxy-best-practices.md) + + [Highly Concurrent Write](/best-practices/high-concurrency-best-practices.md) + + [Grafana Monitoring](/best-practices/grafana-monitor-best-practices.md) + + [PD Scheduling](/best-practices/pd-scheduling-best-practices.md) + + [TiKV Performance Tuning with Massive Regions](/best-practices/massive-regions-best-practices.md) + + [Use Placement Rules](/configure-placement-rules.md) + + [Use Load Base Split](/configure-load-base-split.md) + + [Use Store Limit](/configure-store-limit.md) ++ TiDB Ecosystem Tools + + [Overview](/ecosystem-tool-user-guide.md) + + [Use Cases](/ecosystem-tool-user-case.md) + + [Download](/download-ecosystem-tools.md) + + Backup & Restore (BR) + + [BR FAQ](/br/backup-and-restore-faq.md) + + [Use BR Tool](/br/backup-and-restore-tool.md) + + [BR Use Cases](/br/backup-and-restore-use-cases.md) + + TiDB Binlog + + [Overview](/tidb-binlog/tidb-binlog-overview.md) + + [Deploy](/tidb-binlog/deploy-tidb-binlog.md) + + [Maintain](/tidb-binlog/maintain-tidb-binlog-cluster.md) + + [Configure](/tidb-binlog/tidb-binlog-configuration-file.md) + + [Pump](/tidb-binlog/tidb-binlog-configuration-file.md#pump) + + [Drainer](/tidb-binlog/tidb-binlog-configuration-file.md#drainer) + + [Upgrade](/tidb-binlog/upgrade-tidb-binlog.md) + + [Monitor](/tidb-binlog/monitor-tidb-binlog-cluster.md) + + [Reparo](/tidb-binlog/tidb-binlog-reparo.md) + + [binlogctl](/tidb-binlog/binlog-control.md) + + [Binlog Slave Client](/tidb-binlog/binlog-slave-client.md) + + [TiDB Binlog Relay Log](/tidb-binlog/tidb-binlog-relay-log.md) + + [Bidirectional Replication Between TiDB Clusters](/tidb-binlog/bidirectional-replication-between-tidb-clusters.md) + + [Glossary](/tidb-binlog/tidb-binlog-glossary.md) + + Troubleshoot + + [Troubleshoot](/tidb-binlog/troubleshoot-tidb-binlog.md) + + [Handle Errors](/tidb-binlog/handle-tidb-binlog-errors.md) + + [FAQ](/tidb-binlog/tidb-binlog-faq.md) + + TiDB Lightning + + [Overview](/tidb-lightning/tidb-lightning-overview.md) + + [Tutorial](/get-started-with-tidb-lightning.md) + + [Deploy](/tidb-lightning/deploy-tidb-lightning.md) + + [Configure](/tidb-lightning/tidb-lightning-configuration.md) + + Key Features + + [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md) + + [Table Filter](/table-filter.md) + + [CSV Support](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md) + + [TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md) + + [Web Interface](/tidb-lightning/tidb-lightning-web-interface.md) + + [Monitor](/tidb-lightning/monitor-tidb-lightning.md) + + [Troubleshoot](/troubleshoot-tidb-lightning.md) + + [FAQ](/tidb-lightning/tidb-lightning-faq.md) + + [Glossary](/tidb-lightning/tidb-lightning-glossary.md) + + [TiCDC](/ticdc/ticdc-overview.md) + + sync-diff-inspector + + [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md) + + [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md) + + [Data Check in Sharding Scenarios](/sync-diff-inspector/shard-diff.md) + + [Data Check for TiDB Upstream/Downstream Clusters](/sync-diff-inspector/upstream-downstream-diff.md) + + [Loader](/loader-overview.md) + + [Mydumper](/mydumper-overview.md) + + [Syncer](/syncer-overview.md) + + TiSpark + + [Quick Start](/get-started-with-tispark.md) + + [User Guide](/tispark-overview.md) +>>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) + Reference + SQL - [MySQL Compatibility](/mysql-compatibility.md) @@ -143,6 +268,7 @@ - [`SET`](/data-type-string.md#set-type) - [JSON Type](/data-type-json.md) + Functions and Operators +<<<<<<< HEAD - [Function and Operator Reference](/functions-and-operators/functions-and-operators-overview.md) - [Type Conversion in Expression Evaluation](/functions-and-operators/type-conversion-in-expression-evaluation.md) - [Operators](/functions-and-operators/operators.md) @@ -350,6 +476,115 @@ - [TiKV Control](/tikv-control.md) - [TiDB Control](/tidb-control.md) - [TiDB in Kubernetes](https://pingcap.com/docs/tidb-in-kubernetes/stable/) +======= + + [Overview](/functions-and-operators/functions-and-operators-overview.md) + + [Type Conversion in Expression Evaluation](/functions-and-operators/type-conversion-in-expression-evaluation.md) + + [Operators](/functions-and-operators/operators.md) + + [Control Flow Functions](/functions-and-operators/control-flow-functions.md) + + [String Functions](/functions-and-operators/string-functions.md) + + [Numeric Functions and Operators](/functions-and-operators/numeric-functions-and-operators.md) + + [Date and Time Functions](/functions-and-operators/date-and-time-functions.md) + + [Bit Functions and Operators](/functions-and-operators/bit-functions-and-operators.md) + + [Cast Functions and Operators](/functions-and-operators/cast-functions-and-operators.md) + + [Encryption and Compression Functions](/functions-and-operators/encryption-and-compression-functions.md) + + [Information Functions](/functions-and-operators/information-functions.md) + + [JSON Functions](/functions-and-operators/json-functions.md) + + [Aggregate (GROUP BY) Functions](/functions-and-operators/aggregate-group-by-functions.md) + + [Window Functions](/functions-and-operators/window-functions.md) + + [Miscellaneous Functions](/functions-and-operators/miscellaneous-functions.md) + + [Precision Math](/functions-and-operators/precision-math.md) + + [List of Expressions for Pushdown](/functions-and-operators/expressions-pushed-down.md) + + [Constraints](/constraints.md) + + [Generated Columns](/generated-columns.md) + + [SQL Mode](/sql-mode.md) + + Transactions + + [Overview](/transaction-overview.md) + + [Isolation Levels](/transaction-isolation-levels.md) + + [Optimistic Transactions](/optimistic-transaction.md) + + [Pessimistic Transactions](/pessimistic-transaction.md) + + Garbage Collection (GC) + + [Overview](/garbage-collection-overview.md) + + [Configuration](/garbage-collection-configuration.md) + + [Views](/views.md) + + [Partitioning](/partitioned-table.md) + + [Character Set and Collation](/character-set-and-collation.md) + + System Tables + + [`mysql`](/system-tables/system-table-overview.md) + + [`information_schema`](/system-tables/system-table-information-schema.md) + + sql-diagnosis + + [`cluster_info`](/system-tables/system-table-cluster-info.md) + + [`cluster_hardware`](/system-tables/system-table-cluster-hardware.md) + + [`cluster_config`](/system-tables/system-table-cluster-config.md) + + [`cluster_load`](/system-tables/system-table-cluster-load.md) + + [`cluster_systeminfo`](/system-tables/system-table-cluster-systeminfo.md) + + [`cluster_log`](/system-tables/system-table-cluster-log.md) + + [`metrics_schema`](/system-tables/system-table-metrics-schema.md) + + [`metrics_tables`](/system-tables/system-table-metrics-tables.md) + + [`metrics_summary`](/system-tables/system-table-metrics-summary.md) + + [`inspection_result`](/system-tables/system-table-inspection-result.md) + + [`inspection_summary`](/system-tables/system-table-inspection-summary.md) + + UI + + TiDB Dashboard + + [Overview](/dashboard/dashboard-intro.md) + + Maintain + + [Deploy](/dashboard/dashboard-ops-deploy.md) + + [Reverse Proxy](/dashboard/dashboard-ops-reverse-proxy.md) + + [Secure](/dashboard/dashboard-ops-security.md) + + [Access](/dashboard/dashboard-access.md) + + [Overview Page](/dashboard/dashboard-overview.md) + + [Cluster Info Page](/dashboard/dashboard-cluster-info.md) + + [Key Visualizer Page](/dashboard/dashboard-key-visualizer.md) + + SQL Statements Analysis + + [SQL Statements Page](/dashboard/dashboard-statement-list.md) + + [SQL Details Page](/dashboard/dashboard-statement-details.md) + + [Slow Queries Page](/dashboard/dashboard-slow-query.md) + + Cluster Diagnostics + + [Access Cluster Diagnostics Page](/dashboard/dashboard-diagnostics-access.md) + + [View Diagnostics Report](/dashboard/dashboard-diagnostics-report.md) + + [Use Diagnostics](/dashboard/dashboard-diagnostics-usage.md) + + [Search Logs Page](/dashboard/dashboard-log-search.md) + + [Profile Instances Page](/dashboard/dashboard-profiling.md) + + [FAQ](/dashboard/dashboard-faq.md) + + CLI + + [tikv-ctl](/tikv-control.md) + + [pd-ctl](/pd-control.md) + + [tidb-ctl](/tidb-control.md) + + [pd-recover](/pd-recover.md) + + Command Line Flags + + [tidb-server](/command-line-flags-for-tidb-configuration.md) + + [tikv-server](/command-line-flags-for-tikv-configuration.md) + + [tiflash-server](/tiflash/tiflash-command-line-flags.md) + + [pd-server](/command-line-flags-for-pd-configuration.md) + + Configuration File Parameters + + [tidb-server](/tidb-configuration-file.md) + + [tikv-server](/tikv-configuration-file.md) + + [tiflash-server](/tiflash/tiflash-configuration.md) + + [pd-server](/pd-configuration-file.md) + + System Variables + + [MySQL System Variables](/system-variables.md) + + [TiDB Specific System Variables](/tidb-specific-system-variables.md) + + Storage Engines + + TiFlash + + [Overview](/tiflash/tiflash-overview.md) + + [Use TiFlash](/tiflash/use-tiflash.md) + + TiUP + + [Documentation Guide](/tiup/tiup-documentation-guide.md) + + [Overview](/tiup/tiup-overview.md) + + [Terminology and Concepts](/tiup/tiup-terminology-and-concepts.md) + + [Manage TiUP Components](/tiup/tiup-component-management.md) + + [FAQ](/tiup/tiup-faq.md) + + [Troubleshooting Guide](/tiup/tiup-troubleshooting-guide.md) + + TiUP Components + + [tiup-playground](/tiup/tiup-playground.md) + + [tiup-cluster](/tiup/tiup-cluster.md) + + [tiup-mirror](/tiup/tiup-mirror.md) + + [tiup-bench](/tiup/tiup-bench.md) + + [Telemetry](/telemetry.md) + + [Errors Codes](/error-codes.md) + + [TiCDC Overview](/ticdc/ticdc-overview.md) + + [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md) + + [Table Filter](/table-filter.md) +>>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) + FAQs - [TiDB FAQs](/faq/tidb-faq.md) - [TiDB Lightning FAQs](/tidb-lightning/tidb-lightning-faq.md) diff --git a/br/backup-and-restore-tool.md b/br/backup-and-restore-tool.md new file mode 100644 index 0000000000000..de5403a45fdb8 --- /dev/null +++ b/br/backup-and-restore-tool.md @@ -0,0 +1,668 @@ +--- +title: Use BR to Back up and Restore Data +summary: Learn how to back up and restore data of the TiDB cluster using BR. +category: how-to +aliases: ['/docs/dev/br/backup-and-restore-tool/','/docs/dev/reference/tools/br/br/','/docs/dev/how-to/maintain/backup-and-restore/br/'] +--- + +# Use BR to Back up and Restore Data + +[Backup & Restore](http://github.com/pingcap/br) (BR) is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with [`mydumper`/`loader`](/backup-and-restore-using-mydumper-lightning.md), BR is more suitable for scenarios of huge data volume. This document describes the BR command line, detailed use examples, best practices, restrictions, and introduces the implementation principles of BR. + +## Usage restrictions + +- BR only supports TiDB v3.1 and later versions. +- Currently, TiDB does not support backing up and restoring partitioned tables. +- Currently, you can perform restoration only on new clusters. +- It is recommended that you execute multiple backup operations serially. Otherwise, different backup operations might interfere with each other. + +## Recommended deployment configuration + +- It is recommended that you deploy BR on the PD node. +- It is recommended that you mount a high-performance SSD to BR nodes and all TiKV nodes. A 10-gigabit network card is recommended. Otherwise, bandwidth is likely to be the performance bottleneck during the backup and restore process. + +## Download Binary + +Refer to the [download page](/download-ecosystem-tools.md#br-backup-and-restore) for more information. + +## Implementation principles + +BR sends the backup or restoration commands to each TiKV node. After receiving these commands, TiKV performs the corresponding backup or restoration operations. Each TiKV node has a path in which the backup files generated in the backup operation are stored and from which the stored backup files are read during the restoration. + +### Backup principle + +When BR performs a backup operation, it first obtains the following information from PD: + +- The current TS (timestamp) as the time of the backup snapshot +- The TiKV node information of the current cluster + +According to these information, BR starts a TiDB instance internally to obtain the database or table information corresponding to the TS, and filters out the system databases (`information_schema`, `performance_schema`, `mysql`) at the same time. + +According to the backup sub-command, BR adopts the following two types of backup logic: + +- Full backup: BR traverses all the tables and constructs the KV range to be backed up according to each table. +- Single table backup: BR constructs the KV range to be backed up according a single table. + +Finally, BR collects the KV range to be backed up and sends the complete backup request to the TiKV node of the cluster. + +The structure of the request: + +``` +BackupRequest{ + ClusterId, // The cluster ID. + StartKey, // The starting key of the backup (backed up). + EndKey, // The ending key of the backup (not backed up). + StartVersion, // The version of the last backup snapshot, used for the incremental backup. + EndVersion, // The backup snapshot time. + StorageBackend, // The path where backup files are stored. + RateLimit, // Backup speed (MB/s). +} +``` + +After receiving the backup request, the TiKV node traverses all Region leaders on the node to find the Regions that overlap with the KV ranges in this request. The TiKV node backs up some or all of the data within the range, and generates the corresponding SST file. + +After finishing backing up the data of the corresponding Region, the TiKV node returns the metadata to BR. BR collects the metadata and stores it in the `backupmeta` file which is used for restoration. + +If `StartVersion` is not `0`, the backup is seen as an incremental backup. In addition to KVs, BR also collects DDLs between `[StartVersion, EndVersion)`. During data restoration, these DDLs are restored first. + +If checksum is enabled when you execute the backup command, BR calculates the checksum of each backed up table for data check. + +#### Types of backup files + +Two types of backup files are generated in the path where backup files are stored: + +- **The SST file**: stores the data that the TiKV node backed up. +- **The `backupmeta` file**: stores the metadata of this backup operation, including the number, the key range, the size, and the Hash (sha256) value of the backup files. + +#### The format of the SST file name + +The SST file is named in the format of `storeID_regionID_regionEpoch_keyHash_cf`, where + +- `storeID` is the TiKV node ID; +- `regionID` is the Region ID; +- `regionEpoch` is the version number of the Region; +- `keyHash` is the Hash (sha256) value of the startKey of a range, which ensures the uniqueness of a key; +- `cf` indicates the [Column Family](/tune-tikv-memory-performance.md) of RocksDB (`default` or `write` by default). + +### Restoration principle + +During the data restoration process, BR performs the following tasks in order: + +1. It parses the `backupmeta` file in the backup path, and then starts a TiDB instance internally to create the corresponding databases and tables based on the parsed information. + +2. It aggregates the parsed SST files according to the tables. + +3. It pre-splits Regions according to the key range of the SST file so that every Region corresponds to at least one SST file. + +4. It traverses each table to be restored and the SST file corresponding to each tables. + +5. It finds the Region corresponding to the SST file and sends a request to the corresponding TiKV node for downloading the file. Then it sends a request for loading the file after the file is successfully downloaded. + +After TiKV receives the request to load the SST file, TiKV uses the Raft mechanism to ensure the strong consistency of the SST data. After the downloaded SST file is loaded successfully, the file is deleted asynchronously. + +After the restoration operation is completed, BR performs a checksum calculation on the restored data to compare the stored data with the backed up data. + +![br-arch](/media/br-arch.png) + +## Command-line description + +A `br` command consists of sub-commands, options, and parameters. + +* Sub-command: the characters without `-` or `--`. +* Option: the characters that start with `-` or `--`. +* Parameter: the characters that immediately follow behind and are passed to the sub-command or the option. + +This is a complete `br` command: + +{{< copyable "shell-regular" >}} + +```shell +br backup full --pd "${PDIP}:2379" -s "local:///tmp/backup" +``` + +Explanations for the above command are as follows: + +* `backup`: the sub-command of `br`. +* `full`: the sub-command of `backup`. +* `-s` (or `--storage`): the option that specifies the path where the backup files are stored. +* `"local:///tmp/backup"`: the parameter of `-s`. `/tmp/backup` is the path in the local disk where the backed up files of each TiKV node are stored. +* `--pd`: the option that specifies the Placement Driver (PD) service address. +* `"${PDIP}:2379"`: the parameter of `--pd`. + +> **Note:** +> +> - When the `local` storage is used, the backup data are scattered in the local file system of each node. +> +> - It is **not recommended** to back up to a local disk in the production environment because you **have to** manually aggregate these data to complete the data restoration. For more information, see [Restore Cluster Data](#restore-cluster-data). +> +> - Aggregating these backup data might cause redundancy and bring troubles to operation and maintenance. Even worse, if restoring data without aggregating these data, you can receive a rather confusing error message `SST file not found`. +> +> - It is recommended to mount the NFS disk on each node, or back up to the `S3` object storage. + +### Sub-commands + +A `br` command consists of multiple layers of sub-commands. Currently, BR has the following three sub-commands: + +* `br backup`: used to back up the data of the TiDB cluster. +* `br restore`: used to restore the data of the TiDB cluster. + +Each of the above three sub-commands might still include the following three sub-commands to specify the scope of an operation: + +* `full`: used to back up or restore all the cluster data. +* `db`: used to back up or restore the specified database of the cluster. +* `table`: used to back up or restore a single table in the specified database of the cluster. + +### Common options + +* `--pd`: used for connection, specifying the PD server address. For example, `"${PDIP}:2379"`. +* `-h` (or `--help`): used to get help on all sub-commands. For example, `br backup --help`. +* `-V` (or `--version`): used to check the version of BR. +* `--ca`: specifies the path to the trusted CA certificate in the PEM format. +* `--cert`: specifies the path to the SSL certificate in the PEM format. +* `--key`: specifies the path to the SSL certificate key in the PEM format. +* `--status-addr`: specifies the listening address through which BR provides statistics to Prometheus. + +## Back up cluster data + +To back up the cluster data, use the `br backup` command. You can add the `full` or `table` sub-command to specify the scope of your backup operation: the whole cluster or a single table. + +If the backup time might exceed the [`tikv_gc_life_time`](/garbage-collection-configuration.md#tikv_gc_life_time) configuration which is `10m0s` by default (`10m0s` means 10 minutes), increase the value of this configuration. + +For example, set `tikv_gc_life_time` to `720h`: + +{{< copyable "sql" >}} + +```sql +mysql -h${TiDBIP} -P4000 -u${TIDB_USER} ${password_str} -Nse \ + "update mysql.tidb set variable_value='720h' where variable_name='tikv_gc_life_time'"; +``` + +### Back up all the cluster data + +To back up all the cluster data, execute the `br backup full` command. To get help on this command, execute `br backup full -h` or `br backup full --help`. + +**Usage example:** + +Back up all the cluster data to the `/tmp/backup` path of each TiKV node and write the `backupmeta` file to this path. + +> **Note:** +> +> + If the backup disk and the service disk are different, it has been tested that online backup reduces QPS of the read-only online service by about 15%-25% in case of full-speed backup. If you want to reduce the impact on QPS, use `--ratelimit` to limit the rate. +> +> + If the backup disk and the service disk are the same, the backup competes with the service for I/O resources. This might decrease the QPS of the read-only online service by more than half. Therefore, it is **highly not recommended** to back up the online service data to the TiKV data disk. + +{{< copyable "shell-regular" >}} + +```shell +br backup full \ + --pd "${PDIP}:2379" \ + --storage "local:///tmp/backup" \ + --ratelimit 120 \ + --log-file backupfull.log +``` + +Explanations for some options in the above command are as follows: + +* `--ratelimit`: specifies the maximum speed at which a backup operation is performed (MiB/s) on each TiKV node. +* `--log-file`: specifies writing the BR log to the `backupfull.log` file. + +A progress bar is displayed in the terminal during the backup. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety. The progress bar is displayed as follows: + +```shell +br backup full \ + --pd "${PDIP}:2379" \ + --storage "local:///tmp/backup" \ + --ratelimit 120 \ + --log-file backupfull.log +Full Backup <---------/................................................> 17.12%. +``` + +### Back up a database + +To back up a database in the cluster, execute the `br backup db` command. To get help on this command, execute `br backup db -h` or `br backup db --help`. + +**Usage example:** + +Back up the data of the `test` database to the `/tmp/backup` path on each TiKV node and write the `backupmeta` file to this path. + +{{< copyable "shell-regular" >}} + +```shell +br backup db \ + --pd "${PDIP}:2379" \ + --db test \ + --storage "local:///tmp/backup" \ + --ratelimit 120 \ + --log-file backuptable.log +``` + +In the above command, `--db` specifies the name of the database to be backed up. For descriptions of other options, see [Back up all the cluster data](/br/backup-and-restore-tool.md#back-up-all-the-cluster-data). + +A progress bar is displayed in the terminal during the backup. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety. + +### Back up a table + +To back up the data of a single table in the cluster, execute the `br backup table` command. To get help on this command, execute `br backup table -h` or `br backup table --help`. + +**Usage example:** + +Back up the data of the `test.usertable` table to the `/tmp/backup` path on each TiKV node and write the `backupmeta` file to this path. + +{{< copyable "shell-regular" >}} + +```shell +br backup table \ + --pd "${PDIP}:2379" \ + --db test \ + --table usertable \ + --storage "local:///tmp/backup" \ + --ratelimit 120 \ + --log-file backuptable.log +``` + +The `table` sub-command has two options: + +* `--db`: specifies the database name +* `--table`: specifies the table name. + +For descriptions of other options, see [Back up all cluster data](#back-up-all-the-cluster-data). + +A progress bar is displayed in the terminal during the backup operation. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety. + +### Back up with table filter + +To back up multiple tables with more complex criteria, execute the `br backup full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`. + +**Usage example:** + +The following command backs up the data of all tables in the form `db*.tbl*` to the `/tmp/backup` path on each TiKV node and writes the `backupmeta` file to this path. + +{{< copyable "shell-regular" >}} + +```shell +br backup full \ + --pd "${PDIP}:2379" \ + --filter 'db*.tbl*' \ + --storage "local:///tmp/backup" \ + --ratelimit 120 \ + --log-file backupfull.log +``` + +### Back up data to Amazon S3 backend + +If you back up the data to the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3. + +You can refer to the [AWS Official Document](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) to create an S3 `Bucket` in the specified `Region`. You can also refer to another [AWS Official Document](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html) to create a `Folder` in the `Bucket`. + +Pass `SecretKey` and `AccessKey` of the account that has privilege to access the S3 backend to the BR node. Here `SecretKey` and `AccessKey` are passed as environment variables. Then pass the privilege to the TiKV node through BR. + +{{< copyable "shell-regular" >}} + +```shell +export AWS_ACCESS_KEY_ID=${AccessKey} +export AWS_SECRET_ACCESS_KEY=${SecretKey} +``` + +When backing up using BR, explicitly specify the parameters `--s3.region` and `--send-credentials-to-tikv`. `--s3.region` indicates the region where S3 is located, and `--send-credentials-to-tikv` means passing the privilege to access S3 to the TiKV node. + +{{< copyable "shell-regular" >}} + +```shell +br backup full \ + --pd "${PDIP}:2379" \ + --storage "s3://${Bucket}/${Folder}" \ + --s3.region "${region}" \ + --send-credentials-to-tikv=true \ + --log-file backuptable.log +``` + +### Back up incremental data + +If you want to back up incrementally, you only need to specify the **last backup timestamp** `--lastbackupts`. + +The incremental backup has two limitations: + +- The incremental backup needs to be under a different path from the previous full backup. +- No GC (Garbage Collection) happens between the start time of the incremental backup and `lastbackupts`. + +To back up the incremental data between `(LAST_BACKUP_TS, current PD timestamp]`, execute the following command: + +{{< copyable "shell-regular" >}} + +```shell +br backup full\ + --pd ${PDIP}:2379 \ + -s local:///home/tidb/backupdata/incr \ + --lastbackupts ${LAST_BACKUP_TS} +``` + +To get the timestamp of the last backup, execute the `validate` command. For example: + +{{< copyable "shell-regular" >}} + +```shell +LAST_BACKUP_TS=`br validate decode --field="end-version" -s local:///home/tidb/backupdata` +``` + +In the above example, the incremental backup data includes the newly written data and the DDLs between `(LAST_BACKUP_TS, current PD timestamp]`. When restoring data, BR restores DDLs first and then restores the written data. + +### Back up Raw KV (experimental feature) + +> **Warning:** +> +> This feature is experimental and not thoroughly tested. It is highly **not recommended** to use this feature in the production environment. + +In some scenarios, TiKV might run independently of TiDB. Given that, BR also supports bypassing the TiDB layer and backing up data in TiKV. + +For example, you can execute the following command to back up all keys between `[0x31, 0x3130303030303030)` in the default CF to `$BACKUP_DIR`: + +{{< copyable "shell-regular" >}} + +```shell +br backup raw --pd $PD_ADDR \ + -s "local://$BACKUP_DIR" \ + --start 31 \ + --end 3130303030303030 \ + --format hex \ + --cf default +``` + +Here, the parameters of `--start` and `--end` are decoded using the method specified by `--format` before being sent to TiKV. Currently, the following methods are available: + +- "raw": The input string is directly encoded as a key in binary format. +- "hex": The default encoding method. The input string is treated as a hexadecimal number. +- "escape": First escape the input string, and then encode it into binary format. + +## Restore cluster data + +To restore the cluster data, use the `br restore` command. You can add the `full`, `db` or `table` sub-command to specify the scope of your restoration: the whole cluster, a database or a single table. + +> **Note:** +> +> If you use the local storage, you **must** copy all back up SST files to every TiKV node in the path specified by `--storage`. +> +> Even if each TiKV node eventually only need to read a part of the all SST files, they all need full access to the complete archive because: +> +> - Data are replicated into multiple peers. When ingesting SSTs, these files have to be present on *all* peers. This is unlike back up where reading from a single node is enough. +> - Where each peer is scattered to during restore is random. We don't know in advance which node will read which file. +> +> These can be avoided using shared storage, for example mounting an NFS on the local path, or using S3. With network storage, every node can automatically read every SST file, so these caveats no longer apply. + +### Restore all the backup data + +To restore all the backup data to the cluster, execute the `br restore full` command. To get help on this command, execute `br restore full -h` or `br restore full --help`. + +**Usage example:** + +Restore all the backup data in the `/tmp/backup` path to the cluster. + +{{< copyable "shell-regular" >}} + +```shell +br restore full \ + --pd "${PDIP}:2379" \ + --storage "local:///tmp/backup" \ + --ratelimit 128 \ + --log-file restorefull.log +``` + +Explanations for some options in the above command are as follows: + +* `--ratelimit`: specifies the maximum speed at which a restoration operation is performed (MiB/s) on each TiKV node. +* `--log-file`: specifies writing the BR log to the `restorefull.log` file. + +A progress bar is displayed in the terminal during the restoration. When the progress bar advances to 100%, the restoration is complete. Then the BR also checks the backup data to ensure data safety. + +```shell +br restore full \ + --pd "${PDIP}:2379" \ + --storage "local:///tmp/backup" \ + --log-file restorefull.log +Full Restore <---------/...............................................> 17.12%. +``` + +### Restore a database + +To restore a database to the cluster, execute the `br restore db` command. To get help on this command, execute `br restore db -h` or `br restore db --help`. + +**Usage example:** + +Restore a database backed up in the `/tmp/backup` path to the cluster. + +{{< copyable "shell-regular" >}} + +```shell +br restore db \ + --pd "${PDIP}:2379" \ + --db "test" \ + --storage "local:///tmp/backup" \ + --log-file restorefull.log +``` + +In the above command, `--db` specifies the name of the database to be restored. For descriptions of other options, see [Restore all backup data](#restore-all-the-backup-data)). + +### Restore a table + +To restore a single table to the cluster, execute the `br restore table` command. To get help on this command, execute `br restore table -h` or `br restore table --help`. + +**Usage example:** + +Restore a table backed up in the `/tmp/backup` path to the cluster. + +{{< copyable "shell-regular" >}} + +```shell +br restore table \ + --pd "${PDIP}:2379" \ + --db "test" \ + --table "usertable" \ + --storage "local:///tmp/backup" \ + --log-file restorefull.log +``` + +In the above command, `--table` specifies the name of the table to be restored. For descriptions of other options, see [Restore all backup data](#restore-all-the-backup-data) and [Restore a database](#restore-a-database). + +### Restore with table filter + +To restore multiple tables with more complex criteria, execute the `br restore full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`. + +**Usage example:** + +The following command restores a subset of tables backed up in the `/tmp/backup` path to the cluster. + +{{< copyable "shell-regular" >}} + +```shell +br restore full \ + --pd "${PDIP}:2379" \ + --filter 'db*.tbl*' \ + --storage "local:///tmp/backup" \ + --log-file restorefull.log +``` + +### Restore data from Amazon S3 backend + +If you restore data from the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3. + +Pass `SecretKey` and `AccessKey` of the account that has privilege to access the S3 backend to the BR node. Here `SecretKey` and `AccessKey` are passed as environment variables. Then pass the privilege to the TiKV node through BR. + +{{< copyable "shell-regular" >}} + +```shell +export AWS_ACCESS_KEY_ID=${AccessKey} +export AWS_SECRET_ACCESS_KEY=${SecretKey} +``` + +When restoring data using BR, explicitly specify the parameters `--s3.region` and `--send-credentials-to-tikv`. `--s3.region` indicates the region where S3 is located, and `--send-credentials-to-tikv` means passing the privilege to access S3 to the TiKV node. + +`Bucket` and `Folder` in the `--storage` parameter represent the S3 bucket and the folder where the data to be restored is located. + +{{< copyable "shell-regular" >}} + +```shell +br restore full \ + --pd "${PDIP}:2379" \ + --storage "s3://${Bucket}/${Folder}" \ + --s3.region "${region}" \ + --send-credentials-to-tikv=true \ + --log-file restorefull.log +``` + +In the above command, `--table` specifies the name of the table to be restored. For descriptions of other options, see [Restore a database](#restore-a-database). + +### Restore incremental data + +Restoring incremental data is similar to [restoring full data using BR](#restore-all-the-backup-data). Note that when restoring incremental data, make sure that all the data backed up before `last backup ts` has been restored to the target cluster. + +### Restore Raw KV (experimental feature) + +> **Warning:** +> +> This feature is in the experiment, without being thoroughly tested. It is highly **not recommended** to use this feature in the production environment. + +Similar to [backing up Raw KV](#back-up-raw-kv-experimental-feature), you can execute the following command to restore Raw KV: + +{{< copyable "shell-regular" >}} + +```shell +br restore raw --pd $PD_ADDR \ + -s "local://$BACKUP_DIR" \ + --start 31 \ + --end 3130303030303030 \ + --format hex \ + --cf default +``` + +In the above example, all the backed up keys in the range `[0x31, 0x3130303030303030)` are restored to the TiKV cluster. The coding methods of these keys are identical to that of [keys during the backup process](#back-up-raw-kv-experimental-feature) + +### Online restore (experimental feature) + +> **Warning:** +> +> This feature is in the experiment, without being thoroughly tested. It also relies on the unstable `Placement Rules` feature of PD. It is highly **not recommended** to use this feature in the production environment. + +During data restoration, writing too much data affects the performance of the online cluster. To avoid this effect as much as possible, BR supports [Placement rules](/configure-placement-rules.md) to isolate resources. In this case, downloading and importing SST are only performed on a few specified nodes (or "restore nodes" for short). To complete the online restore, take the following steps. + +1. Configure PD, and start Placement rules: + + {{< copyable "shell-regular" >}} + + ```shell + echo "config set enable-placement-rules true" | pd-ctl + ``` + +2. Edit the configuration file of the "restore node" in TiKV, and specify "restore" to the `server` configuration item: + + {{< copyable "" >}} + + ``` + [server] + labels = { exclusive = "restore" } + ``` + +3. Start TiKV of the "restore node" and restore the backed up files using BR. Compared with the offline restore, you only need to add the `--online` flag: + + {{< copyable "shell-regular" >}} + + ``` + br restore full \ + -s "local://$BACKUP_DIR" \ + --pd $PD_ADDR \ + --online + ``` + +## Best practices + +- It is recommended that you mount a shared storage (for example, NFS) on the backup path specified by `-s`, to make it easier to collect and manage backup files. +- It is recommended that you use a storage hardware with high throughput, because the throughput of a storage hardware limits the backup and restoration speed. +- It is recommended that you perform the backup operation during off-peak hours to minimize the impact on applications. + +For more recommended practices of using BR, refer to [BR Usage Scenarios](/br/backup-and-restore-use-cases.md). + +## Examples + +This section shows how to back up and restore the data of an existing cluster. You can estimate the performance of backup and restoration based on machine performance, configuration and data volume. + +### Data volume and machine configuration + +Suppose that the backup and restoration operations are performed on 10 tables in the TiKV cluster, each table with 5 million rows of data. The total data volume is 35 GB. + +```sql +MySQL [sbtest]> show tables; ++------------------+ +| Tables_in_sbtest | ++------------------+ +| sbtest1 | +| sbtest10 | +| sbtest2 | +| sbtest3 | +| sbtest4 | +| sbtest5 | +| sbtest6 | +| sbtest7 | +| sbtest8 | +| sbtest9 | ++------------------+ + +MySQL [sbtest]> select count(*) from sbtest1; ++----------+ +| count(*) | ++----------+ +| 5000000 | ++----------+ +1 row in set (1.04 sec) +``` + +The table structure is as follows: + +```sql +CREATE TABLE `sbtest1` ( + `id` int(11) NOT NULL AUTO_INCREMENT, + `k` int(11) NOT NULL DEFAULT '0', + `c` char(120) NOT NULL DEFAULT '', + `pad` char(60) NOT NULL DEFAULT '', + PRIMARY KEY (`id`), + KEY `k_1` (`k`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=5138499 +``` + +Suppose that 4 TiKV nodes is used, each with the following configuration: + +| CPU | Memory | Disk | Number of replicas | +| :-------- | :------ | :---- | :------------------ | +| 16 cores | 32 GB | SSD | 3 | + +### Backup + +Before the backup operation, check the following two items: + +- You have set `tikv_gc_life_time` set to a larger value so that the backup operation will not be interrupted because of data loss. +- No DDL statement is being executed on the TiDB cluster. + +Then execute the following command to back up all the cluster data: + +{{< copyable "shell-regular" >}} + +```shell +bin/br backup full -s local:///tmp/backup --pd "${PDIP}:2379" --log-file backup.log +``` + +``` +[INFO] [collector.go:165] ["Full backup summary: total backup ranges: 2, total success: 2, total failed: 0, total take(s): 0.00, total kv: 4, total size(Byte): 133, avg speed(Byte/s): 27293.78"] ["backup total regions"=2] ["backup checksum"=1.640969ms] ["backup fast checksum"=227.885µs] +``` + +### Restoration + +Before the restoration, make sure that the TiKV cluster to be restored is a new cluster. + +Then execute the following command to restore all the cluster data: + +{{< copyable "shell-regular" >}} + +```shell +bin/br restore full -s local:///tmp/backup --pd "${PDIP}:2379" --log-file restore.log +``` + +``` +[INFO] [collector.go:165] ["Full Restore summary: total restore tables: 1, total success: 1, total failed: 0, total take(s): 0.26, total kv: 20000, total size(MB): 10.98, avg speed(MB/s): 41.95"] ["restore files"=3] ["restore ranges"=2] ["split region"=0.562369381s] ["restore checksum"=36.072769ms] +``` diff --git a/table-filter.md b/table-filter.md new file mode 100644 index 0000000000000..80f44ac522ec8 --- /dev/null +++ b/table-filter.md @@ -0,0 +1,252 @@ +--- +title: Table Filter +summary: Usage of table filter feature in TiDB tools. +category: reference +aliases: ['/docs/dev/tidb-lightning/tidb-lightning-table-filter/','/docs/dev/reference/tools/tidb-lightning/table-filter/','/tidb/dev/tidb-lightning-table-filter/'] +--- + +# Table Filter + +The TiDB ecosystem tools operate on all the databases by default, but oftentimes only a subset is needed. For example, you only want to work with the schemas in the form of `foo*` and `bar*` and nothing else. + +Since TiDB 4.0, all TiDB ecosystem tools share a common filter syntax to define subsets. This document describes how to use the table filter feature. + +## Usage + +### CLI + +Table filters can be applied to the tools using multiple `-f` or `--filter` command line parameters. Each filter is in the form of `db.table`, where each part can be a wildcard (further explained in the [next section](#wildcards)). The following lists the example usage in each tool. + +* [BR](/br/backup-and-restore-tool.md): + + {{< copyable "shell-regular" >}} + + ```shell + ./br backup full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup' + # ^~~~~~~~~~~~~~~~~~~~~~~ + ./br restore full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup' + # ^~~~~~~~~~~~~~~~~~~~~~~ + ``` + +* [Dumpling](/export-or-backup-using-dumpling.md): + + {{< copyable "shell-regular" >}} + + ```shell + ./dumpling -f 'foo*.*' -f 'bar*.*' -P 3306 -o /tmp/data/ + # ^~~~~~~~~~~~~~~~~~~~~~~ + ``` + +* [Lightning](/tidb-lightning/tidb-lightning-overview.md): + + {{< copyable "shell-regular" >}} + + ```shell + ./tidb-lightning -f 'foo*.*' -f 'bar*.*' -d /tmp/data/ --backend tidb + # ^~~~~~~~~~~~~~~~~~~~~~~ + ``` + +### TOML configuration files + +Table filters in TOML files are specified as [array of strings](https://toml.io/en/v1.0.0-rc.1#section-15). The following lists the example usage in each tool. + +* Lightning: + + ```toml + [mydumper] + filter = ['foo*.*', 'bar*.*'] + ``` + +* [TiCDC](/ticdc/ticdc-overview.md): + + ```toml + [filter] + rules = ['foo*.*', 'bar*.*'] + + [[sink.dispatchers]] + matcher = ['db1.*', 'db2.*', 'db3.*'] + dispatcher = 'ts' + ``` + +## Syntax + +### Plain table names + +Each table filter rule consists of a "schema pattern" and a "table pattern", separated by a dot (`.`). Tables whose fully-qualified name matches the rules are accepted. + +``` +db1.tbl1 +db2.tbl2 +db3.tbl3 +``` + +A plain name must only consist of valid [identifier characters](/schema-object-names.md), such as: + +* digits (`0` to `9`) +* letters (`a` to `z`, `A` to `Z`) +* `$` +* `_` +* non ASCII characters (U+0080 to U+10FFFF) + +All other ASCII characters are reserved. Some punctuations have special meanings, as described in the next section. + +### Wildcards + +Each part of the name can be a wildcard symbol described in [fnmatch(3)](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13): + +* `*` — matches zero or more characters +* `?` — matches one character +* `[a-z]` — matches one character between "a" and "z" inclusively +* `[!a-z]` — matches one character except "a" to "z". + +``` +db[0-9].tbl[0-9a-f][0-9a-f] +data.* +*.backup_* +``` + +"Character" here means a Unicode code point, such as: + +* U+00E9 (é) is 1 character. +* U+0065 U+0301 (é) are 2 characters. +* U+1F926 U+1F3FF U+200D U+2640 U+FE0F (🤦🏿‍♀️) are 5 characters. + +### File import + +To import a file as the filter rule, include an `@` at the beginning of the rule to specify the file name. The table filter parser treats each line of the imported file as additional filter rules. + +For example, if a file `config/filter.txt` has the following content: + +``` +employees.* +*.WorkOrder +``` + +the following two invocations are equivalent: + +```bash +./dumpling -f '@config/filter.txt' +./dumpling -f 'employees.*' -f '*.WorkOrder' +``` + +A filter file cannot further import another file. + +### Comments and blank lines + +Inside a filter file, leading and trailing white-spaces of every line are trimmed. Furthermore, blank lines (empty strings) are ignored. + +A leading `#` marks a comment and is ignored. `#` not at start of line is considered syntax error. + +``` +# this line is a comment +db.table # but this part is not comment and may cause error +``` + +### Exclusion + +An `!` at the beginning of the rule means the pattern after it is used to exclude tables from being processed. This effectively turns the filter into a block list. + +``` +*.* +#^ note: must add the *.* to include all tables first +!*.Password +!employees.salaries +``` + +### Escape character + +To turn a special character into an identifier character, precede it with a backslash `\`. + +``` +db\.with\.dots.* +``` + +For simplicity and future compatibility, the following sequences are prohibited: + +* `\` at the end of the line after trimming whitespaces (use `[ ]` to match a literal whitespace at the end). +* `\` followed by any ASCII alphanumeric character (`[0-9a-zA-Z]`). In particular, C-like escape sequences like `\0`, `\r`, `\n` and `\t` currently are meaningless. + +### Quoted identifier + +Besides `\`, special characters can also be suppressed by quoting using `"` or `` ` ``. + +``` +"db.with.dots"."tbl\1" +`db.with.dots`.`tbl\2` +``` + +The quotation mark can be included within an identifier by doubling itself. + +``` +"foo""bar".`foo``bar` +# equivalent to: +foo\"bar.foo\`bar +``` + +Quoted identifiers cannot span multiple lines. + +It is invalid to partially quote an identifier: + +``` +"this is "invalid*.* +``` + +### Regular expression + +In case very complex rules are needed, each pattern can be written as a regular expression delimited with `/`: + +``` +/^db\d{2,}$/./^tbl\d{2,}$/ +``` + +These regular expressions use the [Go dialect](https://pkg.go.dev/regexp/syntax?tab=doc). The pattern is matched if the identifier contains a substring matching the regular expression. For instance, `/b/` matches `db01`. + +> **Note:** +> +> Every `/` in the regular expression must be escaped as `\/`, including inside `[…]`. You cannot place an unescaped `/` between `\Q…\E`. + +## Multiple rules + +When a table name matches none of the rules in the filter list, the default behavior is to ignore such unmatched tables. + +To build a block list, an explicit `*.*` must be used as the first rule, otherwise all tables will be excluded. + +```bash +# every table will be filtered out +./dumpling -f '!*.Password' + +# only the "Password" table is filtered out, the rest are included. +./dumpling -f '*.*' -f '!*.Password' +``` + +In a filter list, if a table name matches multiple patterns, the last match decides the outcome. For instance: + +``` +# rule 1 +employees.* +# rule 2 +!*.dep* +# rule 3 +*.departments +``` + +The filtered outcome is as follows: + +| Table name | Rule 1 | Rule 2 | Rule 3 | Outcome | +|-----------------------|--------|--------|--------|------------------| +| irrelevant.table | | | | Default (reject) | +| employees.employees | ✓ | | | Rule 1 (accept) | +| employees.dept_emp | ✓ | ✓ | | Rule 2 (reject) | +| employees.departments | ✓ | ✓ | ✓ | Rule 3 (accept) | +| else.departments | | ✓ | ✓ | Rule 3 (accept) | + +> **Note:** +> +> In TiDB tools, the system schemas are always excluded regardless of the table filter settings. The system schemas are: +> +> * `INFORMATION_SCHEMA` +> * `PERFORMANCE_SCHEMA` +> * `METRICS_SCHEMA` +> * `INSPECTION_SCHEMA` +> * `mysql` +> * `sys` diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 797f1e4e79efa..0872929a9e42b 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -134,6 +134,26 @@ no-schema = false # schema encoding. character-set = "auto" +<<<<<<< HEAD +======= +# Assumes the input data are "strict" to speed up processing. +# Implications of strict-format = true are: +# * in CSV, every value cannot contain literal new lines (U+000A and U+000D, or \r and \n) even +# when quoted, i.e. new lines are strictly used to separate rows. +# Strict format allows Lightning to quickly locate split positions of a large file for parallel +# processing. However, if the input data is not strict, it may split a valid data in half and +# corrupt the result. +# The default value is false for safety over speed. +strict-format = false + +# If strict-format is true, Lightning will split large CSV files into multiple chunks to process in +# parallel. max-region-size is the maximum size of each chunk after splitting. +# max-region-size = 268_435_456 # Byte (default = 256 MB) + +# Only import tables if these wildcard rules are matched. See the corresponding section for details. +filter = ['*.*'] + +>>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) # Configures how CSV files are parsed. [mydumper.csv] # Separator between fields, should be an ASCII character. @@ -205,10 +225,13 @@ analyze = true switch-mode = "5m" # Duration between which an import progress is printed to the log. log-progress = "5m" +<<<<<<< HEAD # Table filter options. See the corresponding section for details. # [black-white-list] # ... +======= +>>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) ``` ### TiKV Importer @@ -289,7 +312,13 @@ min-available-ratio = 0.05 | -V | Prints program version | | | -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` | | -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` | +<<<<<<< HEAD | --log-file *file* | Log file path | `lightning.log-file` | +======= +| -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` | +| --backend *backend* | [Delivery backend](/tidb-lightning/tidb-lightning-tidb-backend.md) (`importer` or `tidb`) | `tikv-importer.backend` | +| --log-file *file* | Log file path (default = a temporary file in `/tmp`) | `lightning.log-file` | +>>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) | --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` | | --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` | | --pd-urls *host:port* | PD endpoint address | `tidb.pd-addr` | diff --git a/tidb-lightning/tidb-lightning-glossary.md b/tidb-lightning/tidb-lightning-glossary.md index 573c34f7a386e..de4900dcc561f 100644 --- a/tidb-lightning/tidb-lightning-glossary.md +++ b/tidb-lightning/tidb-lightning-glossary.md @@ -35,12 +35,6 @@ Back end is the destination where TiDB Lightning sends the parsed result. Also s See [TiDB Lightning TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md) for details. -### Black-white list - -A configuration list that specifies which tables to be imported and which should be excluded. - -See [TiDB Lightning Table Filter](/tidb-lightning/tidb-lightning-table-filter.md) for details. - ## C @@ -101,6 +95,16 @@ Engines use TiKV Importer's `import-dir` as temporary storage, which are sometim See also [data engine](/tidb-lightning/tidb-lightning-glossary.md#data-engine) and [index engine](/tidb-lightning/tidb-lightning-glossary.md#index-engine). + + +## F + +### Filter + +A configuration list that specifies which tables to be imported or excluded. + +See [Table Filter](/table-filter.md) for details. + ## I From f4dca8c8c8a48480a7d2705c216b3f1388c2bde9 Mon Sep 17 00:00:00 2001 From: Ran Date: Fri, 3 Jul 2020 16:46:56 +0800 Subject: [PATCH 2/3] delete br file; resolve conflicts; update toc --- TOC.md | 238 +------ br/backup-and-restore-tool.md | 668 ------------------ table-filter.md | 24 +- .../tidb-lightning-configuration.md | 31 +- 4 files changed, 4 insertions(+), 957 deletions(-) delete mode 100644 br/backup-and-restore-tool.md diff --git a/TOC.md b/TOC.md index 9157d14b028d3..f19e8cbbe8b0c 100644 --- a/TOC.md +++ b/TOC.md @@ -88,7 +88,6 @@ - [Identify Slow Queries](/identify-slow-queries.md) - [Identify Expensive Queries](/identify-expensive-queries.md) + Scale -<<<<<<< HEAD - [Scale using Ansible](/scale-tidb-using-ansible.md) - [Scale a TiDB Cluster](/horizontal-scale.md) + Upgrade @@ -97,130 +96,6 @@ - [TiDB Troubleshooting Map](/tidb-troubleshooting-map.md) - [Troubleshoot Cluster Setup](/troubleshoot-tidb-cluster.md) - [Troubleshoot TiDB Lightning](/troubleshoot-tidb-lightning.md) -======= - + [Use TiUP (Recommended)](/scale-tidb-using-tiup.md) - + [Use TiDB Ansible](/scale-tidb-using-ansible.md) - + [Use TiDB Operator](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/scale-a-tidb-cluster) - + Backup and Restore - + [Use Mydumper and TiDB Lightning](/backup-and-restore-using-mydumper-lightning.md) - + [Use Dumpling for Export or Backup](/export-or-backup-using-dumpling.md) - + Use BR Tool - + [Use BR Tool](/br/backup-and-restore-tool.md) - + [BR Use Cases](/br/backup-and-restore-use-cases.md) - + [BR storages](/br/backup-and-restore-storages.md) - + [Configure Time Zone](/configure-time-zone.md) - + [Daily Checklist](/daily-check.md) - + [Manage TiCDC Cluster and Replication Tasks](/ticdc/manage-ticdc.md) - + [Maintain TiFlash](/tiflash/maintain-tiflash.md) - + [Maintain TiDB Using TiUP](/maintain-tidb-using-tiup.md) - + [Maintain TiDB Using Ansible](/maintain-tidb-using-ansible.md) -+ Monitor and Alert - + [Monitoring Framework Overview](/tidb-monitoring-framework.md) - + [Monitoring API](/tidb-monitoring-api.md) - + [Deploy Monitoring Services](/deploy-monitoring-services.md) - + [TiDB Cluster Alert Rules](/alert-rules.md) - + [TiFlash Alert Rules](/tiflash/tiflash-alert-rules.md) -+ Troubleshoot - + [Identify Slow Queries](/identify-slow-queries.md) - + [SQL Diagnostics](/system-tables/system-table-sql-diagnostics.md) - + [Identify Expensive Queries](/identify-expensive-queries.md) - + [Statement Summary Tables](/statement-summary-tables.md) - + [Troubleshoot Cluster Setup](/troubleshoot-tidb-cluster.md) - + [TiDB Troubleshooting Map](/tidb-troubleshooting-map.md) - + [Troubleshoot TiCDC](/ticdc/troubleshoot-ticdc.md) - + [Troubleshoot TiFlash](/tiflash/troubleshoot-tiflash.md) -+ Performance Tuning - + System Tuning - + [Operating System Tuning](/tune-operating-system.md) - + Software Tuning - + Configuration - + [Tune TiDB Memory](/configure-memory-usage.md) - + [Tune TiKV Threads](/tune-tikv-thread-performance.md) - + [Tune TiKV Memory](/tune-tikv-memory-performance.md) - + [TiKV Follower Read](/follower-read.md) - + [TiFlash Tuning](/tiflash/tune-tiflash-performance.md) - + [Coprocessor Cache](/coprocessor-cache.md) - + SQL Tuning - + [SQL Tuning with `EXPLAIN`](/query-execution-plan.md) - + SQL Optimization - + [SQL Optimization Process](/sql-optimization-concepts.md) - + Logic Optimization - + [Join Reorder](/join-reorder.md) - + Physical Optimization - + [Statistics](/statistics.md) - + Control Execution Plan - + [Optimizer Hints](/optimizer-hints.md) - + [SQL Plan Management](/sql-plan-management.md) - + [Access Tables Using `IndexMerge`](/index-merge.md) -+ Tutorials - + [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) - + [Three Data Centers in Two Cities Deployment](/three-data-centers-in-two-cities-deployment.md) - + Best Practices - + [Use TiDB](/tidb-best-practices.md) - + [Java Application Development](/best-practices/java-app-best-practices.md) - + [Use HAProxy](/best-practices/haproxy-best-practices.md) - + [Highly Concurrent Write](/best-practices/high-concurrency-best-practices.md) - + [Grafana Monitoring](/best-practices/grafana-monitor-best-practices.md) - + [PD Scheduling](/best-practices/pd-scheduling-best-practices.md) - + [TiKV Performance Tuning with Massive Regions](/best-practices/massive-regions-best-practices.md) - + [Use Placement Rules](/configure-placement-rules.md) - + [Use Load Base Split](/configure-load-base-split.md) - + [Use Store Limit](/configure-store-limit.md) -+ TiDB Ecosystem Tools - + [Overview](/ecosystem-tool-user-guide.md) - + [Use Cases](/ecosystem-tool-user-case.md) - + [Download](/download-ecosystem-tools.md) - + Backup & Restore (BR) - + [BR FAQ](/br/backup-and-restore-faq.md) - + [Use BR Tool](/br/backup-and-restore-tool.md) - + [BR Use Cases](/br/backup-and-restore-use-cases.md) - + TiDB Binlog - + [Overview](/tidb-binlog/tidb-binlog-overview.md) - + [Deploy](/tidb-binlog/deploy-tidb-binlog.md) - + [Maintain](/tidb-binlog/maintain-tidb-binlog-cluster.md) - + [Configure](/tidb-binlog/tidb-binlog-configuration-file.md) - + [Pump](/tidb-binlog/tidb-binlog-configuration-file.md#pump) - + [Drainer](/tidb-binlog/tidb-binlog-configuration-file.md#drainer) - + [Upgrade](/tidb-binlog/upgrade-tidb-binlog.md) - + [Monitor](/tidb-binlog/monitor-tidb-binlog-cluster.md) - + [Reparo](/tidb-binlog/tidb-binlog-reparo.md) - + [binlogctl](/tidb-binlog/binlog-control.md) - + [Binlog Slave Client](/tidb-binlog/binlog-slave-client.md) - + [TiDB Binlog Relay Log](/tidb-binlog/tidb-binlog-relay-log.md) - + [Bidirectional Replication Between TiDB Clusters](/tidb-binlog/bidirectional-replication-between-tidb-clusters.md) - + [Glossary](/tidb-binlog/tidb-binlog-glossary.md) - + Troubleshoot - + [Troubleshoot](/tidb-binlog/troubleshoot-tidb-binlog.md) - + [Handle Errors](/tidb-binlog/handle-tidb-binlog-errors.md) - + [FAQ](/tidb-binlog/tidb-binlog-faq.md) - + TiDB Lightning - + [Overview](/tidb-lightning/tidb-lightning-overview.md) - + [Tutorial](/get-started-with-tidb-lightning.md) - + [Deploy](/tidb-lightning/deploy-tidb-lightning.md) - + [Configure](/tidb-lightning/tidb-lightning-configuration.md) - + Key Features - + [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md) - + [Table Filter](/table-filter.md) - + [CSV Support](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md) - + [TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md) - + [Web Interface](/tidb-lightning/tidb-lightning-web-interface.md) - + [Monitor](/tidb-lightning/monitor-tidb-lightning.md) - + [Troubleshoot](/troubleshoot-tidb-lightning.md) - + [FAQ](/tidb-lightning/tidb-lightning-faq.md) - + [Glossary](/tidb-lightning/tidb-lightning-glossary.md) - + [TiCDC](/ticdc/ticdc-overview.md) - + sync-diff-inspector - + [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md) - + [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md) - + [Data Check in Sharding Scenarios](/sync-diff-inspector/shard-diff.md) - + [Data Check for TiDB Upstream/Downstream Clusters](/sync-diff-inspector/upstream-downstream-diff.md) - + [Loader](/loader-overview.md) - + [Mydumper](/mydumper-overview.md) - + [Syncer](/syncer-overview.md) - + TiSpark - + [Quick Start](/get-started-with-tispark.md) - + [User Guide](/tispark-overview.md) ->>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) + Reference + SQL - [MySQL Compatibility](/mysql-compatibility.md) @@ -268,7 +143,6 @@ - [`SET`](/data-type-string.md#set-type) - [JSON Type](/data-type-json.md) + Functions and Operators -<<<<<<< HEAD - [Function and Operator Reference](/functions-and-operators/functions-and-operators-overview.md) - [Type Conversion in Expression Evaluation](/functions-and-operators/type-conversion-in-expression-evaluation.md) - [Operators](/functions-and-operators/operators.md) @@ -449,6 +323,7 @@ - [Overview](/ecosystem-tool-user-guide.md) - [Use Cases](/ecosystem-tool-user-case.md) - [Download](/download-ecosystem-tools.md) + - [Table Filter](/table-filter.md) - [Mydumper](/mydumper-overview.md) - [Syncer](/syncer-overview.md) - [Loader](/loader-overview.md) @@ -458,7 +333,7 @@ - [Deployment](/tidb-lightning/deploy-tidb-lightning.md) - [Configuration](/tidb-lightning/tidb-lightning-configuration.md) - [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md) - - [Table Filter](/tidb-lightning/tidb-lightning-table-filter.md) + - [Table Filter](/table-filter.md) - [CSV Support](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md) - [TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md) - [Web Interface](/tidb-lightning/tidb-lightning-web-interface.md) @@ -476,115 +351,6 @@ - [TiKV Control](/tikv-control.md) - [TiDB Control](/tidb-control.md) - [TiDB in Kubernetes](https://pingcap.com/docs/tidb-in-kubernetes/stable/) -======= - + [Overview](/functions-and-operators/functions-and-operators-overview.md) - + [Type Conversion in Expression Evaluation](/functions-and-operators/type-conversion-in-expression-evaluation.md) - + [Operators](/functions-and-operators/operators.md) - + [Control Flow Functions](/functions-and-operators/control-flow-functions.md) - + [String Functions](/functions-and-operators/string-functions.md) - + [Numeric Functions and Operators](/functions-and-operators/numeric-functions-and-operators.md) - + [Date and Time Functions](/functions-and-operators/date-and-time-functions.md) - + [Bit Functions and Operators](/functions-and-operators/bit-functions-and-operators.md) - + [Cast Functions and Operators](/functions-and-operators/cast-functions-and-operators.md) - + [Encryption and Compression Functions](/functions-and-operators/encryption-and-compression-functions.md) - + [Information Functions](/functions-and-operators/information-functions.md) - + [JSON Functions](/functions-and-operators/json-functions.md) - + [Aggregate (GROUP BY) Functions](/functions-and-operators/aggregate-group-by-functions.md) - + [Window Functions](/functions-and-operators/window-functions.md) - + [Miscellaneous Functions](/functions-and-operators/miscellaneous-functions.md) - + [Precision Math](/functions-and-operators/precision-math.md) - + [List of Expressions for Pushdown](/functions-and-operators/expressions-pushed-down.md) - + [Constraints](/constraints.md) - + [Generated Columns](/generated-columns.md) - + [SQL Mode](/sql-mode.md) - + Transactions - + [Overview](/transaction-overview.md) - + [Isolation Levels](/transaction-isolation-levels.md) - + [Optimistic Transactions](/optimistic-transaction.md) - + [Pessimistic Transactions](/pessimistic-transaction.md) - + Garbage Collection (GC) - + [Overview](/garbage-collection-overview.md) - + [Configuration](/garbage-collection-configuration.md) - + [Views](/views.md) - + [Partitioning](/partitioned-table.md) - + [Character Set and Collation](/character-set-and-collation.md) - + System Tables - + [`mysql`](/system-tables/system-table-overview.md) - + [`information_schema`](/system-tables/system-table-information-schema.md) - + sql-diagnosis - + [`cluster_info`](/system-tables/system-table-cluster-info.md) - + [`cluster_hardware`](/system-tables/system-table-cluster-hardware.md) - + [`cluster_config`](/system-tables/system-table-cluster-config.md) - + [`cluster_load`](/system-tables/system-table-cluster-load.md) - + [`cluster_systeminfo`](/system-tables/system-table-cluster-systeminfo.md) - + [`cluster_log`](/system-tables/system-table-cluster-log.md) - + [`metrics_schema`](/system-tables/system-table-metrics-schema.md) - + [`metrics_tables`](/system-tables/system-table-metrics-tables.md) - + [`metrics_summary`](/system-tables/system-table-metrics-summary.md) - + [`inspection_result`](/system-tables/system-table-inspection-result.md) - + [`inspection_summary`](/system-tables/system-table-inspection-summary.md) - + UI - + TiDB Dashboard - + [Overview](/dashboard/dashboard-intro.md) - + Maintain - + [Deploy](/dashboard/dashboard-ops-deploy.md) - + [Reverse Proxy](/dashboard/dashboard-ops-reverse-proxy.md) - + [Secure](/dashboard/dashboard-ops-security.md) - + [Access](/dashboard/dashboard-access.md) - + [Overview Page](/dashboard/dashboard-overview.md) - + [Cluster Info Page](/dashboard/dashboard-cluster-info.md) - + [Key Visualizer Page](/dashboard/dashboard-key-visualizer.md) - + SQL Statements Analysis - + [SQL Statements Page](/dashboard/dashboard-statement-list.md) - + [SQL Details Page](/dashboard/dashboard-statement-details.md) - + [Slow Queries Page](/dashboard/dashboard-slow-query.md) - + Cluster Diagnostics - + [Access Cluster Diagnostics Page](/dashboard/dashboard-diagnostics-access.md) - + [View Diagnostics Report](/dashboard/dashboard-diagnostics-report.md) - + [Use Diagnostics](/dashboard/dashboard-diagnostics-usage.md) - + [Search Logs Page](/dashboard/dashboard-log-search.md) - + [Profile Instances Page](/dashboard/dashboard-profiling.md) - + [FAQ](/dashboard/dashboard-faq.md) - + CLI - + [tikv-ctl](/tikv-control.md) - + [pd-ctl](/pd-control.md) - + [tidb-ctl](/tidb-control.md) - + [pd-recover](/pd-recover.md) - + Command Line Flags - + [tidb-server](/command-line-flags-for-tidb-configuration.md) - + [tikv-server](/command-line-flags-for-tikv-configuration.md) - + [tiflash-server](/tiflash/tiflash-command-line-flags.md) - + [pd-server](/command-line-flags-for-pd-configuration.md) - + Configuration File Parameters - + [tidb-server](/tidb-configuration-file.md) - + [tikv-server](/tikv-configuration-file.md) - + [tiflash-server](/tiflash/tiflash-configuration.md) - + [pd-server](/pd-configuration-file.md) - + System Variables - + [MySQL System Variables](/system-variables.md) - + [TiDB Specific System Variables](/tidb-specific-system-variables.md) - + Storage Engines - + TiFlash - + [Overview](/tiflash/tiflash-overview.md) - + [Use TiFlash](/tiflash/use-tiflash.md) - + TiUP - + [Documentation Guide](/tiup/tiup-documentation-guide.md) - + [Overview](/tiup/tiup-overview.md) - + [Terminology and Concepts](/tiup/tiup-terminology-and-concepts.md) - + [Manage TiUP Components](/tiup/tiup-component-management.md) - + [FAQ](/tiup/tiup-faq.md) - + [Troubleshooting Guide](/tiup/tiup-troubleshooting-guide.md) - + TiUP Components - + [tiup-playground](/tiup/tiup-playground.md) - + [tiup-cluster](/tiup/tiup-cluster.md) - + [tiup-mirror](/tiup/tiup-mirror.md) - + [tiup-bench](/tiup/tiup-bench.md) - + [Telemetry](/telemetry.md) - + [Errors Codes](/error-codes.md) - + [TiCDC Overview](/ticdc/ticdc-overview.md) - + [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md) - + [Table Filter](/table-filter.md) ->>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) + FAQs - [TiDB FAQs](/faq/tidb-faq.md) - [TiDB Lightning FAQs](/tidb-lightning/tidb-lightning-faq.md) diff --git a/br/backup-and-restore-tool.md b/br/backup-and-restore-tool.md deleted file mode 100644 index de5403a45fdb8..0000000000000 --- a/br/backup-and-restore-tool.md +++ /dev/null @@ -1,668 +0,0 @@ ---- -title: Use BR to Back up and Restore Data -summary: Learn how to back up and restore data of the TiDB cluster using BR. -category: how-to -aliases: ['/docs/dev/br/backup-and-restore-tool/','/docs/dev/reference/tools/br/br/','/docs/dev/how-to/maintain/backup-and-restore/br/'] ---- - -# Use BR to Back up and Restore Data - -[Backup & Restore](http://github.com/pingcap/br) (BR) is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with [`mydumper`/`loader`](/backup-and-restore-using-mydumper-lightning.md), BR is more suitable for scenarios of huge data volume. This document describes the BR command line, detailed use examples, best practices, restrictions, and introduces the implementation principles of BR. - -## Usage restrictions - -- BR only supports TiDB v3.1 and later versions. -- Currently, TiDB does not support backing up and restoring partitioned tables. -- Currently, you can perform restoration only on new clusters. -- It is recommended that you execute multiple backup operations serially. Otherwise, different backup operations might interfere with each other. - -## Recommended deployment configuration - -- It is recommended that you deploy BR on the PD node. -- It is recommended that you mount a high-performance SSD to BR nodes and all TiKV nodes. A 10-gigabit network card is recommended. Otherwise, bandwidth is likely to be the performance bottleneck during the backup and restore process. - -## Download Binary - -Refer to the [download page](/download-ecosystem-tools.md#br-backup-and-restore) for more information. - -## Implementation principles - -BR sends the backup or restoration commands to each TiKV node. After receiving these commands, TiKV performs the corresponding backup or restoration operations. Each TiKV node has a path in which the backup files generated in the backup operation are stored and from which the stored backup files are read during the restoration. - -### Backup principle - -When BR performs a backup operation, it first obtains the following information from PD: - -- The current TS (timestamp) as the time of the backup snapshot -- The TiKV node information of the current cluster - -According to these information, BR starts a TiDB instance internally to obtain the database or table information corresponding to the TS, and filters out the system databases (`information_schema`, `performance_schema`, `mysql`) at the same time. - -According to the backup sub-command, BR adopts the following two types of backup logic: - -- Full backup: BR traverses all the tables and constructs the KV range to be backed up according to each table. -- Single table backup: BR constructs the KV range to be backed up according a single table. - -Finally, BR collects the KV range to be backed up and sends the complete backup request to the TiKV node of the cluster. - -The structure of the request: - -``` -BackupRequest{ - ClusterId, // The cluster ID. - StartKey, // The starting key of the backup (backed up). - EndKey, // The ending key of the backup (not backed up). - StartVersion, // The version of the last backup snapshot, used for the incremental backup. - EndVersion, // The backup snapshot time. - StorageBackend, // The path where backup files are stored. - RateLimit, // Backup speed (MB/s). -} -``` - -After receiving the backup request, the TiKV node traverses all Region leaders on the node to find the Regions that overlap with the KV ranges in this request. The TiKV node backs up some or all of the data within the range, and generates the corresponding SST file. - -After finishing backing up the data of the corresponding Region, the TiKV node returns the metadata to BR. BR collects the metadata and stores it in the `backupmeta` file which is used for restoration. - -If `StartVersion` is not `0`, the backup is seen as an incremental backup. In addition to KVs, BR also collects DDLs between `[StartVersion, EndVersion)`. During data restoration, these DDLs are restored first. - -If checksum is enabled when you execute the backup command, BR calculates the checksum of each backed up table for data check. - -#### Types of backup files - -Two types of backup files are generated in the path where backup files are stored: - -- **The SST file**: stores the data that the TiKV node backed up. -- **The `backupmeta` file**: stores the metadata of this backup operation, including the number, the key range, the size, and the Hash (sha256) value of the backup files. - -#### The format of the SST file name - -The SST file is named in the format of `storeID_regionID_regionEpoch_keyHash_cf`, where - -- `storeID` is the TiKV node ID; -- `regionID` is the Region ID; -- `regionEpoch` is the version number of the Region; -- `keyHash` is the Hash (sha256) value of the startKey of a range, which ensures the uniqueness of a key; -- `cf` indicates the [Column Family](/tune-tikv-memory-performance.md) of RocksDB (`default` or `write` by default). - -### Restoration principle - -During the data restoration process, BR performs the following tasks in order: - -1. It parses the `backupmeta` file in the backup path, and then starts a TiDB instance internally to create the corresponding databases and tables based on the parsed information. - -2. It aggregates the parsed SST files according to the tables. - -3. It pre-splits Regions according to the key range of the SST file so that every Region corresponds to at least one SST file. - -4. It traverses each table to be restored and the SST file corresponding to each tables. - -5. It finds the Region corresponding to the SST file and sends a request to the corresponding TiKV node for downloading the file. Then it sends a request for loading the file after the file is successfully downloaded. - -After TiKV receives the request to load the SST file, TiKV uses the Raft mechanism to ensure the strong consistency of the SST data. After the downloaded SST file is loaded successfully, the file is deleted asynchronously. - -After the restoration operation is completed, BR performs a checksum calculation on the restored data to compare the stored data with the backed up data. - -![br-arch](/media/br-arch.png) - -## Command-line description - -A `br` command consists of sub-commands, options, and parameters. - -* Sub-command: the characters without `-` or `--`. -* Option: the characters that start with `-` or `--`. -* Parameter: the characters that immediately follow behind and are passed to the sub-command or the option. - -This is a complete `br` command: - -{{< copyable "shell-regular" >}} - -```shell -br backup full --pd "${PDIP}:2379" -s "local:///tmp/backup" -``` - -Explanations for the above command are as follows: - -* `backup`: the sub-command of `br`. -* `full`: the sub-command of `backup`. -* `-s` (or `--storage`): the option that specifies the path where the backup files are stored. -* `"local:///tmp/backup"`: the parameter of `-s`. `/tmp/backup` is the path in the local disk where the backed up files of each TiKV node are stored. -* `--pd`: the option that specifies the Placement Driver (PD) service address. -* `"${PDIP}:2379"`: the parameter of `--pd`. - -> **Note:** -> -> - When the `local` storage is used, the backup data are scattered in the local file system of each node. -> -> - It is **not recommended** to back up to a local disk in the production environment because you **have to** manually aggregate these data to complete the data restoration. For more information, see [Restore Cluster Data](#restore-cluster-data). -> -> - Aggregating these backup data might cause redundancy and bring troubles to operation and maintenance. Even worse, if restoring data without aggregating these data, you can receive a rather confusing error message `SST file not found`. -> -> - It is recommended to mount the NFS disk on each node, or back up to the `S3` object storage. - -### Sub-commands - -A `br` command consists of multiple layers of sub-commands. Currently, BR has the following three sub-commands: - -* `br backup`: used to back up the data of the TiDB cluster. -* `br restore`: used to restore the data of the TiDB cluster. - -Each of the above three sub-commands might still include the following three sub-commands to specify the scope of an operation: - -* `full`: used to back up or restore all the cluster data. -* `db`: used to back up or restore the specified database of the cluster. -* `table`: used to back up or restore a single table in the specified database of the cluster. - -### Common options - -* `--pd`: used for connection, specifying the PD server address. For example, `"${PDIP}:2379"`. -* `-h` (or `--help`): used to get help on all sub-commands. For example, `br backup --help`. -* `-V` (or `--version`): used to check the version of BR. -* `--ca`: specifies the path to the trusted CA certificate in the PEM format. -* `--cert`: specifies the path to the SSL certificate in the PEM format. -* `--key`: specifies the path to the SSL certificate key in the PEM format. -* `--status-addr`: specifies the listening address through which BR provides statistics to Prometheus. - -## Back up cluster data - -To back up the cluster data, use the `br backup` command. You can add the `full` or `table` sub-command to specify the scope of your backup operation: the whole cluster or a single table. - -If the backup time might exceed the [`tikv_gc_life_time`](/garbage-collection-configuration.md#tikv_gc_life_time) configuration which is `10m0s` by default (`10m0s` means 10 minutes), increase the value of this configuration. - -For example, set `tikv_gc_life_time` to `720h`: - -{{< copyable "sql" >}} - -```sql -mysql -h${TiDBIP} -P4000 -u${TIDB_USER} ${password_str} -Nse \ - "update mysql.tidb set variable_value='720h' where variable_name='tikv_gc_life_time'"; -``` - -### Back up all the cluster data - -To back up all the cluster data, execute the `br backup full` command. To get help on this command, execute `br backup full -h` or `br backup full --help`. - -**Usage example:** - -Back up all the cluster data to the `/tmp/backup` path of each TiKV node and write the `backupmeta` file to this path. - -> **Note:** -> -> + If the backup disk and the service disk are different, it has been tested that online backup reduces QPS of the read-only online service by about 15%-25% in case of full-speed backup. If you want to reduce the impact on QPS, use `--ratelimit` to limit the rate. -> -> + If the backup disk and the service disk are the same, the backup competes with the service for I/O resources. This might decrease the QPS of the read-only online service by more than half. Therefore, it is **highly not recommended** to back up the online service data to the TiKV data disk. - -{{< copyable "shell-regular" >}} - -```shell -br backup full \ - --pd "${PDIP}:2379" \ - --storage "local:///tmp/backup" \ - --ratelimit 120 \ - --log-file backupfull.log -``` - -Explanations for some options in the above command are as follows: - -* `--ratelimit`: specifies the maximum speed at which a backup operation is performed (MiB/s) on each TiKV node. -* `--log-file`: specifies writing the BR log to the `backupfull.log` file. - -A progress bar is displayed in the terminal during the backup. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety. The progress bar is displayed as follows: - -```shell -br backup full \ - --pd "${PDIP}:2379" \ - --storage "local:///tmp/backup" \ - --ratelimit 120 \ - --log-file backupfull.log -Full Backup <---------/................................................> 17.12%. -``` - -### Back up a database - -To back up a database in the cluster, execute the `br backup db` command. To get help on this command, execute `br backup db -h` or `br backup db --help`. - -**Usage example:** - -Back up the data of the `test` database to the `/tmp/backup` path on each TiKV node and write the `backupmeta` file to this path. - -{{< copyable "shell-regular" >}} - -```shell -br backup db \ - --pd "${PDIP}:2379" \ - --db test \ - --storage "local:///tmp/backup" \ - --ratelimit 120 \ - --log-file backuptable.log -``` - -In the above command, `--db` specifies the name of the database to be backed up. For descriptions of other options, see [Back up all the cluster data](/br/backup-and-restore-tool.md#back-up-all-the-cluster-data). - -A progress bar is displayed in the terminal during the backup. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety. - -### Back up a table - -To back up the data of a single table in the cluster, execute the `br backup table` command. To get help on this command, execute `br backup table -h` or `br backup table --help`. - -**Usage example:** - -Back up the data of the `test.usertable` table to the `/tmp/backup` path on each TiKV node and write the `backupmeta` file to this path. - -{{< copyable "shell-regular" >}} - -```shell -br backup table \ - --pd "${PDIP}:2379" \ - --db test \ - --table usertable \ - --storage "local:///tmp/backup" \ - --ratelimit 120 \ - --log-file backuptable.log -``` - -The `table` sub-command has two options: - -* `--db`: specifies the database name -* `--table`: specifies the table name. - -For descriptions of other options, see [Back up all cluster data](#back-up-all-the-cluster-data). - -A progress bar is displayed in the terminal during the backup operation. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety. - -### Back up with table filter - -To back up multiple tables with more complex criteria, execute the `br backup full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`. - -**Usage example:** - -The following command backs up the data of all tables in the form `db*.tbl*` to the `/tmp/backup` path on each TiKV node and writes the `backupmeta` file to this path. - -{{< copyable "shell-regular" >}} - -```shell -br backup full \ - --pd "${PDIP}:2379" \ - --filter 'db*.tbl*' \ - --storage "local:///tmp/backup" \ - --ratelimit 120 \ - --log-file backupfull.log -``` - -### Back up data to Amazon S3 backend - -If you back up the data to the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3. - -You can refer to the [AWS Official Document](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) to create an S3 `Bucket` in the specified `Region`. You can also refer to another [AWS Official Document](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html) to create a `Folder` in the `Bucket`. - -Pass `SecretKey` and `AccessKey` of the account that has privilege to access the S3 backend to the BR node. Here `SecretKey` and `AccessKey` are passed as environment variables. Then pass the privilege to the TiKV node through BR. - -{{< copyable "shell-regular" >}} - -```shell -export AWS_ACCESS_KEY_ID=${AccessKey} -export AWS_SECRET_ACCESS_KEY=${SecretKey} -``` - -When backing up using BR, explicitly specify the parameters `--s3.region` and `--send-credentials-to-tikv`. `--s3.region` indicates the region where S3 is located, and `--send-credentials-to-tikv` means passing the privilege to access S3 to the TiKV node. - -{{< copyable "shell-regular" >}} - -```shell -br backup full \ - --pd "${PDIP}:2379" \ - --storage "s3://${Bucket}/${Folder}" \ - --s3.region "${region}" \ - --send-credentials-to-tikv=true \ - --log-file backuptable.log -``` - -### Back up incremental data - -If you want to back up incrementally, you only need to specify the **last backup timestamp** `--lastbackupts`. - -The incremental backup has two limitations: - -- The incremental backup needs to be under a different path from the previous full backup. -- No GC (Garbage Collection) happens between the start time of the incremental backup and `lastbackupts`. - -To back up the incremental data between `(LAST_BACKUP_TS, current PD timestamp]`, execute the following command: - -{{< copyable "shell-regular" >}} - -```shell -br backup full\ - --pd ${PDIP}:2379 \ - -s local:///home/tidb/backupdata/incr \ - --lastbackupts ${LAST_BACKUP_TS} -``` - -To get the timestamp of the last backup, execute the `validate` command. For example: - -{{< copyable "shell-regular" >}} - -```shell -LAST_BACKUP_TS=`br validate decode --field="end-version" -s local:///home/tidb/backupdata` -``` - -In the above example, the incremental backup data includes the newly written data and the DDLs between `(LAST_BACKUP_TS, current PD timestamp]`. When restoring data, BR restores DDLs first and then restores the written data. - -### Back up Raw KV (experimental feature) - -> **Warning:** -> -> This feature is experimental and not thoroughly tested. It is highly **not recommended** to use this feature in the production environment. - -In some scenarios, TiKV might run independently of TiDB. Given that, BR also supports bypassing the TiDB layer and backing up data in TiKV. - -For example, you can execute the following command to back up all keys between `[0x31, 0x3130303030303030)` in the default CF to `$BACKUP_DIR`: - -{{< copyable "shell-regular" >}} - -```shell -br backup raw --pd $PD_ADDR \ - -s "local://$BACKUP_DIR" \ - --start 31 \ - --end 3130303030303030 \ - --format hex \ - --cf default -``` - -Here, the parameters of `--start` and `--end` are decoded using the method specified by `--format` before being sent to TiKV. Currently, the following methods are available: - -- "raw": The input string is directly encoded as a key in binary format. -- "hex": The default encoding method. The input string is treated as a hexadecimal number. -- "escape": First escape the input string, and then encode it into binary format. - -## Restore cluster data - -To restore the cluster data, use the `br restore` command. You can add the `full`, `db` or `table` sub-command to specify the scope of your restoration: the whole cluster, a database or a single table. - -> **Note:** -> -> If you use the local storage, you **must** copy all back up SST files to every TiKV node in the path specified by `--storage`. -> -> Even if each TiKV node eventually only need to read a part of the all SST files, they all need full access to the complete archive because: -> -> - Data are replicated into multiple peers. When ingesting SSTs, these files have to be present on *all* peers. This is unlike back up where reading from a single node is enough. -> - Where each peer is scattered to during restore is random. We don't know in advance which node will read which file. -> -> These can be avoided using shared storage, for example mounting an NFS on the local path, or using S3. With network storage, every node can automatically read every SST file, so these caveats no longer apply. - -### Restore all the backup data - -To restore all the backup data to the cluster, execute the `br restore full` command. To get help on this command, execute `br restore full -h` or `br restore full --help`. - -**Usage example:** - -Restore all the backup data in the `/tmp/backup` path to the cluster. - -{{< copyable "shell-regular" >}} - -```shell -br restore full \ - --pd "${PDIP}:2379" \ - --storage "local:///tmp/backup" \ - --ratelimit 128 \ - --log-file restorefull.log -``` - -Explanations for some options in the above command are as follows: - -* `--ratelimit`: specifies the maximum speed at which a restoration operation is performed (MiB/s) on each TiKV node. -* `--log-file`: specifies writing the BR log to the `restorefull.log` file. - -A progress bar is displayed in the terminal during the restoration. When the progress bar advances to 100%, the restoration is complete. Then the BR also checks the backup data to ensure data safety. - -```shell -br restore full \ - --pd "${PDIP}:2379" \ - --storage "local:///tmp/backup" \ - --log-file restorefull.log -Full Restore <---------/...............................................> 17.12%. -``` - -### Restore a database - -To restore a database to the cluster, execute the `br restore db` command. To get help on this command, execute `br restore db -h` or `br restore db --help`. - -**Usage example:** - -Restore a database backed up in the `/tmp/backup` path to the cluster. - -{{< copyable "shell-regular" >}} - -```shell -br restore db \ - --pd "${PDIP}:2379" \ - --db "test" \ - --storage "local:///tmp/backup" \ - --log-file restorefull.log -``` - -In the above command, `--db` specifies the name of the database to be restored. For descriptions of other options, see [Restore all backup data](#restore-all-the-backup-data)). - -### Restore a table - -To restore a single table to the cluster, execute the `br restore table` command. To get help on this command, execute `br restore table -h` or `br restore table --help`. - -**Usage example:** - -Restore a table backed up in the `/tmp/backup` path to the cluster. - -{{< copyable "shell-regular" >}} - -```shell -br restore table \ - --pd "${PDIP}:2379" \ - --db "test" \ - --table "usertable" \ - --storage "local:///tmp/backup" \ - --log-file restorefull.log -``` - -In the above command, `--table` specifies the name of the table to be restored. For descriptions of other options, see [Restore all backup data](#restore-all-the-backup-data) and [Restore a database](#restore-a-database). - -### Restore with table filter - -To restore multiple tables with more complex criteria, execute the `br restore full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`. - -**Usage example:** - -The following command restores a subset of tables backed up in the `/tmp/backup` path to the cluster. - -{{< copyable "shell-regular" >}} - -```shell -br restore full \ - --pd "${PDIP}:2379" \ - --filter 'db*.tbl*' \ - --storage "local:///tmp/backup" \ - --log-file restorefull.log -``` - -### Restore data from Amazon S3 backend - -If you restore data from the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3. - -Pass `SecretKey` and `AccessKey` of the account that has privilege to access the S3 backend to the BR node. Here `SecretKey` and `AccessKey` are passed as environment variables. Then pass the privilege to the TiKV node through BR. - -{{< copyable "shell-regular" >}} - -```shell -export AWS_ACCESS_KEY_ID=${AccessKey} -export AWS_SECRET_ACCESS_KEY=${SecretKey} -``` - -When restoring data using BR, explicitly specify the parameters `--s3.region` and `--send-credentials-to-tikv`. `--s3.region` indicates the region where S3 is located, and `--send-credentials-to-tikv` means passing the privilege to access S3 to the TiKV node. - -`Bucket` and `Folder` in the `--storage` parameter represent the S3 bucket and the folder where the data to be restored is located. - -{{< copyable "shell-regular" >}} - -```shell -br restore full \ - --pd "${PDIP}:2379" \ - --storage "s3://${Bucket}/${Folder}" \ - --s3.region "${region}" \ - --send-credentials-to-tikv=true \ - --log-file restorefull.log -``` - -In the above command, `--table` specifies the name of the table to be restored. For descriptions of other options, see [Restore a database](#restore-a-database). - -### Restore incremental data - -Restoring incremental data is similar to [restoring full data using BR](#restore-all-the-backup-data). Note that when restoring incremental data, make sure that all the data backed up before `last backup ts` has been restored to the target cluster. - -### Restore Raw KV (experimental feature) - -> **Warning:** -> -> This feature is in the experiment, without being thoroughly tested. It is highly **not recommended** to use this feature in the production environment. - -Similar to [backing up Raw KV](#back-up-raw-kv-experimental-feature), you can execute the following command to restore Raw KV: - -{{< copyable "shell-regular" >}} - -```shell -br restore raw --pd $PD_ADDR \ - -s "local://$BACKUP_DIR" \ - --start 31 \ - --end 3130303030303030 \ - --format hex \ - --cf default -``` - -In the above example, all the backed up keys in the range `[0x31, 0x3130303030303030)` are restored to the TiKV cluster. The coding methods of these keys are identical to that of [keys during the backup process](#back-up-raw-kv-experimental-feature) - -### Online restore (experimental feature) - -> **Warning:** -> -> This feature is in the experiment, without being thoroughly tested. It also relies on the unstable `Placement Rules` feature of PD. It is highly **not recommended** to use this feature in the production environment. - -During data restoration, writing too much data affects the performance of the online cluster. To avoid this effect as much as possible, BR supports [Placement rules](/configure-placement-rules.md) to isolate resources. In this case, downloading and importing SST are only performed on a few specified nodes (or "restore nodes" for short). To complete the online restore, take the following steps. - -1. Configure PD, and start Placement rules: - - {{< copyable "shell-regular" >}} - - ```shell - echo "config set enable-placement-rules true" | pd-ctl - ``` - -2. Edit the configuration file of the "restore node" in TiKV, and specify "restore" to the `server` configuration item: - - {{< copyable "" >}} - - ``` - [server] - labels = { exclusive = "restore" } - ``` - -3. Start TiKV of the "restore node" and restore the backed up files using BR. Compared with the offline restore, you only need to add the `--online` flag: - - {{< copyable "shell-regular" >}} - - ``` - br restore full \ - -s "local://$BACKUP_DIR" \ - --pd $PD_ADDR \ - --online - ``` - -## Best practices - -- It is recommended that you mount a shared storage (for example, NFS) on the backup path specified by `-s`, to make it easier to collect and manage backup files. -- It is recommended that you use a storage hardware with high throughput, because the throughput of a storage hardware limits the backup and restoration speed. -- It is recommended that you perform the backup operation during off-peak hours to minimize the impact on applications. - -For more recommended practices of using BR, refer to [BR Usage Scenarios](/br/backup-and-restore-use-cases.md). - -## Examples - -This section shows how to back up and restore the data of an existing cluster. You can estimate the performance of backup and restoration based on machine performance, configuration and data volume. - -### Data volume and machine configuration - -Suppose that the backup and restoration operations are performed on 10 tables in the TiKV cluster, each table with 5 million rows of data. The total data volume is 35 GB. - -```sql -MySQL [sbtest]> show tables; -+------------------+ -| Tables_in_sbtest | -+------------------+ -| sbtest1 | -| sbtest10 | -| sbtest2 | -| sbtest3 | -| sbtest4 | -| sbtest5 | -| sbtest6 | -| sbtest7 | -| sbtest8 | -| sbtest9 | -+------------------+ - -MySQL [sbtest]> select count(*) from sbtest1; -+----------+ -| count(*) | -+----------+ -| 5000000 | -+----------+ -1 row in set (1.04 sec) -``` - -The table structure is as follows: - -```sql -CREATE TABLE `sbtest1` ( - `id` int(11) NOT NULL AUTO_INCREMENT, - `k` int(11) NOT NULL DEFAULT '0', - `c` char(120) NOT NULL DEFAULT '', - `pad` char(60) NOT NULL DEFAULT '', - PRIMARY KEY (`id`), - KEY `k_1` (`k`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=5138499 -``` - -Suppose that 4 TiKV nodes is used, each with the following configuration: - -| CPU | Memory | Disk | Number of replicas | -| :-------- | :------ | :---- | :------------------ | -| 16 cores | 32 GB | SSD | 3 | - -### Backup - -Before the backup operation, check the following two items: - -- You have set `tikv_gc_life_time` set to a larger value so that the backup operation will not be interrupted because of data loss. -- No DDL statement is being executed on the TiDB cluster. - -Then execute the following command to back up all the cluster data: - -{{< copyable "shell-regular" >}} - -```shell -bin/br backup full -s local:///tmp/backup --pd "${PDIP}:2379" --log-file backup.log -``` - -``` -[INFO] [collector.go:165] ["Full backup summary: total backup ranges: 2, total success: 2, total failed: 0, total take(s): 0.00, total kv: 4, total size(Byte): 133, avg speed(Byte/s): 27293.78"] ["backup total regions"=2] ["backup checksum"=1.640969ms] ["backup fast checksum"=227.885µs] -``` - -### Restoration - -Before the restoration, make sure that the TiKV cluster to be restored is a new cluster. - -Then execute the following command to restore all the cluster data: - -{{< copyable "shell-regular" >}} - -```shell -bin/br restore full -s local:///tmp/backup --pd "${PDIP}:2379" --log-file restore.log -``` - -``` -[INFO] [collector.go:165] ["Full Restore summary: total restore tables: 1, total success: 1, total failed: 0, total take(s): 0.26, total kv: 20000, total size(MB): 10.98, avg speed(MB/s): 41.95"] ["restore files"=3] ["restore ranges"=2] ["split region"=0.562369381s] ["restore checksum"=36.072769ms] -``` diff --git a/table-filter.md b/table-filter.md index 80f44ac522ec8..d2831c8c6dbfa 100644 --- a/table-filter.md +++ b/table-filter.md @@ -2,7 +2,7 @@ title: Table Filter summary: Usage of table filter feature in TiDB tools. category: reference -aliases: ['/docs/dev/tidb-lightning/tidb-lightning-table-filter/','/docs/dev/reference/tools/tidb-lightning/table-filter/','/tidb/dev/tidb-lightning-table-filter/'] +aliases: ['/docs/v3.0/tidb-lightning/tidb-lightning-table-filter/','/docs/v3.0/reference/tools/tidb-lightning/table-filter/','/tidb/v3.0/tidb-lightning-table-filter/'] --- # Table Filter @@ -17,17 +17,6 @@ Since TiDB 4.0, all TiDB ecosystem tools share a common filter syntax to define Table filters can be applied to the tools using multiple `-f` or `--filter` command line parameters. Each filter is in the form of `db.table`, where each part can be a wildcard (further explained in the [next section](#wildcards)). The following lists the example usage in each tool. -* [BR](/br/backup-and-restore-tool.md): - - {{< copyable "shell-regular" >}} - - ```shell - ./br backup full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup' - # ^~~~~~~~~~~~~~~~~~~~~~~ - ./br restore full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup' - # ^~~~~~~~~~~~~~~~~~~~~~~ - ``` - * [Dumpling](/export-or-backup-using-dumpling.md): {{< copyable "shell-regular" >}} @@ -57,17 +46,6 @@ Table filters in TOML files are specified as [array of strings](https://toml.io/ filter = ['foo*.*', 'bar*.*'] ``` -* [TiCDC](/ticdc/ticdc-overview.md): - - ```toml - [filter] - rules = ['foo*.*', 'bar*.*'] - - [[sink.dispatchers]] - matcher = ['db1.*', 'db2.*', 'db3.*'] - dispatcher = 'ts' - ``` - ## Syntax ### Plain table names diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 0872929a9e42b..11752fc466d59 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -134,26 +134,9 @@ no-schema = false # schema encoding. character-set = "auto" -<<<<<<< HEAD -======= -# Assumes the input data are "strict" to speed up processing. -# Implications of strict-format = true are: -# * in CSV, every value cannot contain literal new lines (U+000A and U+000D, or \r and \n) even -# when quoted, i.e. new lines are strictly used to separate rows. -# Strict format allows Lightning to quickly locate split positions of a large file for parallel -# processing. However, if the input data is not strict, it may split a valid data in half and -# corrupt the result. -# The default value is false for safety over speed. -strict-format = false - -# If strict-format is true, Lightning will split large CSV files into multiple chunks to process in -# parallel. max-region-size is the maximum size of each chunk after splitting. -# max-region-size = 268_435_456 # Byte (default = 256 MB) - # Only import tables if these wildcard rules are matched. See the corresponding section for details. filter = ['*.*'] ->>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) # Configures how CSV files are parsed. [mydumper.csv] # Separator between fields, should be an ASCII character. @@ -225,13 +208,6 @@ analyze = true switch-mode = "5m" # Duration between which an import progress is printed to the log. log-progress = "5m" -<<<<<<< HEAD - -# Table filter options. See the corresponding section for details. -# [black-white-list] -# ... -======= ->>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) ``` ### TiKV Importer @@ -312,13 +288,8 @@ min-available-ratio = 0.05 | -V | Prints program version | | | -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` | | -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` | -<<<<<<< HEAD -| --log-file *file* | Log file path | `lightning.log-file` | -======= | -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` | -| --backend *backend* | [Delivery backend](/tidb-lightning/tidb-lightning-tidb-backend.md) (`importer` or `tidb`) | `tikv-importer.backend` | -| --log-file *file* | Log file path (default = a temporary file in `/tmp`) | `lightning.log-file` | ->>>>>>> 68375a7... tidb-lightning,br: replaced black-white-list by table-filter (#3065) +| --log-file *file* | Log file path | `lightning.log-file` | | --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` | | --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` | | --pd-urls *host:port* | PD endpoint address | `tidb.pd-addr` | From e602a99159fdd3d17e9e9389fc21da34d78f2e36 Mon Sep 17 00:00:00 2001 From: Ran Date: Mon, 6 Jul 2020 14:11:03 +0800 Subject: [PATCH 3/3] Update table-filter.md Co-authored-by: kennytm --- table-filter.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/table-filter.md b/table-filter.md index d2831c8c6dbfa..59b64ef888b47 100644 --- a/table-filter.md +++ b/table-filter.md @@ -9,7 +9,7 @@ aliases: ['/docs/v3.0/tidb-lightning/tidb-lightning-table-filter/','/docs/v3.0/r The TiDB ecosystem tools operate on all the databases by default, but oftentimes only a subset is needed. For example, you only want to work with the schemas in the form of `foo*` and `bar*` and nothing else. -Since TiDB 4.0, all TiDB ecosystem tools share a common filter syntax to define subsets. This document describes how to use the table filter feature. +Several TiDB ecosystem tools share a common filter syntax to define subsets. This document describes how to use the table filter feature. ## Usage