Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiFlash results materialization #11121

Merged
merged 25 commits into from Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
decfcd5
add translations
qiancai Nov 2, 2022
70586e5
wording updates
qiancai Nov 2, 2022
b565cd2
refine
qiancai Nov 2, 2022
6d063d4
Update tiflash-results-materialization.md
qiancai Nov 2, 2022
0fe40b7
Update tiflash-results-materialization.md
qiancai Nov 2, 2022
13b1d2a
align with Chinese changes
qiancai Dec 1, 2022
f3555ca
align with Chinese changes
qiancai Dec 1, 2022
fdb3d0f
refine the wording
qiancai Dec 1, 2022
fda0eca
Update tiflash-results-materialization.md
qiancai Dec 1, 2022
27c36ef
Update experimental-features.md
qiancai Dec 2, 2022
b10b881
align with Chinese
qiancai Dec 6, 2022
6275d37
Update system-variables.md
qiancai Dec 6, 2022
8e876cb
Update tiflash-results-materialization.md
qiancai Dec 6, 2022
8480880
Update tiflash-results-materialization.md
qiancai Dec 6, 2022
4a31afc
Merge remote-tracking branch 'upstream/master' into tiflash-results-m…
qiancai Dec 6, 2022
904a944
add tiflash-results-materialization.md for TiDB Cloud
qiancai Dec 6, 2022
6f78f5d
sync changes
qiancai Dec 7, 2022
3f4d716
refine the sentences
qiancai Dec 7, 2022
551b4fa
Update tiflash-results-materialization.md
qiancai Dec 7, 2022
7924bef
use custom content for tidb-cloud
qiancai Dec 7, 2022
8e503d2
Update tiflash-results-materialization.md
qiancai Dec 7, 2022
0ae3d7a
Update system-variables.md
qiancai Dec 7, 2022
a2b66a0
Apply suggestions from code review
qiancai Dec 8, 2022
184c67c
Update tiflash-results-materialization.md
qiancai Dec 8, 2022
bd0b9d7
remove a sentence that is under discussion
qiancai Dec 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions TOC-tidb-cloud.md
Expand Up @@ -88,6 +88,7 @@
- [Read Data from TiFlash](/tiflash/use-tidb-to-read-tiflash.md)
- [Use MPP Mode](/tiflash/use-tiflash-mpp-mode.md)
- [Supported Push-down Calculations](/tiflash/tiflash-supported-pushdown-calculations.md)
- [TiFlash Query Result Materialization](/tiflash/tiflash-results-materialization.md)
- [Compatibility](/tiflash/tiflash-compatibility.md)
- [Scale a TiDB Cluster](/tidb-cloud/scale-tidb-cluster.md)
- [Pause or Resume a TiDB Cluster](/tidb-cloud/pause-or-resume-tidb-cluster.md)
Expand Down
1 change: 1 addition & 0 deletions TOC.md
Expand Up @@ -874,6 +874,7 @@
- [Use TiSpark to Read TiFlash Replicas](/tiflash/use-tispark-to-read-tiflash.md)
- [Use MPP Mode](/tiflash/use-tiflash-mpp-mode.md)
- [Supported Push-down Calculations](/tiflash/tiflash-supported-pushdown-calculations.md)
- [TiFlash Query Result Materialization](/tiflash/tiflash-results-materialization.md)
- [Data Validation](/tiflash/tiflash-data-validation.md)
- [Compatibility](/tiflash/tiflash-compatibility.md)
- [Telemetry](/telemetry.md)
Expand Down
1 change: 1 addition & 0 deletions experimental-features.md
Expand Up @@ -36,6 +36,7 @@ Elastic scheduling feature. It enables the TiDB cluster to dynamically scale out
+ [Range INTERVAL partitioning](/partitioned-table.md#range-interval-partitioning) (Introduced in v6.3.0)
+ [Add index acceleration](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) (Introduced in v6.3.0)
+ [Restore a cluster to a specific point in time using the `FLASHBACK CLUSTER TO TIMESTAMP` syntax](/sql-statements/sql-statement-flashback-to-timestamp.md) (Introduced in v6.4.0)
+ [TiFlash Query Result Materialization](/tiflash/tiflash-results-materialization.md) (Introduced in v6.5.0)

## Storage

Expand Down
7 changes: 5 additions & 2 deletions system-variables.md
Expand Up @@ -1823,13 +1823,16 @@ Query OK, 0 rows affected (0.09 sec)

> **Warning:**
>
> The feature controlled by this variable is not fully functional in the current TiDB version. Do not change the default value.
> The feature controlled by this variable is experimental in the current TiDB version. It is not recommended that you use it for production environments.

- Scope: SESSION | GLOBAL
- Persists to cluster: Yes
- Type: Boolean
- Default value: `OFF`
- This variable controls whether read requests in SQL write statements can be pushed down to TiFlash.
- This variable controls whether read operations in SQL statements containing `INSERT`, `DELETE`, and `UPDATE` can be pushed down to TiFlash. For example:

- `SELECT` queries in `INSERT INTO SELECT` statements (typical usage scenario: [TiFlash query result materialization](/tiflash/tiflash-results-materialization.md))
- `WHERE` condition filtering in `UPDATE` and `DELETE` statements

### tidb_enable_top_sql <span class="version-mark">New in v5.4.0</span>

Expand Down
81 changes: 81 additions & 0 deletions tiflash/tiflash-results-materialization.md
@@ -0,0 +1,81 @@
---
title: TiFlash Query Result Materialization
summary: Learn how to save the query results of TiFlash in a transaction.
---

# TiFlash Query Result Materialization

> **Warning:**
>
> This is an experimental feature, which might be changed or removed without prior notice. The syntax and implementation might change before GA. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) in GitHub.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

This document introduces how to save the TiFlash query result to a specified TiDB table in an `INSERT INTO SELECT` transaction.

Starting from v6.5.0, TiDB supports saving TiFlash query results in tables, that is, TiFlash query result materialization. During the execution of the `INSERT INTO SELECT` statement, if TiDB pushes down the `SELECT` subquery to TiFlash, the TiFlash query result can be saved to a TiDB table specified in `INSERT INTO`. For TiDB versions earlier than v6.5.0, the TiFlash query results are read-only, so if you want to save TiFlash query results, you have to obtain them from the application level, and then save them in a separate transaction or process.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

> **Note:**
>
> - By default ([`tidb_allow_mpp = ON`](/system-variables.md#tidb_allow_mpp-new-in-v50)), the TiDB optimizer intelligently chooses to push down queries to TiKV or TiFlash based on the query cost. To enforce that the queries are pushed down to TiFlash, you can set the system variable [`tidb_enforce_mpp`](/system-variables.md#tidb_enforce_mpp-new-in-v51) to `ON`.
> - During the experimental phase, this feature is disabled by default. To enable this feature, you can set the system variable [`tidb_enable_tiflash_read_for_write_stmt`](/system-variables.md#tidb_enable_tiflash_read_for_write_stmt-new-in-v630) to `ON`.

The syntax of `INSERT INTO SELECT` is as follows.

```sql
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
[PARTITION (partition_name [, partition_name] ...)]
[(col_name [, col_name] ...)]
SELECT ...
[ON DUPLICATE KEY UPDATE assignment_list]value:
{expr | DEFAULT}

assignment:
col_name = valueassignment_list:
assignment [, assignment] ...
```

For example, you can save the query result from table `t1` in the `SELECT` clause to table `t2` with the following `INSERT INTO SELECT` statement:
qiancai marked this conversation as resolved.
Show resolved Hide resolved

```sql
INSERT INTO t2 (name, country)
SELECT app_name, country FROM t1;
```

## Typical and recommended usage scenarios

- Efficient BI solutions

For many BI applications, the analysis query requests are very heavy. For example, when a lot of users access and refresh a report at the same time, a BI application needs to handle a lot of concurrent query requests. To deal with this situation effectively, you can use `INSERT INTO SELECT` to save the query results of the report in a TiDB table. Then, the end users can query data directly from the result table when the report is refreshed, which avoids multiple repeated computations and analyses. Similarly, by saving historical analysis results, you can further reduce the computation volume for long-time historical data analysis. For example, if you have a report `A` that is used to analyze daily sales profit, you can save the results of report `A` to a result table `T` using `INSERT INTO SELECT`. Then, when you need to generate a report `B` to analyze the sales profit of the past month, you can directly use the daily analysis results in table `T`. This way not only greatly reduces the computation volume but also improves the query response speed and reduces the system load.

- Serving online applications with TiFlash

The number of concurrent requests supported by TiFlash depends on the volume of data and complexity of the queries, but it typically does not exceed 100 QPS. You can use `INSERT INTO SELECT` to save TiFlash query results, and then use the query result tables to support highly concurrent online requests. The data in result tables can be updated in the background at a low frequency (for example, at an interval of 0.5 second), which is well below the TiFlash concurrency limit, while still maintaining a high level of data freshness.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

## Execution process

* During the execution of the `INSERT INTO SELECT` statement, TiFlash first returns the query results of the `SELECT` clause to a TiDB server node in the cluster, and then writes the results to the target table (which can have a TiFlash replica).
qiancai marked this conversation as resolved.
Show resolved Hide resolved
* The execution of the `INSERT INTO SELECT` statement guarantees ACID properties.

## Restrictions

<CustomContent platform="tidb">

* The TiDB memory limit on the `INSERT INTO SELECT` statement can be adjusted using the system variable [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query). Starting from v6.5.0, it is not recommended to use [txn-total-size-limit](/tidb-configuration-file.md#txn-total-size-limit) to control transaction memory size.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

For more information, see [TiDB memory control](/configure-memory-usage.md).

</CustomContent>

<CustomContent platform="tidb-cloud">

* The TiDB memory limit on the `INSERT INTO SELECT` statement can be adjusted using the system variable [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query). Starting from v6.5.0, it is not recommended to use [txn-total-size-limit](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#txn-total-size-limit) to control transaction memory size.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

For more information, see [TiDB memory control](https://docs.pingcap.com/tidb/stable/configure-memory-usage).

</CustomContent>

* TiDB has no hard limit on the concurrency of the `INSERT INTO SELECT` statement, but it is recommended to consider the following practices:

* When a "write transaction" is large, such as close to 1 GiB, it is recommended to control concurrency to no more than 10.
* When a "write transaction" is small, such as less than 100 MiB, it is recommended to control concurrency to no more than 30.
* Determine the concurrency based on testing results and specific circumstances.