From 4979bfa7aa4eee3400ed8c2343db52ffb8450a5c Mon Sep 17 00:00:00 2001 From: JoyinQin <56883733+Joyinqin@users.noreply.github.com> Date: Wed, 22 Jul 2020 23:03:16 +0800 Subject: [PATCH 01/15] partitioning: update partition pruning --- partition-pruning.md | 260 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 260 insertions(+) create mode 100644 partition-pruning.md diff --git a/partition-pruning.md b/partition-pruning.md new file mode 100644 index 0000000000000..32f85856410dc --- /dev/null +++ b/partition-pruning.md @@ -0,0 +1,260 @@ +# Partition pruning + +Partition pruning is a performance optimization only when the target table is a partitioned table. Partition pruning analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. + +## Application scenarios for partition pruning + +TiDB supports two types of partitioned tables: Range partitioned tables and Hash partitioned tables, for which partition pruning applies different application scenarios. + +### Application of partition pruning in Hash partitioned tables + +#### Scenarios for partition pruning in Hash partitioned tables + +Only the query condition of equal comparison can support the partition pruning of the Hash partitioned tables. + +{{< copyable "sql" >}} + +```sql +create table t (x int) partition by hash(x) partitions 4; +explain select * from t where x = 1; +``` + +```sql + ++-------------------------+----------+-----------+-----------------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+-----------------------+--------------------------------+ +| TableReader_8 | 10.00 | root | | data:Selection_7 | +| └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 1) | +| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+-----------------------+--------------------------------+ +``` + +In this SQL statement, it can be known from the condition `x = 1` that all results fall in one partition. The value `1` can be confirmed to be in the partition `p1` after passing through the Hash partition. Therefore, only the partition `p1` needs to be scanned, and there is no need to access the `p2`, `p3`, and `p4` partitions that will not have matching results. From the execution plan, there is only one `TableFullScan` operator, and the `p1` partition is specified in the `access object`, confirming that `partition pruning` takes effect. + +#### Scenarios that cannot use partition pruning in Hash partitioned tables + +##### Scenario one + +Partition pruning cannot take effect when it is not certain that the results of some query conditions, such as `in`, `between`, `> < >= <=`, etc., are only in one partition. + +{{< copyable "sql" >}} + +```sql +create table t (x int) partition by hash(x) partitions 4; +explain select * from t where x > 2; +``` + +```sql ++------------------------------+----------+-----------+-----------------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++------------------------------+----------+-----------+-----------------------+--------------------------------+ +| Union_10 | 13333.33 | root | | | +| ├─TableReader_13 | 3333.33 | root | | data:Selection_12 | +| │ └─Selection_12 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | +| │ └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | +| ├─TableReader_16 | 3333.33 | root | | data:Selection_15 | +| │ └─Selection_15 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | +| │ └─TableFullScan_14 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | +| ├─TableReader_19 | 3333.33 | root | | data:Selection_18 | +| │ └─Selection_18 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | +| │ └─TableFullScan_17 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | +| └─TableReader_22 | 3333.33 | root | | data:Selection_21 | +| └─Selection_21 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | +| └─TableFullScan_20 | 10000.00 | cop[tikv] | table:t, partition:p3 | keep order:false, stats:pseudo | ++------------------------------+----------+-----------+-----------------------+--------------------------------+ +``` + +In this case, partition pruning does not take effect because the condition `x> 2` cannot determine the corresponding Hash partition. + +##### Scenario two + +Since the optimization rule of partition pruning is done during the query plan phase, it does not apply for those cases that filter conditions are unknown until the execution phase. + +{{< copyable "sql" >}} + +```sql +create table t (x int) partition by hash(x) partitions 4; +explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x < 2); +``` + +```sql ++--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ +| id | estRows | task | access object | operator info | ++--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ +| Projection_13 | 9990.00 | root | | test.t2.x | +| └─Apply_15 | 9990.00 | root | | inner join, equal:[eq(test.t2.x, test.t1.x)] | +| ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 | +| │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) | +| │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | +| └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) | +| └─MaxOneRow_20 | 1.00 | root | | | +| └─Union_21 | 2.00 | root | | | +| ├─TableReader_24 | 2.00 | root | | data:Selection_23 | +| │ └─Selection_23 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) | +| │ └─TableFullScan_22 | 2500.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo | +| └─TableReader_27 | 2.00 | root | | data:Selection_26 | +| └─Selection_26 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) | +| └─TableFullScan_25 | 2500.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo | ++--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ +``` + +Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x = val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning does not take effect. + +### Application of partition pruning in Range partitioned tables + +#### Scenarios for partition pruning in Range partitioned tables + +##### Scenario one + +Partition pruning supports the query condition of equal comparison. + +{{< copyable "sql" >}} + +```sql +create table t (x int) partition by range (x) ( + partition p0 values less than (5), + partition p1 values less than (10), + partition p2 values less than (15) + ); +explain select * from t where x = 3; +``` + +```sql ++-------------------------+----------+-----------+-----------------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+-----------------------+--------------------------------+ +| TableReader_8 | 10.00 | root | | data:Selection_7 | +| └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 3) | +| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+-----------------------+--------------------------------+ +``` + +In this SQL statement, partition pruning supports the query condition `in` of equal comparison. + +{{< copyable "sql" >}} + +```sql +create table t (x int) partition by range (x) ( + partition p0 values less than (5), + partition p1 values less than (10), + partition p2 values less than (15) + ); +explain select * from t where x in(1,13); +``` + +```sql ++-----------------------------+----------+-----------+-----------------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-----------------------------+----------+-----------+-----------------------+--------------------------------+ +| Union_8 | 40.00 | root | | | +| ├─TableReader_11 | 20.00 | root | | data:Selection_10 | +| │ └─Selection_10 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) | +| │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | +| └─TableReader_14 | 20.00 | root | | data:Selection_13 | +| └─Selection_13 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) | +| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | ++-----------------------------+----------+-----------+-----------------------+--------------------------------+ +``` + +In this SQL statement, it can be known from the condition `x in(1,13)` that all results fall in a few partitions. After analysis, it is found that all records of `x = 1` are in partition `p0`, and all records of `x = 13` are in partition `p2`, so only `p0` and `p2` partitions need to be accessed. + +##### Scenario two + +Partition pruning supports the query condition of interval comparison, such as `between`, `> < = >= <=`. + +{{< copyable "sql" >}} + +```sql +create table t (x int) partition by range (x) ( + partition p0 values less than (5), + partition p1 values less than (10), + partition p2 values less than (15) + ); +explain select * from t where x between 7 and 14; +``` + +```sql ++-----------------------------+----------+-----------+-----------------------+-----------------------------------+ +| id | estRows | task | access object | operator info | ++-----------------------------+----------+-----------+-----------------------+-----------------------------------+ +| Union_8 | 500.00 | root | | | +| ├─TableReader_11 | 250.00 | root | | data:Selection_10 | +| │ └─Selection_10 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) | +| │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | +| └─TableReader_14 | 250.00 | root | | data:Selection_13 | +| └─Selection_13 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) | +| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | ++-----------------------------+----------+-----------+-----------------------+-----------------------------------+ +``` + +##### Scenario three + +For Range partition, for partition pruning to take effect, the partition expression must be in the simple form of `fn(col)` and the `fn` function must be monotonous. In addition, the query condition must be one of `>, <, =, >=`, and `<=`. + +If the `fn` function is monotonous, for any `x` and `y`, if `x > y`, then `fn(x) > fn(y)`. Then this `fn` function can be called strictly monotonous. For any x and y, if `x > y`, then `fn(x) >= fn(y)`. In this case, `fn` could also be called "monotonous". Theoretically, all monotonous functions, strictly or not, are supported by partition pruning. + +In fact, partition pruning in TiDB only support those monotonous functions: + +```sql +unix_timestamp +to_days +``` + +For example, partition pruning takes effect when the partition expression is in the form of `fn(col)` where the `fn` is monotonous function `to_days`: + +{{< copyable "sql" >}} + +```sql +create table t (id datetime) partition by range (to_days(id)) ( + partition p0 values less than (to_days('2020-04-01')), + partition p1 values less than (to_days('2020-05-01'))); +explain select * from t where id > '2020-04-18'; +``` + +```sql ++-------------------------+----------+-----------+-----------------------+-------------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+-----------------------+-------------------------------------------+ +| TableReader_8 | 3333.33 | root | | data:Selection_7 | +| └─Selection_7 | 3333.33 | cop[tikv] | | gt(test.t.id, 2020-04-18 00:00:00.000000) | +| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+-----------------------+-------------------------------------------+ +``` + +#### Scenarios that cannot use partition pruning in Range partitioned tables + +Since the rule optimization of partition pruning is done during the query plan phase, it does not apply for those cases that filter conditions are unknown until the execution phase. + +{{< copyable "sql" >}} + +```sql +create table t1 (x int) partition by range (x) ( + partition p0 values less than (5), + partition p1 values less than (10)); +create table t2 (x int); +explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x < 2); +``` + +```sql ++--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ +| Projection_13 | 9990.00 | root | | test.t2.x | +| └─Apply_15 | 9990.00 | root | | CARTESIAN inner join, other cond:lt(test.t2.x, test.t1.x) | +| ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 | +| │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) | +| │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | +| └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) | +| └─MaxOneRow_20 | 1.00 | root | | | +| └─Union_21 | 2.00 | root | | | +| ├─TableReader_24 | 2.00 | root | | data:Selection_23 | +| │ └─Selection_23 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) | +| │ └─TableFullScan_22 | 2.50 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo | +| └─TableReader_27 | 2.00 | root | | data:Selection_26 | +| └─Selection_26 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) | +| └─TableFullScan_25 | 2.50 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo | +14 rows in set (0.00 sec) +``` + +Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x> val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning does not take effect. \ No newline at end of file From 679c44347487615ca252619ef7febfa29610f28f Mon Sep 17 00:00:00 2001 From: JoyinQin <56883733+Joyinqin@users.noreply.github.com> Date: Wed, 22 Jul 2020 23:04:01 +0800 Subject: [PATCH 02/15] Update partition-pruning.md --- partition-pruning.md | 1 + 1 file changed, 1 insertion(+) diff --git a/partition-pruning.md b/partition-pruning.md index 32f85856410dc..bcd576f92bc03 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -254,6 +254,7 @@ explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x | └─TableReader_27 | 2.00 | root | | data:Selection_26 | | └─Selection_26 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) | | └─TableFullScan_25 | 2.50 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo | ++--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ 14 rows in set (0.00 sec) ``` From b7317d676b2005118d7550f24fae5045ea63ac03 Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Thu, 23 Jul 2020 10:51:23 +0800 Subject: [PATCH 03/15] Update partition-pruning.md --- partition-pruning.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/partition-pruning.md b/partition-pruning.md index bcd576f92bc03..1ea8c3bda6760 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,6 +1,6 @@ # Partition pruning -Partition pruning is a performance optimization only when the target table is a partitioned table. Partition pruning analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. +Partition pruning is a performance optimization only when the target table is a partitioned table. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. ## Application scenarios for partition pruning @@ -36,7 +36,7 @@ In this SQL statement, it can be known from the condition `x = 1` that all resul ##### Scenario one -Partition pruning cannot take effect when it is not certain that the results of some query conditions, such as `in`, `between`, `> < >= <=`, etc., are only in one partition. +Partition pruning cannot take effect when it is not certain that the results of some query conditions, such as `in`, `between`, `> < >= <=`, are only in one partition. {{< copyable "sql" >}} @@ -99,7 +99,7 @@ explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ ``` -Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x = val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning does not take effect. +Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x = val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning will not take effect. ### Application of partition pruning in Range partitioned tables @@ -130,7 +130,7 @@ explain select * from t where x = 3; +-------------------------+----------+-----------+-----------------------+--------------------------------+ ``` -In this SQL statement, partition pruning supports the query condition `in` of equal comparison. +In this case, partition pruning supports the query condition `in` of equal comparison. {{< copyable "sql" >}} @@ -161,7 +161,7 @@ In this SQL statement, it can be known from the condition `x in(1,13)` that all ##### Scenario two -Partition pruning supports the query condition of interval comparison, such as `between`, `> < = >= <=`. +Partition pruning supports the query condition of interval comparison,such as `between`, `> < = >= <=`. {{< copyable "sql" >}} @@ -192,9 +192,7 @@ explain select * from t where x between 7 and 14; For Range partition, for partition pruning to take effect, the partition expression must be in the simple form of `fn(col)` and the `fn` function must be monotonous. In addition, the query condition must be one of `>, <, =, >=`, and `<=`. -If the `fn` function is monotonous, for any `x` and `y`, if `x > y`, then `fn(x) > fn(y)`. Then this `fn` function can be called strictly monotonous. For any x and y, if `x > y`, then `fn(x) >= fn(y)`. In this case, `fn` could also be called "monotonous". Theoretically, all monotonous functions, strictly or not, are supported by partition pruning. - -In fact, partition pruning in TiDB only support those monotonous functions: +If the `fn` function is monotonous, for any `x` and `y`, if `x > y`, then `fn(x) > fn(y)`. Then this `fn` function can be called strictly monotonous. For any `x` and `y`, if `x > y`, then `fn(x) >= fn(y)`. In this case, `fn` could also be called "monotonous". Theoretically, all monotonous functions, strictly or not, are supported by partition pruning. In fact, partition pruning in TiDB only support those monotonous functions: ```sql unix_timestamp @@ -258,4 +256,4 @@ explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x 14 rows in set (0.00 sec) ``` -Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x> val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning does not take effect. \ No newline at end of file +Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x> val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning will not take effect. From 7c45fad9930c6898cc58f7e75f7a422aa7cfdf6b Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Thu, 23 Jul 2020 16:34:31 +0800 Subject: [PATCH 04/15] Apply suggestions from code review Co-authored-by: Null not nil <67764674+nullnotnil@users.noreply.github.com> --- partition-pruning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/partition-pruning.md b/partition-pruning.md index 1ea8c3bda6760..4e557fa322699 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,4 +1,4 @@ -# Partition pruning +# Partition Pruning Partition pruning is a performance optimization only when the target table is a partitioned table. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. From 55a0f38c20c5a58612e5abd15f47527712f470f4 Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Thu, 23 Jul 2020 17:21:59 +0800 Subject: [PATCH 05/15] Update partition-pruning.md --- partition-pruning.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/partition-pruning.md b/partition-pruning.md index 4e557fa322699..d3ee221458d11 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,6 +1,10 @@ +--- +title: Partition Pruning +--- + # Partition Pruning -Partition pruning is a performance optimization only when the target table is a partitioned table. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. +Partition pruning is a performance optimization only when target tables are partitioned tables. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. ## Application scenarios for partition pruning From a2ce6ee2ca235ef90222b3fa787801a4af7e0c3b Mon Sep 17 00:00:00 2001 From: JoyinQin <56883733+Joyinqin@users.noreply.github.com> Date: Thu, 23 Jul 2020 17:46:28 +0800 Subject: [PATCH 06/15] Revert "Update partition-pruning.md" This reverts commit 55a0f38c20c5a58612e5abd15f47527712f470f4. --- partition-pruning.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/partition-pruning.md b/partition-pruning.md index d3ee221458d11..4e557fa322699 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,10 +1,6 @@ ---- -title: Partition Pruning ---- - # Partition Pruning -Partition pruning is a performance optimization only when target tables are partitioned tables. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. +Partition pruning is a performance optimization only when the target table is a partitioned table. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. ## Application scenarios for partition pruning From eb2b074e7151f836a14c3227d7f9708e65b6a96d Mon Sep 17 00:00:00 2001 From: JoyinQin <56883733+Joyinqin@users.noreply.github.com> Date: Thu, 23 Jul 2020 17:48:29 +0800 Subject: [PATCH 07/15] Revert "Apply suggestions from code review" This reverts commit 7c45fad9930c6898cc58f7e75f7a422aa7cfdf6b. --- partition-pruning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/partition-pruning.md b/partition-pruning.md index 4e557fa322699..1ea8c3bda6760 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,4 +1,4 @@ -# Partition Pruning +# Partition pruning Partition pruning is a performance optimization only when the target table is a partitioned table. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. From ddd1bd829ad475e3de33068a3011d13cf62f7d25 Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Thu, 23 Jul 2020 17:49:30 +0800 Subject: [PATCH 08/15] Apply suggestions from code review Co-authored-by: Null not nil <67764674+nullnotnil@users.noreply.github.com> --- partition-pruning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/partition-pruning.md b/partition-pruning.md index 1ea8c3bda6760..d5059ddcd6ec1 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,6 +1,6 @@ # Partition pruning -Partition pruning is a performance optimization only when the target table is a partitioned table. It analyzes the filter conditions in the query statements, selects only the partitions that may meet the conditions, and does not scan the partitions that do not match, thereby significantly reducing the amount of calculated data. +Partition pruning is a performance optimization that applies to partitioned tables. It analyzes the filter conditions in query statements, and eliminates (_prunes_) partitions from consideration when they do not contain any data that will be required. By eliminating the non-required partitions, TiDB is able to reduce the amount of data that needs to be accessed and potentially significantly improving query execution times. ## Application scenarios for partition pruning From 159be5ed9f96325391b6f95f9fcc878f6cebd479 Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Thu, 23 Jul 2020 17:51:29 +0800 Subject: [PATCH 09/15] Update partition-pruning.md --- partition-pruning.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/partition-pruning.md b/partition-pruning.md index d5059ddcd6ec1..a21962a83ada7 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,4 +1,8 @@ -# Partition pruning +--- +title: Partition Pruning +--- + +# Partition Pruning Partition pruning is a performance optimization that applies to partitioned tables. It analyzes the filter conditions in query statements, and eliminates (_prunes_) partitions from consideration when they do not contain any data that will be required. By eliminating the non-required partitions, TiDB is able to reduce the amount of data that needs to be accessed and potentially significantly improving query execution times. From 54a65b15d2af8661b5ae97f648fe593e4959db1e Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Fri, 24 Jul 2020 16:39:22 +0800 Subject: [PATCH 10/15] Update partition-pruning.md --- partition-pruning.md | 1 + 1 file changed, 1 insertion(+) diff --git a/partition-pruning.md b/partition-pruning.md index a21962a83ada7..eeac8a9c4c942 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,5 +1,6 @@ --- title: Partition Pruning +summary:Introduce application scenarios of TiDB partition pruning. --- # Partition Pruning From 5d597d28207231b3e3eee06bb32e8cbb7a28b8b7 Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Fri, 24 Jul 2020 16:54:28 +0800 Subject: [PATCH 11/15] Update partition-pruning.md --- partition-pruning.md | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/partition-pruning.md b/partition-pruning.md index eeac8a9c4c942..a2b0446bcbc64 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -7,6 +7,40 @@ summary:Introduce application scenarios of TiDB partition pruning. Partition pruning is a performance optimization that applies to partitioned tables. It analyzes the filter conditions in query statements, and eliminates (_prunes_) partitions from consideration when they do not contain any data that will be required. By eliminating the non-required partitions, TiDB is able to reduce the amount of data that needs to be accessed and potentially significantly improving query execution times. +The following is an example: + +{{< copyable "sql" >}} + +```sql + +CREATE TABLE t1 ( + id INT NOT NULL PRIMARY KEY, + pad VARCHAR(100) +) +PARTITION BY RANGE COLUMNS(id) ( + PARTITION p0 VALUES LESS THAN (100), + PARTITION p1 VALUES LESS THAN (200), + PARTITION p2 VALUES LESS THAN (MAXVALUE) +); + +INSERT INTO t1 VALUES (1, 'test1'),(101, 'test2'), (201, 'test3'); +EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120; + +... + +mysql> EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120; ++----------------------------+---------+-----------+------------------------+------------------------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------+---------+-----------+------------------------+------------------------------------------------+ +| PartitionUnion_8 | 80.00 | root | | | +| ├─TableReader_10 | 40.00 | root | | data:TableRangeScan_9 | +| │ └─TableRangeScan_9 | 40.00 | cop[tikv] | table:t1, partition:p0 | range:[80,120], keep order:false, stats:pseudo | +| └─TableReader_12 | 40.00 | root | | data:TableRangeScan_11 | +| └─TableRangeScan_11 | 40.00 | cop[tikv] | table:t1, partition:p1 | range:[80,120], keep order:false, stats:pseudo | ++----------------------------+---------+-----------+------------------------+------------------------------------------------+ +5 rows in set (0.00 sec) +``` + ## Application scenarios for partition pruning TiDB supports two types of partitioned tables: Range partitioned tables and Hash partitioned tables, for which partition pruning applies different application scenarios. @@ -25,7 +59,6 @@ explain select * from t where x = 1; ``` ```sql - +-------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+-----------------------+--------------------------------+ From 8831461a84e44257342cf95f6cc6af76905fa4b5 Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Mon, 27 Jul 2020 09:10:18 +0800 Subject: [PATCH 12/15] Update partition-pruning.md --- partition-pruning.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/partition-pruning.md b/partition-pruning.md index a2b0446bcbc64..11234cf97dfe6 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -25,10 +25,9 @@ PARTITION BY RANGE COLUMNS(id) ( INSERT INTO t1 VALUES (1, 'test1'),(101, 'test2'), (201, 'test3'); EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120; +``` -... - -mysql> EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120; +```sql +----------------------------+---------+-----------+------------------------+------------------------------------------------+ | id | estRows | task | access object | operator info | +----------------------------+---------+-----------+------------------------+------------------------------------------------+ From 2fd66cbbefcdefbb950efeb6fd73f109870dbfb5 Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Mon, 27 Jul 2020 20:37:55 +0800 Subject: [PATCH 13/15] Update partition-pruning.md --- partition-pruning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/partition-pruning.md b/partition-pruning.md index 11234cf97dfe6..3497dbf92b042 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,6 +1,6 @@ --- title: Partition Pruning -summary:Introduce application scenarios of TiDB partition pruning. +summary: Introduce application scenarios of TiDB partition pruning. --- # Partition Pruning From 70dba44e9f80bef38ee304553518a672e4cce74d Mon Sep 17 00:00:00 2001 From: lilin90 Date: Mon, 27 Jul 2020 20:50:53 +0800 Subject: [PATCH 14/15] Optimize description and format till line 110 --- TOC.md | 1 + partition-pruning.md | 27 +++++++++++++++------------ partitioned-table.md | 2 +- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/TOC.md b/TOC.md index 2c0c701c491bb..7e305c13c39ec 100644 --- a/TOC.md +++ b/TOC.md @@ -111,6 +111,7 @@ + [Column Pruning](/column-pruning.md) + [Decorrelation of Correlated Subquery](/correlated-subquery-optimization.md) + [Predicates Push Down](/predicates-push-down.md) + + [Partition Pruning](/partition-pruning.md) + [TopN and Limit Push Down](/topn-limit-push-down.md) + [Join Reorder](/join-reorder.md) + Physical Optimization diff --git a/partition-pruning.md b/partition-pruning.md index 11234cf97dfe6..8372075b37114 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -1,6 +1,6 @@ --- title: Partition Pruning -summary:Introduce application scenarios of TiDB partition pruning. +summary: Learn about the usage scenarios of TiDB partition pruning. --- # Partition Pruning @@ -12,7 +12,6 @@ The following is an example: {{< copyable "sql" >}} ```sql - CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY, pad VARCHAR(100) @@ -40,15 +39,17 @@ EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120; 5 rows in set (0.00 sec) ``` -## Application scenarios for partition pruning +## Usage scenarios of partition pruning + +The usage scenarios of partition pruning are different for the two types of partitioned tables: Range partitioned tables and Hash partitioned tables. -TiDB supports two types of partitioned tables: Range partitioned tables and Hash partitioned tables, for which partition pruning applies different application scenarios. +### Use partition pruning in Hash partitioned tables -### Application of partition pruning in Hash partitioned tables +This section describes the applicable and inapplicable usage scenarios of partition pruning in Hash partitioned tables. -#### Scenarios for partition pruning in Hash partitioned tables +#### Applicable scenarios in Hash partitioned tables -Only the query condition of equal comparison can support the partition pruning of the Hash partitioned tables. +Only the query condition of equal comparison supports partition pruning in Hash partitioned tables. {{< copyable "sql" >}} @@ -67,13 +68,15 @@ explain select * from t where x = 1; +-------------------------+----------+-----------+-----------------------+--------------------------------+ ``` -In this SQL statement, it can be known from the condition `x = 1` that all results fall in one partition. The value `1` can be confirmed to be in the partition `p1` after passing through the Hash partition. Therefore, only the partition `p1` needs to be scanned, and there is no need to access the `p2`, `p3`, and `p4` partitions that will not have matching results. From the execution plan, there is only one `TableFullScan` operator, and the `p1` partition is specified in the `access object`, confirming that `partition pruning` takes effect. +In the SQL statement above, it can be known from the condition `x = 1` that all results fall in one partition. The value `1` can be confirmed to be in the `p1` partition after passing through the Hash partition. Therefore, only the `p1` partition needs to be scanned, and there is no need to access the `p2`, `p3`, and `p4` partitions that will not have matching results. From the execution plan, only one `TableFullScan` operator appears and the `p1` partition is specified in `access object`, so it can be confirmed that `partition pruning` takes effect. + +#### Inapplicable scenarios in Hash partitioned tables -#### Scenarios that cannot use partition pruning in Hash partitioned tables +This section describes two inapplicable usage scenarios of partition pruning in Hash partitioned tables. ##### Scenario one -Partition pruning cannot take effect when it is not certain that the results of some query conditions, such as `in`, `between`, `> < >= <=`, are only in one partition. +If you cannot confirm the condition that the query result falls in only one partition (such as `in`, `between`, `>`, `<`, `>=`, `<=`), you cannot use the partition pruning optimization. For example: {{< copyable "sql" >}} @@ -102,7 +105,7 @@ explain select * from t where x > 2; +------------------------------+----------+-----------+-----------------------+--------------------------------+ ``` -In this case, partition pruning does not take effect because the condition `x> 2` cannot determine the corresponding Hash partition. +In this case, partition pruning is inapplicable because the corresponding Hash partition cannot be confirmed by the `x > 2` condition. ##### Scenario two @@ -138,7 +141,7 @@ explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x = val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning will not take effect. -### Application of partition pruning in Range partitioned tables +### Use partition pruning in Range partitioned tables #### Scenarios for partition pruning in Range partitioned tables diff --git a/partitioned-table.md b/partitioned-table.md index 4f5e78e8b124c..0ab749102c2d6 100644 --- a/partitioned-table.md +++ b/partitioned-table.md @@ -483,7 +483,7 @@ ERROR 8200 (HY000): Unsupported optimize partition ## Partition pruning -Partition pruning is an optimization which is based on a very simple idea - do not scan the partitions that do not match. +[Partition pruning](/partition-pruning.md) is an optimization which is based on a very simple idea - do not scan the partitions that do not match. Assume that you create a partitioned table `t1`: From 7b2f377bbacadcc1bcaea6764a1a2a17eaa2c4ee Mon Sep 17 00:00:00 2001 From: lilin90 Date: Tue, 28 Jul 2020 13:42:43 +0800 Subject: [PATCH 15/15] Update description and format --- partition-pruning.md | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/partition-pruning.md b/partition-pruning.md index 8372075b37114..7206b39752f72 100644 --- a/partition-pruning.md +++ b/partition-pruning.md @@ -47,9 +47,9 @@ The usage scenarios of partition pruning are different for the two types of part This section describes the applicable and inapplicable usage scenarios of partition pruning in Hash partitioned tables. -#### Applicable scenarios in Hash partitioned tables +#### Applicable scenario in Hash partitioned tables -Only the query condition of equal comparison supports partition pruning in Hash partitioned tables. +Partition pruning applies only to the query condition of equality comparison in Hash partitioned tables. {{< copyable "sql" >}} @@ -109,7 +109,7 @@ In this case, partition pruning is inapplicable because the corresponding Hash p ##### Scenario two -Since the optimization rule of partition pruning is done during the query plan phase, it does not apply for those cases that filter conditions are unknown until the execution phase. +Because the rule optimization of partition pruning is performed during the generation phase of the query plan, partition pruning is not suitable for scenarios where the filter conditions can be obtained only during the execution phase. For example: {{< copyable "sql" >}} @@ -139,15 +139,19 @@ explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ ``` -Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x = val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning will not take effect. +Each time this query reads a row from `t2`, it will query on the `t1` partitioned table. Theoretically, the filter condition of `t1.x = val` is met at this time, but in fact, partition pruning takes effect only in the generation phase of the query plan, not the execution phase. ### Use partition pruning in Range partitioned tables -#### Scenarios for partition pruning in Range partitioned tables +This section describes the applicable and inapplicable usage scenarios of partition pruning in Range partitioned tables. -##### Scenario one +#### Applicable scenarios in Range partitioned tables -Partition pruning supports the query condition of equal comparison. +This section describes three applicable usage scenarios of partition pruning in Range partitioned tables. + +##### Scenario one + +Partition pruning applies to the query condition of equality comparison in Range partitioned tables. For example: {{< copyable "sql" >}} @@ -170,7 +174,7 @@ explain select * from t where x = 3; +-------------------------+----------+-----------+-----------------------+--------------------------------+ ``` -In this case, partition pruning supports the query condition `in` of equal comparison. +Partition pruning also applies to the equality comparison that uses the `in` query condition. For example: {{< copyable "sql" >}} @@ -197,11 +201,11 @@ explain select * from t where x in(1,13); +-----------------------------+----------+-----------+-----------------------+--------------------------------+ ``` -In this SQL statement, it can be known from the condition `x in(1,13)` that all results fall in a few partitions. After analysis, it is found that all records of `x = 1` are in partition `p0`, and all records of `x = 13` are in partition `p2`, so only `p0` and `p2` partitions need to be accessed. +In the SQL statement above, it can be known from the `x in(1,13)` condition that all results fall in a few partitions. After analysis, it is found that all records of `x = 1` are in the `p0` partition, and all records of `x = 13` are in the `p2` partition, so only `p0` and `p2` partitions need to be accessed. ##### Scenario two -Partition pruning supports the query condition of interval comparison,such as `between`, `> < = >= <=`. +Partition pruning applies to the query condition of interval comparison,such as `between`, `>`, `<`, `=`, `>=`, `<=`. For example: {{< copyable "sql" >}} @@ -230,16 +234,16 @@ explain select * from t where x between 7 and 14; ##### Scenario three -For Range partition, for partition pruning to take effect, the partition expression must be in the simple form of `fn(col)` and the `fn` function must be monotonous. In addition, the query condition must be one of `>, <, =, >=`, and `<=`. +Partition pruning applies to the scenario where the partition expression is in the simple form of `fn(col)`, the query condition is one of `>`, `<`, `=`, `>=`, and `<=`, and the `fn` function is monotonous. -If the `fn` function is monotonous, for any `x` and `y`, if `x > y`, then `fn(x) > fn(y)`. Then this `fn` function can be called strictly monotonous. For any `x` and `y`, if `x > y`, then `fn(x) >= fn(y)`. In this case, `fn` could also be called "monotonous". Theoretically, all monotonous functions, strictly or not, are supported by partition pruning. In fact, partition pruning in TiDB only support those monotonous functions: +If the `fn` function is monotonous, for any `x` and `y`, if `x > y`, then `fn(x) > fn(y)`. Then this `fn` function can be called strictly monotonous. For any `x` and `y`, if `x > y`, then `fn(x) >= fn(y)`. In this case, `fn` could also be called "monotonous". Theoretically, all monotonous functions, strictly or not, are supported by partition pruning. Currently, TiDB only supports the following monotonous functions: -```sql +``` unix_timestamp to_days ``` -For example, partition pruning takes effect when the partition expression is in the form of `fn(col)` where the `fn` is monotonous function `to_days`: +For example, partition pruning takes effect when the partition expression is in the form of `fn(col)`, where the `fn` is monotonous function `to_days`: {{< copyable "sql" >}} @@ -260,9 +264,9 @@ explain select * from t where id > '2020-04-18'; +-------------------------+----------+-----------+-----------------------+-------------------------------------------+ ``` -#### Scenarios that cannot use partition pruning in Range partitioned tables +#### Inapplicable scenario in Range partitioned tables -Since the rule optimization of partition pruning is done during the query plan phase, it does not apply for those cases that filter conditions are unknown until the execution phase. +Because the rule optimization of partition pruning is performed during the generation phase of the query plan, partition pruning is not suitable for scenarios where the filter conditions can be obtained only during the execution phase. For example: {{< copyable "sql" >}} @@ -296,4 +300,4 @@ explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x 14 rows in set (0.00 sec) ``` -Each time this query reads a row from `t2`, it will go to the partitioned table `t1` for query. Theoretically, the filter condition of `t1.x> val` will be met at this time, but in fact, the partition pruning only affects the query plan in the generation phase, not the execution phase, so the pruning will not take effect. +Each time this query reads a row from `t2`, it will query on the `t1` partitioned table. Theoretically, the `t1.x> val` filter condition is met at this time, but in fact, partition pruning takes effect only in the generation phase of the query plan, not the execution phase.