From 7b8163c26a12f14970ae04e96182abeac87170dc Mon Sep 17 00:00:00 2001 From: "Zhuomin(Charming) Liu" Date: Tue, 14 Jul 2020 10:56:43 +0800 Subject: [PATCH] cherry pick #3061 to release-4.0 Signed-off-by: ti-srebot --- TOC.md | 1 + predicates-push-down.md | 150 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 151 insertions(+) create mode 100644 predicates-push-down.md diff --git a/TOC.md b/TOC.md index 78563a9f7a8b6..4ff1fccb03f52 100644 --- a/TOC.md +++ b/TOC.md @@ -103,6 +103,7 @@ + SQL Optimization + [SQL Optimization Process](/sql-optimization-concepts.md) + Logic Optimization + + [Predicates Push Down](/predicates-push-down.md) + [Join Reorder](/join-reorder.md) + Physical Optimization + [Statistics](/statistics.md) diff --git a/predicates-push-down.md b/predicates-push-down.md new file mode 100644 index 0000000000000..4aac41488e49b --- /dev/null +++ b/predicates-push-down.md @@ -0,0 +1,150 @@ +--- +title: Predicates Push Down +summary: Introduce one of the TiDB's logic optimization rules—Predicate Push Down (PPD). +--- + +# Predicates Push Down (PPD) + +This document introduces one of the TiDB's logic optimization rules—Predicate Push Down (PPD). It aims to help you understand the predicate push down and know its applicable and inapplicable scenarios. + +PPD pushes down selection operators to data source as close as possible to complete data filtering as early as possible, which significantly reduces the cost of data transmission or computation. + +## Examples + +The following cases describe the optimization of PPD. Case 1, 2, and 3 are scenarios where PPD is applicable, and Case 4, 5, and 6 are scenarios where PPD is not applicable. + +### Case 1: push predicates to storage layer + +```sql +create table t(id int primary key, a int); +explain select * from t where a < 1; ++-------------------------+----------+-----------+---------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+---------------+--------------------------------+ +| TableReader_7 | 3323.33 | root | | data:Selection_6 | +| └─Selection_6 | 3323.33 | cop[tikv] | | lt(test.t.a, 1) | +| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+---------------+--------------------------------+ +3 rows in set (0.00 sec) +``` + +In this query, pushing down the predicate `a < 1` to the TiKV layer to filter the data can reduce the overhead of network transmission. + +### Case 2: push predicates to storage layer + +```sql +create table t(id int primary key, a int not null); +explain select * from t where a < substring('123', 1, 1); ++-------------------------+----------+-----------+---------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+---------------+--------------------------------+ +| TableReader_7 | 3323.33 | root | | data:Selection_6 | +| └─Selection_6 | 3323.33 | cop[tikv] | | lt(test.t.a, 1) | +| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+---------------+--------------------------------+ +``` + +This query has the same execution plan as the query in case 1, because the input parameters of the `substring` of the predicate `a < substring('123', 1, 1)` are constants, so they can be calculated in advance. Then the predicate is simplified to the equivalent predicate `a < 1`. After that, TiDB can push `a < 1` down to TiKV. + +### Case 3: push predicates below join operator + +```sql +create table t(id int primary key, a int not null); +create table s(id int primary key, a int not null); +explain select * from t join s on t.a = s.a where t.a < 1; ++------------------------------+----------+-----------+---------------+--------------------------------------------+ +| id | estRows | task | access object | operator info | ++------------------------------+----------+-----------+---------------+--------------------------------------------+ +| HashJoin_8 | 4154.17 | root | | inner join, equal:[eq(test.t.a, test.s.a)] | +| ├─TableReader_15(Build) | 3323.33 | root | | data:Selection_14 | +| │ └─Selection_14 | 3323.33 | cop[tikv] | | lt(test.s.a, 1) | +| │ └─TableFullScan_13 | 10000.00 | cop[tikv] | table:s | keep order:false, stats:pseudo | +| └─TableReader_12(Probe) | 3323.33 | root | | data:Selection_11 | +| └─Selection_11 | 3323.33 | cop[tikv] | | lt(test.t.a, 1) | +| └─TableFullScan_10 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++------------------------------+----------+-----------+---------------+--------------------------------------------+ +7 rows in set (0.00 sec) +``` + +In this query, the predicate `t.a < 1` is pushed below join to filter in advance, which can reduce the calculation overhead of join. + +In addition,This SQL statement has an inner join executed, and the `ON` condition is `t.a = s.a`. The predicate `s.a <1` can be derived from `t.a < 1` and pushed down to `s` table below the join operator. Filtering the `s` table can further reduce the calculation overhead of join. + +### Case 4: predicates that are not supported by storage layers cannot be pushed down + +```sql +create table t(id int primary key, a int not null); +desc select * from t where substring('123', a, 1) = '1'; ++-------------------------+---------+-----------+---------------+----------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+---------+-----------+---------------+----------------------------------------+ +| Selection_7 | 2.00 | root | | eq(substring("123", test.t.a, 1), "1") | +| └─TableReader_6 | 2.00 | root | | data:TableFullScan_5 | +| └─TableFullScan_5 | 2.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+---------+-----------+---------------+----------------------------------------+ +``` + +In this query, there is a predicate `substring('123', a, 1) = '1'`. + +From the `explain` results, we can see that the predicate is not pushed down to TiKV for calculation. This is because the TiKV coprocessor does not support the built-in function `substring`. + +### Case 5: predicates of inner tables on the outer join can't be pushed down + +```sql +create table t(id int primary key, a int not null); +create table s(id int primary key, a int not null); +explain select * from t left join s on t.a = s.a where s.a is null; ++-------------------------------+----------+-----------+---------------+-------------------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------------+----------+-----------+---------------+-------------------------------------------------+ +| Selection_7 | 10000.00 | root | | isnull(test.s.a) | +| └─HashJoin_8 | 12500.00 | root | | left outer join, equal:[eq(test.t.a, test.s.a)] | +| ├─TableReader_13(Build) | 10000.00 | root | | data:TableFullScan_12 | +| │ └─TableFullScan_12 | 10000.00 | cop[tikv] | table:s | keep order:false, stats:pseudo | +| └─TableReader_11(Probe) | 10000.00 | root | | data:TableFullScan_10 | +| └─TableFullScan_10 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------------+----------+-----------+---------------+-------------------------------------------------+ +6 rows in set (0.00 sec) +``` + +In this query,there is a predicate `s.a is null` on the inner table `s`。 + +From the `explain` results,we can see that the predicate is not pushed below join operator. This is because the outer join fills the inner table with `NULL` values when the `on` condition isn't satisfied, and the predicate `s.a is null` is used to filter the results after the join. If it is pushed down to the inner table below join, the execution plan is not equivalent to the original one. + +### Case 6: the predicates which contain user variables cannot be pushed down + +```sql +create table t(id int primary key, a char); +set @a = 1; +explain select * from t where a < @a; ++-------------------------+----------+-----------+---------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+---------------+--------------------------------+ +| Selection_5 | 8000.00 | root | | lt(test.t.a, getvar("a")) | +| └─TableReader_7 | 10000.00 | root | | data:TableFullScan_6 | +| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+---------------+--------------------------------+ +3 rows in set (0.00 sec) +``` + +In this query,there is a predicate `a < @a` on table `t`. The `@a` of the predicate is a user variable. + +As can be seen from `explain` results, the predicate is not like case 2, which is simplified to `a < 1` and pushed down to TiKV. This is because the value of the user variable `@a` may change during the computation, and TiKV is not aware of the changes. So TiDB does not replace `@a` with `1`, and does not push down it to TiKV. + +An example to help you understand is as follows: + +```sql +create table t(id int primary key, a int); +insert into t values(1, 1), (2,2); +set @a = 1; +select id, a, @a:=@a+1 from t where a = @a; ++----+------+----------+ +| id | a | @a:=@a+1 | ++----+------+----------+ +| 1 | 1 | 2 | +| 2 | 2 | 3 | ++----+------+----------+ +2 rows in set (0.00 sec) +``` + +As you can see from this query, the value of `@a` will change during the query. So if you replace `a = @a` with `a = 1` and push it down to TiKV, it's not an equivalent execution plan.