From a3e7fe415526858c358ed75297093edc06dc014c Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 30 Jun 2020 20:56:42 +0800 Subject: [PATCH 01/17] add topn-limit-push-down add topn-limit-push-down --- topn-limit-push-down.md | 124 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 124 insertions(+) create mode 100644 topn-limit-push-down.md diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md new file mode 100644 index 0000000000000..bb88e5ae9754a --- /dev/null +++ b/topn-limit-push-down.md @@ -0,0 +1,124 @@ +--- +title: TopN and Limit operator push down +category: performance +--- + +# TopN and Limit operator push down + +The Limit clause in SQL corresponds to the Limit operator node in the TiDB execution plan tree, and the ORDER BY clause corresponds to the Sort operator. So adjacent Limit operator and Sort operator will be merged into the TopN operator node. It means that top N records will be returned according to a sorting rule. On the other hand, the Limit operator is equivalent to the TopN operator node with a null sorting rule. + +Similar to predicate pushdown, TopN (and Limit, the same below) will be pushed down in the execution plan tree as close to the data source as possible for completing the data filtering as early as possible, thereby significantly reducing data transmission or Calculated overhead. + +Please refer to [Optimization Rules and Blacklist for Expression Pushdown](blacklist-control-plan.md) when you want to close the rule. + +## Examples + +The following describes TopN pushdown through some examples. + +### Example 1: Push down to the Coprocessor in the storage layer + +{{< copyable "sql" >}} + +```sql +create table t(id int primary key, a int not null); +explain select * from t order by a limit 10; +``` + +``` ++----------------------------+----------+-----------+---------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------+----------+-----------+---------------+--------------------------------+ +| TopN_7 | 10.00 | root | | test.t.a, offset:0, count:10 | +| └─TableReader_15 | 10.00 | root | | data:TopN_14 | +| └─TopN_14 | 10.00 | cop[tikv] | | test.t.a, offset:0, count:10 | +| └─TableFullScan_13 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++----------------------------+----------+-----------+---------------+--------------------------------+ +4 rows in set (0.00 sec) +``` + +In this query, the TopN operator node was pushed down to TiKV for data filtering, and each Coprocessor returns only 10 records to TiDB. After TiDB integrates the data, the final filtering is performed. + +### Example 2: TopN can be pushed down into the Join (sorting only depends on the columns in the outer table) + +{{< copyable "sql" >}} + +```sql +create table t(id int primary key, a int not null); +create table s(id int primary key, a int not null); +explain select * from t left join s on t.a = s.a order by t.a limit 10; +``` + +``` ++----------------------------------+----------+-----------+---------------+-------------------------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------------+----------+-----------+---------------+-------------------------------------------------+ +| TopN_12 | 10.00 | root | | test.t.a, offset:0, count:10 | +| └─HashJoin_17 | 12.50 | root | | left outer join, equal:[eq(test.t.a, test.s.a)] | +| ├─TopN_18(Build) | 10.00 | root | | test.t.a, offset:0, count:10 | +| │ └─TableReader_26 | 10.00 | root | | data:TopN_25 | +| │ └─TopN_25 | 10.00 | cop[tikv] | | test.t.a, offset:0, count:10 | +| │ └─TableFullScan_24 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | +| └─TableReader_30(Probe) | 10000.00 | root | | data:TableFullScan_29 | +| └─TableFullScan_29 | 10000.00 | cop[tikv] | table:s | keep order:false, stats:pseudo | ++----------------------------------+----------+-----------+---------------+-------------------------------------------------+ +8 rows in set (0.01 sec) +``` + +In this query, the sorting of the TopN operator only depends on the columns in the outer table t, so TopN can be pushed down before Join, to reduce the computational of Join. Besides, TiDB also pushed TopN down to the storage layer. + +### Example 3: TopN can't be pushed down into the Join + +{{< copyable "sql" >}} + +```sql +create table t(id int primary key, a int not null); +create table s(id int primary key, a int not null); +explain select * from t join s on t.a = s.a order by t.id limit 10; +``` + +``` ++-------------------------------+----------+-----------+---------------+--------------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------------+----------+-----------+---------------+--------------------------------------------+ +| TopN_12 | 10.00 | root | | test.t.id, offset:0, count:10 | +| └─HashJoin_16 | 12500.00 | root | | inner join, equal:[eq(test.t.a, test.s.a)] | +| ├─TableReader_21(Build) | 10000.00 | root | | data:TableFullScan_20 | +| │ └─TableFullScan_20 | 10000.00 | cop[tikv] | table:s | keep order:false, stats:pseudo | +| └─TableReader_19(Probe) | 10000.00 | root | | data:TableFullScan_18 | +| └─TableFullScan_18 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------------+----------+-----------+---------------+--------------------------------------------+ +6 rows in set (0.00 sec) +``` + +TopN can't be pushed down into the Inner Join. Taking the query above as an example, if you get 100 records after Join, then do TopN can leave 10 records. If the remaining 10 records are filtered before TopN, there may be 5 left after the Join is done. It will make results differences. + +That is the reason why opN can't be pushed down into the Outer Join for the Inner table. TopN can not be pushed down when sorting by multiple columns on multiple tables. such as `t.a+s.a`.Only if the TopN exclusively depends on sorting outer table columns, it can be pushed down. + +### Example 4: Conversion of TopN to Limit + +{{< copyable "sql" >}} + +```sql +create table t(id int primary key, a int not null); +create table s(id int primary key, a int not null); +explain select * from t left join s on t.a = s.a order by t.id limit 10; +``` + +``` ++----------------------------------+----------+-----------+---------------+-------------------------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------------+----------+-----------+---------------+-------------------------------------------------+ +| TopN_12 | 10.00 | root | | test.t.id, offset:0, count:10 | +| └─HashJoin_17 | 12.50 | root | | left outer join, equal:[eq(test.t.a, test.s.a)] | +| ├─Limit_21(Build) | 10.00 | root | | offset:0, count:10 | +| │ └─TableReader_31 | 10.00 | root | | data:Limit_30 | +| │ └─Limit_30 | 10.00 | cop[tikv] | | offset:0, count:10 | +| │ └─TableFullScan_29 | 10.00 | cop[tikv] | table:t | keep order:true, stats:pseudo | +| └─TableReader_35(Probe) | 10000.00 | root | | data:TableFullScan_34 | +| └─TableFullScan_34 | 10000.00 | cop[tikv] | table:s | keep order:false, stats:pseudo | ++----------------------------------+----------+-----------+---------------+-------------------------------------------------+ +8 rows in set (0.00 sec) + +``` + +In the above query, TopN is first pushed to the outer table t. Because TopN needs to sort by `t.id` which is the primary key, it can be directly readout in order (`keep order: true`) without extra sorting in TopN, and Simplified to Limit. From a05e32284a2600e52dde2409795ef1f3a8bf0ebd Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 30 Jun 2020 21:03:42 +0800 Subject: [PATCH 02/17] Update topn-limit-push-down.md --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index bb88e5ae9754a..fdab6387dadd6 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -9,7 +9,7 @@ The Limit clause in SQL corresponds to the Limit operator node in the TiDB execu Similar to predicate pushdown, TopN (and Limit, the same below) will be pushed down in the execution plan tree as close to the data source as possible for completing the data filtering as early as possible, thereby significantly reducing data transmission or Calculated overhead. -Please refer to [Optimization Rules and Blacklist for Expression Pushdown](blacklist-control-plan.md) when you want to close the rule. +Please refer to [Optimization Rules and Blacklist for Expression Pushdown](/blacklist-control-plan.md) when you want to close the rule. ## Examples From a4678115774f3cd435af81e2a072437093fcdbfe Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:18:24 +0800 Subject: [PATCH 03/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index fdab6387dadd6..e362b6cd93c9e 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -7,7 +7,7 @@ category: performance The Limit clause in SQL corresponds to the Limit operator node in the TiDB execution plan tree, and the ORDER BY clause corresponds to the Sort operator. So adjacent Limit operator and Sort operator will be merged into the TopN operator node. It means that top N records will be returned according to a sorting rule. On the other hand, the Limit operator is equivalent to the TopN operator node with a null sorting rule. -Similar to predicate pushdown, TopN (and Limit, the same below) will be pushed down in the execution plan tree as close to the data source as possible for completing the data filtering as early as possible, thereby significantly reducing data transmission or Calculated overhead. +Similar to predicate pushdown, TopN and Limit are pushed down in the execution plan tree to a position as close to the data source as possible so that the required data is filtered at an early stage. In this way, the pushdown significantly reduces the overhead of data transmission and calculation. Please refer to [Optimization Rules and Blacklist for Expression Pushdown](/blacklist-control-plan.md) when you want to close the rule. From ab724079602796eb58369fd15a416a9930c1a100 Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:18:54 +0800 Subject: [PATCH 04/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index e362b6cd93c9e..3164231e74093 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -9,7 +9,7 @@ The Limit clause in SQL corresponds to the Limit operator node in the TiDB execu Similar to predicate pushdown, TopN and Limit are pushed down in the execution plan tree to a position as close to the data source as possible so that the required data is filtered at an early stage. In this way, the pushdown significantly reduces the overhead of data transmission and calculation. -Please refer to [Optimization Rules and Blacklist for Expression Pushdown](/blacklist-control-plan.md) when you want to close the rule. +To disable this rule, refer to [Optimization Rules and Blocklist for Expression Pushdown](/blocklist-control-plan.md). ## Examples From 4cf6c04f65c99239453505b512c30d4f349690af Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:19:06 +0800 Subject: [PATCH 05/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 3164231e74093..6289778197a64 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -1,5 +1,6 @@ --- -title: TopN and Limit operator push down +title: TopN and Limit Operator Pushdown +summary: Learn the implementation of TopN and Limit pushdown. category: performance --- From 0af278d4835d823fc891c17647585857aafd09ad Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:20:15 +0800 Subject: [PATCH 06/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 6289778197a64..51c5448125d75 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -14,7 +14,7 @@ To disable this rule, refer to [Optimization Rules and Blocklist for Expression ## Examples -The following describes TopN pushdown through some examples. +This section describes TopN pushdown through some examples. ### Example 1: Push down to the Coprocessor in the storage layer From c512a558cb0de1f28bd0a9d6caa5dc73e9486dd5 Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:20:28 +0800 Subject: [PATCH 07/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 51c5448125d75..81547f6687c8c 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -122,4 +122,4 @@ explain select * from t left join s on t.a = s.a order by t.id limit 10; ``` -In the above query, TopN is first pushed to the outer table t. Because TopN needs to sort by `t.id` which is the primary key, it can be directly readout in order (`keep order: true`) without extra sorting in TopN, and Simplified to Limit. +In the query above, TopN is first pushed to the outer table `t`. TopN needs to sort by `t.id`, which is the primary key and can be directly read in order (`keep order: true`) without extra sorting in TopN. Therefore, TopN is simplied as Limit. From b67e22fe3875d3342d407b0f331858443594eaa7 Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:20:46 +0800 Subject: [PATCH 08/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 81547f6687c8c..151af1d92226c 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -95,7 +95,7 @@ TopN can't be pushed down into the Inner Join. Taking the query above as an exam That is the reason why opN can't be pushed down into the Outer Join for the Inner table. TopN can not be pushed down when sorting by multiple columns on multiple tables. such as `t.a+s.a`.Only if the TopN exclusively depends on sorting outer table columns, it can be pushed down. -### Example 4: Conversion of TopN to Limit +### Example 4: Convert TopN to Limit {{< copyable "sql" >}} From ab84b89f84bcd7029aed9dcb63ed5aac1d44c45a Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:25:21 +0800 Subject: [PATCH 09/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 151af1d92226c..1fff966ade0e4 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -91,7 +91,7 @@ explain select * from t join s on t.a = s.a order by t.id limit 10; 6 rows in set (0.00 sec) ``` -TopN can't be pushed down into the Inner Join. Taking the query above as an example, if you get 100 records after Join, then do TopN can leave 10 records. If the remaining 10 records are filtered before TopN, there may be 5 left after the Join is done. It will make results differences. +TopN cannot be pushed down before `Inner Join`. Taking the query above as an example, if you get 100 records after Join, then you can have 10 records left after TopN. However, if TopN is performed first to get 10 records, only 5 records are left after Join. The pushdown might result in different results. That is the reason why opN can't be pushed down into the Outer Join for the Inner table. TopN can not be pushed down when sorting by multiple columns on multiple tables. such as `t.a+s.a`.Only if the TopN exclusively depends on sorting outer table columns, it can be pushed down. From f25082489b44bb5f1795cfde562ea887e06d260b Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:29:58 +0800 Subject: [PATCH 10/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 1fff966ade0e4..903407cd99883 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -37,7 +37,7 @@ explain select * from t order by a limit 10; 4 rows in set (0.00 sec) ``` -In this query, the TopN operator node was pushed down to TiKV for data filtering, and each Coprocessor returns only 10 records to TiDB. After TiDB integrates the data, the final filtering is performed. +In this query, the TopN operator node is pushed down to TiKV for data filtering, and each Coprocessor returns only 10 records to TiDB. After TiDB aggregates the data, the final filtering is performed. ### Example 2: TopN can be pushed down into the Join (sorting only depends on the columns in the outer table) From 7c59f504c782a18e4e1a207ffc6a4a6e6206243d Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:32:21 +0800 Subject: [PATCH 11/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 903407cd99883..71fd94a14fb69 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -39,7 +39,7 @@ explain select * from t order by a limit 10; In this query, the TopN operator node is pushed down to TiKV for data filtering, and each Coprocessor returns only 10 records to TiDB. After TiDB aggregates the data, the final filtering is performed. -### Example 2: TopN can be pushed down into the Join (sorting only depends on the columns in the outer table) +### Example 2: TopN can be pushed down into Join (sorting only depends on the columns in the outer table) {{< copyable "sql" >}} From 65671a55bd2ab2c802e2d6bc7a2cb62b4c0aadc7 Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:36:33 +0800 Subject: [PATCH 12/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 71fd94a14fb69..e4395ee45b676 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -65,7 +65,7 @@ explain select * from t left join s on t.a = s.a order by t.a limit 10; 8 rows in set (0.01 sec) ``` -In this query, the sorting of the TopN operator only depends on the columns in the outer table t, so TopN can be pushed down before Join, to reduce the computational of Join. Besides, TiDB also pushed TopN down to the storage layer. +In this query, the collation of the TopN operator only depends on the columns in the outer table `t`, so a calculation can be performed before pushing down TopN to Join, to reduce the calculation cost of the Join operation. Besides, TiDB also pushes TopN down to the storage layer. ### Example 3: TopN can't be pushed down into the Join From cc31a9c41b63478cbb9747afd34bcbe2ec1550bc Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:37:40 +0800 Subject: [PATCH 13/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index e4395ee45b676..e6fe305608fc7 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -93,7 +93,7 @@ explain select * from t join s on t.a = s.a order by t.id limit 10; TopN cannot be pushed down before `Inner Join`. Taking the query above as an example, if you get 100 records after Join, then you can have 10 records left after TopN. However, if TopN is performed first to get 10 records, only 5 records are left after Join. The pushdown might result in different results. -That is the reason why opN can't be pushed down into the Outer Join for the Inner table. TopN can not be pushed down when sorting by multiple columns on multiple tables. such as `t.a+s.a`.Only if the TopN exclusively depends on sorting outer table columns, it can be pushed down. +Similarly, TopN can neither be pushed down into the Outer Join for the inner table, nor can it be pushed down when its collation is related to columns on multiple tables, such as `t.a+s.a`. Only when the collation of TopN exclusively depends on columns on the outer table, can it be pushed down. ### Example 4: Convert TopN to Limit From 9959dcce681aaa64f48023431c4f3c7a597dc5ac Mon Sep 17 00:00:00 2001 From: Se-L Date: Tue, 7 Jul 2020 20:37:51 +0800 Subject: [PATCH 14/17] Update topn-limit-push-down.md Co-authored-by: Ran --- topn-limit-push-down.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index e6fe305608fc7..e4e43ed5e1837 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -6,7 +6,7 @@ category: performance # TopN and Limit operator push down -The Limit clause in SQL corresponds to the Limit operator node in the TiDB execution plan tree, and the ORDER BY clause corresponds to the Sort operator. So adjacent Limit operator and Sort operator will be merged into the TopN operator node. It means that top N records will be returned according to a sorting rule. On the other hand, the Limit operator is equivalent to the TopN operator node with a null sorting rule. +In the TiDB execution plan tree, the `LIMIT` clause in SQL corresponds to the Limit operator node, and the `ORDER BY` clause corresponds to the Sort operator node. The adjacent Limit operator and Sort operator are combined as the TopN operator node, which means that the top N records are returned according to a certain collation. That is to say, a Limit operator is equivalent to a TopN operator node with a null collation. Similar to predicate pushdown, TopN and Limit are pushed down in the execution plan tree to a position as close to the data source as possible so that the required data is filtered at an early stage. In this way, the pushdown significantly reduces the overhead of data transmission and calculation. From d5b2d14520be78c5f8ac21b69029b6bff424dca4 Mon Sep 17 00:00:00 2001 From: Ran Date: Wed, 8 Jul 2020 13:24:44 +0800 Subject: [PATCH 15/17] update wording; add the doc to TOC --- TOC.md | 1 + topn-limit-push-down.md | 22 ++++++++++++---------- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/TOC.md b/TOC.md index ce4643a991335..b382299463571 100644 --- a/TOC.md +++ b/TOC.md @@ -99,6 +99,7 @@ + SQL Optimization + [SQL Optimization Process](/sql-optimization-concepts.md) + Logic Optimization + + [TopN and Limit Pushdown](/topn-limit-push-down.md) + [Join Reorder](/join-reorder.md) + Physical Optimization + [Statistics](/statistics.md) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index e4e43ed5e1837..6aaaa3a675b62 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -1,22 +1,24 @@ --- title: TopN and Limit Operator Pushdown -summary: Learn the implementation of TopN and Limit pushdown. +summary: Learn the implementation of TopN and Limit operator pushdown. category: performance --- -# TopN and Limit operator push down +# TopN and Limit Operator Pushdown + +This document describes the implementation of TopN and Limit operator pushdown. In the TiDB execution plan tree, the `LIMIT` clause in SQL corresponds to the Limit operator node, and the `ORDER BY` clause corresponds to the Sort operator node. The adjacent Limit operator and Sort operator are combined as the TopN operator node, which means that the top N records are returned according to a certain collation. That is to say, a Limit operator is equivalent to a TopN operator node with a null collation. Similar to predicate pushdown, TopN and Limit are pushed down in the execution plan tree to a position as close to the data source as possible so that the required data is filtered at an early stage. In this way, the pushdown significantly reduces the overhead of data transmission and calculation. -To disable this rule, refer to [Optimization Rules and Blocklist for Expression Pushdown](/blocklist-control-plan.md). +To disable this rule, refer to [Optimization Rules and Blocklist for Expression Pushdown](/blocklist-control-plan.md). ## Examples -This section describes TopN pushdown through some examples. +This section illustrates TopN pushdown through some examples. -### Example 1: Push down to the Coprocessor in the storage layer +### Example 1: Push down to the Coprocessors in the storage layer {{< copyable "sql" >}} @@ -39,7 +41,7 @@ explain select * from t order by a limit 10; In this query, the TopN operator node is pushed down to TiKV for data filtering, and each Coprocessor returns only 10 records to TiDB. After TiDB aggregates the data, the final filtering is performed. -### Example 2: TopN can be pushed down into Join (sorting only depends on the columns in the outer table) +### Example 2: TopN can be pushed down into Join (the collation only depends on the columns in the outer table) {{< copyable "sql" >}} @@ -67,7 +69,7 @@ explain select * from t left join s on t.a = s.a order by t.a limit 10; In this query, the collation of the TopN operator only depends on the columns in the outer table `t`, so a calculation can be performed before pushing down TopN to Join, to reduce the calculation cost of the Join operation. Besides, TiDB also pushes TopN down to the storage layer. -### Example 3: TopN can't be pushed down into the Join +### Example 3: TopN cannot be pushed down before Join {{< copyable "sql" >}} @@ -91,9 +93,9 @@ explain select * from t join s on t.a = s.a order by t.id limit 10; 6 rows in set (0.00 sec) ``` -TopN cannot be pushed down before `Inner Join`. Taking the query above as an example, if you get 100 records after Join, then you can have 10 records left after TopN. However, if TopN is performed first to get 10 records, only 5 records are left after Join. The pushdown might result in different results. +TopN cannot be pushed down before `Inner Join`. Taking the query above as an example, if you get 100 records after Join, then you can have 10 records left after TopN. However, if TopN is performed first to get 10 records, only 5 records are left after Join. In such cases, the pushdown results in different results. -Similarly, TopN can neither be pushed down into the Outer Join for the inner table, nor can it be pushed down when its collation is related to columns on multiple tables, such as `t.a+s.a`. Only when the collation of TopN exclusively depends on columns on the outer table, can it be pushed down. +Similarly, TopN can neither be pushed down to the inner table of Outer Join, nor can it be pushed down when its collation is related to columns on multiple tables, such as `t.a+s.a`. Only when the collation of TopN exclusively depends on columns on the outer table, can TopN be pushed down. ### Example 4: Convert TopN to Limit @@ -122,4 +124,4 @@ explain select * from t left join s on t.a = s.a order by t.id limit 10; ``` -In the query above, TopN is first pushed to the outer table `t`. TopN needs to sort by `t.id`, which is the primary key and can be directly read in order (`keep order: true`) without extra sorting in TopN. Therefore, TopN is simplied as Limit. +In the query above, TopN is first pushed to the outer table `t`. TopN needs to sort by `t.id`, which is the primary key and can be directly read in order (`keep order: true`) without extra sorting in TopN. Therefore, TopN is simplified as Limit. From 6c2e2c827d1a577ec498737f09db36901d26e082 Mon Sep 17 00:00:00 2001 From: Ran Date: Tue, 14 Jul 2020 16:53:08 +0800 Subject: [PATCH 16/17] Update topn-limit-push-down.md Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- topn-limit-push-down.md | 1 - 1 file changed, 1 deletion(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 6aaaa3a675b62..2c5ea83365a83 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -1,7 +1,6 @@ --- title: TopN and Limit Operator Pushdown summary: Learn the implementation of TopN and Limit operator pushdown. -category: performance --- # TopN and Limit Operator Pushdown From edf4f85683067ab6aa8818e214ed94ad55d8c05e Mon Sep 17 00:00:00 2001 From: Ran Date: Tue, 14 Jul 2020 17:06:53 +0800 Subject: [PATCH 17/17] collation -> sorting rule; update title --- topn-limit-push-down.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/topn-limit-push-down.md b/topn-limit-push-down.md index 2c5ea83365a83..c8dfe67031100 100644 --- a/topn-limit-push-down.md +++ b/topn-limit-push-down.md @@ -1,13 +1,13 @@ --- -title: TopN and Limit Operator Pushdown +title: TopN and Limit Operator Push Down summary: Learn the implementation of TopN and Limit operator pushdown. --- -# TopN and Limit Operator Pushdown +# TopN and Limit Operator Push Down This document describes the implementation of TopN and Limit operator pushdown. -In the TiDB execution plan tree, the `LIMIT` clause in SQL corresponds to the Limit operator node, and the `ORDER BY` clause corresponds to the Sort operator node. The adjacent Limit operator and Sort operator are combined as the TopN operator node, which means that the top N records are returned according to a certain collation. That is to say, a Limit operator is equivalent to a TopN operator node with a null collation. +In the TiDB execution plan tree, the `LIMIT` clause in SQL corresponds to the Limit operator node, and the `ORDER BY` clause corresponds to the Sort operator node. The adjacent Limit operator and Sort operator are combined as the TopN operator node, which means that the top N records are returned according to a certain sorting rule. That is to say, a Limit operator is equivalent to a TopN operator node with a null sorting rule. Similar to predicate pushdown, TopN and Limit are pushed down in the execution plan tree to a position as close to the data source as possible so that the required data is filtered at an early stage. In this way, the pushdown significantly reduces the overhead of data transmission and calculation. @@ -40,7 +40,7 @@ explain select * from t order by a limit 10; In this query, the TopN operator node is pushed down to TiKV for data filtering, and each Coprocessor returns only 10 records to TiDB. After TiDB aggregates the data, the final filtering is performed. -### Example 2: TopN can be pushed down into Join (the collation only depends on the columns in the outer table) +### Example 2: TopN can be pushed down into Join (the sorting rule only depends on the columns in the outer table) {{< copyable "sql" >}} @@ -66,7 +66,7 @@ explain select * from t left join s on t.a = s.a order by t.a limit 10; 8 rows in set (0.01 sec) ``` -In this query, the collation of the TopN operator only depends on the columns in the outer table `t`, so a calculation can be performed before pushing down TopN to Join, to reduce the calculation cost of the Join operation. Besides, TiDB also pushes TopN down to the storage layer. +In this query, the sorting rule of the TopN operator only depends on the columns in the outer table `t`, so a calculation can be performed before pushing down TopN to Join, to reduce the calculation cost of the Join operation. Besides, TiDB also pushes TopN down to the storage layer. ### Example 3: TopN cannot be pushed down before Join @@ -94,7 +94,7 @@ explain select * from t join s on t.a = s.a order by t.id limit 10; TopN cannot be pushed down before `Inner Join`. Taking the query above as an example, if you get 100 records after Join, then you can have 10 records left after TopN. However, if TopN is performed first to get 10 records, only 5 records are left after Join. In such cases, the pushdown results in different results. -Similarly, TopN can neither be pushed down to the inner table of Outer Join, nor can it be pushed down when its collation is related to columns on multiple tables, such as `t.a+s.a`. Only when the collation of TopN exclusively depends on columns on the outer table, can TopN be pushed down. +Similarly, TopN can neither be pushed down to the inner table of Outer Join, nor can it be pushed down when its sorting rule is related to columns on multiple tables, such as `t.a+s.a`. Only when the sorting rule of TopN exclusively depends on columns on the outer table, can TopN be pushed down. ### Example 4: Convert TopN to Limit