From edb3d33463a50e8af9648143076f686dc173819b Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 13 Jul 2020 17:03:59 +0800 Subject: [PATCH 1/4] add column pruning --- TOC.md | 1 + column-pruning.md | 21 +++++++++++++++++++++ 2 files changed, 22 insertions(+) create mode 100644 column-pruning.md diff --git a/TOC.md b/TOC.md index ad1f7d71200b6..8ed103a836dcf 100644 --- a/TOC.md +++ b/TOC.md @@ -104,6 +104,7 @@ + SQL Optimization + [SQL Optimization Process](/sql-optimization-concepts.md) + Logic Optimization + + [Column Pruning](/column-pruning.md) + [Join Reorder](/join-reorder.md) + Physical Optimization + [Statistics](/statistics.md) diff --git a/column-pruning.md b/column-pruning.md new file mode 100644 index 0000000000000..c4b175e068c96 --- /dev/null +++ b/column-pruning.md @@ -0,0 +1,21 @@ +--- +title: Column Pruning +summary: Learn about the usage of column pruning in TiDB. +category: performance +--- + +# Column Pruning + +The basic idea of column pruning is that for columns not used in the operator, the optimizer does not need to retain them during optimization. Removing these columns reduces the use of I/O resources and facilitates the subsequent optimization. The following is an example of column repetition: + +Suppose there are four columns (a, b, c, and d) in table t. You can execute the following statement: + +{{< copyable "sql" >}} + +```sql +select a from t where b> 5 +``` + +In this query, only column a and column b are used, and column c and column d are redundant. Regarding the query plan of this statement, the `Selection` operator uses column b. Then the `DataSource` operator uses columns a and column b. Columns c and column d can be pruned since the `DataSource` operator does not read them. + +Therefore, when TiDB performs a top-down scan during the logic optimization phase, unnecessary columns are pruned to reduce waste of resources. This scanning process is called "Column Pruning", corresponding to the `columnPruner` rule. If you want to close this rule, refer to [The Blocklist of Optimization Rules and Expression Pushdown](/blacklist-control-plan.md). From aef5cc88e228d1597ded5e09808e2a7f965ad78f Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 13 Jul 2020 17:25:18 +0800 Subject: [PATCH 2/4] remove category --- column-pruning.md | 1 - 1 file changed, 1 deletion(-) diff --git a/column-pruning.md b/column-pruning.md index c4b175e068c96..f2bf81a272588 100644 --- a/column-pruning.md +++ b/column-pruning.md @@ -1,7 +1,6 @@ --- title: Column Pruning summary: Learn about the usage of column pruning in TiDB. -category: performance --- # Column Pruning From b89bc5a626b6f0fa7630a6220f66281d3e4c0dc8 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 13 Jul 2020 20:42:21 +0800 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- column-pruning.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/column-pruning.md b/column-pruning.md index f2bf81a272588..6b3de909f7a93 100644 --- a/column-pruning.md +++ b/column-pruning.md @@ -15,6 +15,6 @@ Suppose there are four columns (a, b, c, and d) in table t. You can execute the select a from t where b> 5 ``` -In this query, only column a and column b are used, and column c and column d are redundant. Regarding the query plan of this statement, the `Selection` operator uses column b. Then the `DataSource` operator uses columns a and column b. Columns c and column d can be pruned since the `DataSource` operator does not read them. +In this query, only column a and column b are used, and column c and column d are redundant. Regarding the query plan of this statement, the `Selection` operator uses column b. Then the `DataSource` operator uses columns a and column b. Columns c and column d can be pruned because the `DataSource` operator does not read them. -Therefore, when TiDB performs a top-down scan during the logic optimization phase, unnecessary columns are pruned to reduce waste of resources. This scanning process is called "Column Pruning", corresponding to the `columnPruner` rule. If you want to close this rule, refer to [The Blocklist of Optimization Rules and Expression Pushdown](/blacklist-control-plan.md). +Therefore, when TiDB performs a top-down scanning during the logic optimization phase, redundant columns are pruned to reduce waste of resources. This scanning process is called "Column Pruning", corresponding to the `columnPruner` rule. If you want to disable this rule, refer to [The Blocklist of Optimization Rules and Expression Pushdown](/blacklist-control-plan.md). From 3f2aa6adf105cbd3a1e1b8886cb8cfeaff4d90ce Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Fri, 17 Jul 2020 14:45:20 +0800 Subject: [PATCH 4/4] update link --- column-pruning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/column-pruning.md b/column-pruning.md index 6b3de909f7a93..d018ab15084a6 100644 --- a/column-pruning.md +++ b/column-pruning.md @@ -17,4 +17,4 @@ select a from t where b> 5 In this query, only column a and column b are used, and column c and column d are redundant. Regarding the query plan of this statement, the `Selection` operator uses column b. Then the `DataSource` operator uses columns a and column b. Columns c and column d can be pruned because the `DataSource` operator does not read them. -Therefore, when TiDB performs a top-down scanning during the logic optimization phase, redundant columns are pruned to reduce waste of resources. This scanning process is called "Column Pruning", corresponding to the `columnPruner` rule. If you want to disable this rule, refer to [The Blocklist of Optimization Rules and Expression Pushdown](/blacklist-control-plan.md). +Therefore, when TiDB performs a top-down scanning during the logic optimization phase, redundant columns are pruned to reduce waste of resources. This scanning process is called "Column Pruning", corresponding to the `columnPruner` rule. If you want to disable this rule, refer to [The Blocklist of Optimization Rules and Expression Pushdown](/blocklist-control-plan.md).