From b2c36e9f0c0831b0977a779e388799f1ef3cff0f Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Wed, 5 Sep 2018 13:48:36 -0600 Subject: [PATCH 1/9] Update understanding-the-query-execution-plan.md --- sql/understanding-the-query-execution-plan.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index 39a29d93e2a18..c2fe826715fc2 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -29,6 +29,14 @@ Currently, the `EXPLAIN` statement returns the following four columns: id, count | task | the task that the current operator belongs to. The current execution plan contains two types of tasks: 1) the **root** task that runs on the TiDB server; 2) the **cop** task that runs concurrently on the TiKV server. The topological relations of the current execution plan in the task level is that a root task can be followed by many cop tasks. The root task uses the output of cop task as the input. The cop task executes the tasks that TiDB pushes to TiKV. Each cop task scatters in the TiKV cluster and is executed by multiple processes. | | operator info | The details about each operator. The information of each operator differs from others, see [Operator Info](#operator-info).| +### Example Usage + +Using the [bikeshare example database](bikeshare-example-database.md): + +``` +SELECT 1 FROM DUAL; +``` + ## Overview ### Introduction to task From 347f469fdc09247a99b73f6e534b7b978e8b4104 Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Wed, 5 Sep 2018 14:22:46 -0600 Subject: [PATCH 2/9] Added Example Usage --- sql/understanding-the-query-execution-plan.md | 36 +++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index c2fe826715fc2..c3f9dfe0990ff 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -31,12 +31,44 @@ Currently, the `EXPLAIN` statement returns the following four columns: id, count ### Example Usage -Using the [bikeshare example database](bikeshare-example-database.md): +Using the [bikeshare example database](../bikeshare-example-database.md): ``` -SELECT 1 FROM DUAL; +mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +| id | count | task | operator info | ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +| StreamAgg_20 | 1.00 | root | funcs:count(col_0) | +| └─TableReader_21 | 1.00 | root | data:StreamAgg_9 | +| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) | +| └─Selection_19 | 8166.73 | cop | ge(bikeshare.trips.start_date, 2017-07-01 00:00:00.000000), le(bikeshare.trips.start_date, 2017-07-01 23:59:59.000000) | +| └─TableScan_18 | 19117643.00 | cop | table:trips, range:[-inf,+inf], keep order:false | ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +5 rows in set (0.00 sec) ``` +Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). Each of the TiKV nodes return `1.00` rows to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return `1.00` rows to the client. + +The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer way required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: + +``` +mysql> ALTER TABLE trips ADD INDEX (start_date); +.. +mmysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +| id | count | task | operator info | ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +| StreamAgg_25 | 1.00 | root | funcs:count(col_0) | +| └─IndexReader_26 | 1.00 | root | index:StreamAgg_9 | +| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) | +| └─IndexScan_24 | 8166.73 | cop | table:trips, index:start_date, range:[2017-07-01 00:00:00,2017-07-01 23:59:59], keep order:false | ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +4 rows in set (0.01 sec) + +``` + +In the revisted `EXPLAIN` we can see the count of rows scanned has reduced via the use of an index. On a reference system, this reduced query execution time reduced from 50.41 seconds to 0.00 seconds! + ## Overview ### Introduction to task From 381d973a31b0dff5cdb0911832f2707fa3f64e2f Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Wed, 5 Sep 2018 14:23:23 -0600 Subject: [PATCH 3/9] Update understanding-the-query-execution-plan.md --- sql/understanding-the-query-execution-plan.md | 1 - 1 file changed, 1 deletion(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index c3f9dfe0990ff..996e214119a13 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -64,7 +64,6 @@ mmysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 | └─IndexScan_24 | 8166.73 | cop | table:trips, index:start_date, range:[2017-07-01 00:00:00,2017-07-01 23:59:59], keep order:false | +------------------------+---------+------+--------------------------------------------------------------------------------------------------+ 4 rows in set (0.01 sec) - ``` In the revisted `EXPLAIN` we can see the count of rows scanned has reduced via the use of an index. On a reference system, this reduced query execution time reduced from 50.41 seconds to 0.00 seconds! From 550a95f952c23580f9b217c8347b440955556a07 Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Wed, 5 Sep 2018 14:23:43 -0600 Subject: [PATCH 4/9] Update understanding-the-query-execution-plan.md --- sql/understanding-the-query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index 996e214119a13..d70f471381517 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -54,7 +54,7 @@ The good news with this query is that most of the work is pushed down to the cop ``` mysql> ALTER TABLE trips ADD INDEX (start_date); .. -mmysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; +mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; +------------------------+---------+------+--------------------------------------------------------------------------------------------------+ | id | count | task | operator info | +------------------------+---------+------+--------------------------------------------------------------------------------------------------+ From 2bdf420220f0b3d2324a1ecc0f39e5a0e8900b79 Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Wed, 5 Sep 2018 14:32:37 -0600 Subject: [PATCH 5/9] Update understanding-the-query-execution-plan.md --- sql/understanding-the-query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index d70f471381517..d754cafa08f9a 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -47,7 +47,7 @@ mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 0 5 rows in set (0.00 sec) ``` -Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). Each of the TiKV nodes return `1.00` rows to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return `1.00` rows to the client. +Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). Each of the TiKV nodes return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return `1.00` row to the client. The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer way required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: From 2110b0f9b65849fce8d43328905724f443e823e7 Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Wed, 5 Sep 2018 14:33:54 -0600 Subject: [PATCH 6/9] Changed double space to single --- sql/understanding-the-query-execution-plan.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index d754cafa08f9a..8ba054f6db7a0 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -47,9 +47,9 @@ mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 0 5 rows in set (0.00 sec) ``` -Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). Each of the TiKV nodes return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return `1.00` row to the client. +Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). Each of the TiKV nodes return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return `1.00` row to the client. -The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer way required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: +The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer way required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: ``` mysql> ALTER TABLE trips ADD INDEX (start_date); @@ -66,7 +66,7 @@ mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 0 4 rows in set (0.01 sec) ``` -In the revisted `EXPLAIN` we can see the count of rows scanned has reduced via the use of an index. On a reference system, this reduced query execution time reduced from 50.41 seconds to 0.00 seconds! +In the revisted `EXPLAIN` we can see the count of rows scanned has reduced via the use of an index. On a reference system, this reduced query execution time reduced from 50.41 seconds to 0.00 seconds! ## Overview From 422f4df7dbdd2798fc69b6dd673ab56c4361142c Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Mon, 10 Sep 2018 07:18:53 -0600 Subject: [PATCH 7/9] Addressed PR feedback --- sql/understanding-the-query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index 8ba054f6db7a0..fae54a837c19f 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -47,7 +47,7 @@ mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 0 5 rows in set (0.00 sec) ``` -Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). Each of the TiKV nodes return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return `1.00` row to the client. +Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). The `count` column shows an approximate number of rows that will be processed, which is estimated with the help of table statistics. In this query it is estimated that each of the TiKV nodes will return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return an estimated `1.00` row to the client. The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer way required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: From 634bf2cd9d1b8567c8a929941c979974d548ad5e Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Tue, 11 Sep 2018 21:55:08 -0600 Subject: [PATCH 8/9] Address PR feedback --- sql/understanding-the-query-execution-plan.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index fae54a837c19f..835da7e1447a6 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -29,7 +29,7 @@ Currently, the `EXPLAIN` statement returns the following four columns: id, count | task | the task that the current operator belongs to. The current execution plan contains two types of tasks: 1) the **root** task that runs on the TiDB server; 2) the **cop** task that runs concurrently on the TiKV server. The topological relations of the current execution plan in the task level is that a root task can be followed by many cop tasks. The root task uses the output of cop task as the input. The cop task executes the tasks that TiDB pushes to TiKV. Each cop task scatters in the TiKV cluster and is executed by multiple processes. | | operator info | The details about each operator. The information of each operator differs from others, see [Operator Info](#operator-info).| -### Example Usage +### Example usage Using the [bikeshare example database](../bikeshare-example-database.md): @@ -49,7 +49,7 @@ mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 0 Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). The `count` column shows an approximate number of rows that will be processed, which is estimated with the help of table statistics. In this query it is estimated that each of the TiKV nodes will return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return an estimated `1.00` row to the client. -The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer way required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: +The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer is required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: ``` mysql> ALTER TABLE trips ADD INDEX (start_date); @@ -66,7 +66,7 @@ mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 0 4 rows in set (0.01 sec) ``` -In the revisted `EXPLAIN` we can see the count of rows scanned has reduced via the use of an index. On a reference system, this reduced query execution time reduced from 50.41 seconds to 0.00 seconds! +In the revisited `EXPLAIN` you can see the count of rows scanned has reduced via the use of an index. On a reference system, the query execution time reduced from 50.41 seconds to 0.00 seconds! ## Overview From 74713a551f019d2702c8bcd9e6b65a94eb10d2c3 Mon Sep 17 00:00:00 2001 From: Morgan Tocker Date: Tue, 11 Sep 2018 21:56:26 -0600 Subject: [PATCH 9/9] Update understanding-the-query-execution-plan.md --- sql/understanding-the-query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index 835da7e1447a6..c840af4b03570 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -47,7 +47,7 @@ mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 0 5 rows in set (0.00 sec) ``` -Here we can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). The `count` column shows an approximate number of rows that will be processed, which is estimated with the help of table statistics. In this query it is estimated that each of the TiKV nodes will return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return an estimated `1.00` row to the client. +Here you can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). The `count` column shows an approximate number of rows that will be processed, which is estimated with the help of table statistics. In this query it is estimated that each of the TiKV nodes will return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return an estimated `1.00` row to the client. The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer is required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: