diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index 39a29d93e2a18..c840af4b03570 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -29,6 +29,45 @@ Currently, the `EXPLAIN` statement returns the following four columns: id, count | task | the task that the current operator belongs to. The current execution plan contains two types of tasks: 1) the **root** task that runs on the TiDB server; 2) the **cop** task that runs concurrently on the TiKV server. The topological relations of the current execution plan in the task level is that a root task can be followed by many cop tasks. The root task uses the output of cop task as the input. The cop task executes the tasks that TiDB pushes to TiKV. Each cop task scatters in the TiKV cluster and is executed by multiple processes. | | operator info | The details about each operator. The information of each operator differs from others, see [Operator Info](#operator-info).| +### Example usage + +Using the [bikeshare example database](../bikeshare-example-database.md): + +``` +mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +| id | count | task | operator info | ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +| StreamAgg_20 | 1.00 | root | funcs:count(col_0) | +| └─TableReader_21 | 1.00 | root | data:StreamAgg_9 | +| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) | +| └─Selection_19 | 8166.73 | cop | ge(bikeshare.trips.start_date, 2017-07-01 00:00:00.000000), le(bikeshare.trips.start_date, 2017-07-01 23:59:59.000000) | +| └─TableScan_18 | 19117643.00 | cop | table:trips, range:[-inf,+inf], keep order:false | ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +5 rows in set (0.00 sec) +``` + +Here you can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). The `count` column shows an approximate number of rows that will be processed, which is estimated with the help of table statistics. In this query it is estimated that each of the TiKV nodes will return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return an estimated `1.00` row to the client. + +The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer is required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: + +``` +mysql> ALTER TABLE trips ADD INDEX (start_date); +.. +mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +| id | count | task | operator info | ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +| StreamAgg_25 | 1.00 | root | funcs:count(col_0) | +| └─IndexReader_26 | 1.00 | root | index:StreamAgg_9 | +| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) | +| └─IndexScan_24 | 8166.73 | cop | table:trips, index:start_date, range:[2017-07-01 00:00:00,2017-07-01 23:59:59], keep order:false | ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +4 rows in set (0.01 sec) +``` + +In the revisited `EXPLAIN` you can see the count of rows scanned has reduced via the use of an index. On a reference system, the query execution time reduced from 50.41 seconds to 0.00 seconds! + ## Overview ### Introduction to task