Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions sql/understanding-the-query-execution-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,45 @@ Currently, the `EXPLAIN` statement returns the following four columns: id, count
| task | the task that the current operator belongs to. The current execution plan contains two types of tasks: 1) the **root** task that runs on the TiDB server; 2) the **cop** task that runs concurrently on the TiKV server. The topological relations of the current execution plan in the task level is that a root task can be followed by many cop tasks. The root task uses the output of cop task as the input. The cop task executes the tasks that TiDB pushes to TiKV. Each cop task scatters in the TiKV cluster and is executed by multiple processes. |
| operator info | The details about each operator. The information of each operator differs from others, see [Operator Info](#operator-info).|

### Example usage

Using the [bikeshare example database](../bikeshare-example-database.md):

```
mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59';
+--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+
| id | count | task | operator info |
+--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_20 | 1.00 | root | funcs:count(col_0) |
| └─TableReader_21 | 1.00 | root | data:StreamAgg_9 |
| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) |
| └─Selection_19 | 8166.73 | cop | ge(bikeshare.trips.start_date, 2017-07-01 00:00:00.000000), le(bikeshare.trips.start_date, 2017-07-01 23:59:59.000000) |
| └─TableScan_18 | 19117643.00 | cop | table:trips, range:[-inf,+inf], keep order:false |
+--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
```

Here you can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). The `count` column shows an approximate number of rows that will be processed, which is estimated with the help of table statistics. In this query it is estimated that each of the TiKV nodes will return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return an estimated `1.00` row to the client.

The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer is required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`:

```
mysql> ALTER TABLE trips ADD INDEX (start_date);
..
mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59';
+------------------------+---------+------+--------------------------------------------------------------------------------------------------+
| id | count | task | operator info |
+------------------------+---------+------+--------------------------------------------------------------------------------------------------+
| StreamAgg_25 | 1.00 | root | funcs:count(col_0) |
| └─IndexReader_26 | 1.00 | root | index:StreamAgg_9 |
| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) |
| └─IndexScan_24 | 8166.73 | cop | table:trips, index:start_date, range:[2017-07-01 00:00:00,2017-07-01 23:59:59], keep order:false |
+------------------------+---------+------+--------------------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)
```

In the revisited `EXPLAIN` you can see the count of rows scanned has reduced via the use of an index. On a reference system, the query execution time reduced from 50.41 seconds to 0.00 seconds!

## Overview

### Introduction to task
Expand Down