*: make `explain` support `explain anaylze` #7827

lysu · 2018-10-02T04:13:55Z

What problem does this PR solve?

Currently, our explain command only display the physical plan result without execution.

but sometime we also need execution info to help us find reality plan question or executor performance bottleneck.

this PR add anaylze option to explain command(it just like postgresql's explain anaylze does https://www.postgresql.org/docs/9.1/static/sql-explain.html), and it will cause the statement to be actually executed, not only planned.

it works like this:

mysql> explain analyze select a_id, count(*) from v left join `u` on `u`.id = v.u_id where p_id = 1 and `u`.t > '2018-08-01' and v.t>'2018-08-01' group by a_id;
+------------------------------+---------+------+---------------------------------------------------------------------------------------+---------------------------------------+
| id                           | count   | task | operator info                                                                         | execution_info                        |
+------------------------------+---------+------+---------------------------------------------------------------------------------------+---------------------------------------+
| Projection_7                 | 2.67    | root | test.v.a_id, 5_col_0                                                          | actual_time:0.504215, loops:1, rows:0 |
| └─HashAgg_10                 | 2.67    | root | group by:test.v.a_id, funcs:count(1), firstrow(test.v.a_id)           | time:0.497296, loops:1, rows:0 |
|   └─IndexJoin_18             | 4.17    | root | inner join, inner:TableReader_17, outer key:test.v.u_id, inner key:test.u.id | time:0.394209, loops:1, rows:0 |
|     ├─IndexLookUp_40         | 3.33    | root |                                                                                       | time:0.365342, loops:1, rows:0 |
|     │ ├─IndexScan_37         | 10.00   | cop  | table:v, index:p_id, u_id, range:[1,1], keep order:false, stats:pseudo       |   |
|     │ └─Selection_39         | 3.33    | cop  | gt(test.v.t, 2018-08-01 00:00:00.000000)                                |   |
|     │   └─TableScan_38       | 10.00   | cop  | table:v, keep order:false, stats:pseudo                                            |   |
|     └─TableReader_17         | 3333.33 | root | data:Selection_16                                                                     |   |
|       └─Selection_16         | 3333.33 | cop  | gt(test.u.t, 2018-08-01 00:00:00.000000)                                |   |
|         └─TableScan_15       | 10.00   | cop  | table:u, range: decided by [test.v.u_id], keep order:false, stats:pseudo     |   |
+------------------------------+---------+------+---------------------------------------------------------------------------------------+---------------------------------------+
10 rows in set (0.00 sec)

the difference is with addition execution_info columns, and has be actual executed.

What is changed and how it works?

change Executor's Next from method to a func field to make it bindable
bind "Normal version Next" or "Explain version Next" once in plan build phase
maintain ExecStats(a map), and init ExecStates entry in plan building phase(single thread, so map itself no race condition, but for entry...)
add switch on SessionCtx.StmtCtx level to control Normal or Explain execution
modify Explain to open switch then actual execute plan and output addition column.

Remain question

Cop task need get info from tikv

Check List

Tests

Old test
Manual test (add detailed scripts or steps below)

Code changes

Has exported function/method change
Has exported variable/fields change
Has interface methods change

Side effects

No

Related changes

Need to update the documentation
Need to be included in the release note

This change is

lysu · 2018-10-02T04:18:16Z

/run-all-tests

shenli · 2018-10-02T13:06:00Z

/cc @zhexuany
@lysu Is this the same thing with tracing?

lysu · 2018-10-03T10:26:12Z

@shenli I think it's a different perspective of same question.

explain anaylze only focus on the perspective of plan execute stats to help find plan question(so in this "view" we can combine plan detail with execution info), pg like db seems have explan anaylze and tracing at same time.

explain anaylze collect and using stats data(sum of time, rows, count) in executor level, but tracing more low level data(timeline, row per call, call span...so it can find more detail question) and display in custom level. so tracing need do some agg to get the same view, it's not simple and will collect some other nouse info for plan anaylze.

It's in Proposal, and need more disscuss, maybe finally we can let command's stats datasource be tracing-data too. 😆

zz-jason · 2018-10-08T02:05:30Z

@lysu we don't need to bind different next() functions. we only need to add a if statement in the next() function for every operator to record the executor statistics. and these executor statistics can also be reused in the trace statement.

eurekaka

Nice work, LGTM up till now.

winoros · 2018-10-08T05:39:32Z

Side effects
Possible performance regression

Will there be that?

winoros · 2018-10-08T05:48:58Z

Do we need to record the loop information?

eurekaka · 2018-10-08T06:43:32Z

Side effects
Possible performance regression

Will there be that?

Deeper call stack I guess.

lysu · 2018-10-08T06:56:44Z

yes~it will get deeper call stack, and current modification way is change Next() from method to function pointer to make wrap Next easier, but calling function pointer seems slower than previous way. if use @zz-jason 's if-way will get better in performance but it needs make modification point to every operator.

and loops seems useful when chunk-size is highly-customized and not easy deduce from rows

zz-jason · 2018-10-08T07:00:18Z

@lysu both ways need to make modification point to every operator.

lysu · 2018-10-08T07:58:40Z

@zz-jason yes, it need change operator to support cutting point, but "track logic" isn't distribute to every operator just in one place at

	e.nextFunc = func(ctx context.Context, chk *chunk.Chunk) error {
 		start := time.Now()
 		err := nextFunc(ctx, chk)
 		e.execStat.Record(time.Now().Sub(start), chk.NumRows())
 		return err
 	}

and add new operator that follow current idiom will no need take care about "track logic" any more.

metric is classical crosscutting logic just like https://en.wikipedia.org/wiki/Aspect-oriented_programming or golang's http.Handler(prometheus no need call every http handler).

but performance is question, I will change it to if-way later 😆

winoros · 2018-10-08T11:50:04Z

Will there be one test that test its explain result?

lysu · 2018-10-09T06:07:49Z

@winoros time part is change every time, so I add some simple test in https://github.com/pingcap/tidb/pull/7827/files#diff-ddcc9b9aba1b5bc2d2338389e13a3bd8R40 PTAL

lysu · 2018-10-09T06:11:48Z

/run-all-tests

lysu · 2018-10-11T11:07:54Z

@zz-jason @jackysp PTAL if free

zz-jason · 2018-10-11T11:16:13Z

util/execdetails/execdetails.go

+type ExecStats map[string]*ExecStat
+
+// ExecStat collects one executor's execution info.
+type ExecStat struct {


How about: s/ExecStat/RuntimeStats/

Please add comments for each struct field.

zz-jason · 2018-10-11T11:16:49Z

util/execdetails/execdetails.go

+}
+
+// NewExecutorStats creates new executor collector.
+func NewExecutorStats() ExecStats {


s/NewExecutorStats/NewRuntimeStats/

zz-jason · 2018-10-11T11:22:32Z

util/execdetails/execdetails.go

@@ -52,3 +54,55 @@ func (d ExecDetails) String() string {
 	}
 	return strings.Join(parts, " ")
 }
+
+// ExecStats collects executors's execution info.
+type ExecStats map[string]*ExecStat


Why use a map to store the runtime statistics of all the executors? Maybe its simpler to make each executor to hold a ExecStat object?

Because I want to get RuntimeStat from a PhysicalPlan, so this map will give ExplainID to RuntimeStat mapping. or do we have better way to get Executor from an PhysicalPlan~? 🐱

And it seem some physical is "1 to N" relationship with executor. (e.g. IndexLookUpExecutor)

Except that map lookup also easy to combine coprocessor's RuntimeStats in furture~?

zz-jason · 2018-10-11T11:28:08Z

executor/update.go

@@ -112,6 +114,12 @@ func (e *UpdateExec) canNotUpdate(handle types.Datum) bool {

 // Next implements the Executor Next interface.
 func (e *UpdateExec) Next(ctx context.Context, chk *chunk.Chunk) error {
+	if e.execStat != nil {
+		start := time.Now()
+		defer func() {


it can be simplified to:

defer e.execStat.Record(time.Now().Sub(start), chk.NumRows())

does update statement support the explain statement?

update support it and will execute selectExec part and not modify records...but this line is brainless added code that can be removed too... 😹

zz-jason · 2018-10-11T11:29:50Z

executor/simple.go

@@ -49,6 +50,12 @@ type SimpleExec struct {

 // Next implements the Executor Next interface.
 func (e *SimpleExec) Next(ctx context.Context, chk *chunk.Chunk) (err error) {


most of the statements executed by SimpleExec can not be explained.

zz-jason · 2018-10-11T11:30:09Z

executor/show.go

@@ -64,6 +64,12 @@ type ShowExec struct {

 // Next implements the Executor Next interface.
 func (e *ShowExec) Next(ctx context.Context, chk *chunk.Chunk) error {


the show statement can not be explained.

zz-jason · 2018-10-11T11:30:19Z

executor/set.go

@@ -43,6 +43,12 @@ type SetExecutor struct {

 // Next implements the Executor Next interface.
 func (e *SetExecutor) Next(ctx context.Context, chk *chunk.Chunk) error {


crazycs520 · 2018-10-11T14:06:29Z

executor/aggregate.go

@@ -501,6 +502,10 @@ func (w *HashAggFinalWorker) run(ctx sessionctx.Context, waitGroup *sync.WaitGro

 // Next implements the Executor Next interface.
 func (e *HashAggExec) Next(ctx context.Context, chk *chunk.Chunk) error {
+	if e.runtimeStat != nil {
+		start := time.Now()
+		defer e.runtimeStat.Record(time.Now().Sub(start), chk.NumRows())


Defer function input parameter will be calculated in defer define.

func main() { foo() foo1() } func foo() { start := time.Now() defer func() { fmt.Println(time.Since(start)) }() time.Sleep(1 * time.Second) } func foo1() { start := time.Now() defer fmt.Println(time.Since(start)) time.Sleep(1 * time.Second) }

output:

1.001687191s 319ns

emmm.....that's right..@zz-jason u mislead me.:rofl:

😂I'm sorry

jackysp · 2018-10-12T11:08:05Z

executor/revoke.go

@@ -15,7 +15,6 @@ package executor

 import (
 	"fmt"
-


Is this modification caused by gofmt?

jackysp · 2018-10-12T11:08:42Z

executor/prepared_test.go

@@ -178,11 +178,6 @@ func (s *testSuite) TestPrepared(c *C) {
 		_, _, fields, err = tk.Se.PrepareStmt("update prepare3 set a = ?")
 		c.Assert(err, IsNil)
 		c.Assert(len(fields), Equals, 0)
-
-		// Coverage.


Why remove this test?

jackysp · 2018-10-12T11:08:55Z

executor/load_stats.go

@@ -15,7 +15,6 @@ package executor

 import (
 	"encoding/json"
-


Is this modification caused by gofmt?

- rename - remove unuse code

jackysp

LGTM

zz-jason · 2018-10-12T12:01:46Z

it seems that there is only one LGTM?

jackysp · 2018-10-12T12:08:44Z

Seems @eurekaka send another LGTM.

zz-jason · 2018-10-12T12:13:19Z

planner/core/planbuilder.go

+		case ast.ExplainFormatROW:
+			retFields := []string{"id", "count", "task", "operator info"}
+			if explain.Analyze {
+				retFields = append(retFields, "execution_info")


s/execution_info/execution info/

zz-jason · 2018-10-12T12:15:56Z

planner/core/common_plans.go

-	row := []string{e.prettyIdentifier(p.ExplainID(), indent, isLastChild), count, TaskType, operatorInfo}
+	row := []string{e.prettyIdentifier(p.ExplainID(), indent, isLastChild), count, taskType, operatorInfo}
+	if e.Analyze {
+		runtimeStat := e.ctx.GetSessionVars().StmtCtx.RuntimeStats


s/runtimeStat/runtimeStats/

zz-jason · 2018-10-12T12:19:13Z

executor/executor.go

@@ -130,6 +134,7 @@ func newBaseExecutor(ctx sessionctx.Context, schema *expression.Schema, id strin
 		schema:       schema,
 		initCap:      ctx.GetSessionVars().MaxChunkSize,
 		maxChunkSize: ctx.GetSessionVars().MaxChunkSize,
+		runtimeStat:  ctx.GetSessionVars().StmtCtx.RuntimeStats.GetRuntimeStat(id),


e.runtimeStat will always be set, no matter whether it is in the explain analyze statement.

ctx.GetSessionVars().StmtCtx.RuntimeStats == nil so GetRuntimeStat(id) will quick return nil, so I only +1 a set nil, but look more uniform?

it's doesn't +1 memory set, because it's in struct initializer

zz-jason · 2018-10-12T12:21:35Z

executor/executor.go

@@ -75,6 +78,7 @@ type baseExecutor struct {
 	maxChunkSize  int
 	children      []Executor
 	retFieldTypes []*types.FieldType
+	runtimeStat   *execdetails.RuntimeStat


s/Stat/Stats/
stats is short for statistics in our codebase.

but runtimeStats is collection of runtimeStat

how about s/RuntimeStat/ExecutorStats/ and keep RuntimeStats a collection of the ExecutorStats?

it feels stranger 🤣

then how about s/RuntimeStats/RuntimeStatsColl/ and s/RuntimeStat/RuntimeStats/?

zz-jason · 2018-10-12T12:25:48Z

executor/builder.go

@@ -659,6 +659,33 @@ func (b *executorBuilder) buildTrace(v *plannercore.Trace) Executor {

 // buildExplain builds a explain executor. `e.rows` collects final result to `ExplainExec`.
 func (b *executorBuilder) buildExplain(v *plannercore.Explain) Executor {
+	if v.Analyze {


We should move this to the function body of ExplainExec.Next().

lysu · 2018-10-12T12:27:30Z

@zz-jason now issue another PR to fix this?

zz-jason · 2018-10-12T12:28:56Z

@lysu OK

zz-jason · 2018-10-12T12:32:53Z

util/execdetails/execdetails.go

+	if e == nil {
+		return ""
+	}
+	return fmt.Sprintf("time:%f, loops:%d, rows:%d", time.Duration(e.consume).Seconds()*1e3, e.loop, e.rows)


it's better to specify the time unit in the string message. for example: "time:%fms, loops:%d, rows:%d"

zz-jason · 2018-10-12T12:35:56Z

util/execdetails/execdetails.go

+	return runtimecStat
+}
+
+func (e RuntimeStats) String() string {


this method can be removed? BTW, the result is not stable, because the it is composed by a range operation on a map.

it's current no use and can be remove

zz-jason · 2018-10-12T12:42:22Z

executor/distsql.go

@@ -453,6 +458,7 @@ func (e *IndexLookUpExecutor) startIndexWorker(ctx context.Context, kvRanges []k
 func (e *IndexLookUpExecutor) startTableWorker(ctx context.Context, workCh <-chan *lookupTableTask) {
 	lookupConcurrencyLimit := e.ctx.GetSessionVars().IndexLookupConcurrency
 	e.tblWorkerWg.Add(lookupConcurrencyLimit)
+	e.baseExecutor.ctx.GetSessionVars().StmtCtx.RuntimeStats.GetRuntimeStat(e.id + "_tableReader")


I think this collected stats is never used by the ExplainExec

e.id + "_tableReader" question~~?

Because you only retrieve the runtime statistics of an operator by the operator id.

https://github.com/pingcap/tidb/pull/7827/files#diff-31b1cb91591c88f55f52073d912b1e82R455

oh, this is by design, GetRuntimeStat will pre-init e.id + "_tableReader" slot in RuntimeStatsColl, after this call will fork goroutine create executors, this pre-init prevent race condition for modify RuntimeStatsColl map, so RuntimeStatsColl live without mutex.

I think we don't need to collect the runtime statistics of the table worker of the IndexLookupExecutor, because it is never used and presented to the user in the ExplainExec.

ok, _tableReader can be removed~

zz-jason · 2018-10-12T13:11:55Z

another thing is, for index join, its inner child will be built again and again according to the outer join keys. we need to collect all the runtime statistics of the built inner child and aggregate them to only one result to present to the user.

lysu · 2018-10-12T13:23:52Z

yes~ if inner child is unqiue plan id (e.g. IndexLookUp_8), inner create multple times will share same RuntimeStats instance and sum up.

zz-jason · 2018-10-12T13:29:04Z

util/execdetails/execdetails.go

+}
+
+// GetRuntimeStat gets execStat for a executor.
+func (e RuntimeStats) GetRuntimeStat(planID string) *RuntimeStat {


maybe we need a mutex to prevent data race on the map, considered the parallel executed inner children of the index join.

we have no way to pre-init inner plainID's entry to make it lock free?

ok, GetRuntimeStat is low frequence operation...let's lock.

I think a lock is much more easier.

lysu added status/WIP proposal component/tools labels Oct 2, 2018

eurekaka added the status/all tests passed label Oct 8, 2018

zz-jason added sig/execution SIG execution and removed component/tools proposal labels Oct 8, 2018

eurekaka reviewed Oct 8, 2018

View reviewed changes

lysu removed the status/WIP label Oct 8, 2018

zz-jason reviewed Oct 11, 2018

View reviewed changes

crazycs520 reviewed Oct 11, 2018

View reviewed changes

lysu requested a review from jackysp October 12, 2018 10:52

jackysp reviewed Oct 12, 2018

View reviewed changes

executor/revoke.go Outdated

@@ -15,7 +15,6 @@ package executor

import (

"fmt"

Copy link

Member

jackysp Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this modification caused by gofmt?

jackysp reviewed Oct 12, 2018

View reviewed changes

executor/load_stats.go Outdated

@@ -15,7 +15,6 @@ package executor

import (

"encoding/json"

Copy link

Member

jackysp Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this modification caused by gofmt?

executor: collect executor time/loop/row count

57105b8

lysu added 7 commits October 12, 2018 19:16

executor: fix rebase question

28db228

address comment

30e0278

- rename - remove unuse code

rename some variable

2873ff1

fix defer bug

cd6e960

address comment

0de7f6e

address comment

a1b68b4

Merge branch 'master' into dev/track-exec

e3411ca

jackysp approved these changes Oct 12, 2018

View reviewed changes

jackysp merged commit d21f294 into pingcap:master Oct 12, 2018

zz-jason reviewed Oct 12, 2018

View reviewed changes

This was referenced Oct 12, 2018

executor: refine explain analyze #7888

Merged

*: make explain support explain anaylze (#7827)(#7888) #7925

Merged

ngaut pushed a commit that referenced this pull request Oct 18, 2018

*: make explain support explain anaylze (#7827)(#7888) (#7925)

52d5ee2

		@@ -49,6 +50,12 @@ type SimpleExec struct {

		// Next implements the Executor Next interface.
		func (e SimpleExec) Next(ctx context.Context, chk chunk.Chunk) (err error) {

		@@ -64,6 +64,12 @@ type ShowExec struct {

		// Next implements the Executor Next interface.
		func (e ShowExec) Next(ctx context.Context, chk chunk.Chunk) error {

		@@ -43,6 +43,12 @@ type SetExecutor struct {

		// Next implements the Executor Next interface.
		func (e SetExecutor) Next(ctx context.Context, chk chunk.Chunk) error {

*: make explain support explain anaylze #7827

*: make explain support explain anaylze #7827

Conversation

lysu commented Oct 2, 2018 • edited

What problem does this PR solve?

What is changed and how it works?

Remain question

Check List

lysu commented Oct 2, 2018

shenli commented Oct 2, 2018

lysu commented Oct 3, 2018 • edited

zz-jason commented Oct 8, 2018

eurekaka left a comment

Choose a reason for hiding this comment

winoros commented Oct 8, 2018

winoros commented Oct 8, 2018

eurekaka commented Oct 8, 2018

lysu commented Oct 8, 2018

zz-jason commented Oct 8, 2018

lysu commented Oct 8, 2018

winoros commented Oct 8, 2018

lysu commented Oct 9, 2018

lysu commented Oct 9, 2018

lysu commented Oct 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackysp left a comment • edited

Choose a reason for hiding this comment

zz-jason commented Oct 12, 2018

jackysp commented Oct 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lysu Oct 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lysu commented Oct 12, 2018

zz-jason commented Oct 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zz-jason commented Oct 12, 2018 • edited

lysu commented Oct 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

*: make `explain` support `explain anaylze` #7827

*: make `explain` support `explain anaylze` #7827

lysu commented Oct 2, 2018 •

edited

lysu commented Oct 3, 2018 •

edited

jackysp left a comment •

edited

lysu Oct 12, 2018 •

edited

zz-jason commented Oct 12, 2018 •

edited