Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: print an expensive log when a query exceeds time threshold #10350

Merged
merged 31 commits into from Jun 5, 2019

Conversation

Projects
None yet
7 participants
@XuHuaiyu
Copy link
Contributor

commented May 5, 2019

What problem does this PR solve?

Print an expensive log when a query exceeds time threshold

What is changed and how it works?

  1. Keep the old check of the expensive query, but only use it to set the low priority of a query.
  2. Check the execution time for a query, and print an expensive log when the query exceeds the time threshold.
  3. Introduce a global/server level system variable tidb_expensive_query_time_threshold to set the time threshold.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  1. Set tidb_expensive_query_time_threshold to 10s.
  2. Run
tidb> select *, sleep(4) from t;
+------+----------+
| a    | sleep(4) |
+------+----------+
|    1 |        0 |
|    1 |        0 |
|    1 |        0 |
|    1 |        0 |
+------+----------+
4 rows in set (16.01 sec)

We can get:

[2019/05/05 16:24:48.003 +08:00] [WARN] [adapter.go:521] [expensive_query] [conn=1] [request_count=1] [num_cop_tasks=1] [process_avg_time=0s] [process_p90_time=0s] [process_max_time=0s] [process_max_addr=store1] [wait_avg_time=0s] [wait_p90_time=0s] [wait_max_time=0s] [wait_max_addr=store1] [stats=t:pseudo] [conn_id=1] [user=root@127.0.0.1] [database=test] [table_ids="[39]"] [txn_start_ts=408169920069107712] [mem_max=107] [sql="select *, sleep(4) from t"]

Code changes

  • Has exported function/method change
  • Has exported variable/fields change

Side effects

  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
Show resolved Hide resolved config/config.go Outdated

XuHuaiyu added some commits May 5, 2019

XuHuaiyu added some commits May 5, 2019

DefTiDBUseFastAnalyze = false
DefTiDBSkipIsolationLevelCheck = false
DefTiDBWaitTableSplitFinish = false
DefTiDBExpensiveQueryTimeThreshold = 60 // 30s

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler May 5, 2019

Member

Why 30s? Shouldn't it be 60s?

}
if ticker != nil {
defer ticker.Stop()
<-ticker.C

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler May 5, 2019

Member

If the ticker is already closed, will it still log the query?

This comment has been minimized.

Copy link
@XuHuaiyu

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler May 5, 2019

Member

https://play.golang.org/p/R7kMIiy5lyc, seems that this function will never quit in this case.

@lysu
Copy link
Member

left a comment

maybe we need a simple performance test result, it seems per request a tick will have some performance impact in old go version.https://blog.csdn.net/siddontang/article/details/18370541

// LogExpensiveQuery logs the queries which exceed the time threshold or memory threshold.
func LogExpensiveQuery(ctx context.Context, sctx sessionctx.Context, ticker *time.Ticker, p plannercore.Plan, sql string) {
level := log.GetLevel()
if level > zapcore.WarnLevel {

This comment has been minimized.

Copy link
@lysu

lysu May 5, 2019

Member

maybe we can run without the creation of ticker and watch goroutine when level > zapcore.WarnLevel

a.StartTime = time.Now()
a.ExpensiveQueryTicker = time.NewTicker(time.Second * time.Duration(a.Ctx.GetSessionVars().ExpensiveQueryTimeThreshold))

This comment has been minimized.

Copy link
@lysu

lysu May 5, 2019

Member

it doesn't repeat running, maybe time.AfterFunc is more suitable? but I do not confirm cc @tiancaiamao (context.Without use time.AfterFunc)

Show resolved Hide resolved executor/adapter.go Outdated
@XuHuaiyu

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

@lysu
I made a simple sysbench test on mock-tikv, the performance loss was apparent. I'll try to refine it.

sysbench --config-file=config --num-threads=128 oltp_read_only run

master branch

SQL statistics:
    queries performed:
        read:                            29638
        write:                           0
        other:                           4234
        total:                           33872
    transactions:                        2117   (197.37 per sec.)
    queries:                             33872  (3157.98 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          10.7237s
    total number of events:              2117

Latency (ms):
         min:                                  137.03
         avg:                                  630.70
         max:                                 1133.19
         95th percentile:                      787.74
         sum:                              1335187.60

Threads fairness:
    events (avg/stddev):           16.5391/0.56
    execution time (avg/stddev):   10.4312/0.22 

this commit

SQL statistics:
    queries performed:
        read:                            25102
        write:                           0
        other:                           3586
        total:                           28688
    transactions:                        1793   (165.25 per sec.)
    queries:                             28688  (2643.93 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          10.8478s
    total number of events:              1793

Latency (ms):
         min:                                   43.01
         avg:                                  742.07
         max:                                 1226.68
         95th percentile:                      926.33
         sum:                              1330539.57

Threads fairness:
    events (avg/stddev):           14.0078/0.52
    execution time (avg/stddev):   10.3948/0.29
@codecov

This comment has been minimized.

Copy link

commented May 8, 2019

Codecov Report

Merging #10350 into master will decrease coverage by 0.1316%.
The diff coverage is 33.6734%.

@@               Coverage Diff               @@
##             master    #10350        +/-   ##
===============================================
- Coverage   79.6706%   79.539%   -0.1316%     
===============================================
  Files           414       415         +1     
  Lines         87863     87943        +80     
===============================================
- Hits          70001     69949        -52     
- Misses        12680     12809       +129     
- Partials       5182      5185         +3

@XuHuaiyu XuHuaiyu requested a review from lysu May 8, 2019

Show resolved Hide resolved session/session.go Outdated
Show resolved Hide resolved util/memory/action.go Outdated
}
processInfo := sm.ShowProcessList()
for connID, info := range processInfo {
if len(info.Info) == 0 || info.Time.Equal(lastQueryStartTime[connID]) {

This comment has been minimized.

Copy link
@qw4990

qw4990 May 24, 2019

Contributor

Do we need to clear this map periodically?

@XuHuaiyu XuHuaiyu requested a review from qw4990 May 24, 2019

Show resolved Hide resolved util/expensivequery/expensivequery.go Outdated
lastQueryStartTime[connID] = info.Time
}
}
if newInterval := time.Second * time.Duration(sctx.GetSessionVars().ExpensiveQueryTimeThreshold); curInterval != newInterval {

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao May 24, 2019

Contributor

ExpensiveQueryTimeThreshold is a global session variable, how could changing a global session variable take effect on a existing session? (sctx is the first bootstrap session)

This comment has been minimized.

Copy link
@XuHuaiyu

XuHuaiyu May 27, 2019

Author Contributor

It's a global-level and server-level variable.
A server-level variable can take effect on an existing session.

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao May 28, 2019

Contributor

server-level variable need a global variable ...

Show resolved Hide resolved util/expensivequery/expensivequery.go Outdated

XuHuaiyu added some commits May 27, 2019

@XuHuaiyu XuHuaiyu requested a review from tiancaiamao May 27, 2019

@qw4990
Copy link
Contributor

left a comment

LGTM, please resolve conflicts

XuHuaiyu added some commits Jun 5, 2019

@tiancaiamao

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

LGTM

@tiancaiamao

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

/run-all-tests

@tiancaiamao tiancaiamao merged commit f67352d into pingcap:master Jun 5, 2019

16 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 33.6734% of diff hit (target 0%)
Details
codecov/project 79.539% (-0.1317%) compared to 54e4894
Details
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/build_check_race Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev_2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/common-test job succeeded
Details
idc-jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/mybatis-test job succeeded
Details
idc-jenkins-ci-tidb/sqllogic-test-1 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/sqllogic-test-2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/unit-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@XuHuaiyu XuHuaiyu deleted the XuHuaiyu:expensive_log_time branch Jun 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.