Skip to content
Please note that GitHub no longer supports Internet Explorer.

We recommend upgrading to the latest Microsoft Edge, Google Chrome, or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: support tagging SQL statements in pprof results #14312

Merged
merged 4 commits into from Jan 2, 2020

Conversation

@lysu
Copy link
Member

lysu commented Jan 2, 2020

What problem does this PR solve?

Related PR: *: replace conf item pprof_sql_cpu with srv var tidb_pprof_sql_cpu

In TiDB, we often use Go profiler (pprof) to analyze which parts of the code consume most of the CPU resources. But in actual user scenarios, when we’re troubleshooting the 100% CPU usage issue, it would also be of great help to know which parts of the SQL statement to over-consume CPU.

After v1.9, Go supports adding profiler labels that allow users to tag key-value pairs in pprof results, so we think tagging SQL statements in pprof would help. (ref: https://rakyll.org/profiler-labels/)

This PR adds SQL labels to TiDB's Go profiler.

What is changed and how it works?

Profiler labels are not free. There can be some costs to call runtime set goroutine, so this PR adds a config item and combine it with http://0.0.0.0:10080/reload-config to hot reload it.

[performance]
pprof-sql-cpu = false

The config item’s default value is false. You can set it to true and run curl http://0.0.0.0:10080/reload-config to analyze which parts of the SQL statement consume most of the CPU time.

Then, run curl http://0.0.0.0:10080/debug/pprof/profile --output p or curl http://0.0.0.0:10080/debug/pprof/zip --output tidb_debug.zip to get pprof results.

In the pprof results, you can observe tags as below:

go tool pprof  p1                                        
File: ___go_build_github_com_pingcap_tidb_tidb_server
Build ID: b1542297fdc1541f774e2ab7c6cc62de595ee7f1
Type: cpu
Time: Jan 2, 2020 at 11:55am (CST)
Duration: 30s, Total samples = 300ms (    1%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) tags
 sql: Total 40.0ms
      20.0ms (50.00%): insert into t ( a ) values ( ? )
      20.0ms (50.00%): select * from t where a > ?

image

NOTE:

  1. You need to set pprof-sql-cpu back to false and curl http://0.0.0.0:10080/reload-config after checking the pprof result.

  2. TiDB will only tag the SQL statements that are executed after setting pprof-sql-cpu to true.

Check List

Tests

  • Manual test
set conf and run pprof

Code changes

  • add config item
  • add reload
  • setup labels

Side effects

  • n/a

Related changes

  • Need to cherry-pick to the release branch

Release note

  • support tag sql in pprof result.

This change is Reviewable

tidb-server/main.go Outdated Show resolved Hide resolved
lysu added 2 commits Jan 2, 2020
Copy link
Member

jackysp left a comment

LGTM

@jackysp jackysp added the status/LGT1 label Jan 2, 2020
server/conn.go Show resolved Hide resolved
@qw4990
qw4990 approved these changes Jan 2, 2020
Copy link
Contributor

qw4990 left a comment

LGTM

@sre-bot

This comment has been minimized.

Copy link

sre-bot commented Jan 2, 2020

/run-all-tests

@sre-bot sre-bot merged commit 5d76de9 into pingcap:master Jan 2, 2020
14 checks passed
14 checks passed
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/build_check_race Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev_2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/common-test job succeeded
Details
idc-jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-copr-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/mybatis-test job succeeded
Details
idc-jenkins-ci-tidb/sqllogic-test-1 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/sqllogic-test-2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/unit-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details
@sre-bot

This comment has been minimized.

Copy link

sre-bot commented Jan 2, 2020

cherry pick to release-3.0 failed

@lysu

This comment has been minimized.

Copy link
Member Author

lysu commented Jan 3, 2020

/run-cherry-picker

@lysu lysu deleted the lysu:tag-sql-in-pprof branch Jan 3, 2020
@sre-bot

This comment has been minimized.

Copy link

sre-bot commented Jan 3, 2020

cherry pick to release-3.0 failed

@lysu

This comment has been minimized.

Copy link
Member Author

lysu commented Jan 7, 2020

Benchmark Report when set pprof-sql-cpu = true

Run Sysbench Performance Test on VMs

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: b398ef4d2ead6ae98287d9489976a7dc8dba8e49
+++ tidb: 91144ca41fcd86801e85ca8be2d748f3b3885568
tikv: 1211ed161c093ab180e42879146dedd05c4c6fb5
pd: 4498b6f203588f556cb6f6ebcdcb55f0afff0efa
================================================================================
oltp_update_index:
    * QPS: 4246.60 ± 0.04% (std=1.15) delta: 0.06% (p=0.363)
    * Latency p50: 30.14 ± 0.03% (std=0.01) delta: -0.07%
    * Latency p99: 54.84 ± 3.64% (std=1.22) delta: -1.81%
            
oltp_insert:
    * QPS: 7567.92 ± 0.04% (std=2.21) delta: 0.47% (p=0.029)
    * Latency p50: 16.91 ± 0.04% (std=0.00) delta: -0.49%
    * Latency p99: 29.07 ± 4.09% (std=0.78) delta: -1.76%
            
oltp_read_write:
    * QPS: 15694.90 ± 0.25% (std=27.78) delta: -0.52% (p=0.072)
    * Latency p50: 163.45 ± 0.21% (std=0.25) delta: 0.55%
    * Latency p99: 326.02 ± 1.81% (std=4.79) delta: 0.90%
            
oltp_point_select:
    * QPS: 41161.17 ± 0.17% (std=53.14) delta: -3.90% (p=0.020)
    * Latency p50: 3.11 ± 0.21% (std=0.00) delta: 4.02%
    * Latency p99: 10.09 ± 0.00% (std=0.00) delta: 3.70%
            
oltp_update_non_index:
    * QPS: 4715.66 ± 0.83% (std=28.71) delta: -0.12% (p=0.341)
    * Latency p50: 27.14 ± 0.84% (std=0.17) delta: 0.14%
    * Latency p99: 43.92 ± 2.41% (std=0.75) delta: 2.54%
            

Originally posted by @sre-bot in #14362 (comment)

@sre-bot

This comment has been minimized.

Copy link

sre-bot commented Jan 7, 2020

@lysu No command or invalid command

1 similar comment
@sre-bot

This comment has been minimized.

Copy link

sre-bot commented Jan 7, 2020

@lysu No command or invalid command

@yikeke yikeke changed the title server: support tag sql in pprof result server: support tagging SQL statements in pprof results Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.