Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add rpcserver to get other tidb server info for diagnostics #13693

Merged
merged 23 commits into from Nov 26, 2019

Conversation

@crazycs520
Copy link
Member

crazycs520 commented Nov 22, 2019

What problem does this PR solve?

  • Add rpcserver to get other tidb server info, it's for diagnostics.
  • Add diagnostics grpc service.

eg

>select * from INFORMATION_SCHEMA.TIDB_CLUSTER_LOAD where value != "0" order by device;
+------+---------------+-------------+---------------+---------------------+---------------------+
| TYPE | ADDRESS       | DEVICE_TYPE | DEVICE_NAME   | KEY                 | VALUE               |
+------+---------------+-------------+---------------+---------------------+---------------------+
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | inodes-total        | 4.294967279e+09     |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | used                | 3.10417584128e+11   |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | free                | 1.88119605248e+11   |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | total               | 4.98799333376e+11   |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | inodes-free         | 4.292144101e+09     |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | used-percent        | 62.26568262972274   |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | fstype              | hfs                 |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | path                | /                   |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | inodes-used-percent | 0.06573223534912057 |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk0s3  | inodes-used         | 2.823178e+06        |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | used                | 8.3307286528e+10    |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | free                | 4.4500492288e+10    |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | inodes-free         | 4.3463652e+07       |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | fstype              | ntfs                |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | total               | 1.27807778816e+11   |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | path                | /Volumes/Untitled   |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | inodes-total        | 4.3623656e+07       |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | inodes-used-percent | 0.3667826465530537  |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | inodes-used         | 160004              |
| tidb | 0.0.0.0:10080 | disk        | /dev/disk1s3  | used-percent        | 65.18170278816467   |
| tidb | 0.0.0.0:10080 | net         | XHC0          | name                | XHC0                |
| tidb | 0.0.0.0:10080 | net         | XHC20         | name                | XHC20               |
| tidb | 0.0.0.0:10080 | net         | awdl0         | name                | awdl0               |
| tidb | 0.0.0.0:10080 | net         | awdl0         | bytes-sent          | 16264               |
| tidb | 0.0.0.0:10080 | net         | awdl0         | packets-sent        | 82                  |
| tidb | 0.0.0.0:10080 | cpu         | cpu           | load1               | 2.3154296875        |
| tidb | 0.0.0.0:10080 | cpu         | cpu           | load15              | 3.78076171875       |
| tidb | 0.0.0.0:10080 | cpu         | cpu           | load5               | 3.185546875         |
| tidb | 0.0.0.0:10080 | cpu         | cpu-0         | usage               | 18.181818181818183  |
| tidb | 0.0.0.0:10080 | cpu         | cpu-10        | usage               | 20                  |
| tidb | 0.0.0.0:10080 | cpu         | cpu-2         | usage               | 10                  |
| tidb | 0.0.0.0:10080 | cpu         | cpu-4         | usage               | 11.11111111111111   |
| tidb | 0.0.0.0:10080 | cpu         | cpu-6         | usage               | 10                  |
| tidb | 0.0.0.0:10080 | cpu         | cpu-8         | usage               | 10                  |
| tidb | 0.0.0.0:10080 | disk        | devfs         | used-percent        | 100                 |
| tidb | 0.0.0.0:10080 | disk        | devfs         | inodes-used-percent | 100                 |
| tidb | 0.0.0.0:10080 | disk        | devfs         | inodes-used         | 668                 |
| tidb | 0.0.0.0:10080 | disk        | devfs         | inodes-total        | 668                 |
| tidb | 0.0.0.0:10080 | disk        | devfs         | path                | /dev                |
| tidb | 0.0.0.0:10080 | disk        | devfs         | used                | 197632              |
| tidb | 0.0.0.0:10080 | disk        | devfs         | total               | 197632              |
| tidb | 0.0.0.0:10080 | disk        | devfs         | fstype              | devfs               |
| tidb | 0.0.0.0:10080 | disk        | disk0         | write-count         | 907824              |
| tidb | 0.0.0.0:10080 | disk        | disk0         | write-time          | 106176              |
| tidb | 0.0.0.0:10080 | disk        | disk0         | io-time             | 193230              |
| tidb | 0.0.0.0:10080 | disk        | disk0         | read-bytes          | 1.1506114048e+10    |
| tidb | 0.0.0.0:10080 | disk        | disk0         | read-time           | 87053               |
| tidb | 0.0.0.0:10080 | disk        | disk0         | serial-number       |                     |
| tidb | 0.0.0.0:10080 | disk        | disk0         | read-count          | 325149              |
| tidb | 0.0.0.0:10080 | disk        | disk0         | name                | disk0               |
| tidb | 0.0.0.0:10080 | disk        | disk0         | label               |                     |
| tidb | 0.0.0.0:10080 | disk        | disk0         | write-bytes         | 3.7249970688e+10    |
| tidb | 0.0.0.0:10080 | disk        | disk1         | read-time           | 64                  |
| tidb | 0.0.0.0:10080 | disk        | disk1         | io-time             | 64                  |
| tidb | 0.0.0.0:10080 | disk        | disk1         | read-bytes          | 4.211712e+06        |
| tidb | 0.0.0.0:10080 | disk        | disk1         | read-count          | 1057                |
| tidb | 0.0.0.0:10080 | disk        | disk1         | name                | disk1               |
| tidb | 0.0.0.0:10080 | disk        | disk1         | serial-number       |                     |
| tidb | 0.0.0.0:10080 | disk        | disk1         | label               |                     |
| tidb | 0.0.0.0:10080 | net         | en0           | name                | en0                 |
| tidb | 0.0.0.0:10080 | net         | en1           | packets-recv        | 1.578578e+06        |
| tidb | 0.0.0.0:10080 | net         | en1           | name                | en1                 |

.
.
.

What is changed and how it works?

  • Add grpc server, and listen on 10080 port, share port with http.
  • implement the diagnostics grpc service.
  • Add INFORMATION_SCHEMA.TIDB_CLUSTER_LOAD_INFO system table.
  • diagnostics repo will be used in TiDB and PD.

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Code changes

  • Has exported function/method change

Side effects

Related changes

Release note

  • Add RPC server to get other tidb server info.
@lonng lonng mentioned this pull request Nov 22, 2019
26 of 60 tasks complete
@codecov

This comment has been minimized.

Copy link

codecov bot commented Nov 22, 2019

Codecov Report

Merging #13693 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #13693   +/-   ##
===========================================
  Coverage   80.0622%   80.0622%           
===========================================
  Files           474        474           
  Lines        116573     116573           
===========================================
  Hits          93331      93331           
  Misses        15872      15872           
  Partials       7370       7370
@crazycs520 crazycs520 requested a review from djshow832 Nov 22, 2019
infoschema/tables.go Outdated Show resolved Hide resolved
infoschema/tables.go Outdated Show resolved Hide resolved
infoschema/tables.go Outdated Show resolved Hide resolved
}

func dataForClusterLoadInfo() ([][]types.Datum, error) {
serversInfo, err := infosync.GetAllServerInfo(context.Background())

This comment has been minimized.

Copy link
@lonng

lonng Nov 22, 2019

Member

Just retrieve load info from TiDB? It's better to retrieve all cluster component from TiDB_CLUSTER_INFO.

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Nov 22, 2019

Author Member

Address, but after PD/TiKV implement the diagnostics grpc service.

infoschema/tables.go Outdated Show resolved Hide resolved
crazycs520 added 3 commits Nov 22, 2019
@lonng

This comment has been minimized.

Copy link
Member

lonng commented Nov 22, 2019

I think all keys should be Snake case, e.g read_time instead of readTime (or Kebab case like read-time).
What do you think? @djshow832 @bb7133

infoschema/tables.go Outdated Show resolved Hide resolved
@crazycs520

This comment has been minimized.

Copy link
Member Author

crazycs520 commented Nov 25, 2019

@lonng Great, already change camel name to kebab name.

crazycs520 added 3 commits Nov 25, 2019
infoschema/tables.go Outdated Show resolved Hide resolved
infoschema/tables.go Outdated Show resolved Hide resolved
infoschema/tables.go Outdated Show resolved Hide resolved
"tikv,127.0.0.1:11080," + mockAddr,
}
fpExpr := `return("` + strings.Join(instances, ";") + `")`
c.Assert(failpoint.Enable("github.com/pingcap/tidb/infoschema/mockClusterInfo", fpExpr), IsNil)

This comment has been minimized.

Copy link
@lonng

lonng Nov 25, 2019

Member

I have some concerns about this failpoint, which may cause unit test unstable because the deferent test case shares the same failpoint. It's better to refactor the failpoint logic and use failpoint.InjectContext to avoid interleaving between deferent test suites.

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Nov 25, 2019

Author Member

Already use a SerialSuites to avoid this.

@@ -0,0 +1,41 @@
// Copyright 2019 PingCAP, Inc.

This comment has been minimized.

Copy link
@lonng

lonng Nov 25, 2019

Member

Prefer to change the rpcserveer subpackage to services/service subpackage. What do you think? @djshow832 @bb7133 @zimulala

This comment has been minimized.

Copy link
@djshow832

djshow832 Nov 25, 2019

Contributor

How about services/rpc_server?

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Nov 25, 2019

Author Member

Already move to services/rpcserver

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Nov 26, 2019

Author Member

Already move to server pkg, no need a new pkg, Thanks

This comment has been minimized.

Copy link
@lonng

lonng Nov 26, 2019

Member

And change the PR title.

rpcserver/rpc_server.go Outdated Show resolved Hide resolved
infoschema/tables_test.go Outdated Show resolved Hide resolved
infoschema/tables.go Outdated Show resolved Hide resolved
crazycs520 added 4 commits Nov 25, 2019
@lonng

This comment has been minimized.

Copy link
Member

lonng commented Nov 25, 2019

Seems the rpcserver subpackage just contains 40 loc, do we really need an individual package?

crazycs520 added 2 commits Nov 25, 2019
server/rpc_server.go Outdated Show resolved Hide resolved
crazycs520 added 3 commits Nov 26, 2019
@crazycs520

This comment has been minimized.

Copy link
Member Author

crazycs520 commented Nov 26, 2019

/run-all-tests

@crazycs520 crazycs520 changed the title rpcserver: add rpcserver to get other tidb server info for diagnostics server: add rpcserver to get other tidb server info for diagnostics Nov 26, 2019
@crazycs520

This comment has been minimized.

Copy link
Member Author

crazycs520 commented Nov 26, 2019

/run-all-tests

@djshow832

This comment has been minimized.

Copy link
Contributor

djshow832 commented Nov 26, 2019

LGTM

@lonng
lonng approved these changes Nov 26, 2019
Copy link
Member

lonng left a comment

LGTM

@lonng

This comment has been minimized.

Copy link
Member

lonng commented Nov 26, 2019

/merge

@lonng lonng merged commit 9fd3e92 into pingcap:master Nov 26, 2019
14 checks passed
14 checks passed
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/build_check_race Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev_2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/common-test job succeeded
Details
idc-jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-copr-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/mybatis-test job succeeded
Details
idc-jenkins-ci-tidb/sqllogic-test-1 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/sqllogic-test-2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/unit-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.