From b6a74d2a39f9be6517d0fa1aba00f3d5acf1174b Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 14:51:35 +0800 Subject: [PATCH 1/9] Update system-table-cluster-log.md --- system-tables/system-table-cluster-log.md | 36 +++++++++++------------ 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/system-tables/system-table-cluster-log.md b/system-tables/system-table-cluster-log.md index 0a01809ae1a79..91459de68b817 100644 --- a/system-tables/system-table-cluster-log.md +++ b/system-tables/system-table-cluster-log.md @@ -40,35 +40,33 @@ Field description: > **Note:** > -> + All fields of the cluster log table are pushed down to the corresponding instance for execution. So to reduce the overhead of using the cluster log table, specify as many conditions as possible. For example, the `select * from cluter_log where instance='tikv-1'` statement only executes the log search on the `tikv-1` instance. +> + All fields of the cluster log table are pushed down to the corresponding instance for execution. To reduce the overhead of using the cluster log table, you must specify the keywords used for the search, the time range, and as many conditions as possible. For example, `select * from cluster_log where message like '%ddl%' and time > '2020-05-18 20:40:00' and time<'2020-05-18 21:40:00' and type='tidb'`. > -> + The `message` field supports the `like` and `regexp` regular expressions, and the corresponding pattern is encoded as `regexp`. Specifying multiple `message` conditions is equivalent to the `pipeline` form of the `grep` command. For example, executing the `select * from cluster_log where message like 'coprocessor%' and message regexp '.*slow.*'` statement is equivalent to executing `grep 'coprocessor' xxx.log | grep -E '.*slow.*'` on all cluster instances. +> + The `message` field supports the `like` and `regexp` regular expressions, and the corresponding pattern is encoded as `regexp`. Specifying multiple `message` conditions is equivalent to the `pipeline` form of the `grep` command. For example, executing the `select * from cluster_log where message like 'coprocessor%' and message regexp '.*slow.*' and time > '2020-05-18 20:40:00' and time<'2020-05-18 21:40:00'` statement is equivalent to executing `grep 'coprocessor' xxx.log | grep -E '.*slow.*'` on all cluster instances. The following example shows how to query the execution process of a DDL statement using the `CLUSTER_LOG` table: {{< copyable "sql" >}} ```sql -select * from information_schema.cluster_log where message like '%ddl%' and message like '%job%58%' and type='tidb' and time > '2020-03-27 15:39:00'; +select time,instance,left(message,150) from information_schema.cluster_log where message like '%ddl%job%ID.80%' and type='tidb' and time > '2020-05-18 20:40:00' and time<'2020-05-18 21:40:00' ``` ```sql -+-------------------------+------+------------------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| TIME | TYPE | INSTANCE | LEVEL | MESSAGE | -+-------------------------+------+------------------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4008 | INFO | [ddl_worker.go:253] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0; "] | -| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4008 | INFO | [ddl.go:457] ["[ddl] start DDL job"] [job="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0"] [query="create table t3 (a int, b int,c int)"] | -| 2020/03/27 15:39:36.879 | tidb | 172.16.5.40:4009 | INFO | [ddl_worker.go:554] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:0, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0"] | -| 2020/03/27 15:39:36.936 | tidb | 172.16.5.40:4009 | INFO | [ddl_worker.go:739] ["[ddl] wait latest schema version changed"] [worker="worker 1, tp general"] [ver=35] ["take time"=52.165811ms] [job="ID:58, Type:create table, State:done, SchemaState:public, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0"] | -| 2020/03/27 15:39:36.938 | tidb | 172.16.5.40:4009 | INFO | [ddl_worker.go:359] ["[ddl] finish DDL job"] [worker="worker 1, tp general"] [job="ID:58, Type:create table, State:synced, SchemaState:public, SchemaID:1, TableID:57, RowCount:0, ArgLen:0, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0"] | -| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4009 | INFO | [ddl_worker.go:253] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0; "] | -| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4009 | INFO | [ddl.go:457] ["[ddl] start DDL job"] [job="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0"] [query="create table t3 (a int, b int,c int)"] | -| 2020/03/27 15:39:37.141 | tidb | 172.16.5.40:4008 | INFO | [ddl.go:489] ["[ddl] DDL job is finished"] [jobID=58] | -+-------------------------+------+------------------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++-------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ +| time | instance | left(message,150) | ++-------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 2020/05/18 21:37:54.784 | 127.0.0.1:4002 | [ddl_worker.go:261] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:80, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:79, Ro | +| 2020/05/18 21:37:54.784 | 127.0.0.1:4002 | [ddl.go:477] ["[ddl] start DDL job"] [job="ID:80, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:79, RowCount:0, ArgLen:1, start | +| 2020/05/18 21:37:55.327 | 127.0.0.1:4000 | [ddl_worker.go:568] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:80, Type:create table, State:none, SchemaState:none, SchemaID:1, Ta | +| 2020/05/18 21:37:55.381 | 127.0.0.1:4000 | [ddl_worker.go:763] ["[ddl] wait latest schema version changed"] [worker="worker 1, tp general"] [ver=70] ["take time"=50.809848ms] [job="ID:80, Type: | +| 2020/05/18 21:37:55.382 | 127.0.0.1:4000 | [ddl_worker.go:359] ["[ddl] finish DDL job"] [worker="worker 1, tp general"] [job="ID:80, Type:create table, State:synced, SchemaState:public, SchemaI | +| 2020/05/18 21:37:55.786 | 127.0.0.1:4002 | [ddl.go:509] ["[ddl] DDL job is finished"] [jobID=80] | ++-------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` -The above query results show the following process: +The above query results show the process of executing a DDL: -1. The request with a DDL JOB ID of `58` is sent to the `172.16.5.40: 4008` TiDB instance. -2. The `172.16.5.40: 4009` TiDB instance processes this DDL request, which indicates that the `172.16.5.40: 4009` instance is the DDL owner at that time. -3. The request with a DDL JOB ID of `58` has been processed. +1. The request with a DDL JOB ID of `80` is sent to the `127.0.0.1:4002` TiDB instance. +2. The `127.0.0.1:4000` TiDB instance processes this DDL request, which indicates that the `127.0.0.1:4000` instance is the DDL owner at that time. +3. The request with a DDL JOB ID of `80` has been processed. From 25cf7dc6ab9d2772709e77d3370ba0afd573cf42 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 15:39:00 +0800 Subject: [PATCH 2/9] Update system-table-inspection-result.md --- system-tables/system-table-inspection-result.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/system-tables/system-table-inspection-result.md b/system-tables/system-table-inspection-result.md index 1d1e2dd089b01..276503af8fe82 100644 --- a/system-tables/system-table-inspection-result.md +++ b/system-tables/system-table-inspection-result.md @@ -39,17 +39,17 @@ desc information_schema.inspection_result; Field description: * `RULE`: The name of the diagnosis rule. Currently, the following rules are available: - * `config`: The consistency check of configuration. If the same configuration is inconsistent on different instances, a `warning` diagnosis result is generated. + * `config`: Checks the consistency and the rationality of configuration. If the same configuration is inconsistent on different instances, a `warning` diagnosis result is generated. * `version`: The consistency check of version. If the same version is inconsistent on different instances, a `warning` diagnosis result is generated. - * `node-load`: If the current system load is too high, the corresponding `warning` diagnosis result is generated. + * `node-load`: Checks the server load. If the current system load is too high, the corresponding `warning` diagnosis result is generated. * `critical-error`: Each module of the system defines critical errors. If a critical error exceeds the threshold within the corresponding time period, a warning diagnosis result is generated. - * `threshold-check`: The diagnosis system checks the thresholds of a large number of metrics. If a threshold is exceeded, the corresponding diagnosis information is generated. + * `threshold-check`: The diagnosis system checks the thresholds of key metrics. If a threshold is exceeded, the corresponding diagnosis information is generated. * `ITEM`: Each rule diagnoses different items. This field indicates the specific diagnosis items corresponding to each rule. * `TYPE`: The instance type of the diagnosis. The optional values are `tidb`, `pd`, and `tikv`. * `INSTANCE`: The specific address of the diagnosed instance. * `STATUS_ADDRESS`: The HTTP API service address of the instance. * `VALUE`: The value of a specific diagnosis item. -* `REFERENCE`: The reference value (threshold value) for this diagnosis item. If the difference between `VALUE` and the threshold is very large, the corresponding diagnosis information is generated. +* `REFERENCE`: The reference value (threshold value) for this diagnosis item. If `VALUE` exceeds the threshold, the corresponding diagnosis information is generated. * `SEVERITY`: The severity level. The optional values are `warning` and `critical`. * `DETAILS`: Diagnosis details, which might also contain SQL statement(s) or document links for further diagnosis. @@ -160,7 +160,7 @@ select * from information_schema.inspection_result where rule='critical-error'; ## Diagnosis rules -The diagnosis module contains a series of rules. These rules compare the results with the preset thresholds after querying the existing monitoring tables and cluster information tables. If the results exceed the thresholds or fall below the thresholds, the result of `warning` or `critical` is generated and the corresponding information is provided in the `details` column. +The diagnosis module contains a series of rules. These rules compare the results with the thresholds after querying the existing monitoring tables and cluster information tables. If the results exceed the thresholds, the diagnosis of `warning` or `critical` is generated and the corresponding information is provided in the `details` column. You can query the existing diagnosis rules by querying the `inspection_rules` system table: @@ -274,9 +274,9 @@ The `threshold-check` diagnosis rule checks whether the following metrics in the | Component | Monitoring metric | Monitoring table | Expected value | Description | | :---- | :---- | :---- | :---- | :---- | -| TiDB | tso-duration | pd_tso_wait_duration | < 50ms | The time it takes to get the transaction TSO timestamp. | +| TiDB | tso-duration | pd_tso_wait_duration | < 50ms | The wait duration of getting the TSO of transaction. | | TiDB | get-token-duration | tidb_get_token_duration | < 1ms | Queries the time it takes to get the token. The related TiDB configuration item is [`token-limit`](/command-line-flags-for-tidb-configuration.md#token-limit). | -| TiDB | load-schema-duration | tidb_load_schema_duration | < 1s | The time it takes for TiDB to update and load the schema metadata.| +| TiDB | load-schema-duration | tidb_load_schema_duration | < 1s | The time it takes for TiDB to update the schema metadata.| | TiKV | scheduler-cmd-duration | tikv_scheduler_command_duration | < 0.1s | The time it takes for TiKV to execute the KV `cmd` request. | | TiKV | handle-snapshot-duration | tikv_handle_snapshot_duration | < 30s | The time it takes for TiKV to handle the snapshot. | | TiKV | storage-write-duration | tikv_storage_async_request_duration | < 0.1s | The write latency of TiKV. | From e652906f2d77ee47d0020d61339a54a9f146fc51 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 16:04:11 +0800 Subject: [PATCH 3/9] Update system-table-inspection-summary.md --- system-tables/system-table-inspection-summary.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/system-tables/system-table-inspection-summary.md b/system-tables/system-table-inspection-summary.md index 3d45131827ec3..49d9cc0ac1cf6 100644 --- a/system-tables/system-table-inspection-summary.md +++ b/system-tables/system-table-inspection-summary.md @@ -7,7 +7,7 @@ aliases: ['/docs/dev/reference/system-databases/inspection-summary/'] # INSPECTION_SUMMARY -In some scenarios, you might pay attention only to the monitoring summary of specific links or modules. For example, the number of threads for Coprocessor in the thread pool is configured as 8. If the CPU usage of Coprocessor reaches 750%, you can determine that a risk exists and Coprocessor might become a bottleneck in advance. However, some monitoring metrics vary greatly due to different user workloads, so it is difficult to define specific thresholds. It is important to troubleshoot issues in this scenario, so TiDB provides the `inspection_summary` table for link summary. +In some scenarios, you might need to pay attention only to the monitoring summary of specific links or modules. For example, the number of threads for Coprocessor in the thread pool is configured as 8. If the CPU usage of Coprocessor reaches 750%, you can determine that a risk exists and Coprocessor might become a bottleneck in advance. However, some monitoring metrics vary greatly due to different user workloads, so it is difficult to define specific thresholds. It is important to troubleshoot issues in this scenario, so TiDB provides the `inspection_summary` table for link summary. The structure of the `information_schema.inspection_summary` inspection summary table is as follows: @@ -48,9 +48,12 @@ Field description: Usage example: -Both the diagnosis result table and the diagnosis monitoring summary table can specify the diagnosis time range using `hint`. `select **+ time_range('2020-03-07 12:00:00','2020-03-07 13:00:00') */* from inspection_summary` is the monitoring summary for the `2020-03-07 12:00:00` to `2020-03-07 13:00:00` period. Like the monitoring summary table, you can use the diagnosis result table to quickly find the monitoring items with large differences by comparing the data of two different periods. +Both the diagnosis result table and the diagnosis monitoring summary table can specify the diagnosis time range using `hint`. `select /*+ time_range('2020-03-07 12:00:00','2020-03-07 13:00:00') */* from inspection_summary` is the monitoring summary for the `2020-03-07 12:00:00` to `2020-03-07 13:00:00` period. Like the monitoring summary table, you can use the `inspection_summary` table to quickly find the monitoring items with large differences by comparing the data of two different periods. -See the following example that diagnoses issues within a specified range, from "2020-01-16 16:00:54.933" to "2020-01-16 16:10:54.933": +The following is an example that illustrates how to read the monitoring items of the system link by comparing two periods: + +* `(2020-01-16 16:00:54.933, 2020-01-16 16:10:54.933)` +* `(2020-01-16 16:10:54.933, 2020-01-16 16:20:54.933)` {{< copyable "sql" >}} @@ -68,7 +71,7 @@ FROM JOIN ( SELECT - /*+ time_range("2020-01-16 16:10:54.933","2020-01-16 16:20:54.933")*/ * + /*+ time_range("2020-01-16 16:10:54.933", "2020-01-16 16:20:54.933")*/ * FROM information_schema.inspection_summary WHERE rule='read-link' ) t2 ON t1.metrics_name = t2.metrics_name From d90e93d30120cf5194165b1350878b672bc959ef Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 16:29:18 +0800 Subject: [PATCH 4/9] Update system-table-metrics-schema.md --- system-tables/system-table-metrics-schema.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/system-tables/system-table-metrics-schema.md b/system-tables/system-table-metrics-schema.md index c244bba235ffe..255466acc66ea 100644 --- a/system-tables/system-table-metrics-schema.md +++ b/system-tables/system-table-metrics-schema.md @@ -7,13 +7,13 @@ aliases: ['/docs/dev/reference/system-databases/metrics-schema/'] # Metrics Schema -To dynamically observe and compare cluster conditions of different time ranges, the SQL diagnosis system introduces cluster monitoring system tables. All monitoring tables are in the metrics schema, and you can query the monitoring information using SQL statements in this schema. The data of the three monitoring-related summary tables ([`metrics_summary`](/system-tables/system-table-metrics-summary.md), [`metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md), and `inspection_result`) are all obtained by querying the monitoring tables in the metrics schema. Currently, many system tables are added, so you can query the information of these tables using the [`information_schema.metrics_tables`](/system-tables/system-table-metrics-tables.md) table. +To dynamically observe and compare cluster conditions of different time ranges, the SQL diagnosis system introduces cluster monitoring system tables. All monitoring tables are in the `metrics_schema` database. You can query the monitoring information using SQL statements in this schema. The data of the three monitoring-related summary tables ([`metrics_summary`](/system-tables/system-table-metrics-summary.md), [`metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md), and `inspection_result`) are all obtained by querying the monitoring tables in the metrics schema. Currently, many system tables are added, so you can query the information of these tables using the [`information_schema.metrics_tables`](/system-tables/system-table-metrics-tables.md) table. ## Overview -The following example uses the `tidb_query_duration` table to introduce the usage and working principles of the monitoring table. The working principles of other monitoring tables are similar. +To illustrate how to use the monitoring table and how it works, take the `tidb_query_duration` monitoring table in `metrics_schema` as an example. The principles of other monitoring tables are similar to `tidb_query_duration`. -Query the information related to the `tidb_query_duration` table on `information_schema.metrics_tables`: +To query the information related to the `tidb_query_duration` table on `information_schema.metrics_tables`, execute the following command: {{< copyable "sql" >}} @@ -31,8 +31,8 @@ select * from information_schema.metrics_tables where table_name='tidb_query_dur Field description: -* `TABLE_NAME`: Corresponds to the table name in the metrics schema. In this example, the table name is `tidb_query_duration`. -* `PROMQL`: The working principle of the monitoring table is to map SQL statements to `PromQL` and convert Prometheus results into SQL query results. This field is the expression template of `PromQL`. When getting the data of the monitoring table, the query conditions are used to rewrite the variables in this template to generate the final query expression. +* `TABLE_NAME`: Corresponds to the table name in `metrics_schema` . In this example, the table name is `tidb_query_duration`. +* `PROMQL`: The working principle of the monitoring table is to first map SQL statements to `PromQL`, then request data from Prometheus, and convert Prometheus results into SQL query results. This field is the expression template of `PromQL`. When you query the data of the monitoring table, the query conditions are used to rewrite the variables in this template to generate the final query expression. * `LABELS`: The label for the monitoring item. `tidb_query_duration` has two labels: `instance` and `sql_type`. * `QUANTILE`: The percentile. For monitoring data of the histogram type, a default percentile is specified. If the value of this field is `0`, it means that the monitoring item corresponding to the monitoring table is not a histogram. * `COMMENT`: Explanations for the monitoring table. You can see that the `tidb_query_duration` table is used to query the percentile time of the TiDB query execution, such as the query time of P999/P99/P90. The unit is second. @@ -112,7 +112,7 @@ desc select * from metrics_schema.tidb_query_duration where value is not null an From the result above, you can see that `PromQL`, `start_time`, `end_time`, and `step` are in the execution plan. During the execution process, TiDB calls the `query_range` HTTP API of Prometheus to query the monitoring data. -You might find that in the range of [`2020-03-25 23:40:00`, `2020-03-25 23:42:00`], each label only has three time values. In the execution plan, the value of `step` is 1 minute, which is determined by the following two variables: +You might find that in the range of [`2020-03-25 23:40:00`, `2020-03-25 23:42:00`], each label only has three time values. In the execution plan, the value of `step` is 1 minute, which means that the interval of these three values is 1 minute. `step` is determined by the following two session variables: * `tidb_metric_query_step`: The query resolution step width. To get the `query_range` data from Prometheus, you need to specify `start_time`, `end_time`, and `step`. `step` uses the value of this variable. * `tidb_metric_query_range_duration`: When the monitoring data is queried, the value of the `$ RANGE_DURATION` field in `PROMQL` is replaced with the value of this variable. The default value is 60 seconds. From e6cbea1b3c9efa7788c057b752f54f7d58b0a57c Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 16:32:03 +0800 Subject: [PATCH 5/9] Update system-table-metrics-tables.md --- system-tables/system-table-metrics-tables.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/system-tables/system-table-metrics-tables.md b/system-tables/system-table-metrics-tables.md index fcb8e302d7a54..3857748e2436e 100644 --- a/system-tables/system-table-metrics-tables.md +++ b/system-tables/system-table-metrics-tables.md @@ -7,7 +7,7 @@ aliases: ['/docs/dev/reference/system-databases/metrics-tables/'] # METRICS_TABLES -The `METRICS_TABLES` table provides information of all monitoring tables in the [metrics schema](/system-tables/system-table-metrics-schema.md). +The `INFORMATION_SCHEMA.METRICS_TABLES` table provides information of all monitoring tables in the [metrics_schema](/system-tables/system-table-metrics-schema.md) database. {{< copyable "sql" >}} @@ -30,7 +30,7 @@ desc information_schema.metrics_tables; Field description: * `TABLE_NAME`: Corresponds to the table name in `metrics_schema`. -* `PROMQL`: The working principle of the monitoring table is to map SQL statements to `PromQL` and convert Prometheus results into SQL query results. This field is the expression template of `PromQL`. When getting the data of the monitoring table, the query conditions are used to rewrite the variables in this template to generate the final query expression. +* `PROMQL`: The working principle of the monitoring table is to map SQL statements to `PromQL` and convert Prometheus results into SQL query results. This field is the expression template of `PromQL`. When you query the data of the monitoring table, the query conditions are used to rewrite the variables in this template to generate the final query expression. * `LABELS`: The label for the monitoring item. Each label corresponds to a column in the monitoring table. If the SQL statement contains the filter of the corresponding column, the corresponding `PromQL` changes accordingly. * `QUANTILE`: The percentile. For monitoring data of the histogram type, a default percentile is specified. If the value of this field is `0`, it means that the monitoring item corresponding to the monitoring table is not a histogram. * `COMMENT`: The comment about the monitoring table. From b3f7718bbdd678254c231fc66d875a1a2138830e Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 16:49:40 +0800 Subject: [PATCH 6/9] Update system-table-sql-diagnosis.md --- system-tables/system-table-sql-diagnosis.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/system-tables/system-table-sql-diagnosis.md b/system-tables/system-table-sql-diagnosis.md index 2e3e4bb605813..6f070e4a1b1cf 100644 --- a/system-tables/system-table-sql-diagnosis.md +++ b/system-tables/system-table-sql-diagnosis.md @@ -24,7 +24,7 @@ The SQL diagnosis system consists of three major parts: + **Cluster monitoring table**: The SQL diagnosis system introduces cluster monitoring tables. All of these tables are in `metrics_schema`, and you can query monitoring information using SQL statements. Compared to the visualized monitoring before v4.0, you can use this SQL-based method to perform correlated queries on all the monitoring information of the entire cluster, and compare the results of different time periods to quickly identify performance bottlenecks. Because the TiDB cluster has many monitoring metrics, the SQL diagnosis system also provides monitoring summary tables, so you can find abnormal monitoring items more easily. -+ **Automatic diagnosis**: Although you can manually execute SQL statements to query cluster information tables, cluster monitoring tables, and summary tables, the automatic diagnosis is much easier. The SQL diagnosis system performs automatic diagnosis based on the existing cluster information tables and monitoring tables, and provides relevant diagnosis result tables and diagnosis summary tables. ++ **Automatic diagnosis**: Although you can manually execute SQL statements to query cluster information tables, cluster monitoring tables, and summary tables to locate issues, the automatic diagnosis allows you to quickly locate common issues. The SQL diagnosis system performs automatic diagnosis based on the existing cluster information tables and monitoring tables, and provides relevant diagnosis result tables and diagnosis summary tables. ## Cluster information tables @@ -48,11 +48,11 @@ To dynamically observe and compare cluster conditions in different time periods, Because the TiDB cluster has many monitoring metrics, TiDB provides the following monitoring summary tables in v4.0: + The monitoring summary table [`information_schema.metrics_summary`](/system-tables/system-table-metrics-summary.md) summarizes all monitoring data to for you to check each monitoring metric with higher efficiency. -+ The monitoring summary table [`information_schema.metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md)) also summarizes all monitoring data, but this table performs differentiated statistics according to different labels. ++ [`information_schema.metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md)) also summarizes all monitoring data. Particularly, this table aggregates statistics using different labels of each monitoring item. ## Automatic diagnosis -On the above cluster information tables and cluster monitoring tables, you need to manually execute SQL statements of a certain mode to troubleshoot the cluster. To improve user experience, TiDB provides diagnosis-related system tables based on the existing basic information tables, so that the diagnosis is automatically executed. The following are the system tables related to the automatic diagnosis: +On the above cluster information tables and cluster monitoring tables, you need to manually execute SQL statements to troubleshoot the cluster. TiDB v4.0 supports the automatic diagnosis. You can use diagnosis-related system tables based on the existing basic information tables, so that the diagnosis is automatically executed. The following are the system tables related to the automatic diagnosis: + The diagnosis result table [`information_schema.inspection_result`](/system-tables/system-table-inspection-result.md) displays the diagnosis result of the system. The diagnosis is passively triggered. Executing `select * from inspection_result` triggers all diagnostic rules to diagnose the system, and the faults or risks in the system are displayed in the results. + The diagnosis summary table [`information_schema.inspection_summary`](/system-tables/system-table-inspection-summary.md) summarizes the monitoring information of a specific link or module. You can troubleshoot and locate problems based on the context of the entire module or link. From 76368da2d2ff25893e9e9b23549f5bc898f04656 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 19:57:26 +0800 Subject: [PATCH 7/9] update wording --- system-tables/system-table-metrics-schema.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/system-tables/system-table-metrics-schema.md b/system-tables/system-table-metrics-schema.md index 255466acc66ea..986d8635c0784 100644 --- a/system-tables/system-table-metrics-schema.md +++ b/system-tables/system-table-metrics-schema.md @@ -13,7 +13,7 @@ To dynamically observe and compare cluster conditions of different time ranges, To illustrate how to use the monitoring table and how it works, take the `tidb_query_duration` monitoring table in `metrics_schema` as an example. The principles of other monitoring tables are similar to `tidb_query_duration`. -To query the information related to the `tidb_query_duration` table on `information_schema.metrics_tables`, execute the following command: +Query the information related to the `tidb_query_duration` table on `information_schema.metrics_tables`: {{< copyable "sql" >}} @@ -112,7 +112,7 @@ desc select * from metrics_schema.tidb_query_duration where value is not null an From the result above, you can see that `PromQL`, `start_time`, `end_time`, and `step` are in the execution plan. During the execution process, TiDB calls the `query_range` HTTP API of Prometheus to query the monitoring data. -You might find that in the range of [`2020-03-25 23:40:00`, `2020-03-25 23:42:00`], each label only has three time values. In the execution plan, the value of `step` is 1 minute, which means that the interval of these three values is 1 minute. `step` is determined by the following two session variables: +You might find that in the range of [`2020-03-25 23:40:00`, `2020-03-25 23:42:00`], each label only has three time values. In the execution plan, the value of `step` is 1 minute, which means that the interval of these values is 1 minute. `step` is determined by the following two session variables: * `tidb_metric_query_step`: The query resolution step width. To get the `query_range` data from Prometheus, you need to specify `start_time`, `end_time`, and `step`. `step` uses the value of this variable. * `tidb_metric_query_range_duration`: When the monitoring data is queried, the value of the `$ RANGE_DURATION` field in `PROMQL` is replaced with the value of this variable. The default value is 60 seconds. From 75e1b2f1070f868c83737a6a1ed7e5ed3e735e52 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 21:59:56 +0800 Subject: [PATCH 8/9] Update system-tables/system-table-cluster-log.md Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- system-tables/system-table-cluster-log.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system-tables/system-table-cluster-log.md b/system-tables/system-table-cluster-log.md index 91459de68b817..8062852034e6c 100644 --- a/system-tables/system-table-cluster-log.md +++ b/system-tables/system-table-cluster-log.md @@ -65,7 +65,7 @@ select time,instance,left(message,150) from information_schema.cluster_log where +-------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` -The above query results show the process of executing a DDL: +The query results above show the process of executing a DDL statement: 1. The request with a DDL JOB ID of `80` is sent to the `127.0.0.1:4002` TiDB instance. 2. The `127.0.0.1:4000` TiDB instance processes this DDL request, which indicates that the `127.0.0.1:4000` instance is the DDL owner at that time. From 9f6208bb50a97e2dac74ad2d67a463c47ad75a60 Mon Sep 17 00:00:00 2001 From: toutdesuite Date: Mon, 15 Jun 2020 22:15:05 +0800 Subject: [PATCH 9/9] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- system-tables/system-table-inspection-result.md | 2 +- system-tables/system-table-inspection-summary.md | 2 +- system-tables/system-table-metrics-schema.md | 4 ++-- system-tables/system-table-sql-diagnosis.md | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/system-tables/system-table-inspection-result.md b/system-tables/system-table-inspection-result.md index 276503af8fe82..14a646986393a 100644 --- a/system-tables/system-table-inspection-result.md +++ b/system-tables/system-table-inspection-result.md @@ -39,7 +39,7 @@ desc information_schema.inspection_result; Field description: * `RULE`: The name of the diagnosis rule. Currently, the following rules are available: - * `config`: Checks the consistency and the rationality of configuration. If the same configuration is inconsistent on different instances, a `warning` diagnosis result is generated. + * `config`: Checks whether the configuration is consistent and proper. If the same configuration is inconsistent on different instances, a `warning` diagnosis result is generated. * `version`: The consistency check of version. If the same version is inconsistent on different instances, a `warning` diagnosis result is generated. * `node-load`: Checks the server load. If the current system load is too high, the corresponding `warning` diagnosis result is generated. * `critical-error`: Each module of the system defines critical errors. If a critical error exceeds the threshold within the corresponding time period, a warning diagnosis result is generated. diff --git a/system-tables/system-table-inspection-summary.md b/system-tables/system-table-inspection-summary.md index 49d9cc0ac1cf6..416a37f1b6596 100644 --- a/system-tables/system-table-inspection-summary.md +++ b/system-tables/system-table-inspection-summary.md @@ -50,7 +50,7 @@ Usage example: Both the diagnosis result table and the diagnosis monitoring summary table can specify the diagnosis time range using `hint`. `select /*+ time_range('2020-03-07 12:00:00','2020-03-07 13:00:00') */* from inspection_summary` is the monitoring summary for the `2020-03-07 12:00:00` to `2020-03-07 13:00:00` period. Like the monitoring summary table, you can use the `inspection_summary` table to quickly find the monitoring items with large differences by comparing the data of two different periods. -The following is an example that illustrates how to read the monitoring items of the system link by comparing two periods: +The following example compares the monitoring metrics of read links in two time periods: * `(2020-01-16 16:00:54.933, 2020-01-16 16:10:54.933)` * `(2020-01-16 16:10:54.933, 2020-01-16 16:20:54.933)` diff --git a/system-tables/system-table-metrics-schema.md b/system-tables/system-table-metrics-schema.md index 986d8635c0784..c1dd0aba925e9 100644 --- a/system-tables/system-table-metrics-schema.md +++ b/system-tables/system-table-metrics-schema.md @@ -11,7 +11,7 @@ To dynamically observe and compare cluster conditions of different time ranges, ## Overview -To illustrate how to use the monitoring table and how it works, take the `tidb_query_duration` monitoring table in `metrics_schema` as an example. The principles of other monitoring tables are similar to `tidb_query_duration`. +Taking the `tidb_query_duration` monitoring table in `metrics_schema` as an example, this section illustrates how to use this monitoring table and how it works. The working principles of other monitoring tables are similar to `tidb_query_duration`. Query the information related to the `tidb_query_duration` table on `information_schema.metrics_tables`: @@ -32,7 +32,7 @@ select * from information_schema.metrics_tables where table_name='tidb_query_dur Field description: * `TABLE_NAME`: Corresponds to the table name in `metrics_schema` . In this example, the table name is `tidb_query_duration`. -* `PROMQL`: The working principle of the monitoring table is to first map SQL statements to `PromQL`, then request data from Prometheus, and convert Prometheus results into SQL query results. This field is the expression template of `PromQL`. When you query the data of the monitoring table, the query conditions are used to rewrite the variables in this template to generate the final query expression. +* `PROMQL`: The working principle of the monitoring table is to first map SQL statements to `PromQL`, then to request data from Prometheus, and to convert Prometheus results into SQL query results. This field is the expression template of `PromQL`. When you query the data of the monitoring table, the query conditions are used to rewrite the variables in this template to generate the final query expression. * `LABELS`: The label for the monitoring item. `tidb_query_duration` has two labels: `instance` and `sql_type`. * `QUANTILE`: The percentile. For monitoring data of the histogram type, a default percentile is specified. If the value of this field is `0`, it means that the monitoring item corresponding to the monitoring table is not a histogram. * `COMMENT`: Explanations for the monitoring table. You can see that the `tidb_query_duration` table is used to query the percentile time of the TiDB query execution, such as the query time of P999/P99/P90. The unit is second. diff --git a/system-tables/system-table-sql-diagnosis.md b/system-tables/system-table-sql-diagnosis.md index 6f070e4a1b1cf..f4db8cd96444b 100644 --- a/system-tables/system-table-sql-diagnosis.md +++ b/system-tables/system-table-sql-diagnosis.md @@ -48,7 +48,7 @@ To dynamically observe and compare cluster conditions in different time periods, Because the TiDB cluster has many monitoring metrics, TiDB provides the following monitoring summary tables in v4.0: + The monitoring summary table [`information_schema.metrics_summary`](/system-tables/system-table-metrics-summary.md) summarizes all monitoring data to for you to check each monitoring metric with higher efficiency. -+ [`information_schema.metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md)) also summarizes all monitoring data. Particularly, this table aggregates statistics using different labels of each monitoring item. ++ [`information_schema.metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md)) also summarizes all monitoring data. Particularly, this table aggregates statistics using different labels of each monitoring metric. ## Automatic diagnosis