Skip to content
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@
+ [TiFlash Alert Rules](/tiflash/tiflash-alert-rules.md)
+ Troubleshoot
+ [Identify Slow Queries](/identify-slow-queries.md)
+ [SQL Diagnostics](/system-tables/system-table-sql-diagnostics.md)
+ [Identify Expensive Queries](/identify-expensive-queries.md)
+ [Statement Summary Tables](/statement-summary-tables.md)
+ [Troubleshoot Cluster Setup](/troubleshoot-tidb-cluster.md)
Expand Down
64 changes: 32 additions & 32 deletions system-tables/system-table-inspection-result.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
title: INSPECTION_RESULT
summary: Learn the `INSPECTION_RESULT` diagnosis result table.
summary: Learn the `INSPECTION_RESULT` diagnostic result table.
category: reference
aliases: ['/docs/dev/reference/system-databases/inspection-result/']
---

# INSPECTION_RESULT

TiDB has some built-in diagnosis rules for detecting faults and hidden issues in the system.
TiDB has some built-in diagnostic rules for detecting faults and hidden issues in the system.

The `INSPECTION_RESULT` diagnosis feature can help you quickly find problems and reduce your repetitive manual work. You can use the `select * from information_schema.inspection_result` statement to trigger the internal diagnosis.
The `INSPECTION_RESULT` diagnostic feature can help you quickly find problems and reduce your repetitive manual work. You can use the `select * from information_schema.inspection_result` statement to trigger the internal diagnostics.

The structure of the `information_schema.inspection_result` diagnosis result table `information_schema.inspection_result` is as follows:
The structure of the `information_schema.inspection_result` diagnostic result table `information_schema.inspection_result` is as follows:

{{< copyable "sql" >}}

Expand All @@ -38,22 +38,22 @@ desc information_schema.inspection_result;

Field description:

* `RULE`: The name of the diagnosis rule. Currently, the following rules are available:
* `config`: Checks whether the configuration is consistent and proper. If the same configuration is inconsistent on different instances, a `warning` diagnosis result is generated.
* `version`: The consistency check of version. If the same version is inconsistent on different instances, a `warning` diagnosis result is generated.
* `node-load`: Checks the server load. If the current system load is too high, the corresponding `warning` diagnosis result is generated.
* `critical-error`: Each module of the system defines critical errors. If a critical error exceeds the threshold within the corresponding time period, a warning diagnosis result is generated.
* `threshold-check`: The diagnosis system checks the thresholds of key metrics. If a threshold is exceeded, the corresponding diagnosis information is generated.
* `ITEM`: Each rule diagnoses different items. This field indicates the specific diagnosis items corresponding to each rule.
* `TYPE`: The instance type of the diagnosis. The optional values are `tidb`, `pd`, and `tikv`.
* `RULE`: The name of the diagnostic rule. Currently, the following rules are available:
* `config`: Checks whether the configuration is consistent and proper. If the same configuration is inconsistent on different instances, a `warning` diagnostic result is generated.
* `version`: The consistency check of version. If the same version is inconsistent on different instances, a `warning` diagnostic result is generated.
* `node-load`: Checks the server load. If the current system load is too high, the corresponding `warning` diagnostic result is generated.
* `critical-error`: Each module of the system defines critical errors. If a critical error exceeds the threshold within the corresponding time period, a warning diagnostic result is generated.
* `threshold-check`: The diagnostic system checks the thresholds of key metrics. If a threshold is exceeded, the corresponding diagnostic information is generated.
* `ITEM`: Each rule diagnoses different items. This field indicates the specific diagnostic items corresponding to each rule.
* `TYPE`: The instance type of the diagnostics. The optional values are `tidb`, `pd`, and `tikv`.
* `INSTANCE`: The specific address of the diagnosed instance.
* `STATUS_ADDRESS`: The HTTP API service address of the instance.
* `VALUE`: The value of a specific diagnosis item.
* `REFERENCE`: The reference value (threshold value) for this diagnosis item. If `VALUE` exceeds the threshold, the corresponding diagnosis information is generated.
* `VALUE`: The value of a specific diagnostic item.
* `REFERENCE`: The reference value (threshold value) for this diagnostic item. If `VALUE` exceeds the threshold, the corresponding diagnostic information is generated.
* `SEVERITY`: The severity level. The optional values are `warning` and `critical`.
* `DETAILS`: Diagnosis details, which might also contain SQL statement(s) or document links for further diagnosis.
* `DETAILS`: Diagnostic details, which might also contain SQL statement(s) or document links for further diagnostics.

## Diagnosis example
## Diagnostics example

Diagnose issues currently existing in the cluster.

Expand Down Expand Up @@ -102,7 +102,7 @@ SEVERITY | warning
DETAILS | max duration of 172.16.5.40:20151 tikv rocksdb-write-duration was too slow
```

The following issues can be detected from the diagnosis result above:
The following issues can be detected from the diagnostic result above:

* The first row indicates that TiDB's `log.slow-threshold` value is configured to `0`, which might affect performance.
* The second row indicates that two different TiDB versions exist in the cluster.
Expand Down Expand Up @@ -137,32 +137,32 @@ SEVERITY | warning
DETAILS | max duration of 172.16.5.40:10089 tidb get-token-duration is too slow
```

The following issues can be detected from the diagnosis result above:
The following issues can be detected from the diagnostic result above:

* The first row indicates that the `172.16.5.40:4009` TiDB instance is restarted at `2020/03/26 00:05:45.670`.
* The second row indicates that the maximum `get-token-duration` time of the `172.16.5.40:10089` TiDB instance is 0.234s, but the expected time is less than 0.001s.

You can also specify conditions, for example, to query the `critical` level diagnosis results:
You can also specify conditions, for example, to query the `critical` level diagnostic results:

{{< copyable "sql" >}}

```sql
select * from information_schema.inspection_result where severity='critical';
```

Query only the diagnosis result of the `critical-error` rule:
Query only the diagnostic result of the `critical-error` rule:

{{< copyable "sql" >}}

```sql
select * from information_schema.inspection_result where rule='critical-error';
```

## Diagnosis rules
## Diagnostic rules

The diagnosis module contains a series of rules. These rules compare the results with the thresholds after querying the existing monitoring tables and cluster information tables. If the results exceed the thresholds, the diagnosis of `warning` or `critical` is generated and the corresponding information is provided in the `details` column.
The diagnostic module contains a series of rules. These rules compare the results with the thresholds after querying the existing monitoring tables and cluster information tables. If the results exceed the thresholds, the diagnostics of `warning` or `critical` is generated and the corresponding information is provided in the `details` column.

You can query the existing diagnosis rules by querying the `inspection_rules` system table:
You can query the existing diagnostic rules by querying the `inspection_rules` system table:

{{< copyable "sql" >}}

Expand All @@ -182,9 +182,9 @@ select * from information_schema.inspection_rules where type='inspection';
+-----------------+------------+---------+
```

### `config` diagnosis rule
### `config` diagnostic rule

In the `config` diagnosis rule, the following two diagnosis rules are executed by querying the `CLUSTER_CONFIG` system table:
In the `config` diagnostic rule, the following two diagnostic rules are executed by querying the `CLUSTER_CONFIG` system table:

* Check whether the configuration values of the same component are consistent. Not all configuration items has this consistency check. The white list of consistency check is as follows:

Expand Down Expand Up @@ -228,9 +228,9 @@ In the `config` diagnosis rule, the following two diagnosis rules are executed b
| TiDB | log.slow-threshold | larger than `0` |
| TiKV | raftstore.sync-log | `true` |

### `version` diagnosis rule
### `version` diagnostic rule

The `version` diagnosis rule checks whether the version hash of the same component is consistent by querying the `CLUSTER_INFO` system table. See the following example:
The `version` diagnostic rule checks whether the version hash of the same component is consistent by querying the `CLUSTER_INFO` system table. See the following example:

{{< copyable "sql" >}}

Expand All @@ -250,9 +250,9 @@ SEVERITY | critical
DETAILS | the cluster has 2 different tidb versions, execute the sql to see more detail: select * from information_schema.cluster_info where type='tidb'
```

### `critical-error` diagnosis rule
### `critical-error` diagnostic rule

In `critical-error` diagnosis rule, the following two diagnosis rules are executed:
In `critical-error` diagnostic rule, the following two diagnostic rules are executed:

* Detect whether the cluster has the following errors by querying the related monitoring system tables in the metrics schema:

Expand All @@ -268,9 +268,9 @@ In `critical-error` diagnosis rule, the following two diagnosis rules are execut

* Check whether any component is restarted by querying the `metrics_schema.up` monitoring table and the `CLUSTER_LOG` system table.

### `threshold-check` diagnosis rule
### `threshold-check` diagnostic rule

The `threshold-check` diagnosis rule checks whether the following metrics in the cluster exceed the threshold by querying the related monitoring system tables in the metrics schema:
The `threshold-check` diagnostic rule checks whether the following metrics in the cluster exceed the threshold by querying the related monitoring system tables in the metrics schema:

| Component | Monitoring metric | Monitoring table | Expected value | Description |
| :---- | :---- | :---- | :---- | :---- |
Expand Down Expand Up @@ -308,4 +308,4 @@ In addition, this rule also checks whether the CPU usage of the following thread
* storage-readpool-low-cpu
* split-check-cpu

The built-in diagnosis rules are constantly being improved. If you have more diagnosis rules, welcome to create a PR or an issue in the [`tidb` repository](https://github.com/pingcap/tidb).
The built-in diagnostic rules are constantly being improved. If you have more diagnostic rules, welcome to create a PR or an issue in the [`tidb` repository](https://github.com/pingcap/tidb).
4 changes: 2 additions & 2 deletions system-tables/system-table-inspection-summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,12 @@ Field description:

Usage example:

Both the diagnosis result table and the diagnosis monitoring summary table can specify the diagnosis time range using `hint`. `select /*+ time_range('2020-03-07 12:00:00','2020-03-07 13:00:00') */* from inspection_summary` is the monitoring summary for the `2020-03-07 12:00:00` to `2020-03-07 13:00:00` period. Like the monitoring summary table, you can use the `inspection_summary` table to quickly find the monitoring items with large differences by comparing the data of two different periods.
Both the diagnostic result table and the diagnostic monitoring summary table can specify the diagnostic time range using `hint`. `select /*+ time_range('2020-03-07 12:00:00','2020-03-07 13:00:00') */* from inspection_summary` is the monitoring summary for the `2020-03-07 12:00:00` to `2020-03-07 13:00:00` period. Like the monitoring summary table, you can use the `inspection_summary` table to quickly find the monitoring items with large differences by comparing the data of two different periods.

The following example compares the monitoring metrics of read links in two time periods:

* `(2020-01-16 16:00:54.933, 2020-01-16 16:10:54.933)`
* `(2020-01-16 16:10:54.933, 2020-01-16 16:20:54.933)`
* `(2020-01-16 16:10:54.933, 2020-01-16 16:20:54.933)`

{{< copyable "sql" >}}

Expand Down
2 changes: 1 addition & 1 deletion system-tables/system-table-metrics-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ aliases: ['/docs/dev/reference/system-databases/metrics-schema/']

# Metrics Schema

To dynamically observe and compare cluster conditions of different time ranges, the SQL diagnosis system introduces cluster monitoring system tables. All monitoring tables are in the `metrics_schema` database. You can query the monitoring information using SQL statements in this schema. The data of the three monitoring-related summary tables ([`metrics_summary`](/system-tables/system-table-metrics-summary.md), [`metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md), and `inspection_result`) are all obtained by querying the monitoring tables in the metrics schema. Currently, many system tables are added, so you can query the information of these tables using the [`information_schema.metrics_tables`](/system-tables/system-table-metrics-tables.md) table.
To dynamically observe and compare cluster conditions of different time ranges, the SQL diagnostic system introduces cluster monitoring system tables. All monitoring tables are in the `metrics_schema` database. You can query the monitoring information using SQL statements in this schema. The data of the three monitoring-related summary tables ([`metrics_summary`](/system-tables/system-table-metrics-summary.md), [`metrics_summary_by_label`](/system-tables/system-table-metrics-summary.md), and `inspection_result`) are all obtained by querying the monitoring tables in the metrics schema. Currently, many system tables are added, so you can query the information of these tables using the [`information_schema.metrics_tables`](/system-tables/system-table-metrics-tables.md) table.

## Overview

Expand Down
Loading