pingcap · sre-bot · May 26, 2020 · May 19, 2020 · May 20, 2020 · May 20, 2020
diff --git a/system-tables/system-table-cluster-log.md b/system-tables/system-table-cluster-log.md
@@ -40,35 +40,33 @@ desc information_schema.cluster_log;
 
 > **注意：**
 >
-> + 日志表的所有字段都会下推到对应节点执行，所以为了降低使用集群日志表的开销，需尽可能地指定更多的条件。例如 `select * from cluter_log where instance='tikv-1'` 只会在 `tikv-1` 上执行日志搜索。
+> + 日志表的所有字段都会下推到对应节点执行，所以为了降低使用集群日志表的开销，必须指定搜索关键字以及时间范围，然后尽可能地指定更多的条件。例如 `select * from cluster_log where message like '%ddl%' and time > '2020-05-18 20:40:00' and time<'2020-05-18 21:40:00' and type='tidb'`。
 >
-> + `message` 字段支持 `like` 和 `regexp` 正则表达式，对应的 pattern 会编译为 `regexp`。同时指定多个 `message` 条件，相当于 `grep` 命令的 `pipeline` 形式，例如：`select * from cluster_log where message like 'coprocessor%' and message regexp '.*slow.*'` 相当于在集群所有节点执行 `grep 'coprocessor' xxx.log | grep -E '.*slow.*'`。
+> + `message` 字段支持 `like` 和 `regexp` 正则表达式，对应的 pattern 会编译为 `regexp`。同时指定多个 `message` 条件，相当于 `grep` 命令的 `pipeline` 形式，例如：`select * from cluster_log where message like 'coprocessor%' and message regexp '.*slow.*' and time > '2020-05-18 20:40:00' and time<'2020-05-18 21:40:00'` 相当于在集群所有节点执行 `grep 'coprocessor' xxx.log | grep -E '.*slow.*'`。
 
 查询某个 DDL 的执行过程示例如下：
 
 {{< copyable "sql" >}}
 
 ```sql
-select * from information_schema.cluster_log where message like '%ddl%' and message like '%job%58%' and type='tidb' and time > '2020-03-27 15:39:00';
+select time,instance,left(message,150) from information_schema.cluster_log where message like '%ddl%job%ID.80%' and type='tidb' and time > '2020-05-18 20:40:00' and time<'2020-05-18 21:40:00'
 ```
 
 ```
-+-------------------------+------+------------------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| TIME                    | TYPE | INSTANCE         | LEVEL | MESSAGE                                                                                                                                                                                                                                                                                                                                     |
-+-------------------------+------+------------------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4008 | INFO  | [ddl_worker.go:253] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0; "]                                                                       |
-| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4008 | INFO  | [ddl.go:457] ["[ddl] start DDL job"] [job="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="create table t3 (a int, b int,c int)"]                                                   |
-| 2020/03/27 15:39:36.879 | tidb | 172.16.5.40:4009 | INFO  | [ddl_worker.go:554] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:0, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]                                                             |
-| 2020/03/27 15:39:36.936 | tidb | 172.16.5.40:4009 | INFO  | [ddl_worker.go:739] ["[ddl] wait latest schema version changed"] [worker="worker 1, tp general"] [ver=35] ["take time"=52.165811ms] [job="ID:58, Type:create table, State:done, SchemaState:public, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] |
-| 2020/03/27 15:39:36.938 | tidb | 172.16.5.40:4009 | INFO  | [ddl_worker.go:359] ["[ddl] finish DDL job"] [worker="worker 1, tp general"] [job="ID:58, Type:create table, State:synced, SchemaState:public, SchemaID:1, TableID:57, RowCount:0, ArgLen:0, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]                                                      |
-| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4009 | INFO  | [ddl_worker.go:253] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0; "]                                                                       |
-| 2020/03/27 15:39:36.140 | tidb | 172.16.5.40:4009 | INFO  | [ddl.go:457] ["[ddl] start DDL job"] [job="ID:58, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:57, RowCount:0, ArgLen:1, start time: 2020-03-27 15:39:36.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="create table t3 (a int, b int,c int)"]                                                   |
-| 2020/03/27 15:39:37.141 | tidb | 172.16.5.40:4008 | INFO  | [ddl.go:489] ["[ddl] DDL job is finished"] [jobID=58]                                                                                                                                                                                                                                                                                       |
-+-------------------------+------+------------------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
++-------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
+| time                    | instance       | left(message,150)                                                                                                                                      |
++-------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 2020/05/18 21:37:54.784 | 127.0.0.1:4002 | [ddl_worker.go:261] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:80, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:79, Ro |
+| 2020/05/18 21:37:54.784 | 127.0.0.1:4002 | [ddl.go:477] ["[ddl] start DDL job"] [job="ID:80, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:79, RowCount:0, ArgLen:1, start |
+| 2020/05/18 21:37:55.327 | 127.0.0.1:4000 | [ddl_worker.go:568] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:80, Type:create table, State:none, SchemaState:none, SchemaID:1, Ta |
+| 2020/05/18 21:37:55.381 | 127.0.0.1:4000 | [ddl_worker.go:763] ["[ddl] wait latest schema version changed"] [worker="worker 1, tp general"] [ver=70] ["take time"=50.809848ms] [job="ID:80, Type: |
+| 2020/05/18 21:37:55.382 | 127.0.0.1:4000 | [ddl_worker.go:359] ["[ddl] finish DDL job"] [worker="worker 1, tp general"] [job="ID:80, Type:create table, State:synced, SchemaState:public, SchemaI |
+| 2020/05/18 21:37:55.786 | 127.0.0.1:4002 | [ddl.go:509] ["[ddl] DDL job is finished"] [jobID=80]                                                                                                  |
++-------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
 ```
 
-上面查询结果表示：
+上面查询结果记录了一个 DDL 执行的过程：
 
-+ 用户将 DDL JOB ID 为 `58` 的请求发给 `172.16.5.40:4008` TiDB 节点。
-+ `172.16.5.40:4009` TiDB 节点处理这个 DDL 请求，说明此时 `172.16.5.40:4009` 节点是 DDL owner。
-+ DDL JOB ID 为 58 的请求处理完成。
++ 用户将 DDL JOB ID 为 `80` 的请求发给 `127.0.0.1:4002` TiDB 节点。
++ `127.0.0.1:4000` TiDB 节点处理这个 DDL 请求，说明此时 `127.0.0.1:4000` 节点是 DDL owner。
++ DDL JOB ID 为 80 的请求处理完成。
diff --git a/system-tables/system-table-inspection-result.md b/system-tables/system-table-inspection-result.md
@@ -39,23 +39,23 @@ desc information_schema.inspection_result;
 字段解释：
 
 * `RULE`：诊断规则名称，目前实现了以下规则：
-    * `config`：配置一致性检测。如果同一个配置在不同实例不一致，会生成 `warning` 诊断结果。
+    * `config`：配置一致性以及合理性检测。如果同一个配置在不同实例不一致，会生成 `warning` 诊断结果。
     * `version`：版本一致性检测。如果同一类型的实例版本不同，会生成 `critical` 诊断结果。
-    * `node-load`：如果当前系统负载太高，会生成对应的 `warning` 诊断结果。
+    * `node-load`：服务器负载检测。如果当前系统负载太高，会生成对应的 `warning` 诊断结果。
     * `critical-error`：系统各个模块定义了严重的错误，如果某一个严重错误在对应时间段内超过阈值，会生成 `warning` 诊断结果。
-    * `threshold-check`：诊断系统会对大量指标进行阈值判断，如果超过阈值会生成对应的诊断信息。
+    * `threshold-check`：诊断系统会对一些关键指标进行阈值判断，如果超过阈值会生成对应的诊断信息。
 * `ITEM`：每一个规则会对不同的项进行诊断，该字段表示对应规则下面的具体诊断项。
 * `TYPE`：诊断的实例类型，可取值为 `tidb`，`pd` 和 `tikv`。
 * `INSTANCE`：诊断的具体实例地址。
 * `STATUS_ADDRESS`：实例的 HTTP API 服务地址。
 * `VALUE`：针对这个诊断项得到的值。
-* `REFERENCE`：针对这个诊断项的参考值（阈值）。如果 `VALUE` 和阈值相差较大，就会产生对应的诊断信息。
+* `REFERENCE`：针对这个诊断项的参考值（阈值）。如果 `VALUE` 超过阈值，就会产生对应的诊断信息。
 * `SEVERITY`：严重程度，取值为 `warning` 或 `critical`。
 * `DETAILS`：诊断的详细信息，可能包含进一步调查的 SQL 或文档链接。
 
 ## 诊断示例
 
-诊断集群当前时间的问题。
+对当前时间的集群进行诊断。
 
 {{< copyable "sql" >}}
 
@@ -160,7 +160,7 @@ select * from information_schema.inspection_result where rule='critical-error';
 
 ## 诊断规则介绍
 
-诊断模块内部包含一系列的规则，这些规则会通过查询已有的监控表和集群信息表，对结果和预先设定的阈值进行对比。如果结果超过阈值或低于阈值将生成 `warning` 或 `critical` 的结果，并在 `details` 列中提供相应信息。
+诊断模块内部包含一系列的规则，这些规则会通过查询已有的监控表和集群信息表，对结果和阈值进行对比。如果结果超过阈值将生成 `warning` 或 `critical` 的结果，并在 `details` 列中提供相应信息。
 
 可以通过查询 `inspection_rules` 系统表查询已有的诊断规则:
 
@@ -261,7 +261,7 @@ DETAILS   | the cluster has 2 different tidb versions, execute the sql to see mo
     | TiDB | panic-count | tidb_panic_count_total_count | TiDB 出现 panic 错误 |
     | TiDB | binlog-error | tidb_binlog_error_total_count | TiDB 写 binlog 时出现的错误 |
     | TiKV | critical-error | tikv_critical_error_total_coun | TiKV 的 critical error |
-    | TiKV | scheduler-is-busy       | tikv_scheduler_is_busy_total_count | TiKV 的 scheduler 太忙，该使 TiKV 临时不可用 |
+    | TiKV | scheduler-is-busy       | tikv_scheduler_is_busy_total_count | TiKV 的 scheduler 太忙，会导致 TiKV 临时不可用 |
     | TiKV | coprocessor-is-busy | tikv_coprocessor_is_busy_total_count | TiKV 的 coprocessor 太忙 |
     | TiKV | channel-is-full | tikv_channel_full_total_count | TiKV 出现 channel full 的错误 |
     | TiKV | tikv_engine_write_stall | tikv_engine_write_stall | TiKV 出现写入 stall 的错误 |
@@ -274,9 +274,9 @@ DETAILS   | the cluster has 2 different tidb versions, execute the sql to see mo
 
 |  组件  | 监控指标 | 相关监控表 | 预期值 |  说明  |
 |  :----  | :----  |  :----  |  :----  |  :----  |
-| TiDB | tso-duration              | pd_tso_wait_duration                | 小于 50 ms  |  获取事务 TSO 时间戳的耗时 |
+| TiDB | tso-duration              | pd_tso_wait_duration                | 小于 50 ms  |  获取事务 TSO 时间戳的等待耗时 |
 | TiDB | get-token-duration        | tidb_get_token_duration             | 小于 1 ms   |  查询获取 token 的耗时，相关的 TiDB 配置参数是 token-limit  |
-| TiDB | load-schema-duration      | tidb_load_schema_duration           | 小于 1 s    |  TiDB 更新获取表元信息的耗时 |
+| TiDB | load-schema-duration      | tidb_load_schema_duration           | 小于 1 s    |  TiDB 更新表元信息的耗时 |
 | TiKV | scheduler-cmd-duration    | tikv_scheduler_command_duration     | 小于 0.1 s  |  TiKV 执行 KV cmd 请求的耗时 |
 | TiKV | handle-snapshot-duration  | tikv_handle_snapshot_duration       | 小于 30 s   |  TiKV 处理 snapshot 的耗时 |
 | TiKV | storage-write-duration    | tikv_storage_async_request_duration | 小于 0.1 s  |  TiKV 写入的延迟 |

diff --git a/system-tables/system-table-inspection-summary.md b/system-tables/system-table-inspection-summary.md
@@ -7,7 +7,7 @@ aliases: ['/docs-cn/dev/reference/system-databases/inspection-summary/']
 
 # INSPECTION_SUMMARY
 
-在部分场景下，用户只关注特定链路或模块的监控汇总。例如当前 Coprocessor 配置的线程池为 8，如果 Coprocessor 的 CPU 使用率达到了 750%，可以确定存在风险，或者可能提前成为瓶颈。但是部分监控会因为用户的 workload 不同而差异较大，所以难以定义确定的阈值。排查这部分场景的问题也非常重要，所以 TiDB 提供了 `inspection_summary` 来进行链路汇总。
+在部分场景下，用户只需要关注特定链路或模块的监控汇总。例如当前 Coprocessor 配置的线程池为 8，如果 Coprocessor 的 CPU 使用率达到了 750%，就可以确定存在风险，或者可能提前成为瓶颈。但是部分监控会因为用户的 workload 不同而差异较大，所以难以定义确定的阈值。排查这部分场景的问题也非常重要，所以 TiDB 提供了 `inspection_summary` 来进行链路汇总。
 
 诊断汇总表 `information_schema.inspection_summary` 的表结构如下：
 
@@ -38,7 +38,7 @@ desc information_schema.inspection_summary;
 
 * `RULE`：汇总规则。由于规则在持续添加，最新的规则列表可以通过 `select * from inspection_rules where type='summary'` 查询。
 * `INSTANCE`：监控的具体实例。
-* `METRICS_NAME`：监控表。
+* `METRICS_NAME`：监控表的名字。
 * `QUANTILE`：对于包含 `QUANTILE` 的监控表有效，可以通过谓词下推指定多个百分位，例如 `select * from inspection_summary where rule='ddl' and quantile in (0.80, 0.90, 0.99, 0.999)` 来汇总 DDL 相关监控，查询百分位为 80/90/99/999 的结果。`AVG_VALUE`、`MIN_VALUE`、`MAX_VALUE` 分别表示聚合的平均值、最小值、最大值。
 * `COMMENT`：对应监控的解释。
 
@@ -48,9 +48,12 @@ desc information_schema.inspection_summary;
 
 使用示例:
 
-诊断结果表和诊断监控汇总表都可以通过 `hint` 的方式指定诊断的时间范围，例如 `select **+ time_range('2020-03-07 12:00:00','2020-03-07 13:00:00') */* from inspection_summary` 是对 2020-03-07 12:00:00 - 2020-03-07 13:00:00 时间段的监控汇总。和监控汇总表一样，诊断结果表通过对比两个不同时间段的数据，快速发现差异较大的监控项。以下为一个例子：
+诊断结果表和诊断监控汇总表都可以通过 `hint` 的方式指定诊断的时间范围，例如 `select /*+ time_range('2020-03-07 12:00:00','2020-03-07 13:00:00') */* from inspection_summary` 是对 2020-03-07 12:00:00 - 2020-03-07 13:00:00 时间段的监控汇总。和监控汇总表一样，`inspection_summary` 系统表也可以通过对比两个不同时间段的数据，快速发现差异较大的监控项。
 
-诊断集群在时间段 `"2020-01-16 16:00:54.933", "2020-01-16 16:10:54.933"` 的故障:
+以下为一个例子，对比以下两个时间段，读系统链路的监控项:
+
+* `(2020-01-16 16:00:54.933, 2020-01-16 16:10:54.933)`
+* `(2020-01-16 16:10:54.933, 2020-01-16 16:20:54.933)` 
 
 {{< copyable "sql" >}}
 
@@ -68,7 +71,7 @@ FROM
   JOIN
   (
     SELECT
-      /*+ time_range("2020-01-16 16:10:54.933","2020-01-16 16:20:54.933")*/ *
+      /*+ time_range("2020-01-16 16:10:54.933", "2020-01-16 16:20:54.933")*/ *
     FROM information_schema.inspection_summary WHERE rule='read-link'
   ) t2
   ON t1.metrics_name = t2.metrics_name