diff --git a/daily-check.md b/daily-check.md new file mode 100644 index 0000000000000..314cb7310a99e --- /dev/null +++ b/daily-check.md @@ -0,0 +1,82 @@ +--- +title: Daily Check +summary: Learn about performance indicators of the TiDB cluster. +category: reference +--- + +# Daily Check + +As a distributed database, TiDB is more complicated than the stand-alone database in terms of the mechanism, and monitoring items. To help operate and maintain TiDB in a more convenient way, this document introduces some key performance indicators. + +## Key indicators of Dashboard + +Starting from v4.0, TiDB provides a new operation and maintenance management tool, TiDB Dashboard. This tool is integrated into the PD component. You can access TiDB Dashboard at the default address `http://${pd-ip}:${pd_port}/dashboard`. + +TiDB Dashboard simplifies the operation and maintenance of the TiDB database. You can view the running status of the entire TiDB cluster through one interface. The following are descriptions of some performance indicators. + +### Instance panel + +![Instance panel](/media/instance-status-panel.png) + ++ **Status**: This indicator is used to check whether the status is normal. For an online node, this can be ignored. ++ **Up Time**: The key indicator. If you find that the `Up Time` is changed, you need to locate the reason why the component is restarted. ++ **Version**, **Deployment Directory**, **Git Hash**: These indicators need to be checked to avoid inconsistent or even incorrect version/deployment directory. + +### Host panel + +![Host panel](/media/host-panel.png) + +You can view the usage of CPU, memory, and disk. When the average usage of any resource exceeds 60%, it is recommended to plan to scale out the capacity. When the average usage reaches 80%, it is recommended to scale out the capacity. + +### SQL analysis panel + +![SQL analysis panel](/media/sql-analysis-panel.png) + +You can locate the slow SQL statement executed in the cluster. Then you can optimize the specific SQL statement. + +### Region panel + +![Region panel](/media/region-panel.png) + ++ `miss-peer-region-count`: The number of Regions without enough replicas. This value is not always greater than `0`. ++ `extra-peer-region-count`: The number of Regions with extra replicas. These Regions are generated during the scheduling process. ++ `empty-region-count`: The number of empty Regions, generated by executing the `TRUNCATE TABLE`/`DROP TABLE` statement. If this number is large, you can consider enabling `Region Merge` to merge Regions across tables. ++ `pending-peer-region-count`: The number of Regions with outdated Raft logs. It is normal that a few pending peers are generated in the scheduling process. However, it is not normal if this value is large for a period of time. ++ `down-peer-region-count`: The number of Regions with an unresponsive peer reported by the Raft leader. ++ `offline-peer-region-count`: The number of Regions during the offline process. + +Generally, it is normal that these values are not `0`. However, it is not normal that they are not `0` for quite a long time. + +### KV Request Duration + +![TiKV request duration](/media/kv-duration-panel.png) + +The KV request duration 99 in TiKV. If you find nodes with a long duration, check whether there are hot spots, or whether there are nodes with poor performance. + +### PD TSO Wait Duration + +![TiDB TSO Wait Duration](/media/pd-duration-panel.png) + +The time it takes for TiDB to obtain TSO from PD. The following are reasons for the long wait duration: + ++ High network latency from TiDB to PD. You can manually execute the ping command to test the network latency. ++ High load for the TiDB server. ++ High load for the PD server. + +### Overview panel + +![Overview panel](/media/overview-panel.png) + +You can view the load, memory available, network traffic, and I/O utilities. When a bottleneck is found, it is recommended to scale out the capacity, or to optimize the cluster topology, SQL, cluster parameters, etc. + +### Exceptions + +![Exceptions](/media/failed-query-panel.png) + +You can view the errors triggered by the execution of SQL statements on each TiDB instance. These include syntax error, primary key conflicts, etc. + +### GC status + +![GC status](/media/garbage-collation-panel.png) + +You can check whether the GC (Garbage Collection) status is normal by viewing the time when the last GC happens. If the GC is abnormal, it might lead to excessive historical data, thereby decreasing the access efficiency. diff --git a/media/failed-query-panel.png b/media/failed-query-panel.png new file mode 100644 index 0000000000000..8777cbb993705 Binary files /dev/null and b/media/failed-query-panel.png differ diff --git a/media/garbage-collation-panel.png b/media/garbage-collation-panel.png new file mode 100644 index 0000000000000..81042b82f5a70 Binary files /dev/null and b/media/garbage-collation-panel.png differ diff --git a/media/host-panel.png b/media/host-panel.png new file mode 100644 index 0000000000000..8b4bb164b2f3f Binary files /dev/null and b/media/host-panel.png differ diff --git a/media/instance-status-panel.png b/media/instance-status-panel.png new file mode 100644 index 0000000000000..1cdc92e98578b Binary files /dev/null and b/media/instance-status-panel.png differ diff --git a/media/kv-duration-panel.png b/media/kv-duration-panel.png new file mode 100644 index 0000000000000..32309e7064a6b Binary files /dev/null and b/media/kv-duration-panel.png differ diff --git a/media/overview-panel.png b/media/overview-panel.png new file mode 100644 index 0000000000000..353571bc628ce Binary files /dev/null and b/media/overview-panel.png differ diff --git a/media/pd-duration-panel.png b/media/pd-duration-panel.png new file mode 100644 index 0000000000000..d5d3d807a9698 Binary files /dev/null and b/media/pd-duration-panel.png differ diff --git a/media/region-panel.png b/media/region-panel.png new file mode 100644 index 0000000000000..5b1fa4520c5ea Binary files /dev/null and b/media/region-panel.png differ diff --git a/media/sql-analysis-panel.png b/media/sql-analysis-panel.png new file mode 100644 index 0000000000000..7543cd45c2437 Binary files /dev/null and b/media/sql-analysis-panel.png differ