Skip to content

Commit

Permalink
enhance(monitoring&alerting): update docs about component monitoring …
Browse files Browse the repository at this point in the history
…and alerting
  • Loading branch information
just1900 authored and zhujf1989 committed Aug 19, 2021
1 parent 21772b2 commit a578f5a
Show file tree
Hide file tree
Showing 18 changed files with 122 additions and 9 deletions.
2 changes: 1 addition & 1 deletion content/zh/docs/user-guide/alerting/_index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
title: "告警管理"
weight: 4
weight: 5
---
2 changes: 1 addition & 1 deletion content/zh/docs/user-guide/alerting/alerting-amc.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "通知策略管理"
weight: 1
weight: 3
---

本文档介绍了如何在 KubeCube 中管理**项目级别**的告警联系人与通知策略。
Expand Down
9 changes: 4 additions & 5 deletions content/zh/docs/user-guide/alerting/alerting-rule.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
title: "告警规则管理"
weight: 2
weight: 4
---

本文档介绍了如何在 KubeCube 中管理告警规则
本文档介绍了如何在 KubeCube 中管理**项目级**告警规则


## 准备工作
Expand All @@ -12,8 +12,7 @@ weight: 2
2. 创建告警通知策略

## 告警规则管理

### 创建告警规则
#### 创建告警规则

1. 登录到 KubeCube 控制台,选择租户项目后,侧边栏展开【告警】菜单,选择【告警规则】,并点击创建.
2. 填写告警规则基本信息:
Expand All @@ -27,7 +26,7 @@ weight: 2
- 通知策略组:选择告警触发后需要通知的联系人与相应的通知策略。
- 告警描述信息: 告警触发后发送的告警描述信息,具体配置方式请参考[Prometheus告警规则模版配置](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#templating)

### 告警状态查询
#### 告警状态查询
![list-alertrule](/imgs/user-guide/alerting/list-alertrule.png)

在告警规则列表页面,可以查看项目下所有告警规则的状态,状态包含以下三种:
Expand Down
76 changes: 76 additions & 0 deletions content/zh/docs/user-guide/alerting/alertmanager-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: "平台组件告警"
weight: 1
---

本文档介绍了如何在 KubeCube 中配置**平台级**告警,包括配置集群内的 Alertmanager 、告警通知联系人、通知路由规则、告警规则。

## 简介
默认情况下,KubeCube 会在平台部署`kubecube-monitoring` Chart,该 Chart 包含 `Alertmanager` 组件和默认的 `Alertmanager Config Secret`配置以及平台基础组件的告警规则。默认情况下,KubeCube 创建的 `Alertmanager Config Secret` 不会因为 Chart 的升级或删除操作而被修改,以防用户的配置丢失。

## 准备工作
以平台管理员角色登录 KubeCube 平台。

## 配置 AlertManager

登录到 KubeCube 平台,点击【运维管理】,侧边栏选择【告警--全局告警配置】,列表页可以看到各个集群的`AlertManager` 的配置信息,点击【设置】按钮进行配置,

### 全局配置

![am-global](/imgs/user-guide/alerting/am-global.png)

若使用企业邮箱作为告警通知方式,需要在全局配置中配置以下字段:
- `smtp_smarthost` : 邮箱服务器域名和端口信息,e.g. imap.163.com:465
- `smtp_from` : 发件人邮箱
- `smtp_auth_username` : 邮件服务器认证用户名
- `smtp_auth_password` : 邮件服务器认证密码或授权码

若使用企业微信作为告警通知方式,需要在全局配置中配置以下字段:
- `wechat_api_url` : 默认使用`https://qyapi.weixin.qq.com/cgi-bin/`
- `wechat_api_secret` : 第三方企业应用的密钥
- `wechat_api_corp_id` : 企业微信账号唯一 ID

### 通知方式

目前页面支持配置Email、WeChat、Webhook三种联系方式,其他联系方式如Slack、OpsEngine等会在后续版本支持
![am-receivers](/imgs/user-guide/alerting/am-receivers.png)

更多字段含义请参考[Alertmanager官方文档中关于receivers的定义](https://www.prometheus.io/docs/alerting/latest/configuration/#receiver)

### 通知路由规则

相关配置如下:
- `receiver` : 选择上一步骤中定义的联系人
- `group_by` : 当前 route 节点的分组规则
- `matchers` : 当前 route 节点的匹配规则
- `group_wait` : 告警组内的发送一条告警通知的等待时间
- `group_interval` : 告警组内发送两条告警通知的间隔时间
- `repeat_interval` : 相同告警发送的间隔时间


![am-route](/imgs/user-guide/alerting/am-route.png)

更多字段含义请参考[Alertmanager官方文档中关于route的定义](https://www.prometheus.io/docs/alerting/latest/configuration/#route),当前页面暂不支持子路由的配置,会在后续版本提供支持。

## 管理告警规则

### 查看告警规则组
登录到 KubeCube 平台,点击【运维管理】,侧边栏选择【告警--告警规则】,列表页可以看到各个集群的`PrometheusRule` 的配置信息,默认情况下, KubeCube 为每个集群内置了基础资源以及平台组件的 `PrometheusRule`,

![component-alerting](/imgs/user-guide/alerting/component-alerting.png)

### 配置告警规则内容
可以点击【设置】按钮查看并配置每条告警规则的具体内容,包括
- 表达式 : `Promql`表达式
- `for` : 告警持续时长
- 告警程度 : 可以在上述 [通知方式](#通知方式)中配置不同告警程度对应的 Receiver 来接收告警通知
- Annotations
- 摘要: 接收告警通知的摘要信息
- 描述信息: 接收告警通知的具体描述信息,如发生故障的 Pod 所在的集群,空间等
- `Runbook Url`: 针对该告警规则的运维排障文档,应作为最佳实践在企业内部进行维护
- 也可以【展开更多配置】,添加更多自定义的Annotations(键-值对)
- Labels: 为告警规则附带的标签信息(键-值对),可以配合[通知路由规则](#通知路由规则) 实现告警通知的高级配置

![component-PrometheusRule](/imgs/user-guide/alerting/component-prometheusRule.png)

更多字段含义请参考[Prometheus-Operator的API文档](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#rule)
2 changes: 1 addition & 1 deletion content/zh/docs/user-guide/kubediag/_index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "排障系统"
weight: 6
weight: 10
---

本文档介绍了如何在 KubeCube 上集成 [KubeDiag](https://github.com/kubediag/kubediag) 排障系统。
Expand Down
4 changes: 4 additions & 0 deletions content/zh/docs/user-guide/monitoring/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: "集群监控"
weight: 5
---
34 changes: 34 additions & 0 deletions content/zh/docs/user-guide/monitoring/component-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: "平台组件监控"
weight: 3
---

本文档介绍了如何在 KubeCube 中查看平台核心组件监控。

## 准备工作

以平台管理员角色登录 KubeCube 平台。

## 查看组件监控视图

当前 KubeCube 平台支持对以下组件的监控视图可视化查询:

- 管控面 Pod 监控
- CoreDNS 监控
- Etcd 监控
- Kube ApiServer 监控
- Kube Controller Manager 监控
- Kube Proxy 监控
- Kube Scheduler 监控
- Kubelet 监控
- Prometheus 监控
- Thanos Query 监控

可以登录到 KubeCube 平台,点击【运维管理】,侧边栏选择【组件监控】,进行查看

![component-monitoring](/imgs/user-guide/monitoring/component-monitoring.png)

点击需要查看的组件,即可查看对应的监控视图,以平台组件 pod 监控为例,点击【control-plane-pods】后,可以查看不同集群中,管控组件 Pod 的资源使用情况;对于其他组件,可以查看相应的核心指标监控。

![control-plane-pods](/imgs/user-guide/monitoring/control-plane-pods.png)

2 changes: 1 addition & 1 deletion content/zh/docs/user-guide/registry/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "镜像仓库"
linkTitle: "镜像仓库"
weight: 5
weight: 9
---

KubeCube 支持主流的镜像仓库,如 registry.cn-hangzhou.aliyuncs.com,docker.io,hub.c.163.com 等,同时也支持私有仓库,KubeCube
Expand Down
Binary file added static/imgs/user-guide/alerting/am-global.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/imgs/user-guide/alerting/am-receivers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/imgs/user-guide/alerting/am-route.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/imgs/user-guide/alerting/create-amc-email.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/imgs/user-guide/alerting/create-amc-webhook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/imgs/user-guide/alerting/create-amc-wechat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a578f5a

Please sign in to comment.