Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修改了模板没有生效 #37

Closed
imlxh opened this issue Jun 28, 2020 · 23 comments
Closed

修改了模板没有生效 #37

imlxh opened this issue Jun 28, 2020 · 23 comments

Comments

@imlxh
Copy link

imlxh commented Jun 28, 2020

修改了钉钉的模板,但是告警时还是发的老的模板内容,还有如果我想遍历所有label,怎么改模板哈

@feiyu563
Copy link
Owner

feiyu563 commented Jun 28, 2020 via email

@imlxh
Copy link
Author

imlxh commented Jun 29, 2020

哦哦,用的不是您说的这个接口,但是发现提示是多个告警,发出来的只显示了第一个告警,这个感觉还需要优化下

@feiyu563
Copy link
Owner

feiyu563 commented Jun 29, 2020 via email

@imlxh
Copy link
Author

imlxh commented Jun 29, 2020

好的,我如果想把所有label都打印出来,模板该怎么设置呢,必须都写出来么

@imlxh
Copy link
Author

imlxh commented Jun 29, 2020

image
类似于这种,每个JOB的内容可能都不一样,所以显示的时候也需要展示些关键信息

@feiyu563
Copy link
Owner

参考#30

@imlxh
Copy link
Author

imlxh commented Jun 29, 2020

这个我看了,是固定的,并没有展示所有内容,告警比较简练

@xmx0632
Copy link

xmx0632 commented Jun 29, 2020

@feiyu563 你好,我在grafana里设置了发告警到 http://PrometheusAlert/grafana/dingding
从钉钉上可以接收到告警消息,如图
image
当是按照文档 :https://github.com/feiyu563/PrometheusAlert/blob/master/doc/readme/tpltest.md 说的去日志中查看接收到的消息却没有如文档中说的关键字信息。

以下是我从日志中拿到的信息:

2020/06/29 02:37:15.400 [I] [value.go:460] [1593398235400197882] {"evalMatches":[],"message":"nacos db exception!!!","ruleId":27,"ruleName":"db exception alert","ruleUrl":"http://localhost:3000/d/Bz_QALEiz12/nacos-1?fullscreen\u0026edit\u0026tab=alert\u0026panelId=54\u0026orgId=1","state":"alerting","tags":{},"title":"[Alerting] db exception alert"}
2020/06/29 02:37:15.400 [I] [grafana.go:155] [1593398235400197882] [dingding] {"msgtype":"markdown","markdown":{"title":"PrometheusAlert故障告警信息","text":"## PrometheusAlertGrafana故障告警信息\n\n#### db exception alert\n\n###### 告警级别:灾难\n\n###### 开始时间:2020-06-29 02:37:15\n\n##### nacos db exception!!!\n\nPrometheusAlert"},"at":{"atMobiles":["15395105573"],"isAtAll":true}}

2020/06/29 02:37:15.883 [I] [grafana.go:155] [1593398235400197882] [dingding] {"errcode":0,"errmsg":"ok"}
2020/06/29 02:37:15.883 [I] [value.go:460] [1593398235400197882] 告警消息发送完成.
2020/06/29 02:37:15.883 [D] [server.go:2802] | 172.17.0.4| 200 | 483.872963ms| match| POST /grafana/dingding r:/grafana/dingding

请问这里需要如何定制模版呢?是不是文档与代码不同步?多谢

@feiyu563
Copy link
Owner

这些就是grafana的json

{"evalMatches":[],"message":"nacos db exception!!!","ruleId":27,"ruleName":"db exception alert","ruleUrl":"http://localhost:3000/d/Bz_QALEiz12/nacos-1?fullscreen\u0026edit\u0026tab=alert\u0026panelId=54\u0026orgId=1","state":"alerting","tags":{},"title":"[Alerting] db exception alert"}

@xmx0632
Copy link

xmx0632 commented Jun 29, 2020

这些就是grafana的json

{"evalMatches":[],"message":"nacos db exception!!!","ruleId":27,"ruleName":"db exception alert","ruleUrl":"http://localhost:3000/d/Bz_QALEiz12/nacos-1?fullscreen\u0026edit\u0026tab=alert\u0026panelId=54\u0026orgId=1","state":"alerting","tags":{},"title":"[Alerting] db exception alert"}

我在模版测试页面尝试发送消息会报错。可能是哪里的问题呢?多谢
image

@feiyu563
Copy link
Owner

feiyu563 commented Jun 29, 2020

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}
{{if eq $v.status "resolved"}}
## [Prometheus恢复信息]({{$v.generatorURL}})
{{ range $x,$y:=$v.labels }}
###### {{ $x }}: {{ $y }}
{{end}}
###### 开始时间:{{$v.startsAt}}
###### 结束时间:{{$v.endsAt}}
##### [{{$v.annotations.description}}]({{$var}})
![Prometheus](https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png)
{{else}}
## [Prometheus告警信息]({{$v.generatorURL}})
{{ range $x,$y:=$v.labels }}
###### {{ $x }}: {{ $y }}
{{end}}
###### 开始时间:{{$v.startsAt}}
###### 结束时间:{{$v.endsAt}}
##### [{{$v.annotations.description}}]({{$var}})
![Prometheus](https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png)
{{end}}
{{ end }}

遍历所有labels的模版试试这个

@feiyu563
Copy link
Owner

你上面那个是grafana的告警消息,要用grafana的模版,而你用的prometheus的模版 肯定会报错

@xmx0632
Copy link

xmx0632 commented Jun 29, 2020

你上面那个是grafana的告警消息,要用grafana的模版,而你用的prometheus的模版 肯定会报错

看到了,多谢

@xmx0632
Copy link

xmx0632 commented Jun 29, 2020

换了模版测试有问题
image
日志中报错,可能是bug吗?

2020/06/29 03:16:34.596 [D] [value.go:460] [1593400594596360963] {"evalMatches":[],"message":"nacos db exception!!!","ruleId":27,"ruleName":"db exception alert","ruleUrl":"http://www.baidu.com","state":"alerting","tags":{},"title":"[Alerting] db exception alert"}
2020/06/29 03:16:34.596 [C] [panic.go:679] the request url is /prometheusalert
2020/06/29 03:16:34.596 [C] [panic.go:679] Handler crashed with error runtime error: invalid memory address or nil pointer dereference
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/panic.go:679
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/text/template/exec.go:164
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/panic.go:679
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/panic.go:199
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/signal_unix.go:394
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/text/template/exec.go:218
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/text/template/exec.go:204
2020/06/29 03:16:34.596 [C] [panic.go:679] /mnt/hgfs/code/golang/src/PrometheusAlert/controllers/prometheusalert.go:38
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/reflect/value.go:460
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/reflect/value.go:321
2020/06/29 03:16:34.597 [C] [panic.go:679] /mnt/hgfs/code/golang/pkg/mod/github.com/astaxie/beego@v1.12.1/router.go:853
2020/06/29 03:16:34.597 [C] [panic.go:679] /usr/lib/golang/src/net/http/server.go:2802
2020/06/29 03:16:34.597 [C] [panic.go:679] /usr/lib/golang/src/net/http/server.go:1890
2020/06/29 03:16:34.597 [C] [panic.go:679] /usr/lib/golang/src/runtime/asm_amd64.s:1357
2020/06/29 03:16:34.597 [server.go:3054] [HTTP] http: superfluous response.WriteHeader call from github.com/astaxie/beego/context.(*Response).WriteHeader (context.go:230)

@imlxh
Copy link
Author

imlxh commented Jun 29, 2020

好的,我试下,非常感谢

@xmx0632
Copy link

xmx0632 commented Jun 29, 2020

换了模版测试有问题
image
日志中报错,可能是bug吗?

2020/06/29 03:16:34.596 [D] [value.go:460] [1593400594596360963] {"evalMatches":[],"message":"nacos db exception!!!","ruleId":27,"ruleName":"db exception alert","ruleUrl":"http://www.baidu.com","state":"alerting","tags":{},"title":"[Alerting] db exception alert"}
2020/06/29 03:16:34.596 [C] [panic.go:679] the request url is /prometheusalert
2020/06/29 03:16:34.596 [C] [panic.go:679] Handler crashed with error runtime error: invalid memory address or nil pointer dereference
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/panic.go:679
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/text/template/exec.go:164
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/panic.go:679
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/panic.go:199
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/runtime/signal_unix.go:394
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/text/template/exec.go:218
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/text/template/exec.go:204
2020/06/29 03:16:34.596 [C] [panic.go:679] /mnt/hgfs/code/golang/src/PrometheusAlert/controllers/prometheusalert.go:38
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/reflect/value.go:460
2020/06/29 03:16:34.596 [C] [panic.go:679] /usr/lib/golang/src/reflect/value.go:321
2020/06/29 03:16:34.597 [C] [panic.go:679] /mnt/hgfs/code/golang/pkg/mod/github.com/astaxie/beego@v1.12.1/router.go:853
2020/06/29 03:16:34.597 [C] [panic.go:679] /usr/lib/golang/src/net/http/server.go:2802
2020/06/29 03:16:34.597 [C] [panic.go:679] /usr/lib/golang/src/net/http/server.go:1890
2020/06/29 03:16:34.597 [C] [panic.go:679] /usr/lib/golang/src/runtime/asm_amd64.s:1357
2020/06/29 03:16:34.597 [server.go:3054] [HTTP] http: superfluous response.WriteHeader call from github.com/astaxie/beego/context.(*Response).WriteHeader (context.go:230)

debug后发现是grafana原来设置的模版中包含time方法调用导致的,删除包含time的调用之后就正常了。

Grafana告警信息

{{.ruleName}}

告警级别:严重
开始时间:{{ (time.Now).Format time.Kitchen }}{{ ((time.Now).Add (time.Hour 2)).Format time.Kitchen }}
{{.message}}

Prometheus

与传入的消息转换时没有time对象导致返回 nil 报错,在

tpl,_:=template.New("").Parse(tpltext.Tpl)
中打印转换异常可以看到报错信息

tpl,tplErr:=template.New("").Parse(tpltext.Tpl)
if tplErr != nil {
logs.Error("parse tpl error",tplErr)
}

@xmx0632
Copy link

xmx0632 commented Jun 29, 2020

接收到的Grafana消息为:

{"evalMatches":[{"value":0,"metric":"up{Category=\"micro-service\", Env=\"prod\", Name=\"fabric-sidecar-1\", Usage=\"fabric-sidecar\", instance=\"10.170.0.11:9876\", job=\"micro-service\"}","tags":{"Category":"micro-service","Env":"prod","Name":"fabric-sidecar-1","Usage":"fabric-sidecar","__name__":"up","instance":"10.170.0.11:9876","job":"micro-service"}}],"message":"微服务挂了!请联系管理员检查!","ruleId":37,"ruleName":"MicroService节点状态告警","ruleUrl":"http://localhost:3000/d/yCQyX6ZMz/blockchainservicemonitor?fullscreen\u0026edit\u0026tab=alert\u0026panelId=8\u0026orgId=1","state":"alerting","tags":{},"title":"[Alerting] MicroService节点状态告警"}

对应的 Grafana 告警模版改为:


## [Grafana告警信息]({{.ruleUrl}})
#### {{.ruleName}}
###### 告警级别:严重
##### {{.message}}
##### {{.value}}
{{range $i, $v := .evalMatches}}
{{ if $v.tags.Category }}
##### Category: {{ $v.tags.Category }}
{{end}}
{{ if $v.tags.Usage }}
##### Usage: {{ $v.tags.Usage }}
{{end}}
{{ if $v.tags.instance }}
##### Instance: {{ $v.tags.instance }}
{{end}}
{{end}}

@feiyu563
Copy link
Owner

feiyu563 commented Jun 29, 2020 via email

@feiyu563
Copy link
Owner

feiyu563 commented Jul 7, 2020

issue我先关闭了

@feiyu563 feiyu563 closed this as completed Jul 7, 2020
@Zhang21
Copy link
Contributor

Zhang21 commented Jul 7, 2020

@feiyu563 @xmx0632 请问两位的alertmanger是怎么配置的?

我按照文档配置,那不是wxurl已经被写死了?那自己配置的其它机器人是否就不生效了?还有就是,app.conf里面配置是否也不生效了?时间转换这些都不生效。

找到需要使用的自定义消息模版,复制表格中路径一列的地址内容,并将地址中[xxxxx]中的地址或手机号替换成你实际的配置,将其粘贴到对应的WebHook地址配置中即可。


我只是像使用自定义的模板,但我还是需要继续使用app.conf里面的内容。关于多个机器人,关于时区等等。怎么解决?

@Zhang21
Copy link
Contributor

Zhang21 commented Jul 7, 2020

我能不能只填写模板地址,不填写其它配置,让它从app.conf里取读取?

@Zhang21
Copy link
Contributor

Zhang21 commented Jul 7, 2020

我使用sqlite3连数据库都改了,使用默认的接口还是不生效。

这个默认的告警模板是写死在代码里面的了吗?我连数据库都改了,哎!

@pokitpeng
Copy link

+1 搞了半天,发现白忙活一场

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants