Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么修改钉钉告警模板 #11

Closed
zerlee opened this issue Jul 16, 2019 · 20 comments
Closed

怎么修改钉钉告警模板 #11

zerlee opened this issue Jul 16, 2019 · 20 comments

Comments

@zerlee
Copy link

zerlee commented Jul 16, 2019

现在是

P2
PROBLEM
sh-test-master1

 all(#3) mem.memused.percent  81.08559>=80
O1 2019-07-15 14:17:00

想改成这样:

告警等级: P2
告警类型: problem
告警指标:  all(#3) mem.memused.percent  81.08559>=80
告警主机: sh-test-master1
告警IP:  192.168.100.100
告警时间: 2019-7-15 14:17:00
告警说明: 内存使用率超过80%,已持续3分钟
告警链接:
告警联系人:
@sdvdxl
Copy link
Owner

sdvdxl commented Jul 18, 2019

现在还没做模板自定义功能

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 19, 2019

@zerlee 消息回调没有连接和联系人,所以无法处理。

https://github.com/sdvdxl/falcon-message/releases/tag/v0.0.3

@sdvdxl sdvdxl closed this as completed Jul 19, 2019
@zerlee
Copy link
Author

zerlee commented Jul 21, 2019

[root@falcon1 falcon-message-0.0.3]# ./control build
# github.com/sdvdxl/falcon-message-0.0.3
./main.go:49:57: cfg.DingTalk.TemplateFile undefined (type config.DingTalk has no field or method TemplateFile)
./main.go:75:18: assignment mismatch: 2 variables but util.HandleContent returns 1 values
./main.go:92:44: too many arguments in call to ding.Send
./main.go:92:69: cfg.DingTalk.MessageType undefined (type config.DingTalk has no field or method MessageType)

编译的时候报错, @sdvdxl

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 22, 2019

github.com/sdvdxl/falcon-message-0.0.3 是什么意思?

@zerlee
Copy link
Author

zerlee commented Jul 22, 2019

是目录位置

[root@falcon1 falcon-message-0.0.3]# pwd
/root/go/src/github.com/sdvdxl/falcon-message-0.0.3

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 22, 2019

是目录位置

[root@falcon1 falcon-message-0.0.3]# pwd
/root/go/src/github.com/sdvdxl/falcon-message-0.0.3

为什么要带个 -0.0.3

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 22, 2019

具体 gopath 理解一下

@zerlee
Copy link
Author

zerlee commented Jul 22, 2019

具体 gopath 理解一下

@sdvdxl 嗯,去掉后,好了。但是出现了一个问题,貌似是bug。我dashboard联系人im那里配置了两个钉钉群的token,也就是说告警会发到两个钉钉群。
现在使用这个版本后,经过测试会发现只发送到后面那个token的钉钉群,而且会同时发送两次,而前面的钉钉群不会发送了。如图
Snipaste_2019-07-22_11-35-08
Snipaste_2019-07-22_11-35-18

将前面那个token删除后,发送次数就正常了

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 22, 2019

是不是token重复了? 你把日志发一下,敏感信息打一下码

@zerlee
Copy link
Author

zerlee commented Jul 22, 2019

2019/07/22 11:28:00 message comming
2019/07/22 11:28:00 tos: [ding]:777f4f49c88137e6;4368938550f6c5  content: [P2][PROBLEM][master1][][falcon-agent挂了 all(#1) agent.alive  -1==-1][O3 2019-07-22 11:26:00]
2019/07/22 11:28:00 message: {"markdown":{"text":"## 告警\n\n- 告警等级: P2\n- 告警类型: PROBLEM\n- 告警指标: agent.alive  -1==-1\n- 告警主机: master1\n- 告警时间: 2019-07-22 11:26:00\n- 告警说明: falcon-agent挂了 all(#1),已持续3分钟\n","title":"告警"},"msgtype":"markdown"}
2019/07/22 11:28:00 response result: {"errcode":0,"errmsg":"ok"}
2019/07/22 11:28:00 {true ok 0}
2019/07/22 11:28:00 message: {"markdown":{"text":"## 告警\n\n- 告警等级: P2\n- 告警类型: PROBLEM\n- 告警指标: agent.alive  -1==-1\n- 告警主机: master1\n- 告警时间: 2019-07-22 11:26:00\n- 告警说明: falcon-agent挂了 all(#1),已持续3分钟\n","title":"告警"},"msgtype":"markdown"}
2019/07/22 11:28:00 response result: {"errcode":0,"errmsg":"ok"}
2019/07/22 11:28:00 {true ok 0}
2019/07/22 11:28:08 message comming
2019/07/22 11:28:08 tos: [ding]:ed3c2afa7713aa4777f4f49c88137e6;4366a599e5d6c9776b8550f6c5  content: [P2][OK][master1][][falcon-agent挂了 all(#1) agent.alive  1==-1][O1 2019-07-22 11:28:00]
2019/07/22 11:28:08 message: {"markdown":{"text":"## 告警\n\n- 告警等级: P2\n- 告警类型: OK\n- 告警指标: agent.alive  1==-1\n- 告警主机: master1\n- 告警时间: 2019-07-22 11:28:00\n- 告警说明: falcon-agent挂了 all(#1),已持续1分钟\n","title":"告警"},"msgtype":"markdown"}
2019/07/22 11:28:08 response result: {"errcode":0,"errmsg":"ok"}
2019/07/22 11:28:08 {true ok 0}
2019/07/22 11:28:08 message: {"markdown":{"text":"## 告警\n\n- 告警等级: P2\n- 告警类型: OK\n- 告警指标: agent.alive  1==-1\n- 告警主机: master1\n- 告警时间: 2019-07-22 11:28:00\n- 告警说明: falcon-agent挂了 all(#1),已持续1分钟\n","title":"告警"},"msgtype":"markdown"}
2019/07/22 11:28:08 response result: {"errcode":0,"errmsg":"ok"}
2019/07/22 11:28:08 {true ok 0}
2019/07/22 11:58:00 message comming
2019/07/22 11:58:00 tos: [ding]:4366ab8550f6c5  content: [P2][PROBLEM][master1][][falcon-agent挂了 all(#1) agent.alive  -1==-1][O1 2019-07-22 11:56:00]
2019/07/22 11:58:00 message: {"markdown":{"text":"## 告警\n\n- 告警等级: P2\n- 告警类型: PROBLEM\n- 告警指标: agent.alive  -1==-1\n- 告警主机: master1\n- 告警时间: 2019-07-22 11:56:00\n- 告警说明: falcon-agent挂了 all(#1),已持续1分钟\n","title":"告警"},"msgtype":"markdown"}
2019/07/22 11:58:00 response result: {"errcode":0,"errmsg":"ok"}
2019/07/22 11:58:00 {true ok 0}
2019/07/22 12:01:00 message comming
2019/07/22 12:01:00 tos: [ding]:4366a593e096b8550f6c5  content: [P2][PROBLEM][master1][][falcon-agent挂了 all(#1) agent.alive  -1==-1][O2 2019-07-22 11:59:00]
2019/07/22 12:01:00 message: {"markdown":{"text":"## 告警\n\n- 告警等级: P2\n- 告警类型: PROBLEM\n- 告警指标: agent.alive  -1==-1\n- 告警主机: master1\n- 告警时间: 2019-07-22 11:59:00\n- 告警说明: falcon-agent挂了 all(#1),已持续2分钟\n","title":"告警"},"msgtype":"markdown"}
2019/07/22 12:01:00 response result: {"errcode":0,"errmsg":"ok"}
2019/07/22 12:01:00 {true ok 0}

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 22, 2019

发送钉钉的地方加了log,go 用法错误,没传递参数导致,已经修复。

上面信息中的 钉钉 token 抓紧换一下

@zerlee
Copy link
Author

zerlee commented Jul 22, 2019

发送钉钉的地方加了log,go 用法错误,没传递参数导致,已经修复。

上面信息中的 钉钉 token 抓紧换一下

好的

@zerlee
Copy link
Author

zerlee commented Jul 25, 2019

@sdvdxl 模板中的告警时间和告警说明,可能有点问题

我在15:39:00左右停止了4000端口,如下
2
我的告警策略如下
1
我在15:41分左右的时候收到了告警通知
3
4
5

所以我的需求是
1、将告警时间显示为告警触发时间,而不是告警通知发送时间

2、持续时间这里可能需要做一个if判断,如果是第一次则是13=3分钟,第二次则是13+5=8分钟,第三次则是1*3+5+5=13分钟

3、另外无论策略中note字段是否设置,模板中的{{ .desc }}渲染后会总会显示出策略中 if字段的值“all(#3)”,能否改为如果设置则显示,如果没有设置,则不显示

4D0985BF-CD5A-4f2e-80DE-A34D920B2AE4

4、由上上图可以看到,指标实际上包括了指标,标签,阈值三个内容,能否将其分开显示呢?即多添加两个变量,lable和阈值。

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 26, 2019

主要是不太好切个这个字符串,没有太明显的特征;如果都是 all(#xx) 这个格式,也可以按照这个切割

@zerlee
Copy link
Author

zerlee commented Jul 26, 2019

主要是不太好切个这个字符串,没有太明显的特征;如果都是 all(#xx) 这个格式,也可以按照这个切割

@sdvdxl 告警时间能改吗,这个对告警来说是很重要的一个东西。其他需求倒也不太急

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 26, 2019 via email

@zerlee
Copy link
Author

zerlee commented Jul 26, 2019

@sdvdxl
我在20:55:12分停止了agent,收到了下面的通知

11

12

13

存在的问题:
1、告警时间和通知时间乱了
2、告警时间和通知时间总是xx:xx:00这样的形式
3、通知时间和当前时间对不上

@zerlee
Copy link
Author

zerlee commented Jul 26, 2019

不过,如果告警是自动触发的(不是像我上面那样手动停止进程),貌似是正确的,除了时间秒位总是00
16
17
18

@sdvdxl
Copy link
Owner

sdvdxl commented Jul 29, 2019 via email

@sz-wuyanzu
Copy link

告警时间那行代码能给一下不,谢谢了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants