Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest version uses 100% CPU #43

Closed
christophkluenter1234 opened this issue Apr 4, 2019 · 9 comments
Closed

latest version uses 100% CPU #43

christophkluenter1234 opened this issue Apr 4, 2019 · 9 comments

Comments

@christophkluenter1234
Copy link

When running the latest version in kubernetes, we saw it using up 100% CPU.
The output was thousands of these lines:
Failed to parse json with key 'sections': Key path not found"
We don't overwrite the default template. So my first guess would be that
/default-message-card.tmpl might not be correct?

I will have another try on monday and report back.

@Knappek
Copy link
Collaborator

Knappek commented Apr 4, 2019

Thank you @christophkluenter1234 for reporting this. Unfortunately I wasn't able to reproduce this issue. Can you provide some information how you've deployed prometheus-msteams? Did you use the helm chart?

This error message indicates that the message card template does not have the key "sections" which is required for O365 Connector cards, and this exists in default-message-card.tmpl.

@psiservices-dstaples
Copy link

I've just deployed off the helm chart 1.1.0 helm chart and I'm getting the same behavior as well. The only thing I changed from the chart values was the connector endpoint. At first glance it does look like the sections key is there.

"Created a card for Microsoft Teams /alertmanager"
time="2019-04-04T21:39:00Z" level=debug msg="[{\"@type\":\"MessageCard\",\"@context\":\"http://schema.org/extensions\",\"themeColor\":\"808080\",\"summary\":\"\",\"title\":\"Prometheus Alert (firing)\",\"sections\":[{\"activityTitle\":\"[](http://example.com)\",\"facts\":[{\"name\":\"message\",\"value\":\"KubeControllerManager has disappeared from Prometheus target discovery.\"},{\"name\":\"runbook\\\\_url\",\"value\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown\"},{\"name\":\"alertname\",\"value\":\"KubeControllerManagerDown\"},{\"name\":\"prometheus\",\"value\":\"default/soft-panda-prometheus-oper-prometheus\"},{\"name\":\"severity\",\"value\":\"critical\"}],\"markdown\":true},{\"activityTitle\":\"[](http://example.com)\",\"facts\":[{\"name\":\"message\",\"value\":\"KubeScheduler has disappeared from Prometheus target discovery.\"},{\"name\":\"runbook\\\\_url\",\"value\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown\"},{\"name\":\"alertname\",\"value\":\"KubeSchedulerDown\"},{\"name\":\"prometheus\",\"value\":\"default/soft-panda-prometheus-oper-prometheus\"},{\"name\":\"severity\",\"value\":\"critical\"}],\"markdown\":true},{\"activityTitle\":\"[](http://example.com)\",\"facts\":[{\"name\":\"message\",\"value\":\"There are 2 different semantic versions of Kubernetes components running.\"},{\"name\":\"runbook\\\\_url\",\"value\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch\"},{\"name\":\"alertname\",\"value\":\"KubeVersionMismatch\"},{\"name\":\"prometheus\",\"value\":\"default/soft-panda-prometheus-oper-prometheus\"},{\"name\":\"severity\",\"value\":\"warning\"}],\"markdown\":true},{\"activityTitle\":\"[](http://example.com)\",\"facts\":[{\"name\":\"message\",\"value\":\"97% throttling of CPU in namespace default for container prometheus-msteams in pod prometheus-msteams-6c947b454d-c75hq.\"},{\"name\":\"runbook\\\\_url\",\"value\":\"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh\"},{\"name\":\"alertname\",\"value\":\"CPUThrottlingHigh\"},{\"name\":\"container\\\\_name\",\"value\":\"prometheus-msteams\"},{\"name\":\"namespace\",\"value\":\"default\"},{\"name\":\"pod\\\\_name\",\"value\":\"prometheus-msteams-6c947b454d-c75hq\"},{\"name\":\"prometheus\",\"value\":\"default/soft-panda-prometheus-oper-prometheus\"},{\"name\":\"severity\",\"value\":\"warning\"}],\"markdown\":true}]}]"
time="2019-04-04T21:39:00Z" level=error msg="Failed to parse json with key 'sections': Key path not found"

@Knappek
Copy link
Collaborator

Knappek commented Apr 5, 2019

Interesting. I'll have a look soon.
Additionally, I will exit the for loop in such a case to avoid getting 100% CPU workload.

@kolikons
Copy link

kolikons commented Apr 5, 2019

Hi,
I've got the same issue
poc_alertmanager_msteams | time="2019-04-05T08:11:42Z" level=error msg="Failed to parse json with key 'sections': Key path not found"
template:
{{ define "teams.card" }} { "@type": "MessageCard", "@context": "http://schema.org/extensions", "themeColor": "{{- if eq .Status "resolved" -}}2DC72D {{- else if eq .Status "firing" -}} {{- if eq .CommonLabels.severity "critical" -}}8C1A1A {{- else if eq .CommonLabels.severity "warning" -}}FFA500 {{- else -}}808080{{- end -}} {{- else -}}808080{{- end -}}", "text": "{{ .CommonAnnotations.summary }}", "title": "rePrometheus Alert ({{ .Status }})", "sections": [ {{$externalUrl := .ExternalURL}} {{- range $index, $alert := .Alerts }}{{- if $index }},{{- end }} { "activityTitle": "[{{ $alert.Annotations.description }}]({{ $externalUrl }})", "facts": [ {{- range $key, $value := $alert.Annotations }} { "name": "{{ reReplaceAll "_" "\\\\_" $key }}", "value": "{{ reReplaceAll "_" "\\\\_" $value }}" }, {{- end -}} {{$c := counter}}{{ range $key, $value := $alert.Labels }}{{if call $c}},{{ end }} { "name": "{{ reReplaceAll "_" "\\\\_" $key }}", "value": "{{ reReplaceAll "_" "\\\\_" $value }}" } {{- end }} ], "markdown": true } {{- end }} ] } {{ end }}
verison prometheus: 2.8.1
version alertmanager: 0.16.1
I think, it's not problem with template, maybe alertmnager sends a new parametter and your server.go doesn't expect it

@Knappek
Copy link
Collaborator

Knappek commented Apr 5, 2019

Thanks for all the feedback. I will take a look into it latest on sunday.

@Knappek
Copy link
Collaborator

Knappek commented Apr 8, 2019

@kolikons, @psiservices-dstaples or @christophkluenter1234 can you please set log-level to DEBUG and provide more lines of the log output? I'd like to see the incoming prometheus alert.

@kolikons
Copy link

kolikons commented Apr 9, 2019

@kolikons, @psiservices-dstaples or @christophkluenter1234 can you please set log-level to DEBUG and provide more lines of the log output? I'd like to see the incoming prometheus alert.

Hi @Knappek
hm, very strange, right now i don't have like that issue, maybe some lables i've changed and it's fixed it
in any case the logs is the following:
poc_alertmanager_msteams | time="2019-04-09T06:53:29Z" level=info msg="Version: latest, Commit: 0578b32, Branch: HEAD, Build Date: 2019-03-26T21:29:58+0000" poc_alertmanager_msteams | time="2019-04-09T06:53:29Z" level=info msg="Parsing the message card template file: /etc/msteams/card.tmpl" poc_alertmanager_msteams | time="2019-04-09T06:53:29Z" level=info msg="Creating the server request path \"/alertmanager\" with webhook \"https://outlook.office.com/webhook/aaaa-bbbbb\"" poc_alertmanager_msteams | time="2019-04-09T06:53:29Z" level=info msg="prometheus-msteams server started listening at 0.0.0.0:2000" poc_alertmanager_msteams | time="2019-04-09T06:55:02Z" level=info msg="/alertmanager received a request" poc_alertmanager_msteams | time="2019-04-09T06:55:02Z" level=debug msg="{\"receiver\":\"prometheus-msteams\",\"status\":\"resolved\",\"alerts\":[{\"status\":\"resolved\",\"labels\":{\"alertname\":\"read_only_fs\",\"device\":\"/dev/sda1\",\"environment\":\"test\",\"fstype\":\"xfs\",\"hostname\":\"test.domain.com\",\"instance\":\"IP:9100\",\"job\":\"node_exporter\",\"metrics_path\":\"/metrics\",\"mountpoint\":\"/boot\",\"scheme\":\"http\",\"severity\":\"critical\"},\"annotations\":{\"description\":\"/dev/sda1 with mount point: /boot is readonly. Reported by instance IP:9100 of job node_exporter.\",\"summary\":\"Host test.domain.com has readonly file system\"},\"startsAt\":\"2019-04-08T13:43:16.431352365Z\",\"endsAt\":\"2019-04-09T06:55:01.431352365Z\",\"generatorURL\":\"http://00c1f104dd51:9090/graph?g0.expr=node_filesystem_readonly+%3D%3D+1\\u0026g0.tab=1\"}],\"groupLabels\":{\"alertname\":\"read_only_fs\",\"hostname\":\"test.domain.com\"},\"commonLabels\":{\"alertname\":\"read_only_fs\",\"device\":\"/dev/sda1\",\"environment\":\"test\",\"fstype\":\"xfs\",\"hostname\":\"test.domain.com\",\"instance\":\"IP:9100\",\"job\":\"node_exporter\",\"metrics_path\":\"/metrics\",\"mountpoint\":\"/boot\",\"scheme\":\"http\",\"severity\":\"critical\"},\"commonAnnotations\":{\"description\":\"/dev/sda1 with mount point: /boot is readonly. Reported by instance IP:9100 of job node_exporter.\",\"summary\":\"Host test.domain.com has readonly file system\"},\"externalURL\":\"http://6e4cfdbca2a7:9093\",\"version\":\"4\",\"groupKey\":\"{}:{alertname=\\\"read_only_fs\\\", hostname=\\\"test.domain.com\\\"}\"}" poc_alertmanager_msteams | time="2019-04-09T06:55:02Z" level=debug msg="Size of message is 1057 Bytes (~1 KB)" poc_alertmanager_msteams | time="2019-04-09T06:55:02Z" level=info msg="Created a card for Microsoft Teams /alertmanager" poc_alertmanager_msteams | time="2019-04-09T06:55:02Z" level=debug msg="[{\"@type\":\"MessageCard\",\"@context\":\"http://schema.org/extensions\",\"themeColor\":\"2DC72D\",\"text\":\"Host test.domain.com has readonly file system\",\"title\":\"Prometheus Alert (resolved)\",\"sections\":[{\"activityTitle\":\"[/dev/sda1 with mount point: /boot is readonly. Reported by instance IP:9100 of job node_exporter.](http://6e4cfdbca2a7:9093)\",\"facts\":[{\"name\":\"description\",\"value\":\"/dev/sda1 with mount point: /boot is readonly. Reported by instance IP:9100 of job node\\\\_exporter.\"},{\"name\":\"summary\",\"value\":\"Host test.domain.com has readonly file system\"},{\"name\":\"alertname\",\"value\":\"read\\\\_only\\\\_fs\"},{\"name\":\"device\",\"value\":\"/dev/sda1\"},{\"name\":\"environment\",\"value\":\"test\"},{\"name\":\"fstype\",\"value\":\"xfs\"},{\"name\":\"hostname\",\"value\":\"test.domain.com\"},{\"name\":\"instance\",\"value\":\"IP:9100\"},{\"name\":\"job\",\"value\":\"node\\\\_exporter\"},{\"name\":\"metrics\\\\_path\",\"value\":\"/metrics\"},{\"name\":\"mountpoint\",\"value\":\"/boot\"},{\"name\":\"scheme\",\"value\":\"http\"},{\"name\":\"severity\",\"value\":\"critical\"}],\"markdown\":true}]}]" poc_alertmanager_msteams | time="2019-04-09T06:55:03Z" level=info msg="Microsoft Teams response text: 1" poc_alertmanager_msteams | time="2019-04-09T06:55:03Z" level=info msg="A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK"

@Knappek
Copy link
Collaborator

Knappek commented Apr 9, 2019

I merged the PR that avoids having thousands of lines of Failed to parse json with key 'sections': Key path not found in order to avoid having 100% CPU used.
I'll create an additional PR solving the root cause of this issue when I have been able to reproduce it.

@christophkluenter1234
Copy link
Author

the latest version works as expected. No abnormal CPU usage anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants