Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upBad data error posting alerts to alertmanager #3543
Comments
This comment has been minimized.
This comment has been minimized.
|
probably the same thing as #1871 |
This comment has been minimized.
This comment has been minimized.
|
I didn’t change any config during the migration, may be a change in how the alerts are bookended which gives similar behaviour I guess? It’s something that’s changed between 2.0.0 and now because the errors go away when I revert. |
brian-brazil
added
component/notify
kind/bug
labels
Dec 8, 2017
This comment has been minimized.
This comment has been minimized.
|
Can you check if the time is correct on both machines? |
This comment has been minimized.
This comment has been minimized.
|
Yes it is - both synced to the same NTP and from a manual sanity check both came up with the same time when I ran the command simultaneously from within the containers. |
This comment has been minimized.
This comment has been minimized.
yylt
commented
Dec 13, 2017
|
startsAt =alert.ActiveAt.Add(rule.holdDuration) |
This comment has been minimized.
This comment has been minimized.
|
I can reproduce the bug on master (and 2.1) while it doesn't occur on 2.0. Configuration and steps to reproducePrometheus configuration
Rules
AlertManager configuration
Steps
The receiver never gets the resolved notification as Prometheus sends invalid payload to AlertManager. From the AlertManager logs:
IIUC the problem is located here Lines 266 to 270 in 09e460a Basically whenever an alert transitions from firing to inactive, prometheus/cmd/prometheus/main.go Lines 663 to 668 in 09e460a And the rest is explained in prometheus/alertmanager#1191 (cc @Conorbro). |
simonpasquier
referenced this issue
Jan 22, 2018
Merged
Don't reset FiredAt for inactive alerts #3724
This comment has been minimized.
This comment has been minimized.
solsson
commented
Jan 23, 2018
|
This is a serious regression with 2.1, requiring pod restarts. Please prioritize, as downgrade is unsupported due to persistence. |
This comment has been minimized.
This comment has been minimized.
bittopaz
commented
Jan 24, 2018
|
We are experiencing the same problem after upgrade to v2.1.0, please prioritize |
dghubble
referenced this issue
Jan 28, 2018
Merged
Update to Prometheus v2.1.0 and include Grafana dashboards #113
This comment has been minimized.
This comment has been minimized.
jbiel
commented
Jan 29, 2018
•
|
I'm just hopping into the Prometheus ecosystem with the latest stable versions and experienced this bug during the first alerting test that I performed. I've checked that time is NTP synced on all nodes.
Prometheus: 2.1.0 |
This comment has been minimized.
This comment has been minimized.
hectorag
commented
Jan 30, 2018
|
I'm Also having same behavior after update to version 2.1.0
|
This comment has been minimized.
This comment has been minimized.
jsuchenia
commented
Jan 30, 2018
|
@hectorag in this case I suggest you to stay with 2.0.0 and upgrade alertmanager to recent versions |
This comment has been minimized.
This comment has been minimized.
bastischubert
commented
Jan 31, 2018
|
We (@bastischubert + @RichiH ) are experiencing the same problem after upgrade to v2.1.0 - running AM 0.13.0 am side: prom side: later i managed to capture some api calls (like that one):
the json "startsAt":"0001-01-01T00:00:00Z" looks f***ouled up to me - the other instance still at prom 1.8.2 sends proper startsAt times. both running on the same machine, so timesync should be fine ;) |
This comment has been minimized.
This comment has been minimized.
|
@bastischubert the bug was introduced by the commit 6246137#diff-b4ad1f52631fd3bd0de49dd2ed5f0d01R269. There's #3724 which removes the offending line but it isn't yet merged. |
This comment has been minimized.
This comment has been minimized.
mst-ableton
commented
Feb 1, 2018
•
|
We hit this on |
Conorbro
closed this
in
#3724
Feb 1, 2018
This comment has been minimized.
This comment has been minimized.
jsuchenia
commented
Feb 4, 2018
This comment has been minimized.
This comment has been minimized.
RRAlex
commented
Feb 9, 2018
|
I don't see the log line anymore by running the |
vovkanaz
referenced this issue
Feb 14, 2018
Closed
The same issue in your technology stack between prometheus and alertmanager #59
krasi-georgiev
referenced this issue
Feb 27, 2018
Closed
prometheus 2.1.0 send alert return 400 #3894
stefanprodan
referenced this issue
Mar 10, 2018
Closed
[WIP] Bump alertmanager / Prometheus to RC #158
This comment has been minimized.
This comment has been minimized.
kamusin
commented
Mar 13, 2018
|
cheers! I just came across this issue on our 2.1 cluster. |
This comment has been minimized.
This comment has been minimized.
|
The fix is available with v2.2.0 but since you're running 1.8.1, you shouldn't be affected. Please ask on the Prometheus mailing list if you need more help. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
alxmk commentedDec 4, 2017
•
edited
What did you do?
Ran prom/prometheus:master with prom/alertmanager:v0.11.0. I'm using master instead of v2.0.0 because I need to pick up the fixes in prometheus/tsdb#213 and #3508.
What did you expect to see?
Alerts posted to alertmanager from prometheus work as expected
What did you see instead? Under which circumstances?
In prometheus logs:
In alertmanager logs:
Environment
Kubernetes 1.8.2
Prom:
Alertmanager:
Alertmanager has been up for 3 days prior to this issue (which occurred when I updated the prometheus container) so cut the rest of the logs out for clarity. I can provide them if needed.