mixin: remote-write related alert severity should take HA setup into account #7176

beorn7 · 2020-04-27T15:03:42Z

Currently, the PrometheusRemoteStorageFailures and PrometheusRemoteWriteBehind alerts are critical. However, especially with remote-write setups, many users will run HA pairs (or groups) of Prometheus servers, and the remote-write receiver will have some way of dedup'ing the incoming samples. If that's the case, just one Prometheus replica having trouble with remote-write should just be a warning. The alert should be critical only if all members of the HA group have trouble.

The text was updated successfully, but these errors were encountered:

beorn7 · 2020-11-02T14:39:37Z

However, thinking about it, the current way how Cortex handles HA pairs will actually not switch the replica if one falls behind…

krajorama · 2024-04-23T11:48:58Z

Hello from the bug scrub, is there progress on this issue @beorn7 ? Otherwise we'll close it next time around.

ArthurSens · 2024-04-23T11:53:19Z

I think alert severity is highly debatable, not only here but in several parts of our mixins. Some might say that if one replica is completely down, the HA setup is compromised and someone should be paged as a precaution. Others might say that data is still being ingested and it's safe to keep it like this for some time, no need to page.

What I wanted to highlight here is that alert severity is highly opinionated, and hard to find a one-fits-all solution 😬

beorn7 · 2024-04-30T16:37:18Z

I noticed no complaint on the current state in the last 3.5y. So let's close for now. If anyone feels the need to revisit, they can follow-up here or open an new issue and we'll take it from there.

beorn7 added kind/enhancement component/documentation labels Apr 27, 2020

beorn7 self-assigned this Apr 27, 2020

cstyan mentioned this issue May 6, 2020

Remote Write Meta Issue #6333

Open

8 tasks

roidelapluie added the priority/P3 label Jun 22, 2020

beorn7 closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixin: remote-write related alert severity should take HA setup into account #7176

mixin: remote-write related alert severity should take HA setup into account #7176

beorn7 commented Apr 27, 2020

beorn7 commented Nov 2, 2020

krajorama commented Apr 23, 2024

ArthurSens commented Apr 23, 2024

beorn7 commented Apr 30, 2024

mixin: remote-write related alert severity should take HA setup into account #7176

mixin: remote-write related alert severity should take HA setup into account #7176

Comments

beorn7 commented Apr 27, 2020

beorn7 commented Nov 2, 2020

krajorama commented Apr 23, 2024

ArthurSens commented Apr 23, 2024

beorn7 commented Apr 30, 2024