You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is possible for the way we are manipulating the config files/ config maps to arrive at an invalid state for an AlertManager config, causing an infinite hang / crashback loop.
The reason was caused by an invalid STMP smarthost missing with a valid email receiver setup,
So the fix for this should be in two parts:
Dynamically detect email configs, and deploy an opni STMP smarthout if they exist/ destroy the STMP smarthoust if they go from existing -> not existing, in order to keep deployments even more lean.
[More importantly] include functionality similar to amtool check-config binary when applying patches to the underlying config file/ config map.
The text was updated successfully, but these errors were encountered:
As a follow up to this issue, implemented a "pre" reconciler loop which analyzes a series of errors received from AlertManager's LoadConfig function, which prevents all kinds of errors.
The crashback restart errors aren't actually that much of a concern in production, since the opni alerting operator handles those when the rollout restart fails & reverts it, but...
In some cases AM will start without exiting and run normally, but will prune nodes in the routing tree OR delete receiver configurations under some conditions (like missing defaults), which will "softlock" users from certain configurations once AlertManager reaches that state (fails silently).
It is possible for the way we are manipulating the config files/ config maps to arrive at an invalid state for an AlertManager config, causing an infinite hang / crashback loop.
The reason was caused by an invalid STMP smarthost missing with a valid email receiver setup,
So the fix for this should be in two parts:
Dynamically detect email configs, and deploy an opni STMP smarthout if they exist/ destroy the STMP smarthoust if they go from existing -> not existing, in order to keep deployments even more lean.
[More importantly] include functionality similar to
amtool check-config
binary when applying patches to the underlying config file/ config map.The text was updated successfully, but these errors were encountered: