-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerting: Fetch alerts from a remote Alertmanager #75844
Alerting: Fetch alerts from a remote Alertmanager #75844
Conversation
@@ -381,7 +381,7 @@ func (am *alertmanager) buildReceiverIntegrations(receiver *alertingNotify.APIRe | |||
} | |||
|
|||
// PutAlerts receives the alerts and then sends them through the corresponding route based on whenever the alert has a receiver embedded or not | |||
func (am *alertmanager) PutAlerts(postableAlerts apimodels.PostableAlerts) error { | |||
func (am *alertmanager) PutAlerts(_ context.Context, postableAlerts apimodels.PostableAlerts) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use the context.Context
in the internal Alertmanager, but these methods need to accept one to implement the Alertmanager
interface and be able to swap Alertmanager structs in the future.
@@ -103,12 +110,7 @@ func (am *externalAlertmanager) GetSilence(ctx context.Context, silenceID string | |||
return apimodels.GettableSilence{}, err | |||
} | |||
|
|||
if res != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to add this guard to almost every method, I think it's overkill. I'd say it's safe to assume that the go-swagger
package won't introduce a breaking change like returning a nil value when no error occurred.
@@ -36,7 +36,7 @@ type ScheduleService interface { | |||
// | |||
//go:generate mockery --name AlertsSender --structname AlertsSenderMock --inpackage --filename alerts_sender_mock.go --with-expecter | |||
type AlertsSender interface { | |||
Send(key ngmodels.AlertRuleKey, alerts definitions.PostableAlerts) | |||
Send(ctx context.Context, key ngmodels.AlertRuleKey, alerts definitions.PostableAlerts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only change here is adding a context.Context
to the interface and the .Send()
calls.
I also updated the mock for this interface.
@@ -59,7 +59,7 @@ func TestProcessTicks(t *testing.T) { | |||
mockedClock := clock.NewMock() | |||
|
|||
notifier := &AlertsSenderMock{} | |||
notifier.EXPECT().Send(mock.Anything, mock.Anything).Return() | |||
notifier.EXPECT().Send(mock.Anything, mock.Anything, mock.Anything).Return() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this file are:
- Adding a first argument corresponding to the
context.Context
- Checking that the third (not second) argument is of type
definitions.PostableAlerts
…ernandezc/forward_alerts_to_remote_alertmanager
…ttps://github.com/grafana/grafana into santihernandezc/forward_alerts_to_remote_alertmanager
…ernandezc/forward_alerts_to_remote_alertmanager
…https://github.com/grafana/grafana into santihernandezc/forward_alerts_to_remote_alertmanager
…https://github.com/grafana/grafana into santihernandezc/forward_alerts_to_remote_alertmanager
…ernandezc/forward_alerts_to_remote_alertmanager
0872944
to
5c85df3
Compare
be72f88
to
0f3224b
Compare
…ernandezc/forward_alerts_to_remote_alertmanager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but please take a look at my comments.
if !ok { | ||
t.Skip("No Alertmanager URL provided") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially dangerous - if someone makes a mistake and renames the CI variable we wouldn't be able to tell if this test is running or not. I'd rather abort the test and tell the user the devev
command they to run to make sure the test runs correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same thing and started by not doing this but the CI failed in normal integration tests.
I then checked how we handle tests for e.g. Memcached, Redis storage, Redis cache, Postgres and MySQL which are the other integration tests that use docker containers. All of them use this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I don't love this approach but if there's so much precedence I take it it might not be a simple problem. Can you file an issue and attach it in a separate task list to the epic so that can investigate later? There are ways to simply skip these tests if we don't have the right setup in the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created: #76798
alerts := make(alertingNotify.PostableAlerts, 0, len(postableAlerts.PostableAlerts)) | ||
for _, pa := range postableAlerts.PostableAlerts { | ||
alerts = append(alerts, &alertingNotify.PostableAlert{ | ||
Annotations: pa.Annotations, | ||
EndsAt: pa.EndsAt, | ||
StartsAt: pa.StartsAt, | ||
Alert: pa.Alert, | ||
}) | ||
} | ||
|
||
params := amalert.NewPostAlertsParamsWithContext(ctx).WithAlerts(alerts) | ||
_, err := am.amClient.Alert.PostAlerts(params) | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the implementation that we want here, I would prefer if we leave it out of this PR and just let this PR handle reads.
I'll leave you with the details so that you can file an issue and add it as a follow-up.
If you look closely at how prometheus handles sending of alerts to the alertmanager, you'll realise it's a combination of two things:
This is important because the notifier uses an internal queue to make sure alerts are delivered in batches when possible.
The good news is that we already have this implementation in Grafana, the bad news is that we'll need to adapt it so that it works with or in the externalAlertmanager
. Please note that the way that "remote URLs" are configured in the notifier is not straightforward and require some level of finesse to implement as they rely on service discovery, to top that off, the notifier is used in other parts of Grafana (hence the public API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created #76692
} | ||
|
||
func (am *externalAlertmanager) GetAlertGroups(active, silenced, inhibited bool, filter []string, receiver string) (apimodels.AlertGroups, error) { | ||
return apimodels.AlertGroups{}, nil | ||
func (am *externalAlertmanager) PutAlerts(ctx context.Context, postableAlerts apimodels.PostableAlerts) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a recover
here as PostAlerts
can panic.
@@ -121,30 +123,62 @@ func (am *externalAlertmanager) ListSilences(ctx context.Context, filter []strin | |||
return res.Payload, nil | |||
} | |||
|
|||
func (am *externalAlertmanager) GetStatus() apimodels.GettableStatus { | |||
return apimodels.GettableStatus{} | |||
func (am *externalAlertmanager) GetAlerts(ctx context.Context, active, silenced, inhibited bool, filter []string, receiver string) (apimodels.GettableAlerts, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a recover here as GetAlerts
can panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where exactly can it panic? The res.Payload
bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetAlerts(params)
panics
} | ||
|
||
func (am *externalAlertmanager) GetAlerts(active, silenced, inhibited bool, filter []string, receiver string) (apimodels.GettableAlerts, error) { | ||
return apimodels.GettableAlerts{}, nil | ||
func (am *externalAlertmanager) GetAlertGroups(ctx context.Context, active, silenced, inhibited bool, filter []string, receiver string) (apimodels.AlertGroups, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a recover here as GetAlertGroups
can panic.
@grobinson-grafana can you elaborate under which conditions one would expect the panic? From reading the client code it seems to be there as a safeguard for code generation issues on upstream - with the amount of coverage we have on this client (all of our integrations test use it) I'm not sure it's worth it the recover. But perhaps this is customary of using openapi clients? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - but I'm dubious on wether we need the whole recover as part of the client calls.
If you have 100% confidence in the codegen I'd be very happy to remove it. I figured anywhere there is an explicit panic, codegen or otherwise, better to be defensive and recover then risk one bug bringing down the entire process. |
This PR enables the proxy Alertmanager struct to retrieve alerts from the remote Alertmanager. The methods for enabling this feature are part of the
Alertmanager
interface:GetAlerts()
GetAlertGroups()
As we need to communicate to an Alertmanager via HTTP requests, these two methods now take a
context.Context
as the first argument.The
PutAlerts()
method needs to be implemented in the way described in #76692