-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerting: Stop persisting user-defined templates to disk #83456
Conversation
} | ||
|
||
if err := json.Unmarshal(status, config); err != nil { | ||
am.logger.Error("Unable to unmarshall alertmanager config", "Err", err) | ||
} | ||
|
||
return *apimodels.NewGettableStatus(config) | ||
return *apimodels.NewGettableStatus(&config.AlertmanagerConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem to be related to the PR. Is this the fix that makes it parse the correct format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is related, but not obviously so. I'll try to make it more obvious in a future PR though. Currently, GrafanaAlertmanager.GetStatus()
just returns the raw config you initially provided to it. It was PostableApiAlertingConfig
but now it's a PostableUserConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I am not 100% sure about the ignoring of "Templates" field, though. In Mimir and GMA it's certainly redundant but that's the AM's contract.
This should be a no-op from my understanding, since we didn't read paths from the |
AFAIU, the vanilla Alertmanager uses Templates to control what files to read from the templates directory. So the field Template can potentially contain a subset of TemplateFiles. In Mimir and GMA they are always equal because it's hardcoded to be so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@@ -63,11 +63,6 @@ func (t *TemplateService) SetTemplate(ctx context.Context, orgID int64, tmpl def | |||
revision.cfg.TemplateFiles = map[string]string{} | |||
} | |||
revision.cfg.TemplateFiles[tmpl.Name] = tmpl.Template | |||
tmpls := make([]string, 0, len(revision.cfg.TemplateFiles)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU the config still expects the template file "names" to be listed in the config under Templates
, even if the templates themselves are passed in memory, or it won't be able to load them. Is something else expanding this back out? (or, did this change in alertmanager?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU the config still expects the template file "names" to be listed in the config under Templates, even if the templates themselves are passed in memory, or it won't be able to load them.
My understanding of the existing code is that we always overwrite the Templates
field with the information from TemplateFiles
:
PersistTemplates
uses only TemplateFiles
when parsing the config: https://github.com/grafana/grafana/pull/83456/files#diff-786d520b4c7dd72f03ab0422ad82bc8f1c23c8874b60159b77431588f415a151L17
applyConfig
then overwrites the Templates
field with the results of PersistTemplates
: https://github.com/grafana/grafana/pull/83456/files#diff-4aa6a5bc542bc278b3cddc81fe825af177f1f354c9dbb08d0f01d86b7dbe414eL345
Is something else expanding this back out? (or, did this change in alertmanager?)
This PR and the grafana/alerting
sister PR effectively make this temporary saving to disk and transfer to the Templates
field unnecessary. I might be misunderstanding this part of your question though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I misunderstood you entirely. If you mean upstream prometheus alertmanager needing the Templates
file names then I don't think it does, or at least this PR appears to be working as intended. When upstream is ran through main then it relies on the configCoordinator
to subscribe to config changes and would likely require the Templates
field, but we don't use main and instead piece together an alertmanager part-by-part in grafana/alerting
GrafanaAlertmanager.ApplyConfig and manually call Run()
on the Dispatcher
and Inhibitor
.
am.logger.Debug("Neither config nor template have changed, skipping configuration sync.") | ||
// If configuration hasn't changed, we've got nothing to do. | ||
configHash := md5.Sum(rawConfig) | ||
if am.Base.ConfigHash() == configHash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does changing the level at which the hash operates break remote alertmanagers? (@santihernandezc may know more than me here). Will the remote alertmanager still return the hash of the inner, AM config and not this new outer object (PostableUserConfig), it'll not match up and reload needlessly - is this possible?
Do all implementations return the hash at the right level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash is computed in Grafana and sent to the remote Alertmanager, when we compare configurations we do it using that same hash we sent.
If all config hashes are derived from the same struct, then nothing should break.
Plus, the integration tests are passing 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way it's currently implemented, this should not affect the remote alertmanagers as this code is internal to the local alertmanager and does not affect what we store in the database.
However, if this somehow changes then we'd likely want all the hashes to be based on PostableUserConfig
anyways since it contains TemplateFiles
which is how we'll be passing templates to the remote alertmanager, correct?
I'll let @santihernandezc confirm though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not affect the remote alertmanagers as this code is internal to the local alertmanager and does not affect what we store in the database.
Agree, we don't use the internal Alertmanager's hash for the remote/forked AM, we use what we receive via the MOA.
if this somehow changes then we'd likely want all the hashes to be based on PostableUserConfig anyways since it contains TemplateFiles which is how we'll be passing templates to the remote alertmanager, correct?
Yep, AFAIU that's the way we currently do it. The hash we send to the remote Alertmanager is the one we get from the database in ApplyConfig
. This hash is derived from cmd.AlertmanagerConfiguration
(permalink), which is a marshaled PostableUserConfig
(permalink).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously we relied on templates files on disk to check for template changes. Now that we no longer have those, we must rely on the configuration itself. Since PostableApiAlertingConfig does not include template strings, we use PostableUserConfig instead.
a9b111e
to
dc31023
Compare
What is this feature?
Updates Grafana Alertmanager to work with new interface from grafana/alerting#161. This change stops passing user-defined templates to the Grafana Alertmanager by persisting them to disk and instead passes them by string.
Why do we need this feature?
This PR is a general improvement and simplification, but there are also some specific reasons why we might want to do this:
FileStore
.DefaultTemplateString
is now treated like other defaults and therefore consistently applied first. User-defined templates can now rely on being able to always override defaults.Who is this feature for?
Developers mostly, though alert users wanting to consistently override default templates will see improvement here.
Which issue(s) does this PR fix?:
Fixes #76476
Special notes for your reviewer:
Config.Templates
entirely. We should no longer be using it anywhere in local Grafana AM but it's still possible external AMs could use the filenames somehow. We should make a decision one way or the other eventually, but for now keeping it is the safe bet.PostableApiAlertingConfig
with the hash ofPostableUserConfig
, which includes the template contents. This change also necessitates we modifyGetStatus
to expectPostableUserConfig
as the config.