-
Notifications
You must be signed in to change notification settings - Fork 6
feat: adjustable rule_group interval #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This allows for more expensive SLO recording rules by letting users create a custom Prometheus `rule_group.interval`. https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule_group Now instead of all rule groups using the default global evaluation interval, a custom interval can be set on all sets of recording rules for an SLO. If no `interval` is set then the global default will be assumed, matching current Sloth behavior. Resolves: slok#367
Related to how the unit test coverage works, this will make sure RuleGroupInterval is not appended to RuleGroups unless it is defined and not empty. Not requring the `yaml` file does an okay job of this but expliclity writing the rules without RuleGroupInterval is safer. It's a bit ugly and repetitive but I think that's just how Go works..
This updates tests include `interval`'s and also includes a test to make sure the rules render correctly when interval is not included.
wbh1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good! Since Sloth produces 3 rules groups, I think it'd be nice to support specifying separate intervals for each of them, though. (sloth-slo-meta-recordings, sloth-slo-alerts, and sloth-slo-sli-recordings are the 3 prefixes). For example, you may want an interval of 2m on the SLI recording rules but still 1m on the alerts.
Instead of a singular global default, now a rule_group interval can be set for every individual type of rule_group Sloth generates. The generic, `interval:all` rule will also stay and can "fill in" any missing per-rule group defaults. Along with the default behavior of doing nothing if no `interval` is specified.
Specific ones and general/specific combined
eb73e69 to
dc3994b
Compare
wbh1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as it looks good as-is, but left some comments on form.
internal/prometheus/storage.go
Outdated
| // 0s is default empty string value for time.Duration | ||
| if slo.SLO.RuleGroupInterval.String() != "0s" || slo.SLO.SLIErrorRulesInterval.String() != "0s" { | ||
| var ruleGroupIntervalDuration prommodel.Duration | ||
| var err error | ||
|
|
||
| // if we have a valid meta rule rule_group interval, use that first and overwrite any generic ones | ||
| if slo.SLO.SLIErrorRulesInterval.String() != "0s" { | ||
| ruleGroupIntervalDuration, err = prommodel.ParseDuration(slo.SLO.SLIErrorRulesInterval.String()) | ||
| } else { | ||
| ruleGroupIntervalDuration, err = prommodel.ParseDuration(slo.SLO.RuleGroupInterval.String()) | ||
| } | ||
|
|
||
| if err != nil { | ||
| return fmt.Errorf("could not parse rule_group interval duration %w", err) | ||
| } | ||
|
|
||
| ruleGroups.Groups = append(ruleGroups.Groups, ruleGroupYAMLv2{ | ||
| Name: fmt.Sprintf("sloth-slo-sli-recordings-%s", slo.SLO.ID), | ||
| RuleGroupInterval: ruleGroupIntervalDuration, | ||
| Rules: slo.Rules.SLIErrorRecRules, | ||
| }) | ||
| } else { | ||
| ruleGroups.Groups = append(ruleGroups.Groups, ruleGroupYAMLv2{ | ||
| Name: fmt.Sprintf("sloth-slo-sli-recordings-%s", slo.SLO.ID), | ||
| Rules: slo.Rules.SLIErrorRecRules, | ||
| }) | ||
|
|
||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can simplify this with a switch statement like:
group := ruleGroupYAMLv2{
Name: fmt.Sprintf("sloth-slo-sli-recordings-%s", slo.SLO.ID),
Rules: slo.Rules.SLIErrorRecRules,
}
switch {
case AlertRulesInterval.String() != "0s":
ruleGroupIntervalDuration, err = time.ParseDuration(AlertRulesInterval.String())
if err != nil {
return fmt.Errorf("could not parse rule_group interval duration for alerts %w", err)
} else {
group.RuleGroupInterval = ruleGroupIntervalDuration
}
case RuleGroupInterval.String() != "0s":
ruleGroupIntervalDuration, err = time.ParseDuration(RuleGroupInterval.String())
if err != nil {
return fmt.Errorf("could not parse default ("all") rule_group interval duration %w", err)
} else {
group.RuleGroupInterval = ruleGroupIntervalDuration
}
}
ruleGroups.Groups = append(ruleGroups.Groups, group)Co-authored-by: Will Hegedus <will@wbhegedus.me>
…issues Dennisme/bump versions fix build issues
This allows for more expensive SLO recording rules by letting users create a custom Prometheus
rule_group.interval.https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule_group
Now instead of all rule groups using the default global evaluation interval, a custom interval can be set on all sets of recording rules for an SLO.
If no
intervalis set then the global default will be assumed, matching current Sloth behavior.Resolves: slok#367