Rulegroups #2842
Rulegroups #2842
Conversation
Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
@@ -20,8 +20,12 @@ import ( | |||
"path/filepath" | |||
"strings" | |||
|
|||
yaml "github.com/ghodss/yaml" |
brian-brazil
Jun 14, 2017
Contributor
Why are you using a non-standard yaml library?
Why are you using a non-standard yaml library?
brancz
Jun 14, 2017
Member
I believe the goal was to switch over to this library, as the semantics are much closer to the standard lib encoding/json
package, where the Marshal/Unmarshal
interface is a lot saner.
For the better or worse, the fact that it is already vendored is thanks to kubernetes/client-go.
I believe the goal was to switch over to this library, as the semantics are much closer to the standard lib encoding/json
package, where the Marshal/Unmarshal
interface is a lot saner.
For the better or worse, the fact that it is already vendored is thanks to kubernetes/client-go.
brian-brazil
Jun 14, 2017
Contributor
That's still under discussion, I don't believe we got to consensus there.
That's still under discussion, I don't believe we got to consensus there.
fabxc
Jun 14, 2017
Contributor
It was an experiment of mine in the first commit. Turns out, error messages for YAML parsing errors are a lot clearer. Validation and custom marshalling becomes tons better.
I haven't seen any drawbacks yet.
It was an experiment of mine in the first commit. Turns out, error messages for YAML parsing errors are a lot clearer. Validation and custom marshalling becomes tons better.
I haven't seen any drawbacks yet.
fabxc
Jun 14, 2017
Contributor
It's just a small wrapper btw.
It's just a small wrapper btw.
fabxc
Jun 14, 2017
Contributor
It's a wrapper. It makes error reporting better, it allows us to do marshalling saner. We should probably migrate other places to it. Look at our config packages. That's where the insanity lies.
It's a wrapper. It makes error reporting better, it allows us to do marshalling saner. We should probably migrate other places to it. Look at our config packages. That's where the insanity lies.
brian-brazil
Jun 14, 2017
•
Contributor
If we're going to migrate we should complete the discussion elsewhere and then migrate. Introducing things only in new code has a tendency to end up with a split world for rather a long time.
If we're going to migrate we should complete the discussion elsewhere and then migrate. Introducing things only in new code has a tendency to end up with a split world for rather a long time.
fabxc
Jun 15, 2017
Contributor
It's vendored already, it's entirely local to the package and gives us an immediate benefit. Any config/ refactoring is far away. I see no reason to not do this.
It's vendored already, it's entirely local to the package and gives us an immediate benefit. Any config/ refactoring is far away. I see no reason to not do this.
brian-brazil
Jun 15, 2017
Contributor
Changing the rest of our code to also use this new library being something that is far away is a strong reason not to do this.
We do not have consensus on this, and using multiple different libraries within our organisation that do the same thing is to be avoided.
Changing the rest of our code to also use this new library being something that is far away is a strong reason not to do this.
We do not have consensus on this, and using multiple different libraries within our organisation that do the same thing is to be avoided.
gouthamve
Jun 16, 2017
Author
Member
Moved back to go-yaml
. PTAL.
Moved back to go-yaml
. PTAL.
@@ -0,0 +1,59 @@ | |||
version: 1 |
brian-brazil
Jun 14, 2017
Contributor
Do we really need a version number at this stage? It's noise.
Do we really need a version number at this stage? It's noise.
- alert: HighErrors | ||
expr: | | ||
sum without(instance) (rate(errors_total[5m])) | ||
/ sum without(instance) (rate(requests_total[5m])) |
brian-brazil
Jun 14, 2017
Contributor
nit:
sum without(instance) (rate(errors_total[5m]))
/
sum without(instance) (rate(requests_total[5m]))
nit:
sum without(instance) (rate(errors_total[5m]))
/
sum without(instance) (rate(requests_total[5m]))
fabxc
Jun 14, 2017
Contributor
I think YAML doesn't get it if the indentation on the first line is offset by this =/ one of the warts.
I think YAML doesn't get it if the indentation on the first line is offset by this =/ one of the warts.
grobie
Jun 14, 2017
Member
Ugh, is this confirmed? Any workaround? This would be a serious drawback in readability.
Ugh, is this confirmed? Any workaround? This would be a serious drawback in readability.
gouthamve
Jun 14, 2017
Author
Member
I did update the file now to reflect something better. Everything has to be in the same indentation level.
I did update the file now to reflect something better. Everything has to be in the same indentation level.
juliusv
Jun 15, 2017
Member
That kind of constraint goes straight to my worries about nesting one kind of language in another one, especially if the outer one is one that cares about white space like this. I don't know what conclusion to draw from this though, as there might not be a better way.
That kind of constraint goes straight to my worries about nesting one kind of language in another one, especially if the outer one is one that cares about white space like this. I don't know what conclusion to draw from this though, as there might not be a better way.
@@ -270,18 +271,10 @@ func typeForRule(r Rule) ruleType { | |||
// In the future a single group will be evaluated sequentially to properly handle |
brian-brazil
Jun 14, 2017
Contributor
Comment needs updating
Comment needs updating
@@ -270,18 +271,10 @@ func typeForRule(r Rule) ruleType { | |||
// In the future a single group will be evaluated sequentially to properly handle | |||
// rule dependency. | |||
func (g *Group) Eval(ts time.Time) { |
brian-brazil
Jun 14, 2017
Contributor
The different rule groups need to have offset runtimes, like we do for targets.
The different rule groups need to have offset runtimes, like we do for targets.
} | ||
rules = append(rules, rule) | ||
|
||
// Groups need not be unique across filenames. |
brian-brazil
Jun 14, 2017
Contributor
What is this going to look like in the UI and in metrics?
Will the labels of a group change every time it's rescheduled in a different directory?
I don't think non-unique names works with our current setup.
What is this going to look like in the UI and in metrics?
Will the labels of a group change every time it's rescheduled in a different directory?
I don't think non-unique names works with our current setup.
gouthamve
Jun 14, 2017
Author
Member
What do you mean labels?
I haven't yet touched the UI. But it would be simple to add the grouping there as I now added the filename also to the Group
object. Currently in the UI, all the rules will be listed without any reference to the group/file.
What do you mean labels?
I haven't yet touched the UI. But it would be simple to add the grouping there as I now added the filename also to the Group
object. Currently in the UI, all the rules will be listed without any reference to the group/file.
brian-brazil
Jun 14, 2017
Contributor
We will have a metric for how long the rule groups took to evaluate. The labels of a given group in that metric should be consistent across Prometheus invocations.
We will have a metric for how long the rule groups took to evaluate. The labels of a given group in that metric should be consistent across Prometheus invocations.
gouthamve
Jun 14, 2017
Author
Member
Can we not have name, filename
as the labels?
Can we not have name, filename
as the labels?
brian-brazil
Jun 14, 2017
Contributor
The filename can change as Prometheus is rescheduled.
The filename can change as Prometheus is rescheduled.
} | ||
|
||
// Validate the rule and return a list of encountered errors. | ||
func (r *Rule) Validate() (errs []error) { |
brian-brazil
Jun 14, 2017
Contributor
You need to validate that the labelnames/annotations names are valid, and for rules that label values are valid.
You need to validate that the labelnames/annotations names are valid, and for rules that label values are valid.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
return nil, err | ||
rgs, errs := rulefmt.ParseFile(fn) | ||
if errs != nil { | ||
return nil, tsdb.MultiError(errs) |
fabxc
Jun 16, 2017
Contributor
Not a good idea to pull in tsdb
as a dep just for a multierror type. In fact, it's probably a good idea to have loadGroups also just returns a straight slice of errors so the caller can print a single log line per encountered issue or similar. That's why we made it a slice in rulefmt
to begin with.
Not a good idea to pull in tsdb
as a dep just for a multierror type. In fact, it's probably a good idea to have loadGroups also just returns a straight slice of errors so the caller can print a single log line per encountered issue or similar. That's why we made it a slice in rulefmt
to begin with.
l := model.LabelSet{ | ||
"name": model.LabelValue(g.name), | ||
"filename": model.LabelValue(g.file), | ||
} | ||
return l.Fingerprint() |
fabxc
Jun 16, 2017
Contributor
Can we rename it to hash()
and use pkg/labels.Labels.Hash()
?
Can we rename it to hash()
and use pkg/labels.Labels.Hash()
?
@@ -171,16 +176,96 @@ func checkRules(t cli.Term, filename string) (int, error) { | |||
return 0, fmt.Errorf("is a directory") | |||
} | |||
|
|||
rgs, errs := rulefmt.ParseFile(filename) | |||
if errs != nil { | |||
return 0, tsdb.MultiError(errs) |
fabxc
Jun 16, 2017
Contributor
Same about import tsdb
for this and probably better returning the actual slice so the errors can be logged as one per line. tsdb.MultiError
will just concat them with ;
which will be impossible to debug with.
Same about import tsdb
for this and probably better returning the actual slice so the errors can be logged as one per line. tsdb.MultiError
will just concat them with ;
which will be impossible to debug with.
} | ||
return len(rules), nil | ||
|
||
return ioutil.WriteFile(filename+".yaml", y, 0777) |
fabxc
Jun 16, 2017
Contributor
0666
for files, 0777
for dirs. That's what we are going with in general I think.
0666
for files, 0777
for dirs. That's what we are going with in general I think.
} | ||
|
||
yamlRG.Groups[0].Rules = yamlRules | ||
y, err := yaml.Marshal(yamlRG) |
fabxc
Jun 16, 2017
•
Contributor
Because we are now using the plain YAML lib again this will generate rule files with basically random key ordering right?
@brian-brazil if yes, this is pretty bad UX just because we don't want to use a 100 LOC wrapper lib that's vendored already anyway.
Because we are now using the plain YAML lib again this will generate rule files with basically random key ordering right?
@brian-brazil if yes, this is pretty bad UX just because we don't want to use a 100 LOC wrapper lib that's vendored already anyway.
gouthamve
Jun 16, 2017
Author
Member
Using go-yaml
the fields are ordered as we declare them. ghodss/yaml
produces the fields in alphabetical order.
Using go-yaml
the fields are ordered as we declare them. ghodss/yaml
produces the fields in alphabetical order.
fabxc
Jun 16, 2017
Contributor
Okay, let's move forward then. Somewhat sure I've seen things wildely mixed – but maybe I recall incorrectly.
Okay, let's move forward then. Somewhat sure I've seen things wildely mixed – but maybe I recall incorrectly.
The offset logic isn't included here yet. |
@@ -243,6 +328,11 @@ func main() { | |||
Run: CheckMetricsCmd, | |||
}) | |||
|
|||
app.Register("update-rules", &cli.Command{ | |||
Desc: "update the rules to the new YAML format", |
brian-brazil
Jun 16, 2017
Contributor
update rule files to...
to be consistent with the above.
update rule files to...
to be consistent with the above.
type RuleGroup struct { | ||
Name string `yaml:"name"` | ||
Interval model.Duration `yaml:"interval,omitempty"` | ||
Rules []Rule `yaml:"rules"` |
brian-brazil
Jun 16, 2017
Contributor
This should have the XXX checkOverflow stuff.
This should have the XXX checkOverflow stuff.
var ( | ||
wg sync.WaitGroup | ||
) | ||
|
||
for i, rule := range g.rules { |
brian-brazil
Jun 16, 2017
Contributor
Should we check done
after each rule?
Should we check done
after each rule?
gouthamve
Jun 16, 2017
Author
Member
I don't see why we need to. This is all going sequentially and the function wouldn't exit until all the rules are evaluated.
I don't see why we need to. This is all going sequentially and the function wouldn't exit until all the rules are evaluated.
fabxc
Jun 16, 2017
Contributor
I think Brian's point was to actually do exit if done indicates the group should terminate.
I think Brian's point was to actually do exit if done indicates the group should terminate.
gouthamve
Jun 16, 2017
Author
Member
Done
Done
* Move fingerprint to Hash() * Move away from tsdb.MultiError * 0777 -> 0666 for files * checkOverflow of extra fields Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
I don't think we need offset evaluation for the first iteration. Let's focus on working down breaking changes for now. |
@brian-brazil This takes care of the offset no? https://github.com/Gouthamve/prometheus/blob/6b70a4d85031ecfb1404c3702237ebc0e7d3f2c8/rules/manager.go#L169 |
Mh, I thought this was alluding to "offset evaluation" where we evaluate recording rules not against now but now-5m for example. That can be relevant for metrics that are lagging behind like Cloudwatch. |
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
No, that's what I was talking about. I'd thought it hadn't been merged into this PR. My "offset eval" is still at the idea stage, I want to wait until we've got cloudwatch&friends outputting timestamps and see if it still makes sense. |
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
|
The initial implementation of rule-groups. Some keypoints:
ghodss/yaml
instead of go-yaml, I can see that it is already vendored.cc @brian-brazil @fabxc @beorn7 @juliusv @grobie @SuperQ
This change is