Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for line-column numbers for invalid rules, promtool #6533

Merged
merged 18 commits into from
Jan 15, 2020

Conversation

Harkishen-Singh
Copy link
Contributor

Fixes: #1758 link

This PR uses the help of gopkg.in/yaml.v3 to implement this functionality.

line-column-number-invalid-rules

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
@Harkishen-Singh Harkishen-Singh changed the title Adds support for Line-Column numbers for invalid rules during check, promtool Adds support for line-column numbers for invalid rules, promtool Dec 30, 2019
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do a straight switch everywhere to v3? It'd be better not to have two versions around.

@@ -769,7 +769,7 @@ func reloadAndValidate(rgs *rulefmt.RuleGroups, t *testing.T, tmpFile *os.File,
_, err = tmpFile.Write(bs)
testutil.Ok(t, err)
err = ruleManager.Update(10*time.Second, []string{tmpFile.Name()}, nil)
testutil.Ok(t, err)
testutil.NotOk(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this fail now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned now is along with line and column number. I haven't been able to figure this out yet. Can you help me with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not offhand, it'd take debugging.

Copy link
Contributor Author

@Harkishen-Singh Harkishen-Singh Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil When we marshal a struct containing the yaml.Node type, the []byte received(shown below) contains additional information from the node. This is done in the reloadAndValidate(). This will create an error from the code since the output would be like:

groups:
  - name: test
    interval: 0s
    rules:
      - record:
            kind: 8
            style: 0
            tag: '!!str'
            value: job:http_requests:rate5m
            anchor: ""
            alias: null
            content: []
            headcomment: ""
            linecomment: ""
            footcomment: ""
            line: 4
            column: 15
        expr:
            kind: 8
            style: 0
            tag: '!!str'
            value: sum by (job)(rate(http_requests_total[5m]))
            anchor: ""
            alias: null
            content: []
            headcomment: ""
            linecomment: ""
            footcomment: ""
            line: 5
            column: 13

Note: record and expr are yaml.Node.

Hence when we Parse this to rulefmt, this will return an error since record: and expr: are empty fields here and hence I used testutil.NotOk()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like we need a better test then, as this one is no longer doing what it's meant to be doing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this by formatting the rules to the yaml v2 standards.

pkg/rulefmt/rulefmt.go Outdated Show resolved Hide resolved
if g.Name == "" {
errs = append(errs, errors.Errorf("Groupname should not be empty"))
errs = append(errs, errors.Errorf("Groupname should not be empty, line: %d, column: %d", grpsTmp.Groups[j].Line, grpsTmp.Groups[j].Column))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to add the line/col in every error?

Copy link
Contributor Author

@Harkishen-Singh Harkishen-Singh Dec 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it would be good for the user to understand and find the invalid rule if suppose in a large rules file. For present, we can find both the line and column number.

@Harkishen-Singh
Copy link
Contributor Author

Straight switch to v3 through the entire project will cause some test failures since the new package does not support UnmarshalStrict. Files using such will experience CI fails. @brian-brazil

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
@slrtbtfs
Copy link
Contributor

slrtbtfs commented Jan 2, 2020

@Harkishen-Singh FYI, when using the PromQL language server you already get some of the functionality of this PR.

@Harkishen-Singh
Copy link
Contributor Author

@slrtbtfs I don't find anything related to rule checking and validation from the readme. Please correct me if I am wrong.

if g.Name == "" {
errs = append(errs, errors.Errorf("Groupname should not be empty"))
errs = append(errs, errors.Errorf("Groupname should not be empty, line: %d, column: %d", node.Groups[j].Line, node.Groups[j].Column))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus uses the <line>:<col>: <Error message> or <filename>:<line>:<col>: <Error message> format for errors in the expression browser now, since that format is used by many compilers. Also, when using that format, VS Code allows to just click on the message to get the cursor to the error position.

Might make sense to do the same here.

fmt.Println(r.Expr.Line)
errs = append(errs, errors.Errorf("field 'expr' must be set in rule, line: %d, column: %d", r.Expr.Line, r.Expr.Column))
} else if _, err := promql.ParseExpr(r.Expr.Value); err != nil {
errs = append(errs, errors.Wrapf(err, "could not parse expression, line: %d, column: %d", r.Expr.Line, r.Expr.Column))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should end up looking quite strange, since the error returned by the parser will include position information, too.

Normally the parser returns a promql.ParseErr. This struct already includes the position information.

What the PromQL language server does here, is to add the line and column where the expression starts to the line and column in the error message and then simply convert the error message to a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the position coordinates to make debugging rules easier for the user, in case of a larger file, increasing the convenience. Similar behaviour is noted in various format checkers as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's clear.

The problem with the current approach is that the output will look roughly like
could not parse expression, line: 3, column: 10: 2:14: Unexpected identifier in label matcher, expected string.

It would be better if the output looked like:
rules.yaml:4:24: could not parse expression: Unexpected identifier in label matcher, expected string. Note that the positions have been added together here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implement is, (from live example)

rules.yml: Groupname should not be empty, line: 2, column: 5
rules.yml: group "recording-node-exporter", rule 1, "heap": field 'expr' must be set in rule, line: 13, column: 12

is this acceptable? @brian-brazil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you post your rules file here? @slrtbtfs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks better now.


../../rules.yml: 2:5

That space should not be there and there should be a colon at the end of the position. To be consistent with conventions from various compilers the position should be of the form
../../rules.yml:2:5:

(nit)


../../rules.yml: 5:21 group "recording", rule 0, "foo": 4:19 could not parse expression: 1:14: parse error: could not parse remaining input "}\n"...

This one looks really confusing, since a single error message contains three different position coordinates and a verbal description of the position.

Ideally there would be only one position, directly at the beginning of the error line. Something like

../../rules.yml:5:28: group "recording", rule 0, "foo": could not parse expression: parse error: could not parse remaining input "}\n"...

Maybe even something more customized like:
../../rules.yml:5:28: Failed to parse PromQL expression: parse error: could not parse remaining input "}\n"...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the design of the rulefmt, this looks tough. If we remove the parser checks and let it handle by itself as earlier, then this can be possible, but again on the right-hand side, towards the ending of the sentence.

Copy link
Contributor

@slrtbtfs slrtbtfs Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible by just typecasting the err result of promql.ParseExpr() to a promql.ParseErr and using the fields in that data structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The space before the number is due to errors.Wrapf(). I don't think there is any way to remove this space. Any idea @slrtbtfs ?

Leaving that, rest I have given a new implementation of the logic. That solves most of what you said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By not using Wrapf() but instead constructing a custom error message from the data contained in the error.

@slrtbtfs
Copy link
Contributor

slrtbtfs commented Jan 2, 2020

That's what is referred to as "YAML support" in the README.

However you're right that this should be explained better.

Relevant issue, tracking part of what is done in this PR, too: prometheus-community/promql-langserver#75

To be clear, ATM config file validation is still WIP.

@Harkishen-Singh
Copy link
Contributor Author

@slrtbtfs I did not understand the purpose of the comment. Language server has a different purpose. Configuration checks by promtool are the general implementation which can be used by anyone, irrespective of the IDEs to ensure the proper functioning of the rules and alerts. Need not everyone to install the Prometheus language server.

@slrtbtfs
Copy link
Contributor

slrtbtfs commented Jan 2, 2020

Sure, this PR is useful independently of whether there exists a language server or not.

The point was mainly, that the use case of having better IDE support which was demoed in your screencast is also covered by the language server and pointing out that there exists other code that tries to achieve similar things.

@Harkishen-Singh
Copy link
Contributor Author

Sure. Both have their own purpose. Having said that language-server is optional for a user, its good to have it.

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
pkg/rulefmt/rulefmt.go Show resolved Hide resolved
@@ -769,7 +769,7 @@ func reloadAndValidate(rgs *rulefmt.RuleGroups, t *testing.T, tmpFile *os.File,
_, err = tmpFile.Write(bs)
testutil.Ok(t, err)
err = ruleManager.Update(10*time.Second, []string{tmpFile.Name()}, nil)
testutil.Ok(t, err)
testutil.NotOk(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not offhand, it'd take debugging.

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
@Harkishen-Singh
Copy link
Contributor Author

@brian-brazil please review.

func (r *Rule) Validate() (errs []error) {
if r.Record != "" && r.Alert != "" {
errs = append(errs, errors.Errorf("only one of 'record' and 'alert' must be set"))
func (r *RuleNode) Validate() (errs []WrapError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only use this as an error

Copy link
Contributor Author

@Harkishen-Singh Harkishen-Singh Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wrapping this error with the node values so that we can implement according to what slrbtfs said. Do you mean to remove the wrapper logic and use the earlier error logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean the return type here should be a slice of error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil If I return a slice of error here, then the suggestion given by slrbtfs would not be possible. Please correct me if I am wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is an interface, it's weird to refer to things as errors that don't meet that interface. So I suggest adjusting things so that your errors are errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil If I return a slice of error here, then the suggestion given by slrbtfs would not be possible. Please correct me if I am wrong.

It is possible.

You can return an error and then convert it back to the underlying concrete type when needed.

https://tour.golang.org/methods/15

Copy link
Contributor Author

@Harkishen-Singh Harkishen-Singh Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is an interface, it's weird to refer to things as errors that don't meet that interface. So I suggest adjusting things so that your errors are errors.

With the current design, the errors are being returned after unwrapping, done by the Error interface function. Only after that, it is an error. I think the errs is leading to confusion even though its a slice of wrapped error along with its node value. Is this any case? @brian-brazil

@@ -769,7 +769,7 @@ func reloadAndValidate(rgs *rulefmt.RuleGroups, t *testing.T, tmpFile *os.File,
_, err = tmpFile.Write(bs)
testutil.Ok(t, err)
err = ruleManager.Update(10*time.Second, []string{tmpFile.Name()}, nil)
testutil.Ok(t, err)
testutil.NotOk(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like we need a better test then, as this one is no longer doing what it's meant to be doing

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
@Harkishen-Singh
Copy link
Contributor Author

@brian-brazil please have a look.

@roidelapluie
Copy link
Member

Do we have the same behaviour as before? e.g. the UnmarshalStrict behaviour?

@Harkishen-Singh
Copy link
Contributor Author

Harkishen-Singh commented Jan 11, 2020

I do think yes. Since when we remove the yaml.Node then the entire thing works like the v2 yaml.

}

// RuleGroups is a set of rule groups that are typically exposed in a file.
type RuleGroups struct {
Groups []RuleGroup `yaml:"groups"`
}

// RuleGroupsTest for running tests over rules.
type RuleGroupsTest struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused.

Copy link
Contributor Author

@Harkishen-Singh Harkishen-Singh Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually used inside the manager_test.go. It helps to convert yaml.Node type into default strings type so that Marshalling can be done according to the aim of the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only used in tests, then it belongs in the test file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will update that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems to be here

func (r *Rule) Validate() (errs []error) {
if r.Record != "" && r.Alert != "" {
errs = append(errs, errors.Errorf("only one of 'record' and 'alert' must be set"))
func (r *RuleNode) Validate() (errs []WrapError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is an interface, it's weird to refer to things as errors that don't meet that interface. So I suggest adjusting things so that your errors are errors.

}

// RuleGroupTest forms a testing struct for running tests over rules.
type RuleGroupTest struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil done. Please have a look.

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
@brian-brazil brian-brazil merged commit 84e6459 into prometheus:master Jan 15, 2020
@brian-brazil
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

promtool: print file/line locations for warnings/errors
4 participants