Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create "promlint" tool for linting metric types, names, and labels #1953

Closed
mdlayher opened this Issue Sep 7, 2016 · 9 comments

Comments

Projects
None yet
5 participants
@mdlayher
Copy link
Member

mdlayher commented Sep 7, 2016

I think it would be an excellent idea to create a linting tool that ensures best practices are used when creating metrics. This could become a new binary, promlint, or perhaps an addition to promtool as promtool lint.

In order to make it useful for more than one language, I think it would be a good idea to directly inspect the metrics notation typically output by prometheus and its exporters, such as:

# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 133

The tool could look for issues such as:

  • incorrect metrics suffix for metric type
    • _total with a metric that is not a counter
  • use of "reserved" labels
    • quantile label with a counter metric
  • use of non-standard units
    • _milliseconds suffix instead of _seconds

As an example, the tool's output could look something like:

$ promlint http://192.168.1.1:9090/metrics
# HELP foo_total Current number of foo.
# TYPE foo_total gauge
foo_total 10
promlint: metric with '_total' suffix should be of type counter

# HELP foo_bar Some help text.
# TYPE foo_total gauge
foo_bar{quantile="1"} 1
promlint: metric of type counter should not use reserved label 'quantile'

# HELP foo_bar Some help text.
# TYPE foo_seconds_total counter
foo_milliseconds_total 10
promlint: prefer unit 'seconds' over 'milliseconds'

I think it would also make sense to allow the tool to ingest metrics in a variety of ways for maximum flexibility. Some trivial examples:

$ curl http://192.168.1.1:9090/metrics | promlint
$ promlint http://192.168.1.1:9090/metrics
$ promlint metrics.txt

I'm happy to negotiate semantics and such, but I think the general idea is sound. This is a tool I would certainly use when working with Prometheus on my own projects. In addition, since it isn't tied to any specific programming language, it would become immediately useful to a wide audience with varying programming backgrounds.

If I'm on the right track, I'd be happy to work on a prototype, and eventually submit a pull request to merge it into the project.

Thanks for your time! Looking forward to hearing your thoughts!

@mdlayher

This comment has been minimized.

Copy link
Member Author

mdlayher commented Sep 7, 2016

Apologies, looks like an issue was already opened about a similar idea: #1902. Should have searched more. :)

Leaving open for discussion on some of the proposed ideas anyway. Feel free to close if you'd like to move discussion elsewhere.

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Sep 7, 2016

This is a great idea, I've been thinking about such tools a few times, although more in the context of rules files. Issue #1902 was meant to check for invalid metric outputs (e.g. for scenarios in which no client_library is being used), so an extra ticket make sense for the linter use case.

In addition to just linting /metrics, a rule linter would be even more helpful for me, as I guide other developers more often through the creation of rules than creation of new metrics. Possible lint checks:

  • common rule name format level:metric:operations
  • don't remove the job label in aggregations
  • prefer rate over irate in rules
  • prefer a rate interval of >4x scrape_interval (would need access to config)

Similar to the existing promtool check-* commands, I see we'll have different lint contexts (config, metrics, rules, ...), so it might be beneficial to include the context in the command.

Using reserved label names should be a check error and not a linter warning.

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Sep 7, 2016

More lint checks:

  • Usage of operator/aggregator/keyword names as label.
@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Sep 8, 2016

For someone implementing this, the prom2json code is a good starting point for how to just read and extract metrics from a /metrics endpoint in either the text or protobuf formats: https://github.com/prometheus/prom2json (and then of course get rid of the JSON conversion and do linting instead).

@tcolgate

This comment has been minimized.

Copy link
Contributor

tcolgate commented Sep 18, 2016

@grobie perhaps:

  • ensure rateXm in rule name matches range vector
  • check for removal of _total suffix after rate

I spent some time thinking about this. It would be nice if the parser passed through the comments, that would let you override the linter in a comment before a rule/alert.

I was leaning toward promtool check-rules -lint (maybe with -lint.ignore, -lint.error , -lint.warn to let you turn linters on/off, and say which could cause a check-rules to fail)

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Sep 18, 2016

What would be examples for overriding the linter?

On Sun, Sep 18, 2016, 06:25 Tristan Colgate-McFarlane <
notifications@github.com> wrote:

@grobie https://github.com/grobie perhaps:

  • ensure rateXm in rule name matches range vector
  • check for removal of _total suffix after rate

I spent some time thinking about this. It would be nice if the parser
passed through the comments, that would let you override the linter in a
comment before a rule/alert.

I was leaning toward promtool check-rules -lint (maybe with -lint.ignore,
-lint.error , -lint.warn to let you turn linters on/off, and say which
could cause a check-rules to fail)


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#1953 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAANaNHe4xjoMGk9LodjAvNi7NnNHqyaks5qrRF_gaJpZM4J2jWC
.

@tcolgate

This comment has been minimized.

Copy link
Contributor

tcolgate commented Sep 18, 2016

@grobie overrides could be used to mark known breaches of the established policy, to stop things causing build failures for instance. e.g. perhaps I have some rules (I do), that don't conform to labels:metric:funcs (e.g. we allow a kpi: prefix to force non job: aggregations into our long term storage).

It's nice to be able to fail a build on a lint error, but if you are going to have that option, you should probably provide the ability to override it.

@mdlayher

This comment has been minimized.

Copy link
Member Author

mdlayher commented Oct 19, 2016

Started messing around with this tonight.

[zsh|matt@nerr-2]:~/src/github.com/prometheus/prometheus/cmd/promtool 1 *(master) ± curl -s http://192.168.1.3:9162/metrics | ./promtool lint
# HELP http_request_duration_microseconds The HTTP request latencies in microseconds.
# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} 2275.341
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} 3021.139
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} 359535.354
http_request_duration_microseconds_sum{handler="prometheus"} 6.7642733153e+07
http_request_duration_microseconds_count{handler="prometheus"} 1878

lint: http_request_duration_microseconds: use base unit seconds instead of unit microseconds

I'd be happy to add a few basic checks and clean it up for submission. I think starting small with lint checks makes sense, since we can always add more over time. Would love to hear others' thoughts.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.