Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Prometheus tolerate dynamic values #1636

Closed
mwitkow opened this Issue May 17, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@mwitkow
Copy link
Contributor

mwitkow commented May 17, 2016

In grpc-ecosystem/go-grpc-prometheus#5 @brian-brazil brought up as an issue the discoverability of label values when all labels can't be pre-populated.

I believe that this is better handled through metric metadata (already such a concept exists, but is not utilized). An additional annotation alongside the help string that lists "predefined" values for a given label could then be scraped and stored by Prometheus.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 17, 2016

If you wish to assert that a particular label set exists for a metric the standard way to do that is on the client side.

See https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics

Doing it on the server side would be quite complex, and the doing it on the client has proven to work fine thus far. What you propose would only work where each combination had the same labels, which is not always the case.

@mwitkow mwitkow changed the title Add metadata hints about possible label values Make Prometheus tolerate dynamic values May 18, 2016

@mwitkow

This comment has been minimized.

Copy link
Contributor Author

mwitkow commented May 18, 2016

@brian-brazil in the other issue:

The problem is not discovery, it is correctness. I've not mentioned label value discovery so I'm not sure where you got that from.

There are real failure modes. For example a 100% error ratio would result in an alert on the successful request ratio being too low never firing, leaving the user blind to the 100% failure ratio.

I've seen this type of problem myself many times, and postmortems mentioning this are not unknown.
The problem is that the failure mode is subtle. Everything seems to work fine until it doesn't, and the time it doesn't work is often during a nasty outage.

If there are post-mortems related to this, then it's clearly an issue with the system's design. As grpc-ecosystem/go-grpc-prometheus#5 shows, expecting third-parties to provide all the necessary interfaces is not practical.

Doing it on the server side would be quite complex, and the doing it on the client has proven to work fine thus far. What you propose would only work where each combination had the same labels, which is not always the case.

The "let's expect clients to handle everything correctly" is an approach that would work at Google, where theres us a significant homogenity of technologies and libraries. Giving people a library that cannot be correctly (since "obvious ways" have significant risks that you point out) integrated with existing interfaces (where others like statsd or logstash can) is not a good strategy for adoption.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 18, 2016

If there are post-mortems related to this, then it's clearly an issue with the system's design.

This is a fundamental problem with time series monitoring. At some point you have to know about all the time series that are meant to exist.

The "let's expect clients to handle everything correctly" is an approach that would work at Google, where theres us a significant homogenity of technologies and libraries.

That doesn't apply here. Either the developer of the client or the end user must handle this correctly, we can't do it in the Prometheus server alone. Your proposal is about the client handling this.

Giving people a library that cannot be correctly (since "obvious ways" have significant risks that you point out) integrated with existing interfaces (where others like statsd or logstash can) is not a good strategy for adoption.

The library can be used correctly, many applications already do so, and there are no significant risks to doing so (just a small performance hit, which is only at initialisation in most cases - see

evalTotal.WithLabelValues(string(ruleTypeAlert))
for example). However you've decided not to use the interface we already offer for this, and instead to request a brand new interface.

Statsd as a component of a time series monitoring system has the same fundamental problem, and I don't believe it's possible to use it correctly.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 13, 2017

We already have a standard solution to this problem in client libraries, no need to add anything on the Prometheus side.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.