Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geometric mean as vector aggregation type #438

Closed
BigCrunsh opened this Issue Jan 7, 2015 · 12 comments

Comments

Projects
None yet
3 participants
@BigCrunsh
Copy link

BigCrunsh commented Jan 7, 2015

It would be nice to have also the geometric mean as a vector aggregation type, since it is more appropriate for metrics that are fractions (e.g., percentage of usage, accuracy,...).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 7, 2015

Typically you'd aggregate up the values you want as sums, and then take the ratio of that so you get the correct weights.

Do you have a use case where that wouldn't work?

@BigCrunsh

This comment has been minimized.

Copy link
Author

BigCrunsh commented Jan 7, 2015

I have probability-like performance measures. For these an average in the log-space (geo-mean) is more appropriate (e.g., if one of the entries drops down to zero, the "average" should be zero).

Sure, I can also sum up the numerators and divide them by the sum, but this would be a different measure.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 7, 2015

I'm having difficulty visualising this, can you explain more about what these performance measures are and how they're instrumented?

@beorn7 beorn7 self-assigned this Jan 7, 2015

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 7, 2015

Implementing a GEOMEAN aggregator should be straight forward, and I can imagine it will be useful for a small but relevant number of users. @BigCrunsh I guess the implementation effort will be similar to the effort you needed to invest to teach us about the uses of geometric means in your area of expertise.

Feature request accepted.

@BigCrunsh

This comment has been minimized.

Copy link
Author

BigCrunsh commented Jan 7, 2015

Thx @beorn7.

Example: Assume you have 100€. On the first day, you win 10%, on the second you loose 20%, and on the third you win 50%. The question is what is you average win? In order to do that you can take the geometric mean (see, e.g., here for more details). More extreme, to show that the arithmetic mean is not appropriate: Assume that you loose one day 100% of your money. Taking then the arithmetic mean give you some non-zero value, the geometric mean does.

This mostly applies to all quantities that are ratios and probabilities (like error probability) - in my examples precision, recall.

Hope that helps ;)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 7, 2015

From my experience, every time I've seen someone aggregating ratios in monitoring the result was not what they wanted.

This mostly applies to all quantities that are ratios and probabilities

I've yet to see a case where it was correct to export a ratio. You want to export the numerator and denominator separately, and process them from there in Prometheus.

If this is not what you're doing, can you detail exactly what your instrumentation is?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 7, 2015

Here's a thought: How about adding logarithm and power scalar functions? That should allow for this for the rare use cases where it comes up and is a valid use case.

My concern is that taking a mean of ratios or mean of means is a very very common mistake in the prometheus style of monitoring, and by offering a geometric mean it may cause even more confusion in this space.

@BigCrunsh

This comment has been minimized.

Copy link
Author

BigCrunsh commented Jan 8, 2015

My concern is that taking a mean of ratios or mean of means is a very very common mistake in the prometheus style of monitoring, and by offering a geometric mean it may cause even more confusion in this space.

@brian-brazil: Actually I agree. A mean of some counters is fine and a mean of ratios typically not, since the more appropriate equivalent is the geomean. And this is why I am asking ;)

Here's a thought: How about adding logarithm and power scalar functions? That should allow for this for the rare use cases where it comes up and is a valid use case.

@brian-brazil: In general, I like the idea of adding logarithm and power function (probably only because of my math background) - it indeed gives prometheus a higher flexibility. However, I don't think that this is a good alternative to implementing geomean; computing the geomean explicitly via AVG, EXP, LOG is too much logic in the presentation layer for me.

I've yet to see a case where it was correct to export a ratio. You want to export the numerator and denominator separately, and process them from there in Prometheus.

If this is not what you're doing, can you detail exactly what your instrumentation is?

@brian-brazil: Again, yes I could export the numerator and denominator separately, but only to compute then the measures I am actually interested in via the dashboard? But even then, I have to compute the geomean. In my context, I would like to observe the quality of a classification system with 25 different outcomes. The quality of each of these classes are measured by precision and recall (see above). In order to come up with an overall measure, people using the geomean either to combine precision and recall or to combine them over multiple classes. Note that, taking the pure arithmetic mean over the underlying counters is not sensitive enough and thus not appropriate in this domain.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 8, 2015

A mean of some counters is fine and a mean of ratios typically not, since the more appropriate equivalent is the geomean. And this is why I am asking ;)

The geomean is still incorrect in the common use cases, as what you'd actually want is a weighted geomean to allow for servers getting different amounts of traffic. This is a big part of the reason of why I'm against making geomean easy to use, as people will misapply it in such cases (and get confused about what the difference between geometric and arithmetic means is).

It's correct and much easier to sum up the rates and then divide to calculate an arithmetic mean based on that.

computing the geomean explicitly via AVG, EXP, LOG is too much logic in the presentation layer for me.

What's your presentation layer?

This seems like an extremely rare use case to me (you're the first person to have it in my ~8 years of monitoring experience, and I'm not yet convinced your use case is correct), so I think it being a bit more wordy is not the end of the world.

Again, yes I could export the numerator and denominator separately, but only to compute then the measures I am actually interested in via the dashboard?

Yes, this the most common use case for Prometheus and how it's intended to be used.

In my context, I would like to observe the quality of a classification system with 25 different outcomes.

This sounds like a good place to use labels.

Note that, taking the pure arithmetic mean over the underlying counters is not sensitive enough and thus not appropriate in this domain.

That seems very odd to me, why's that the case?

I'd expect that precision of a double is going to be an issue for the counters, it's also going to be an issue for any other way of calculating the same result.
I'll also note that if you care that much about sensitivity, then Prometheus is not an appropriate solution for you as there's a number of races inherent in this type of system and other issues that will cause minor inaccuracies. See http://prometheus.github.io/docs/introduction/overview/#when-doesn't-it-fit?

The rule of prometheus monitoring is to do all of your aggregation and processing in Prometheus, and keep the instrumentation logic completely dumb. If you find yourself taking a rate or ratio in your binary, then you need to think how to do it just with counters and rates.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jan 8, 2015

Again, I don't think geomean will be useful for system monitoring. But the vision for Prometheus (as set by our proud founding father Matt) has a larger scope. It's kind of the same thing as our discussion about negative observation values for summaries. You might want to monitor a measurement of a scientific experiment or something. If you already have a system (say a machine learning one, or you write a separate exporter binary for a 3rd party system with a different monitoring API) that spits out ratios of some kind. You don't want to change that existing system to export values at a lower level. You just want to export the ratios it spits out, and then you want to do some processing based on those ratios.

The possibility to simply provide exp and log as functions should be considered. exp(avg(log(metric))) is not that bad. :) If we do geomean, we probably also need geomean_over_time, which we don't need with the exp/log approach. Also, exp and in particular log might be useful in other cases, especially for ad hoc graphing. We'll take the way of least confusion.

In general, I don't think an aggregator named as arcanely as geomean will confuse those that don't understand geometric means. Also, the documentation should clearly state that geomean is for very special purposes.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 16, 2015

#599 will add ln and exp. So once that is in, you can write exp(avg(ln(metric))) for geometric mean and exp(avg_over_time(ln(metric[10m]))) for geometric mean over time.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.