Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd geometric mean as vector aggregation type #438
Comments
This comment has been minimized.
This comment has been minimized.
|
Typically you'd aggregate up the values you want as sums, and then take the ratio of that so you get the correct weights. Do you have a use case where that wouldn't work? |
This comment has been minimized.
This comment has been minimized.
|
I have probability-like performance measures. For these an average in the log-space (geo-mean) is more appropriate (e.g., if one of the entries drops down to zero, the "average" should be zero). Sure, I can also sum up the numerators and divide them by the sum, but this would be a different measure. |
This comment has been minimized.
This comment has been minimized.
|
I'm having difficulty visualising this, can you explain more about what these performance measures are and how they're instrumented? |
beorn7
self-assigned this
Jan 7, 2015
This comment has been minimized.
This comment has been minimized.
|
Implementing a GEOMEAN aggregator should be straight forward, and I can imagine it will be useful for a small but relevant number of users. @BigCrunsh I guess the implementation effort will be similar to the effort you needed to invest to teach us about the uses of geometric means in your area of expertise. Feature request accepted. |
This comment has been minimized.
This comment has been minimized.
|
Thx @beorn7. Example: Assume you have 100€. On the first day, you win 10%, on the second you loose 20%, and on the third you win 50%. The question is what is you average win? In order to do that you can take the geometric mean (see, e.g., here for more details). More extreme, to show that the arithmetic mean is not appropriate: Assume that you loose one day 100% of your money. Taking then the arithmetic mean give you some non-zero value, the geometric mean does. This mostly applies to all quantities that are ratios and probabilities (like error probability) - in my examples precision, recall. Hope that helps ;) |
This comment has been minimized.
This comment has been minimized.
|
From my experience, every time I've seen someone aggregating ratios in monitoring the result was not what they wanted.
I've yet to see a case where it was correct to export a ratio. You want to export the numerator and denominator separately, and process them from there in Prometheus. If this is not what you're doing, can you detail exactly what your instrumentation is? |
This comment has been minimized.
This comment has been minimized.
|
Here's a thought: How about adding logarithm and power scalar functions? That should allow for this for the rare use cases where it comes up and is a valid use case. My concern is that taking a mean of ratios or mean of means is a very very common mistake in the prometheus style of monitoring, and by offering a geometric mean it may cause even more confusion in this space. |
This comment has been minimized.
This comment has been minimized.
@brian-brazil: Actually I agree. A mean of some counters is fine and a mean of ratios typically not, since the more appropriate equivalent is the geomean. And this is why I am asking ;)
@brian-brazil: In general, I like the idea of adding logarithm and power function (probably only because of my math background) - it indeed gives prometheus a higher flexibility. However, I don't think that this is a good alternative to implementing geomean; computing the geomean explicitly via AVG, EXP, LOG is too much logic in the presentation layer for me.
@brian-brazil: Again, yes I could export the numerator and denominator separately, but only to compute then the measures I am actually interested in via the dashboard? But even then, I have to compute the geomean. In my context, I would like to observe the quality of a classification system with 25 different outcomes. The quality of each of these classes are measured by precision and recall (see above). In order to come up with an overall measure, people using the geomean either to combine precision and recall or to combine them over multiple classes. Note that, taking the pure arithmetic mean over the underlying counters is not sensitive enough and thus not appropriate in this domain. |
This comment has been minimized.
This comment has been minimized.
The geomean is still incorrect in the common use cases, as what you'd actually want is a weighted geomean to allow for servers getting different amounts of traffic. This is a big part of the reason of why I'm against making geomean easy to use, as people will misapply it in such cases (and get confused about what the difference between geometric and arithmetic means is). It's correct and much easier to sum up the rates and then divide to calculate an arithmetic mean based on that.
What's your presentation layer? This seems like an extremely rare use case to me (you're the first person to have it in my ~8 years of monitoring experience, and I'm not yet convinced your use case is correct), so I think it being a bit more wordy is not the end of the world.
Yes, this the most common use case for Prometheus and how it's intended to be used.
This sounds like a good place to use labels.
That seems very odd to me, why's that the case? I'd expect that precision of a double is going to be an issue for the counters, it's also going to be an issue for any other way of calculating the same result. The rule of prometheus monitoring is to do all of your aggregation and processing in Prometheus, and keep the instrumentation logic completely dumb. If you find yourself taking a rate or ratio in your binary, then you need to think how to do it just with counters and rates. |
This comment has been minimized.
This comment has been minimized.
|
Again, I don't think The possibility to simply provide In general, I don't think an aggregator named as arcanely as |
This comment has been minimized.
This comment has been minimized.
|
#599 will add |
beorn7
closed this
Mar 16, 2015
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
BigCrunsh commentedJan 7, 2015
It would be nice to have also the geometric mean as a vector aggregation type, since it is more appropriate for metrics that are fractions (e.g., percentage of usage, accuracy,...).