Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd functions for histogram standard deviation #3998
Comments
This comment has been minimized.
This comment has been minimized.
|
How accurate would such a result be on something with only 10 buckets with greatly different sizes? |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil data source
histogramcount of buckets: 30
Result
ConclusionThe accuracy of histogram stddev is no more than that of histogram quantiles. |
This comment has been minimized.
This comment has been minimized.
|
You're algorithm isn't producing a correct answer, it should be +Inf for the static server as there is at least one point in the +Inf bucket. More generally in any sufficiently loaded system there will always be some proportion of events in the +Inf bucket so I don't think this is workable in general. |
This comment has been minimized.
This comment has been minimized.
I don't know why you think at least one point should be in the +Inf bucket. I think you confused general histogram (my graph) and cumulative histogram (prometheus histogram). Because cumulative histogram is good for calc but not for human understanding, I plotted general histogram. This is the same data but use cumulative histogram:
|
This comment has been minimized.
This comment has been minimized.
|
I did not misunderstand the data, my point stands. |
This comment has been minimized.
This comment has been minimized.
|
Can you point me where's wrong? The input data is strange? |
This comment has been minimized.
This comment has been minimized.
|
The input looks reasonable, the output of your function is not given that data. |
This comment has been minimized.
This comment has been minimized.
I think you're talking about the +Inf bucket. The problem is how to treat +Inf. If we take account the +Inf bucket and treat mean of that bucket as +Inf, the stddev is +Inf, which is no meaning; if we just ignore +Inf bucket, we can get reasonable stddev. +Inf bucket only occur if you have really irregular data. If +Inf bucket is big, the histogram design is wrong. |
This comment has been minimized.
This comment has been minimized.
|
+Inf is the correct answer, you can't just throw away data because you don't like it. Histograms are designed this way for quantiles so the user will know that their buckets are off, which means that this doesn't work for standard deviation. |
brian-brazil
closed this
Mar 23, 2018
brian-brazil
added
the
component/promql
label
Mar 23, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Wing924 commentedMar 22, 2018
Standard deviation is as important metrics as φ-quantiles.
We can calc quantiles from histogram using
histogram_quantile(φ float, b instant-vector), but we can't calc the standard deviation.I suggest to add
histogram_stddev(avg float, b instant-vector)andhistogram_stdvar(avg float, b instant-vector)functions.Ref. How to get the standard deviation of a given histogram