Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upBug: stddev_over_time returns NaN for constant series; floating point inaccuracy #4527
Comments
gouthamve
changed the title
Bug: use stddev_over_time to aggregate metric series , result is NaN;
Bug: stddev_over_time returns NaN for constant series; floating point inaccuracy
Aug 22, 2018
This comment has been minimized.
This comment has been minimized.
|
Do you have a suggestion on how to better deal with this? There's generally not much we can do about floating point inaccuracy. |
This comment has been minimized.
This comment has been minimized.
|
@tomwilkie suggests sticking in an |
This comment has been minimized.
This comment has been minimized.
|
Stddev doesn't include an |
This comment has been minimized.
This comment has been minimized.
|
Stddev is the root of variance, and variance is a sum of squares, which is always positive. |
This comment has been minimized.
This comment has been minimized.
|
That doesn't mean that you can add in an arbitrary mathematical operation that will produces a non-negative number - the answer will still be wrong. |
This comment has been minimized.
This comment has been minimized.
|
http://www.volkerschatz.com/science/float.html func funcStddevOverTime(vals []Value, args Expressions, enh *EvalNodeHelper) Vector {
return aggrOverTime(vals, enh, func(values []Point) float64 {
var aux, count, mean float64
for _, v := range values {
count++
delta := v.V - mean
mean += delta / count
aux += delta * (v.V - mean)
}
return math.Sqrt(aux / count)
})
} |
This comment has been minimized.
This comment has been minimized.
|
I tested method provided by DanCech , with metrics we logged previously; result output: std dev: 0 function funcStddevOverTime has the same problem, @brian-brazil @DanCech |
This comment has been minimized.
This comment has been minimized.
|
Not sure I understand your post, 0 is the correct answer isn't it? |
This comment has been minimized.
This comment has been minimized.
|
@DanCech |
This comment has been minimized.
This comment has been minimized.
|
That algorithm is not I in https://sci-hub.tw/10.1080/00401706.1962.10490022, where does it come from? |
This comment has been minimized.
This comment has been minimized.
|
It's taken from the second example in the "Correlation and compensation" section of the page I linked. This is attributed to Welford and described in http://webmail.cs.yale.edu/publications/techreports/tr222.pdf (also linked from that page). |
This comment has been minimized.
This comment has been minimized.
|
Equation 1.3b there is the equation I from Welford, but it is not the algorithm you have suggested. Where can I find a derivation of what you're suggesting? |
This comment has been minimized.
This comment has been minimized.
|
As far as I can tell that is the standard implementation of Welford, the code I provided is taken directly from the example on http://www.volkerschatz.com/science/float.html#corr
I updated my original comment to use You can find the same approach used in the python example at https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm Here is another (though it uses N-1 to compute sample stdev where we use N for population stdev) http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ Some further reading: https://www.johndcook.com/blog/standard_deviation/
|
This comment has been minimized.
This comment has been minimized.
It's not though, the Welford implementation would be
So where does this other algorithm come from, and is it actually a standard deviation? |
This comment has been minimized.
This comment has been minimized.
|
I'm not sure where your code came from either, but it doesn't correlate with what's described in Knuth (above), which is the algorithm implemented in my original comment, where after each iteration In your code you end up with |
This comment has been minimized.
This comment has been minimized.
I haven't dug into West/Hanson, only so much math one can do in a day. Do you have the exact reference for that? Also, who wants to claim the cheque from Knuth for the incorrect reference? |
This comment has been minimized.
This comment has been minimized.
|
The way it's derived is:
So by substitution: As outlined above,
which is the same as the code in my original comment. Going back to the calculation of
Expanding the right side:
Solving for
We know that
Again, the same as in my original comment. If you still want to argue that Knuth is incorrect, be my guest. |
This comment has been minimized.
This comment has been minimized.
|
Okay, that match checks out. Would you like to send a PR? |
DanCech
referenced this issue
Aug 24, 2018
Merged
use Welford/Knuth method to compute standard deviation and variance #4533
brian-brazil
closed this
in
#4533
Aug 26, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |

zjwzte commentedAug 22, 2018
•
edited by gouthamve
Bug Report
In our production ,we use stddev_over_time to aggregate metric series, because the samples of metric series are constant, the aggregate result may be NaN;
because in function funcStddevOverTime :
squaredSum/count - avg*avgis Negative Value;What did you expect to see?
we expect the result is 0;
we use prometheus 2.1
we add logs to see the result:
DEBUG::calc stddev, elems: [1.5990505637277868 @[1534906200000] 1.5990505637277868 @[1534906500000] 1.5990505637277868 @[1534906800000]]
DEBUG::calc stddev, sum: 4.797151691183361
DEBUG::calc stddev, squaredSum: 7.670888116074458
DEBUG::calc stddev, sub: -4.440892098500626e-16
DEBUG::calc stddev, stddev: NaN