Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: stddev_over_time returns NaN for constant series; floating point inaccuracy #4527

Closed
zjwzte opened this Issue Aug 22, 2018 · 19 comments

Comments

Projects
None yet
4 participants
@zjwzte
Copy link
Contributor

zjwzte commented Aug 22, 2018

Bug Report

In our production ,we use stddev_over_time to aggregate metric series, because the samples of metric series are constant, the aggregate result may be NaN;
because in function funcStddevOverTime :

 func funcStddevOverTime(vals []Value, args Expressions, enh *EvalNodeHelper) Vector {
	return aggrOverTime(vals, enh, func(values []Point) float64 {
		var sum, squaredSum, count float64
		for _, v := range values {
			sum += v.V
			squaredSum += v.V * v.V
			count++
		}
		avg := sum / count
		return math.Sqrt(squaredSum/count - avg*avg)
	})
}

squaredSum/count - avg*avg is Negative Value;

What did you expect to see?

we expect the result is 0;

  • Prometheus version:

we use prometheus 2.1

  • Logs:
    we add logs to see the result:

DEBUG::calc stddev, elems: [1.5990505637277868 @[1534906200000] 1.5990505637277868 @[1534906500000] 1.5990505637277868 @[1534906800000]]
DEBUG::calc stddev, sum: 4.797151691183361
DEBUG::calc stddev, squaredSum: 7.670888116074458
DEBUG::calc stddev, sub: -4.440892098500626e-16
DEBUG::calc stddev, stddev: NaN

@gouthamve gouthamve changed the title Bug: use stddev_over_time to aggregate metric series , result is NaN; Bug: stddev_over_time returns NaN for constant series; floating point inaccuracy Aug 22, 2018

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 22, 2018

Do you have a suggestion on how to better deal with this? There's generally not much we can do about floating point inaccuracy.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Aug 22, 2018

@tomwilkie suggests sticking in an Abs there. It makes sense.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 22, 2018

Stddev doesn't include an abs, so I don't see how that makes sense mathematically.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Aug 22, 2018

Stddev is the root of variance, and variance is a sum of squares, which is always positive.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 22, 2018

That doesn't mean that you can add in an arbitrary mathematical operation that will produces a non-negative number - the answer will still be wrong.

@DanCech

This comment has been minimized.

Copy link
Contributor

DanCech commented Aug 22, 2018

http://www.volkerschatz.com/science/float.html

func funcStddevOverTime(vals []Value, args Expressions, enh *EvalNodeHelper) Vector {
	return aggrOverTime(vals, enh, func(values []Point) float64 {
		var aux, count, mean float64
		for _, v := range values {
			count++
			delta := v.V - mean
			mean += delta / count
			aux += delta * (v.V - mean)
		}
		return math.Sqrt(aux / count)
	})
}
@zjwzte

This comment has been minimized.

Copy link
Contributor Author

zjwzte commented Aug 23, 2018

I tested method provided by DanCech , with metrics we logged previously;
func stdDevCalc() {
values := []struct{ T, V float64 }{
{T: 1534906200000, V: 1.5990505637277868},
{T: 1534906500000, V: 1.5990505637277868},
{T: 1534906800000, V: 1.5990505637277868},
}
var aux, count, mean float64
for _, v := range values {
count++
delta := v.V - mean
mean = mean + delta/count
aux = aux + delta*(v.V-mean)
}
fmt.Println("std dev: ", math.Sqrt(aux/count))
}

result output:

std dev: 0

function funcStddevOverTime has the same problem, @brian-brazil @DanCech

@DanCech

This comment has been minimized.

Copy link
Contributor

DanCech commented Aug 23, 2018

Not sure I understand your post, 0 is the correct answer isn't it?

@zjwzte

This comment has been minimized.

Copy link
Contributor Author

zjwzte commented Aug 23, 2018

@DanCech
yes, because all metric samples are same, stddev should be 0;

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 23, 2018

That algorithm is not I in https://sci-hub.tw/10.1080/00401706.1962.10490022, where does it come from?

@DanCech

This comment has been minimized.

Copy link
Contributor

DanCech commented Aug 23, 2018

It's taken from the second example in the "Correlation and compensation" section of the page I linked. This is attributed to Welford and described in http://webmail.cs.yale.edu/publications/techreports/tr222.pdf (also linked from that page).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 23, 2018

Equation 1.3b there is the equation I from Welford, but it is not the algorithm you have suggested. Where can I find a derivation of what you're suggesting?

@DanCech

This comment has been minimized.

Copy link
Contributor

DanCech commented Aug 23, 2018

As far as I can tell that is the standard implementation of Welford, the code I provided is taken directly from the example on http://www.volkerschatz.com/science/float.html#corr

for all elements x in the number series
    count := count + 1
    delta := x - mean
    mean := mean + delta / count
    aux := aux + delta * (x - mean)
stddev := sqrt(aux / count)

I updated my original comment to use += notation, mathematically it is the same though.

You can find the same approach used in the python example at https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

Here is another (though it uses N-1 to compute sample stdev where we use N for population stdev) http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/

Some further reading: https://www.johndcook.com/blog/standard_deviation/

This better way of computing variance goes back to a 1962 paper by B. P. Welford and is presented in Donald Knuth’s Art of Computer Programming, Vol 2, page 232, 3rd edition. Although this solution has been known for decades, not enough people know about it. Most people are probably unaware that computing sample variance can be difficult until the first time they compute a standard deviation and get an exception for taking the square root of a negative number.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 23, 2018

As far as I can tell that is the standard implementation of Welford,

It's not though, the Welford implementation would be

for all elements x in the number series
  aux := aux + count * (x - mean) * (x - mean)
  count := count + 1
  mean := mean + delta / count
stddev := sqrt(aux / count)

So where does this other algorithm come from, and is it actually a standard deviation?

@DanCech

This comment has been minimized.

Copy link
Contributor

DanCech commented Aug 23, 2018

I'm not sure where your code came from either, but it doesn't correlate with what's described in Knuth (above), which is the algorithm implemented in my original comment, where after each iteration delta is x[k] - M[k-1], mean is M[k] and aux is S[k].

In your code you end up with S[k] = S[k-1] + (k - 1) * (x[k] - M[k-1]) * (x[k] - M[k-1]) which is also not the same as 1.3b (which is likely the similar West or Hanson rather than Welford).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 23, 2018

which is also not the same as 1.3b (which is likely the similar West or Hanson rather than Welford).

I haven't dug into West/Hanson, only so much math one can do in a day. Do you have the exact reference for that?

Also, who wants to claim the cheque from Knuth for the incorrect reference?

@DanCech

This comment has been minimized.

Copy link
Contributor

DanCech commented Aug 23, 2018

The way it's derived is:

S[n] = S[n-1] + (n - 1)/n * (x[n] - M[n-1]) * (x[n] - M[n-1]) per Welford pg. 420 fig. I

x[n] - M[n] = (n - 1)/n * (x[n] - M[n-1]) per Welford pg. 419 fig. (3)

So by substitution: S[n] = S[n-1] + (x[n] - M[n]) * (x[n] - M[n-1])

As outlined above, mean is M[n] and delta is x[n] - M[n-1], so now we have:

S[n] = S[n-1] + (x[n] - mean) * delta

v.V is x[n], and before the assignment aux is S[n-1] so now we have:

S[n] = aux + (v.V - mean) * delta

which is the same as the code in my original comment.

Going back to the calculation of mean, we start with Welford pg. 419 fig. (3):

x[n] - M[n] = (n - 1)/n * (x[n] - M[n-1])

Expanding the right side:

x[n] - M[n] = (x[n] - M[n-1]) - (x[n] - M[n-1]) / n

Solving for M[n] we get:

M[n] = M[n-1] + (x[n] - M[n-1]) / n

We know that delta is x[n] - M[n-1], count is n and before the assignment mean is M[n-1], so that is:

M[n] = mean + delta / count

Again, the same as in my original comment.

If you still want to argue that Knuth is incorrect, be my guest.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 24, 2018

Okay, that match checks out. Would you like to send a PR?

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.