Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stat: "running" subpackage for generator-like statistics? #1037

Open
btracey opened this issue Jul 24, 2019 · 2 comments · May be fixed by #1067
Open

stat: "running" subpackage for generator-like statistics? #1037

btracey opened this issue Jul 24, 2019 · 2 comments · May be fixed by #1067

Comments

@btracey
Copy link
Member

btracey commented Jul 24, 2019

Just a thought -- logging as an issue so I don't forget / can solicit feedback.

At the moment we have stat.Mean(x, weights []float64) float64 which computes the mean of a set of data. The thought is to also provide something like stat/running

package running

type Mean struct {
    count float64
    mean float64
}

func (m *Mean) Mean() float64 {
    return m.mean
}

func (m *Mean) Count() float64 {
    return m.count 
}

func (m *Mean) Add(x float64) {
      m.mean = x + m.mean * m.count / (m.count + 1)
      m.count++
}

func (m *Mean) AddWeighted(x, weight float64){ .... }

func (m *Mean) Reset() { ... }

and there could be similar for Variance/StdDev, DiscountedMean (where the count is discounted by a constant, etc.). These could be relatively easily combined with a shim:

type MeanStd struct {
    running.Mean()
    running.StdDev()
}

func (m *MeanStd) Add(x float64) {
    m.Mean.Add(x)
    m.StdDev.Add(x)
}

This would help both with streams that are too large to want to put in a []float64 and data that comes from some source (say, a channel) and so cannot be put into a float64.

I don't think Add is the right verb, but I'm not sure what the right verb is (Accumulate seems too heavy, Append also seems wrong ...)

@kortschak
Copy link
Member

I have often thought that this would be useful. Also note that you can calculate the variance in the same loop very cheaply.

// The Method of Provisional Means
n++
mean = oldmean + (x-oldmean)/n
sumOfSquareDiffs += (x - oldmean) * (x - mean)
oldmean = mean

Then use the sumOfSquareDiff as you would expect.

@btracey
Copy link
Member Author

btracey commented Jul 25, 2019

Yes, agreed, and since it's so common we'd probably want a specific MeanVariance type (or maybe just one thing that's the kitchen sink, i.e. running.Stats). I was just trying to mention that it's easy to combine different things together so the combinatorics aren't so bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants