Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mean/var to stats #22

Merged
merged 1 commit into from Feb 17, 2018
Merged

add mean/var to stats #22

merged 1 commit into from Feb 17, 2018

Conversation

ShigekiKarita
Copy link
Member

No description provided.

@ShigekiKarita ShigekiKarita merged commit 8855ef9 into master Feb 17, 2018
@jmh530
Copy link
Contributor

jmh530 commented Feb 17, 2018

@ShigekiKarita "var" could use an option for calculating the sample variance or population variance.

I had done some work on statistics functions last year. Do you want me to push a branch of mine somewhere so that you can see what I had done for comparison? I had written at least mean, hmean, gmean, var, std, and zscore before I got side-tracked. I took a little different approach as I viewed the axis option as coming through byDim, but mostly similar.

@ShigekiKarita
Copy link
Member Author

@jmh530 I think that sample or population are equals to delta degree of freedom ddof=0 or ddof=1, respectively. Do you agree with that? I want to confirm that because I wanna follow the numpy way (arguments) basically.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html

Do you want me to push a branch of mine somewhere so that you can see what I had done for comparison?

Yes. I'd like to see and merge your function. PR is always welcome.

@ShigekiKarita ShigekiKarita deleted the stats-mean branch February 17, 2018 12:19
@jmh530
Copy link
Contributor

jmh530 commented Feb 17, 2018

@ShigekiKarita Yes, ddof in numpy is for degrees of freedom, which is the point of sample and population. Personally, I have never liked that way of doing things because then you need to assert to validate that the user only actually uses 0 and 1. In addition, for functions like skewness and kurtosis, the formulas don't work that easily so it doesn't generalize. I had used a bool, but a flag would also be reasonable.

I pushed all the stats functions I had written to a branch on my fork. You can view there. I can always re-work to something to something to get added, or you can just feel free to take what you want from it.

EDIT: Mine still requires a little work, I don't think I had added UTs for all the seed versions of the functions.

@9il
Copy link
Member

9il commented Feb 17, 2018

Hi @ShigekiKarita and @jmh530. Moments, including var can be calculated during single iteration. For example var: https://wikimedia.org/api/rest_v1/media/math/render/svg/67c38600b240e9bf9479466f5f362792e4fc4fb8

@9il
Copy link
Member

9il commented Feb 17, 2018

This is how it works in databases. It is preferred because of lazy input.

@jmh530
Copy link
Contributor

jmh530 commented Feb 17, 2018 via email

@jmh530
Copy link
Contributor

jmh530 commented Feb 17, 2018 via email

@9il
Copy link
Member

9il commented Feb 18, 2018

For float:

Summator!(double, Summation.kbn) squares = 0;
Summator!(double, Summation.naive) values = 0;

For double and real:

Summator!(T, Summation.kb2) squares = 0;
Summator!(T, Summation.kbn) values = 0;

And common loop

aSlice.each!((a) { squares += a * a; values += a; });
sizediff_t n = aSlice.elementsCount;
squares += -(values.sum ^^ 2 / n);
return squares.sum / (n - 1);

each will squash inner loops for contiguous slice.

@jmh530
Copy link
Contributor

jmh530 commented Feb 18, 2018

Seems to work for me.

auto twoPassVar(T)(T slice)
{
    import mir.math.sum : sum;
    import mir.ndslice.topology : map;
    import mir.math.common : powi;

    size_t sliceSize = slice.elementsCount;
    auto sliceMean = slice.sum / sliceSize;

    return slice
            .map!(a => a - sliceMean)
            .map!(a => a.powi(2))
            .sum
            / (sliceSize - 1);
}

auto fastVar(T)(T aSlice)
{
    import mir.ndslice.algorithm : each;
    import mir.math.sum;

    Summator!(double, Summation.kbn) squares = 0;
    Summator!(double, Summation.naive) values = 0;

    aSlice.each!((a) { squares += a * a; values += a; });
    sizediff_t n = aSlice.elementsCount;
    squares += -(values.sum ^^ 2 / n);
    return squares.sum / (n - 1);
}

unittest
{
    import mir.ndslice.slice : sliced;
    import mir.ndslice.algorithm : each;
    
    float adder = 100_000_000_000_000.0f;
    float[] x1_pre = [4.0f, 7.0f, 13.0f, 16.0f];
    auto x2_pre = x1_pre.dup;
    foreach(size_t i, float e; x2_pre)
        e = e + adder;
    auto x1 = x1_pre.sliced;
    auto x2 = x2_pre.sliced;
    
    assert(twoPassVar(x1) == fastVar(x1));
    assert(twoPassVar(x2) == fastVar(x2));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants