New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats: Add the Irwin-Hall distribution #20481
base: main
Are you sure you want to change the base?
Conversation
7449412
to
5e1ac5a
Compare
Did adding this to Note that this would need to be vectorized to support array |
961b8b9
to
3167ba1
Compare
@nimish please run the tests locally before pushing and add |
Done. |
a624402
to
72221c0
Compare
Mostly, but the vectorization needs to work for arbitrary shapes. As written, it only works for 1D arrays. Also, this needs at least a test against reference values of the CDF or PDF to show that this is indeed the Irwin-Hall distribution. It's OK to let the existing test suite confirm that the distribution is self-consistent (e.g. that the PDF is the derivative of the CDF, etc.). The references also need to be fixed; they are causing issues with the doc build. Check against other distributions. Check your spaces between What is |
Lol.rst isnt supposed to be there, just a scratch file to test what was throwing the doc parser off. For tests: I suppose adding a test to confirm the sum of n uniforms has the same pdf and cdf would make sense; I'll add it in. Equivalenty the mgf being the mgf of the uniform raised to n would also work but I don't see an explicit mgf for that. |
Well, that reminds me that this should override I would prefer comparison of the output of pdf (at least) against that of UniformSumDistribution. |
4d913f0
to
2247132
Compare
Changed Added both a K-S GoF (basically a weak histo test....) against a sum of 10 uniforms and the exact values as computed by Wolfram Alpha. Test suite passes on my machine, at least, and I fixed up the sphinx stuff. |
Sigh, apparently not. Let me fix the doc gen |
@mdhaber Though I'd add config to only ever run a single CI run at a time for PRs and/or give the PR author the ability to cancel the run if done accidentally. Personally I'd also set it to just manual runs if it's that much of an issue/run only the shallowest suite automatically. Anything left to do before merging? |
Most CI is not running automatically on this PR, I think because this is your first. It's just a good habit to get into for when it does run by default. Before merging I need to review the changes. But if those all look good and nothing major comes up, I will consider merging. |
Oops, yes - thanks for your patience! All new features need an announcement at https://discuss.scientific-python.org/c/contributor/scipy/32. We used to do this on a mailing list; here is an example. One thing I imagine that people might want to comment on is the name - Besides that, I un-resolved all the comments above and re-resolved the ones that are resolved. We just need to address the ones that are still unresolved:
Thanks @nimish! |
Overriding
|
Doing the obvious extension to non integral 'n' by adding an extra weighted uniform breaks a lot of the nice formulas. I could see adding a UniformSum that did accept non integer N and keeping the IH as the integral only version IME neither seem to be used that often over just treating the sum as a sum. |
Re: implementing for non-integral |
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
|
Great. Please remember to send that announcement #20481 (comment) I'll fix the lint errors, etc. |
Re the moments/cumulants code: I'll remove it. |
@nimish thanks for working on this distribution. I'm not sure what the state of the distributions infrastructure is @mdhaber or @steppi please chime in if I'm derailing anything here. @nimish for the general moments of this distribution you should be able to use the formula I noted here. That formula should work for all non-central non-negative integer moments as far as I can tell. The binomial coefficients are available from I'll drop a note with a backlink here in the discuss forum. |
Not derailing, since the new infrastructure is a different issue. If the formula is that simple in terms of the Stirling numbers, it would be OK to include them in this PR, accompanied by the citation information. However I'd do some informal tests of the useful domain of the formula. When do the terms begin to overflow? And in what cases do we get low precision (when is the result close to 1)? Does the generic implementation do better in these cases? |
Thanks. I have the code and derivation of a general cumulant to moment formula but left it out for now since it's not clear if it's better than just calculating the expectation directly @mdhaber @rlucas7 https://gist.github.com/nimish/c986c6f71863eca8b119cd12094e9e72 is a correct derivation of the moments from cumulants, and it's what I removed in It looks like the stirling number formula for the moments lines up with the one from the cumulants so I'll stick the simpler one from @rlucas7 in, thanks! |
Sounds good @nimish I agree with @mdhaber 's comment here:
I'd guess that at some higher I'm guessing the values for this scenario are larger than where I'd think to use but in any case we usually try to document where the numerics are less accurate. (you can see examples of this in the pr for stirling2 where I wrote the approximate branch of the code). Note: for the moments here I'd recommend to use the |
Makes sense. Worst case scenario i fall back to logs or other tricks |
Most recent commit addresses my comments and should fix the lint issues. I'll also run the xslow tests locally to make sure all the right ones are xslowed or skipped. |
This is not a full review of the pull request, but I can see some changes that should be made.
|
@WarrenWeckesser I'd suggest participating in gh-17807 (which I opened to discuss these points when they were coming up frequently) if you want to impose stricter requirements on all distributions. It's not that I don't understand these things or remember to consider them; I'm pretty familiar with The suggestion about breaking the tests up is good, and I break up my tests by their intent, but there were enough higher-priority things to adjust already that I wasn't going to insist on it. These are all good suggestions and I'd encourage @nimish to consider them, but I personally won't require them here or of all distributions added with the old infrastructure. (This is the last of those that I will review, in any case. I would have just let this wait for you or others to review like gh-19145, but I happened to be interested in this distribution when the original issue was posted.) |
@WarrenWeckesser good point on the numeric cancellation, that's a genuine bug I'll look at fixing. But as @mdhaber said catastrophic cancellation at the tails of a symmetric distribution is an infra-wide problem that likely needs regression testing and should be enforced at a higher level.
|
We should be within 10 ulps of the exact rational division, but this boils down to the precision of bspline evaluation more than anything
Long story short, the best we can do here is to be equal within 10-ish ulps since we're basically just testing the precision of the BSpline evaluation algorithm. Unfortunately, since the exact value of I think we're reasonably close and even then it's tough to justify any closer without heuristics that might be better placed in global infrastructure. The exact value only needs rational computation so perhaps Note that even specifically implemented and optimized spline eval algorithms have roughly the same error: https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/sf_poly/cardinal_b_splines.html and it grows with the IH parameter. https://en.wikipedia.org/wiki/Hoeffding%27s_inequality guarantee quite rapid convergence to zero anyway. |
Reference issue
Closes #14806
What does this implement/fix?
Adds the Irwin-Hall distribution i.e.$$\sum^{n}_{k=1}U_k$$ where $U_k \sim U(0, 1)$ and are independent.
Additional information
The Bates distribution is just the IH distribution scaled by$1/n$