Stats bugs tickets 835, 1530, 1545 #101

wants to merge 26 commits into

7 participants

SciPy member

835 is more of an enhancement, nan propagation, Thanks to Per Brodtkorb
1530 is a regression with starting values for fit, only special case of ticket is fixed
1536 is a clear bug in a formula, corrected formula by Warren

this is based on maintenance-0.10x, but I ran the testsuite against 0.10.0b2 (since I cannot compile scipy)

because it's behind master, only the last 4 commits are relevant

rgommers and others added some commits Sep 11, 2011
@rgommers rgommers REL: set version to 0.10.0b1 7483939
@rgommers rgommers REL: some minor fixes to the release scripts. 35fca4b
@rgommers rgommers DOC: arpack tutorial: fix some minor bugs. Thanks to Jake Vanderplas. 60d9f2b
@cgohlke cgohlke BUG: fix several small bugs in rbf, io.harwell_boeing and misc. 6b7b38b
@scottza scottza BUG: fix build from source distribution with existing 43edc4f
@rgommers rgommers TST: fix special.logit test failures that happen with numpy 1.5.1
Also fix test noise.
@rgommers rgommers BUG: fix 2to3 conversion, some relative imports were missing. e8123b1
@cgohlke cgohlke BUG: fixes for issue on Py3 in _logit and a syntax error in fblas. 1c5efe4
@pv pv BUG: py3tool: specify encoding in custom_mangling a861881
Warren Weckesser TST: signal: PEP8 tweaks. f77e0d6
Warren Weckesser TST: signal: A few more tests for the new discrete LTI functions. bce29ec
@rgommers rgommers REL: set version to 0.10.0b2 8edbea5
@rgommers rgommers REL: set version to 0.10.0rc1-dev dd538f4
@rgommers rgommers DOC: add to release notes. Add Pauli's script to generate list of aut…
@yarikoptic yarikoptic DOC: docstrings -- fixed number in enumeration (pdist) + few typos 34017cb
@rgommers rgommers MAINT: mark UTF-8 and Cython-generated files as binary in gitattributes. be879ff
@rgommers rgommers BUG: make sure linalg/src/fblaswrap.f is included in release tarball. 5ab12b0
@rgommers rgommers MAINT: update references svn --> git, or remove them. Closes #1516. f07eb5f
@rgommers rgommers TST: clean up test noise from floating point operations. bb86f4e
@rgommers rgommers TST: Mark medfilt bug as knownfail on 64-bit Linux.
Cherry-picked from 0.9.x branch (c3b3457).
This wasn't fixed, so marking as knownfail for 0.10 again.
@pv pv DOC: signal: fix documentation for the mode='same' versions of convol…
@pv pv TST: sparse/arpack: use less strict test tolerances for single precision
Moreover, increase tolerances further for iterative methods in single

(cherry picked from commit bd49968)
@josef-pkt josef-pkt ENH: stats ticket:835 propagate nans, changes thanks to Per Brodtkorb 9e2fa09
@josef-pkt josef-pkt BUG: ticket:835 fix incorrect changes in previous commit e3a8ca1
@josef-pkt josef-pkt BUG: ticket:1530 cauchy fit, cheap fix for starting values 69f46c0
@josef-pkt josef-pkt BUG: ticket:1545 incorrect variance in tukeylambda.stats 2a34dcd
SciPy member
pv commented Oct 31, 2011

(Just discovered this also by myself:) Note that you can tell Github the branch is against maintenance/0.10.x by clicking the "Change commits" button on the top right, when submitting the pull request.


The new formula blows up near lambda=0. It has removable discontinuity at lambda=0, where the value should by (pi**2)/3. We'll need to check for lambda being close to 0 and switch to a different formula.

SciPy member

The simplest would be to set it to (pi**2)/3 in the range (-5e-6, 5e-6) or something like this.

>>> stats.tukeylambda.stats(5e-6)[1] - np.pi**2 / 3
>>> stats.tukeylambda.stats(-5e-6)[1] - np.pi**2 / 3

The difference looks monotonic up to this range and starts to bounce and go wrong closer to zero using this

>>> np.array([[k*10**-i, stats.tukeylambda.stats(k*10**-i)[1] - np.pi**2 / 3] 
for i in range(1,8) for k in [0.25, 0.5,0.75,1][::-1]])

>>> np.array([[-k*10**-i, stats.tukeylambda.stats(-k*10**-i)[1] - np.pi**2 / 3]
for i in range(1,8) for k in [0.25, 0.5,0.75,1][::-1]])

Here's an updated version that handles the neighborhood of lambda=0 by using the Maclaurin series (thanks, Maxima!).

import numpy as np
from numpy import pi
from scipy.special import gamma, zeta

def _tlvar0(lam):
    """Third order Maclaurin series for the variance of the Tukey Lambda distributions."""
    z3 = zeta(3, 1)
    z5 = zeta(5, 1)
    pi2 = pi**2
    pi4 = pi2**2
    v = (pi2/3 - (2*pi2 + 12*z3)*lam/3 + (3*pi4 + 80*pi2 + 480*z3)*lam**2/60
            - (3*pi4 + (-20*z3 + 80)*pi2 + 480*z3 + 360*z5)*lam**3/30)
    return v

def tlvar(lam):
    """Variance of the Tukey Lambda distribution.

    Always returns a numpy ndarray.
    # FIXME: The variance is not defined for lam < -0.5.  Should probably return nan for such values. 
    lam = np.atleast_1d(lam).astype(np.float64)
    # For values of lam less than threshold, use the third order Maclaurin
    # series to evaluate the distribution.
    threshold = 1e-3
    # Play games with masks to implement the conditional evaluation of
    # the distribution.
    small_mask = np.abs(lam) < threshold
    big_mask = ~small_mask
    small = lam[small_mask]
    big = lam[big_mask]
    v = np.empty_like(lam)
    if small.size > 0:
        v[small_mask] = _tlvar0(small)
    if big.size > 0:
        v[big_mask] = ( (2.0/(big**2))
                        * (1.0 / (1 + 2*big)
                           - gamma(big + 1)**2 / gamma(2*big + 2)) )
    return v
SciPy member

Thanks Warren, I will change the pull request with this.

A np.where that only calculates the parts that are actually used would be nice.

I found and fixed one more variance bug in my fork (powerlaw), only 8 failures left, which might be mostly false alarms, but I'm bored chasing bugs in individual distributions.


Josef, I've tweaked my code a bit more since my last post, so it handles lambda <= -0.5 nicely. If you like, you can omit this commit (2a34dcd), and I'll put together a real pull request of my own for tukeylambda.

SciPy member

Thanks, you're welcome to it. You could use the test in my commit, the generic cases will be handled by the generic tests (once my corrected test is merged), but we will need keep the tests around zero and <= -0.5.

I guess whoever merges this, can just skip the last commit. Or is there anything that I should do?

I don't have commit rights in master.

SciPy member

The fix for #1530 is OK to backport, the other changes I'd rather leave till 0.11.

SciPy member

I committed the fix for #1530. So all that's left is the two commits for #835. Are those OK as is?

SciPy member

#835 are ok for 0.11. There is some change in behavior, but I don't think anyone could rely on the previous behavior for nan arguments. The zero return for nans mostly didn't make any sense.

SciPy member

OK, pushed commit for #835. That should be all except for #1545, for which Warren will submit a new pull request. So closing this one.

@rgommers rgommers closed this Nov 5, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment