Skip to content
This repository

Fix variance and kurtosis functions for the Tukey Lambda distribution (ticket 1545) #106

Closed
wants to merge 2 commits into from

2 participants

Warren Weckesser Josef Perktold
Warren Weckesser
Collaborator

See ticket 1545, and (closed) pull request #101 for more information.

Josef Perktold
Collaborator

Looks like a lot of work. I just read through the changes, not all details yet.

I think the generic stats functions are not vectorized, only the distribution specific _stats might work (by chance) for array arguments. I never checked this, but trying a few examples all raise exception (unintentional because it doesn't work) in the generic wrapping code in stats.

I didn't think about this earlier only when I saw the raise NotImplementedError. In general vectorized functions don't raise an exception but return nan for invalid arguments and return the valid results.

However, some methods that are not vectorized in scipy.stats.distributions raise exceptions, mostly ValueError.

Warren Weckesser
Collaborator

The docstring for the stats() method (.e.g norm.stats(), gamma.stats(), etc) says:

Some statistics of the given RV

Parameters
----------
arg1, arg2, arg3,... : array_like
    The shape parameter(s) for the distribution (see docstring of the
    instance object for more information)

That says to me that the stats() method should handle arrays. Some do, and some don't:

In [46]: dgamma.stats([1,2])
Out[46]: (array([ 0.,  0.]), array([ 2.,  6.]))

In [47]: gamma.stats([1,2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/warren/<ipython-input-47-aea54145098c> in <module>()
----> 1 gamma.stats([1,2])

/Users/warren/local_scipy/lib/python2.7/site-packages/scipy/stats/distributions.pyc in stats(self, *args, **kwds)
   1642                         mu = self._munp(1.0,*goodargs)
   1643                     mu2 = mu2p - mu*mu
-> 1644                 if np.isinf(mu):
   1645                     #if mean is inf then var is also inf

   1646                     mu2 = np.inf

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I think that's a bug in gamma.stats(), and any other distribution that doesn't handle an array.

Maybe raising NotImplementedError isn't the right response, but the fact that the function breaks when lambda > 98 is a limitation of the implementation, not of the function itself. The mathematical function is defined for lambda > 98, but this implementation doesn't compute it. That's what I had in mind by using the NotImplementedError, but maybe just putting in nan is fine, as long as this is explained in the docstring. (Better, of course, would be to make the function work!)

Josef Perktold
Collaborator

I think you can use gammaln for large values

>>> reg=200;( (2.0 / (reg**2)) * (1.0 / (1 + 2*reg) - np.exp(2*gammaln(reg + 1) - gammaln(2*reg + 2))) )
1.2468827930174563e-07

It looks like eventually it will be just a degenerate distribution with a point at zero, so we could also set the variance to zero for large enough. There are other distributions where the approach to the degenerate distribution is not well defined.

To the vectorized stats question: I think this deserves a separate ticket. The stats method is partially, (but incorrectly or incompletely) set up to handle array inputs.
However, I don't understand why dgamma.stats works, my impression is that all array inputs should eventually hit the "if np.isinf(mu):" in the generic code. (I only tried gamma and lognorm). One possibility is because mu=0 in dgamma, maybe it's always a scalar at the isinf check.

Warren Weckesser
Collaborator

I experimented a bit with the gammaln version, and it seems to work just as well as the version I implemented for the same range, and it allows much larger arguments--thanks!

As the wikipedia page points out, the formulas for variance and kurtosis can also be expressed in terms of beta functions, so I'm going to try replacing the use of the gamma (or gammaln) function with beta.

I'll also implement the calculation near lambda=0 with Pade approximations. The implementation using interpolation was an interesting experiment, but far less efficient than using Pade.

So I'm basically going to rewrite the implementation. At least the tests won't have to change. :)

Warren Weckesser
Collaborator

I just committed a rewrite of the variance and kurtosis functions. The formulas are implemented using the beta function, and values near lambda=0 are computed using Pade approximations. (I'll rebase before actually committing anything, so even if it looks good, don't hit the big green button yet.)

Josef Perktold
Collaborator

Looks nice, and thanks for including the mpmath code (I need to remember this as a recipe)

The changes are fine with me. My impression was that the variance is inf for lam<-0.5, but looking for half an hour on the internet doesn't give any clear statements. (var for lam<-0.5 is "not finite") Without any contrary evidence the nan in this case is fine.

Warren Weckesser
Collaborator

Thanks for reviewing the request, Josef.

Rebased and committed in 0cac0b5

Warren Weckesser WarrenWeckesser closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
202  scipy/stats/_tukeylambda_stats.py
... ...
@@ -0,0 +1,202 @@
  1
+
  2
+
  3
+import numpy as np
  4
+from numpy import array, poly1d
  5
+from scipy.interpolate import interp1d
  6
+from scipy.special import beta
  7
+
  8
+
  9
+# The following code was used to generate the Pade coefficients for the
  10
+# Tukey Lambda variance function.  Version 0.17 of mpmath was used.
  11
+#---------------------------------------------------------------------------
  12
+# import mpmath as mp
  13
+#
  14
+# mp.mp.dps = 60
  15
+#
  16
+# one   = mp.mpf(1)
  17
+# two   = mp.mpf(2)
  18
+#
  19
+# def mpvar(lam):
  20
+#     if lam == 0:
  21
+#         v = mp.pi**2 / three
  22
+#     else:
  23
+#         v = (two / lam**2) * (one / (one + two*lam) -
  24
+#                               mp.beta(lam + one, lam + one))
  25
+#     return v
  26
+#
  27
+# t = mp.taylor(mpvar, 0, 8)
  28
+# p, q = mp.pade(t, 4, 4)
  29
+# print "p =", [mp.fp.mpf(c) for c in p]
  30
+# print "q =", [mp.fp.mpf(c) for c in q]
  31
+#---------------------------------------------------------------------------
  32
+
  33
+# Pade coefficients for the Tukey Lambda variance function.
  34
+_tukeylambda_var_pc = [3.289868133696453, 0.7306125098871127,
  35
+                       -0.5370742306855439, 0.17292046290190008,
  36
+                       -0.02371146284628187]
  37
+_tukeylambda_var_qc = [1.0, 3.683605511659861, 4.184152498888124,
  38
+                       1.7660926747377275, 0.2643989311168465]
  39
+
  40
+# numpy.poly1d instances for the numerator and denominator of the
  41
+# Pade approximation to the Tukey Lambda variance.
  42
+_tukeylambda_var_p = poly1d(_tukeylambda_var_pc[::-1])
  43
+_tukeylambda_var_q = poly1d(_tukeylambda_var_qc[::-1])
  44
+
  45
+
  46
+def tukeylambda_variance(lam):
  47
+    """Variance of the Tukey Lambda distribution.
  48
+
  49
+    Parameters
  50
+    ----------
  51
+    lam : array_like
  52
+        The lambda values at which to compute the variance.
  53
+
  54
+    Returns
  55
+    -------
  56
+    v : ndarray
  57
+        The variance.  For lam < -0.5, the variance is not defined, so
  58
+        np.nan is returned.  For lam = 0.5, np.inf is returned.
  59
+
  60
+    Notes
  61
+    -----
  62
+    In an interval around lambda=0, this function uses the [4,4] Pade
  63
+    approximation to compute the variance.  Otherwise it uses the standard
  64
+    formula (http://en.wikipedia.org/wiki/Tukey_lambda_distribution).  The
  65
+    Pade approximation is used because the standard formula has a removable
  66
+    discontinuity at lambda = 0, and does not produce accurate numerical
  67
+    results near lambda = 0.
  68
+    """
  69
+    lam = np.asarray(lam)
  70
+    shp = lam.shape
  71
+    lam = np.atleast_1d(lam).astype(np.float64)
  72
+
  73
+    # For absolute values of lam less than threshold, use the Pade
  74
+    # approximation.
  75
+    threshold = 0.075
  76
+
  77
+    # Play games with masks to implement the conditional evaluation of
  78
+    # the distribution.
  79
+    # lambda < -0.5:  var = nan
  80
+    low_mask = lam < -0.5
  81
+    # lambda == -0.5: var = inf
  82
+    neghalf_mask = lam == -0.5
  83
+    # abs(lambda) < threshold:  use Pade approximation
  84
+    small_mask = np.abs(lam) < threshold
  85
+    # else the "regular" case:  use the explicit formula.
  86
+    reg_mask = ~(low_mask | neghalf_mask | small_mask)
  87
+
  88
+    # Get the 'lam' values for the cases where they are needed.
  89
+    small = lam[small_mask]
  90
+    reg = lam[reg_mask]
  91
+
  92
+    # Compute the function for each case.
  93
+    v = np.empty_like(lam)
  94
+    v[low_mask] = np.nan
  95
+    v[neghalf_mask] = np.inf
  96
+    if small.size > 0:
  97
+        # Use the Pade approximation near lambda = 0.
  98
+        v[small_mask] =  _tukeylambda_var_p(small) / _tukeylambda_var_q(small)
  99
+    if reg.size > 0:
  100
+        v[reg_mask] = (2.0 / reg**2) * (1.0 / (1.0 + 2 * reg) -
  101
+                                      beta(reg + 1, reg + 1))
  102
+    v.shape = shp
  103
+    return v
  104
+
  105
+
  106
+# The following code was used to generate the Pade coefficients for the
  107
+# Tukey Lambda kurtosis function.  Version 0.17 of mpmath was used.
  108
+#---------------------------------------------------------------------------
  109
+# import mpmath as mp
  110
+#
  111
+# mp.mp.dps = 60
  112
+#
  113
+# one   = mp.mpf(1)
  114
+# two   = mp.mpf(2)
  115
+# three = mp.mpf(3)
  116
+# four  = mp.mpf(4)
  117
+#
  118
+# def mpkurt(lam):
  119
+#     if lam == 0:
  120
+#         k = mp.mpf(6)/5
  121
+#     else:
  122
+#         numer = (one/(four*lam+one) - four*mp.beta(three*lam+one, lam+one) +
  123
+#                  three*mp.beta(two*lam+one, two*lam+one))
  124
+#         denom = two*(one/(two*lam+one) - mp.beta(lam+one,lam+one))**2
  125
+#         k = numer / denom - three
  126
+#     return k
  127
+#
  128
+# # There is a bug in mpmath 0.17: when we use the 'method' keyword of the
  129
+# # taylor function and we request a degree 9 Taylor polynomial, we actually
  130
+# # get degree 8.
  131
+# t = mp.taylor(mpkurt, 0, 9, method='quad', radius=0.01)
  132
+# t = [mp.chop(c, tol=1e-15) for c in t]
  133
+# p, q = mp.pade(t, 4, 4)
  134
+# print "p =", [mp.fp.mpf(c) for c in p]
  135
+# print "q =", [mp.fp.mpf(c) for c in q]
  136
+#---------------------------------------------------------------------------
  137
+
  138
+# Pade coefficients for the Tukey Lambda kurtosis function.
  139
+_tukeylambda_kurt_pc = [1.2, -5.853465139719495, -22.653447381131077,
  140
+                        0.20601184383406815, 4.59796302262789]
  141
+_tukeylambda_kurt_qc = [1.0, 7.171149192233599, 12.96663094361842,
  142
+                        0.43075235247853005, -2.789746758009912]
  143
+
  144
+# numpy.poly1d instances for the numerator and denominator of the
  145
+# Pade approximation to the Tukey Lambda kurtosis.
  146
+_tukeylambda_kurt_p = poly1d(_tukeylambda_kurt_pc[::-1])
  147
+_tukeylambda_kurt_q = poly1d(_tukeylambda_kurt_qc[::-1])
  148
+
  149
+
  150
+def tukeylambda_kurtosis(lam):
  151
+    """Kurtosis of the Tukey Lambda distribution.
  152
+
  153
+    Parameters
  154
+    ----------
  155
+    lam : array_like
  156
+        The lambda values at which to compute the variance.
  157
+
  158
+    Returns
  159
+    -------
  160
+    v : ndarray
  161
+        The variance.  For lam < -0.25, the variance is not defined, so
  162
+        np.nan is returned.  For lam = 0.25, np.inf is returned.
  163
+
  164
+    """
  165
+    lam = np.asarray(lam)
  166
+    shp = lam.shape
  167
+    lam = np.atleast_1d(lam).astype(np.float64)
  168
+
  169
+    # For absolute values of lam less than threshold, use the Pade
  170
+    # approximation.
  171
+    threshold = 0.055
  172
+
  173
+    # Use masks to implement the conditional evaluation of the kurtosis.
  174
+    # lambda < -0.25:  kurtosis = nan
  175
+    low_mask = lam < -0.25
  176
+    # lambda == -0.25: kurtosis = inf
  177
+    negqrtr_mask = lam == -0.25
  178
+    # lambda near 0:  use Pade approximation
  179
+    small_mask = np.abs(lam) < threshold
  180
+    # else the "regular" case:  use the explicit formula.
  181
+    reg_mask = ~(low_mask | negqrtr_mask | small_mask)
  182
+
  183
+    # Get the 'lam' values for the cases where they are needed.
  184
+    small = lam[small_mask]
  185
+    reg = lam[reg_mask]
  186
+
  187
+    # Compute the function for each case.
  188
+    k = np.empty_like(lam)
  189
+    k[low_mask] = np.nan
  190
+    k[negqrtr_mask] = np.inf
  191
+    if small.size > 0:
  192
+        k[small_mask] = _tukeylambda_kurt_p(small) / _tukeylambda_kurt_q(small)
  193
+    if reg.size > 0:
  194
+        numer = (1.0 / (4 * reg + 1) - 4 * beta(3 * reg + 1, reg + 1) +
  195
+                 3 * beta(2 * reg + 1, 2 * reg + 1))
  196
+        denom = 2 * (1.0/(2 * reg + 1) - beta(reg + 1, reg + 1))**2
  197
+        k[reg_mask] = numer / denom - 3
  198
+
  199
+    # The return value will be a numpy array; resetting the shape ensures that
  200
+    # if `lam` was a scalar, the return value is a 0-d array.
  201
+    k.shape = shp
  202
+    return k
19  scipy/stats/distributions.py
@@ -28,6 +28,8 @@
28 28
 import numpy.random as mtrand
29 29
 from numpy import flatnonzero as nonzero
30 30
 import vonmises_cython
  31
+from _tukeylambda_stats import tukeylambda_variance as _tlvar, \
  32
+                                tukeylambda_kurtosis as _tlkurt
31 33
 
32 34
 __all__ = [
33 35
            'rv_continuous',
@@ -5102,33 +5104,30 @@ class tukeylambda_gen(rv_continuous):
5102 5104
     def _argcheck(self, lam):
5103 5105
         # lam in RR.
5104 5106
         return np.ones(np.shape(lam), dtype=bool)
  5107
+
5105 5108
     def _pdf(self, x, lam):
5106 5109
         Fx = arr(special.tklmbda(x,lam))
5107 5110
         Px = Fx**(lam-1.0) + (arr(1-Fx))**(lam-1.0)
5108 5111
         Px = 1.0/arr(Px)
5109 5112
         return where((lam <= 0) | (abs(x) < 1.0/arr(lam)), Px, 0.0)
  5113
+
5110 5114
     def _cdf(self, x, lam):
5111 5115
         return special.tklmbda(x, lam)
  5116
+
5112 5117
     def _ppf(self, q, lam):
5113 5118
         q = q*1.0
5114 5119
         vals1 = (q**lam - (1-q)**lam)/lam
5115 5120
         vals2 = log(q/(1-q))
5116 5121
         return where((lam == 0)&(q==q), vals2, vals1)
  5122
+
5117 5123
     def _stats(self, lam):
5118  
-        mu2 = 2*gam(lam+1.5)-lam*pow(4,-lam)*sqrt(pi)*gam(lam)*(1-2*lam)
5119  
-        mu2 /= lam*lam*(1+2*lam)*gam(1+1.5)
5120  
-        mu4 = 3*gam(lam)*gam(lam+0.5)*pow(2,-2*lam) / lam**3 / gam(2*lam+1.5)
5121  
-        mu4 += 2.0/lam**4 / (1+4*lam)
5122  
-        mu4 -= 2*sqrt(3)*gam(lam)*pow(2,-6*lam)*pow(3,3*lam) * \
5123  
-               gam(lam+1.0/3)*gam(lam+2.0/3) / (lam**3.0 * gam(2*lam+1.5) * \
5124  
-                                                gam(lam+0.5))
5125  
-        g2 = mu4 / mu2 / mu2 - 3.0
5126  
-
5127  
-        return 0, mu2, 0, g2
  5124
+        return 0, _tlvar(lam), 0, _tlkurt(lam)
  5125
+
5128 5126
     def _entropy(self, lam):
5129 5127
         def integ(p):
5130 5128
             return log(pow(p,lam-1)+pow(1-p,lam-1))
5131 5129
         return integrate.quad(integ,0,1)[0]
  5130
+
5132 5131
 tukeylambda = tukeylambda_gen(name='tukeylambda', shapes="lam")
5133 5132
 
5134 5133
 
22  scipy/stats/tests/test_distributions.py
@@ -791,6 +791,26 @@ def test_regression_ticket_1530():
791 791
     assert_almost_equal(params, expected, decimal=1)
792 792
 
793 793
 
  794
+def test_tukeylambda_stats_ticket_1545():
  795
+    """Some test for the variance and kurtosis of the Tukey Lambda distr."""
  796
+
  797
+    # See test_tukeylamdba_stats.py for more tests.
  798
+
  799
+    mv = stats.tukeylambda.stats(0, moments='mvsk')
  800
+    # Known exact values:
  801
+    expected = [0, np.pi**2/3, 0, 1.2]
  802
+    assert_almost_equal(mv, expected, decimal=10)
  803
+
  804
+    mv = stats.tukeylambda.stats(3.13, moments='mvsk')
  805
+    # 'expected' computed with mpmath.
  806
+    expected = [0, 0.0269220858861465102, 0, -0.898062386219224104]
  807
+    assert_almost_equal(mv, expected, decimal=10)
  808
+
  809
+    mv = stats.tukeylambda.stats(0.14, moments='mvsk')
  810
+    # 'expected' computed with mpmath.
  811
+    expected = [0, 2.11029702221450250, 0, -0.02708377353223019456]
  812
+    assert_almost_equal(mv, expected, decimal=10)
  813
+
  814
+
794 815
 if __name__ == "__main__":
795 816
     run_module_suite()
796  
-
86  scipy/stats/tests/test_tukeylambda_stats.py
... ...
@@ -0,0 +1,86 @@
  1
+
  2
+import numpy as np
  3
+from numpy.testing import assert_allclose, assert_equal
  4
+
  5
+from scipy.stats._tukeylambda_stats import tukeylambda_variance, \
  6
+                                            tukeylambda_kurtosis
  7
+
  8
+
  9
+def test_tukeylambda_stats_known_exact():
  10
+    """Compare results with some known exact formulas."""
  11
+    # Some exact values of the Tukey Lambda variance and kurtosis:
  12
+    # lambda   var      kurtosis
  13
+    #   0     pi**2/3     6/5     (logistic distribution)
  14
+    #  0.5    4 - pi    (5/3 - pi/2)/(pi/4 - 1)**2 - 3
  15
+    #   1      1/3       -6/5     (uniform distribution on (-1,1))
  16
+    #   2      1/12      -6/5     (uniform distribution on (-1/2, 1/2))
  17
+
  18
+    # lambda = 0
  19
+    var = tukeylambda_variance(0)
  20
+    assert_allclose(var, np.pi**2 / 3, atol=1e-12)
  21
+    kurt = tukeylambda_kurtosis(0)
  22
+    assert_allclose(kurt, 1.2, atol=1e-10)
  23
+
  24
+    # lambda = 0.5
  25
+    var = tukeylambda_variance(0.5)
  26
+    assert_allclose(var, 4 - np.pi, atol=1e-12)
  27
+    kurt = tukeylambda_kurtosis(0.5)
  28
+    desired = (5./3 - np.pi/2) / (np.pi/4 - 1)**2 - 3
  29
+    assert_allclose(kurt, desired, atol=1e-10)
  30
+
  31
+    # lambda = 1
  32
+    var = tukeylambda_variance(1)
  33
+    assert_allclose(var, 1.0 / 3, atol=1e-12)
  34
+    kurt = tukeylambda_kurtosis(1)
  35
+    assert_allclose(kurt, -1.2, atol=1e-10)
  36
+
  37
+    # lambda = 2
  38
+    var = tukeylambda_variance(2)
  39
+    assert_allclose(var, 1.0 / 12, atol=1e-12)
  40
+    kurt = tukeylambda_kurtosis(2)
  41
+    assert_allclose(kurt, -1.2, atol=1e-10)
  42
+
  43
+
  44
+def test_tukeylambda_stats_mpmath():
  45
+    """Compare results with some values that were computed using mpmath."""
  46
+    a10 = dict(atol=1e-10, rtol=0)
  47
+    a12 = dict(atol=1e-12, rtol=0)
  48
+    data = [
  49
+        # lambda        variance              kurtosis
  50
+        [-0.1,     4.78050217874253547,  3.78559520346454510],
  51
+        [-0.0649,  4.16428023599895777,  2.52019675947435718],
  52
+        [-0.05,    3.93672267890775277,  2.13129793057777277],
  53
+        [-0.001,   3.30128380390964882,  1.21452460083542988],
  54
+        [ 0.001,   3.27850775649572176,  1.18560634779287585],
  55
+        [ 0.03125, 2.95927803254615800,  0.804487555161819980],
  56
+        [ 0.05,    2.78281053405464501,  0.611604043886644327],
  57
+        [ 0.0649,  2.65282386754100551,  0.476834119532774540],
  58
+        [ 1.2,     0.242153920578588346, -1.23428047169049726],
  59
+        [ 10.0,  0.00095237579757703597,  2.37810697355144933],
  60
+        [ 20.0,  0.00012195121951131043,  7.37654321002709531],
  61
+    ]
  62
+
  63
+    for lam, var_expected, kurt_expected in data:
  64
+        var = tukeylambda_variance(lam)
  65
+        assert_allclose(var, var_expected, **a12)
  66
+        kurt = tukeylambda_kurtosis(lam)
  67
+        assert_allclose(kurt, kurt_expected, **a10)
  68
+
  69
+    # Test with vector arguments (most of the other tests are for single
  70
+    # values).
  71
+    lam, var_expected, kurt_expected = zip(*data)
  72
+    var = tukeylambda_variance(lam)
  73
+    assert_allclose(var, var_expected, **a12)
  74
+    kurt = tukeylambda_kurtosis(lam)
  75
+    assert_allclose(kurt, kurt_expected, **a10)
  76
+
  77
+
  78
+def test_tukeylambda_stats_invalid():
  79
+    """Test values of lambda outside the domains of the functions."""
  80
+    lam = [-1.0, -0.5]
  81
+    var = tukeylambda_variance(lam)
  82
+    assert_equal(var, np.array([np.nan, np.inf]))
  83
+
  84
+    lam = [-1.0, -0.25]
  85
+    kurt = tukeylambda_kurtosis(lam)
  86
+    assert_equal(kurt, np.array([np.nan, np.inf]))
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.