implement cdf/sf for logser distribution #3890

ev-br · 2014-08-21T20:45:50Z

Log-series distribution has a closed form expression for cdf in terms of an incomplete beta function, http://en.wikipedia.org/wiki/Logarithmic_distribution.

Implementing could address a cryptic comment,

# FIXME: Fails _cdfvec

in https://github.com/scipy/scipy/blob/master/scipy/stats/_discrete_distns.py#L360.

A well hidden ticket, #1883 reporting a problem with stats.logser.sf(np.inf, 0.5), can be related.

The text was updated successfully, but these errors were encountered:

chrisb83 · 2018-10-21T17:52:06Z

I just wanted to add the cdf which uses the incomplete beta function. however, it uses the "unregularized" beta function denoted B(x; a,b) on Wikipedia while special.betainc is the regularized beta function (denoted I_x(a, b) on Wikipedia which is a rescaled version. however, we need B(p; 0, k+1) to compute cdf(k, p) and the zero leads to NaN in betainc (we cannot rescale asGamma(0) is not defined).

Any chance that the "unscaled" version can be added without too much trouble?

Derivation of the CDF:

p_k = P(X = k) = c * p^k / k
d/dp sum_j=1^k p**j / j = sum_j=1^k p**(j-1) = (1-p**k)/(1-p)
Integrating gives sum_j=1^k p**j / j = log(1-p) - Integral(t**k (1-t)**(-1), t=0..p)
and the last term can be expressed as the incomplete beta function B(p, 0, k+1).

Note that integration with a = 0 is not a problem here as p < 1. But betainc(p, 0, k) is NaN.

WarrenWeckesser · 2020-11-30T15:57:53Z

We only need a special case of the incomplete beta function B(x; a, b), with a = k +1 and b = 0. Plugging those values into the integral form of the beta function gives (using Int to denote the integal--I wish github had LaTeX markup!):

B(p; k + 1, 0) = Int_0^p t^k / (1 - t) dt

According to Wolfram Alpha, that integral can be expressed in terms of the hypergeometric function ₂F₁:

B(p; k + 1, 0) = Int_0^p t^k / (1 - t) dt = p^(k + 1) ₂F₁(1, k+1; k+2, p) / (k + 1)

We have ₂F₁ implemented as scipy.special.hyp2f1, so we can implement the CDF as

In [131]: def logser_cdf(k, p): 
     ...:     k = np.asarray(k) 
     ...:     p = np.asarray(p) 
     ...:     r = p**(k+1) * hyp2f1(1, k+1, k+2, p)/(k+1) 
     ...:     return 1 + r/np.log(1-p) 
     ...:

Compare the calculation to the existing implementation:

In [132]: logser_cdf([1, 2, 3, 4, 5], 0.93)                                                                                
Out[132]: array([0.34972135, 0.51234177, 0.61316644, 0.68349164, 0.73581359])

In [133]: from scipy.stats import logser                                                                                   

In [134]: logser.cdf([1, 2, 3, 4, 5], 0.93)                                                                                
Out[134]: array([0.34972135, 0.51234177, 0.61316644, 0.68349164, 0.73581359])

Looks good.

~~I'll create a PR with this implementation.~~

WarrenWeckesser · 2020-11-30T16:34:56Z

It turns out I won't create a PR. hyp2f1 has problems with large arguments. So until we have a more robust implementation of ₂F₁, we probably shouldn't use it for logser._cdf.

mdhaber · 2022-03-10T22:17:50Z

@steppi I imagined (then saw) that you might be interested in hyp2f1, so I thought I'd mention this issue. Do you think hyp2f1 is in good enough shape now that it would make sense to implement cdf/sf methods based on it?

steppi · 2022-03-11T00:20:36Z

@steppi I imagined (then saw) that you might be interested in hyp2f1, so I thought I'd mention this issue. Do you think hyp2f1 is in good enough shape now that it would make sense to implement cdf/sf methods based on it?

Not yet. So far I've only touched the implementation for complex z and even there I haven't implemented the recurrences yet that let it work well with large arguments.

That being said, the special cases of hyp2f1 appearing in the incomplete beta function are better handled with a more specialized continued fraction expansion. I have an implementation in one of my own projects that could be ported over here with little effort. Although I've learned in just the past week from the Boost documentation for Incomplete Beta functions that there are better continued fraction expansions than the one I used.

mdhaber · 2022-03-11T02:16:44Z

I see. Speaking of Boost, is there anything there that could be used? It's relatively easy to add boost stuff to SciPy now.

steppi · 2022-03-11T03:33:04Z

I see. Speaking of Boost, is there anything there that could be used? It's relatively easy to add boost stuff to SciPy now.

I just checked, and yes actually. Boost has an implementation of the unregularized incomplete beta function. See here.

mdhaber · 2022-03-11T06:39:45Z

@mckib2 Are these easy to include, too?

ev-br added scipy.stats labels Aug 21, 2014

mdhaber mentioned this issue Mar 11, 2022

A Solid Foundation for Statistics in Python with SciPy mdhaber/scipy#26

Closed

mdhaber added the enhancement A new feature or improvement label Mar 16, 2022

mdhaber assigned mckib2 Apr 30, 2022

rgommers removed the prio-low label Dec 19, 2023

ev-br mentioned this issue Feb 8, 2024

BUG: Output of logser.cdf is not larger than 0.9999999999999998 on some platforms #20048

Open

steppi mentioned this issue Mar 9, 2024

META: Streamlined Special Function Development in SciPy SDG Tracking #20223

Open

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement cdf/sf for logser distribution #3890

implement cdf/sf for logser distribution #3890

ev-br commented Aug 21, 2014

chrisb83 commented Oct 21, 2018

WarrenWeckesser commented Nov 30, 2020 •

edited

WarrenWeckesser commented Nov 30, 2020

mdhaber commented Mar 10, 2022 •

edited

steppi commented Mar 11, 2022 •

edited

mdhaber commented Mar 11, 2022

steppi commented Mar 11, 2022

mdhaber commented Mar 11, 2022

implement cdf/sf for logser distribution #3890

implement cdf/sf for logser distribution #3890

Comments

ev-br commented Aug 21, 2014

chrisb83 commented Oct 21, 2018

WarrenWeckesser commented Nov 30, 2020 • edited

WarrenWeckesser commented Nov 30, 2020

mdhaber commented Mar 10, 2022 • edited

steppi commented Mar 11, 2022 • edited

mdhaber commented Mar 11, 2022

steppi commented Mar 11, 2022

mdhaber commented Mar 11, 2022

WarrenWeckesser commented Nov 30, 2020 •

edited

mdhaber commented Mar 10, 2022 •

edited

steppi commented Mar 11, 2022 •

edited