Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement cdf/sf for logser distribution #3890

Open
Tracked by #20223
ev-br opened this issue Aug 21, 2014 · 8 comments
Open
Tracked by #20223

implement cdf/sf for logser distribution #3890

ev-br opened this issue Aug 21, 2014 · 8 comments
Assignees
Labels
enhancement A new feature or improvement scipy.stats

Comments

@ev-br
Copy link
Member

ev-br commented Aug 21, 2014

Log-series distribution has a closed form expression for cdf in terms of an incomplete beta function, http://en.wikipedia.org/wiki/Logarithmic_distribution.

Implementing could address a cryptic comment,

# FIXME: Fails _cdfvec

in https://github.com/scipy/scipy/blob/master/scipy/stats/_discrete_distns.py#L360.

A well hidden ticket, #1883 reporting a problem with stats.logser.sf(np.inf, 0.5), can be related.

@chrisb83
Copy link
Member

I just wanted to add the cdf which uses the incomplete beta function. however, it uses the "unregularized" beta function denoted B(x; a,b) on Wikipedia while special.betainc is the regularized beta function (denoted I_x(a, b) on Wikipedia which is a rescaled version. however, we need B(p; 0, k+1) to compute cdf(k, p) and the zero leads to NaN in betainc (we cannot rescale asGamma(0) is not defined).

Any chance that the "unscaled" version can be added without too much trouble?

Derivation of the CDF:

  • p_k = P(X = k) = c * p^k / k
  • d/dp sum_j=1^k p**j / j = sum_j=1^k p**(j-1) = (1-p**k)/(1-p)
  • Integrating gives sum_j=1^k p**j / j = log(1-p) - Integral(t**k (1-t)**(-1), t=0..p)
    and the last term can be expressed as the incomplete beta function B(p, 0, k+1).

Note that integration with a = 0 is not a problem here as p < 1. But betainc(p, 0, k) is NaN.

@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Nov 30, 2020

We only need a special case of the incomplete beta function B(x; a, b), with a = k +1 and b = 0. Plugging those values into the integral form of the beta function gives (using Int to denote the integal--I wish github had LaTeX markup!):

B(p; k + 1, 0) = Int_0^p t^k / (1 - t) dt

According to Wolfram Alpha, that integral can be expressed in terms of the hypergeometric function ₂F₁:

B(p; k + 1, 0) = Int_0^p t^k / (1 - t) dt = p^(k + 1) ₂F₁(1, k+1; k+2, p) / (k + 1)

We have ₂F₁ implemented as scipy.special.hyp2f1, so we can implement the CDF as

In [131]: def logser_cdf(k, p): 
     ...:     k = np.asarray(k) 
     ...:     p = np.asarray(p) 
     ...:     r = p**(k+1) * hyp2f1(1, k+1, k+2, p)/(k+1) 
     ...:     return 1 + r/np.log(1-p) 
     ...:                                                                                                                                                                                                                       

Compare the calculation to the existing implementation:

In [132]: logser_cdf([1, 2, 3, 4, 5], 0.93)                                                                                
Out[132]: array([0.34972135, 0.51234177, 0.61316644, 0.68349164, 0.73581359])

In [133]: from scipy.stats import logser                                                                                   

In [134]: logser.cdf([1, 2, 3, 4, 5], 0.93)                                                                                
Out[134]: array([0.34972135, 0.51234177, 0.61316644, 0.68349164, 0.73581359])

Looks good.

I'll create a PR with this implementation.

@WarrenWeckesser
Copy link
Member

It turns out I won't create a PR. hyp2f1 has problems with large arguments. So until we have a more robust implementation of ₂F₁, we probably shouldn't use it for logser._cdf.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 10, 2022

@steppi I imagined (then saw) that you might be interested in hyp2f1, so I thought I'd mention this issue. Do you think hyp2f1 is in good enough shape now that it would make sense to implement cdf/sf methods based on it?

@steppi
Copy link
Contributor

steppi commented Mar 11, 2022

@steppi I imagined (then saw) that you might be interested in hyp2f1, so I thought I'd mention this issue. Do you think hyp2f1 is in good enough shape now that it would make sense to implement cdf/sf methods based on it?

Not yet. So far I've only touched the implementation for complex z and even there I haven't implemented the recurrences yet that let it work well with large arguments.

That being said, the special cases of hyp2f1 appearing in the incomplete beta function are better handled with a more specialized continued fraction expansion. I have an implementation in one of my own projects that could be ported over here with little effort. Although I've learned in just the past week from the Boost documentation for Incomplete Beta functions that there are better continued fraction expansions than the one I used.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 11, 2022

I see. Speaking of Boost, is there anything there that could be used? It's relatively easy to add boost stuff to SciPy now.

@steppi
Copy link
Contributor

steppi commented Mar 11, 2022

I see. Speaking of Boost, is there anything there that could be used? It's relatively easy to add boost stuff to SciPy now.

I just checked, and yes actually. Boost has an implementation of the unregularized incomplete beta function. See here.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 11, 2022

@mckib2 Are these easy to include, too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

No branches or pull requests

7 participants