# Methods for pdf computation, parameter estimation, and variable sampling in stats.levy_stable. Support for different parameterizations. #8766

Open
wants to merge 69 commits into
from

## Conversation

Projects
None yet
7 participants

### rishabhkabra commented Apr 23, 2018 • edited by rgommers

 In response to #8263. Cross-ref #7374. Document and support four parameterizations for the distribution, referred to as types A, B, C, and P. Add support/fix several methods for pdf computation: Zolotarev's: #7374 had a buggy implementation. Quadrature integration. Analytical expansion. FFT + characteristic function for multiple inputs. Support the following methods for parameter estimation: McCulloch's quantile method Koutrouvelis's empiricial characteristic function method. Fix issues with random variable sampling. Add tests for the pdf computation particularly in problematic regions of the parameter space. Our plan is to benchmark and analyze these methods against the reference implementation/data at http://lpi.tel.uva.es/stable. We are writing an Arxiv paper to support our implementation. Credits: this is a continuation of gh-7374 by @bsdz; most of the rest of this work was done at the Man AHL hackathon (http://ahl.com/hackathon).

### bsdz added some commits May 3, 2017

 Add Levy Stable Parameter Estimation using McCulloch 1986 Quantiles 
method.
Add PDF and CDF calculation using FFT estimate on Continous Fourier
Integral of Characteristic Function.
 a912b73 
 Use numpy dstack instead of stack to remain backward compatible. 
Reduce tolerance on fit test (for python 2.7).
 aa20a38 
 Correct styling for pep8 and pyflakes. 
 8908593 
 Separate tests. Declare quad test as slow. 
 f1bb0ea 
 Added Stable moments. 
Improve documentation.
Choose slightly more illustrative example parameters.
 2d3f9ba 
 Recommended changes for references and array broadcasting. 
 01c5425 
 Take real part of density before interpolating (as suggested by pv). 
 317a5c9 
 Merge branch 'master' into master 
 c0a4a1b 
 do not use TestCase. remove slow decorator 
 f918b13 
 Merge branch 'master' of github.com:bsdz/scipy 
 f982322 
 fix merge break 
 a4770e0 
 move warning suppression to test file 
 d20c46f 

### an81 commented on d20c46fNov 22, 2017

 Is there some development on implementation of stable dists?

### bsdz and others added some commits Dec 3, 2017

 Add Zolotarev method for PDF calculation using S_0 parameterization. 
 257a447 
 fix flake issues + missing cr 
 7906dd5 
 fix reference formatting for doctest 
 8cea10c 
 Added Zolotarev CDF method. 
Fix return data masking bug.
Disable FFT by default for CDF.
Increase threshold to 100 for PDF FFT.
Take real part in quad to remove complexwarning.
 da5e0dc 
 remove double spacing from test script 
 1770b04 
 Merge remote-tracking branch 'upstream/master' 
Conflicts:
scipy/stats/_continuous_distns.py
 119f7e9 
 No need to override fit() with fitstart(). 
 803da12 
 Add note that FFT not recommended MLE fitting. 
 8711278 
 force CI rebuild by adding explicit pdf config 
 b12b6b8 
 Update docstring to use sphinx math notation. 
 0c9593b 
 Merge branch 'master' of github.com:scipy/scipy 
 22d1724 
 Parameterize levy_stable_gen characteristic function with optional lo… 
…cation and scale.
 806b40e 
 Fix use of scale parameter, c, in characteristic function. 
 fc5acf8 
 Add pdf smoke test for levy_stable distribution. Currently fails as a… 
…lpha->0.
 8d2be73 
 Allows for different parameterisations. 
 b9b7c2e 
 Merge branch 'master' of https://github.com/an81/scipy 
 b295752 
 Fix major bug in Zolotarev's Integral Pdf in levy_stable_gen 
 29edb82 
 Major fixes in _rvs for alpha=1 + changes in doc string in levy_stabl… 
…e_gen.
 f782385 

### chrisb83 reviewed May 2, 2018

 return (asymmetry <= min(1, 2./alpha - 1)) & (asymmetry >= -min(1, 2./alpha - 1)) elif param == "P": if alpha < 1: return (asymetry >= 0) & (asymmetry <= 1)

#### chrisb83 May 2, 2018

Member

asymetry --> asymmetry

#### rishabhkabra May 2, 2018

Fixed. Good catch!

We need to unit test these parameterization switch methods. More test coverage should help expose any other issues like this.

@chrisb83 do you think we could expose _param_switch as a public method? That would also help users convert between parameterizations.

### rishabhkabra added some commits May 2, 2018

 Change levy_stable._stats to return np.nan instead of None. 
 0236942 
 Fix minor typo in levy_stable_gen parameterization switch. 
 1023c80 
 Fix levy_stable_gen._argcheck to work with np.array inputs. 
 84ab11d 
 Test levy_stable_gen._argcheck for validity of asymmetry parameter. 
 40d6e6d 
 Remove trailing whitespaces. 
 d2e5d0a 
 Bring levy_stable tests up to date with current API. 
 d18dd41 
 Minor fixes in levy_stable: address np.lstsq warning and correct _fi… 
…tstart method name.
 af3d520 

Member

### rgommers commented Jun 10, 2018

 The test failures here are real.
Member

### rgommers commented Jun 10, 2018

 Thanks for updating my original PR and improving it. It looks good and I would say it might be the most comprehensive implementation available now. Since the tests in this PR seem to be failing, and the ones in gh-7374 were passing, I'm not sure where we are with these two PRs. @bsdz do you think the changes on top of your PR, and if so should yours remain open? @rishabhkabra are you planning to finish this (can't tell how much work is left)? Although I would disagree that the bulk of the work was done at the hackathon as my original PR was done independently with no assistance from that organisation. I have edited the description at the top of the PR to more accurately reflect the history of the code here.
Contributor

### bsdz commented Jun 10, 2018 • edited

 @rgommers Yes I noticed that this PR doesn't pass test cases. From what I see this PR should improve stability around small values of alpha although confusingly I can't see any test cases added to that effect. The test samples I generated for pdfs and cdfs are originally from Nolan's stablec.exe program and initially I chose ranges alpha: 1.25 ->1.75 step .25, beta: -1 -. 1 step .5 and x: -10 -> 10 step 1. Nolan makes clear the limitation of his public domain executable, in particular when |alpha-1| < 0.005; alpha < 0.2 or alpha=1 and beta is very small (although one could just use Cauchy here) and so I was unable to generate reasonable test samples in those ranges. If we can get the improved alpha range support properly tested then yes the PR would be on top my PR along with the introduction of empirical parameter estimations. I'm not sold on including the parameterisation switches preferring to perhaps offer that as a docstring or offer a link elsewhere. I don't feel it adds much to the internal calculations. Perhaps we could merge my original PR for 1.2 with added warnings that for some very small values of alpha there might numerical stability issues and merge this PR at a later stage once it has its test cases all covered.

Contributor

### bsdz commented Jun 11, 2018

 @an81, I'm struggling to see things from your perspective. In my original PR you were critical of two authors (Rachev and Mittnik) and here you are critical of another (Nolan). In my original PR I implemented 3 distinct methods for calculating PDFs (1) FFT, (2) CF integration and (3) Zolaterev's implentation (the last being your suggestion). All 3 methods tie out with Nolan's numbers to at least 4 decimal places; the last two methods tie out to even more decimal places. It feels unlikely that so many have it wrong along with other implentations such R, matlab and more. Perhaps you could just provide some concrete sample values of alpha, beta, x and result densities so we can see the absolute differences to those numbers I extracted from Nolan's stablec executable?

### an81 commented Jun 11, 2018 • edited

 On Mon, 11 Jun 2018 at 13:32, Blair Azzopardi ***@***.***> wrote: @an81 , I'm struggling to see things from your perspective. In my original PR you were critical of two authors (Rachev and Mittnik) and here you are critical of another (Nolan). In my original PR I implemented 3 distinct methods for calculating PDFs (1) FFT, (2) CF integration and (3) Zolaterev's implentation (the last being your suggestion). All 3 methods tie out with Nolan's numbers to at least 4 decimal places; the last two methods tie out to even more decimal places. It feels unlikely that so many have it wrong along with other implentations such R, matlab and more. Sure, so I think it is helpful to read the code and implementation in R and Matlab. We were addressing exactly these issues at hackaton: The R and Matlab implementation is pretty much wrappers around the C library, which has taken Nolan's values as a benchmark one. We have been in touch with the authors, yes, it will be corrected. Just read the code and re-derive the results in the paper. Perhaps you could just provide some concrete sample values of alpha, beta, x and result densities so we can see the absolute differences to those numbers I extracted from Nolan's stablec executable? Well, I am just working on the paper which contains all this and which has the benchmark values. So I ll provide it on ArXiv and we add link to the documentation. — … You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8766 (comment)>, or mute the thread .
Member

### rgommers commented Jun 11, 2018

 Sure, so I think it is helpful to read the code and implementation in R and Matlab. Eh, no. R is GPL and Matlab proprietary - if you have read their source code and then written your own code, we cannot accept that code as it then has license issues. Did you already read that code?

Member

Member

### rgommers commented Jun 11, 2018

 I think the code on stable laws is pretty long. I was wondering if we can move to separate module and then just call in continous_dists.py Yes, that seems like a good idea.
Member

### rgommers commented Jun 11, 2018

 Perhaps we could merge my original PR for 1.2 with added warnings that for some very small values of alpha there might numerical stability issues and merge this PR at a later stage once it has its test cases all covered. That makes sense to me.
Member

### rgommers commented Jun 11, 2018

 The first R implementation that comes up is http://ugrad.stat.ubc.ca/R/library/rmutil/html/Levy.html, that package is GPL. Update: I've now looked at http://lpi.tel.uva.es/stable and that code (C/C++/Matlab/R) is all GPL as well. It can be used to generate test data, and the provided Precomputed replication data can be used. Without written permission from the authors of that package that incorporating their code into SciPy under a BSD license, you cannot look at that source code.

Merged

### an81 commented Jun 13, 2018

 Regarding the values for tests, there values seems to be correct: https://nvlpubs.nist.gov/nistpubs/jres/77B/jresv77Bn3-4p143_A1b.pdf They dont cover the case for alphas around 1, but there are some tabeleted values in different papers for at least 1.1 for symmetric case.

### rishabhkabra commented Jun 13, 2018

 Hi @rgommers, thanks for your comments firstly. We certainly intend to finish this work. Thanks also for ensuring we can rebase on top of gh-7374. We are aware of what is causing the tests to fail. These should be easy to fix. More importantly, we have identified mistakes in Nolan (1997)'s method and results. This makes our changes on top of @bsdz's work rather crucial, because any implementation that uses Nolan's numbers as a reference is wrong in striking ways. We are writing an Arxiv paper to describe this because it is important for the community even beyond SciPy, and will be the basis of justifying our work here. Our argument for supporting multiple parameterizations is two-fold: firstly, different parameterizations lend themselves to different numerical methods; these have different numerical guarantees in different regions of the parameter space. We don't want users to be stuck with one canonical parameterization. Secondly, supporting multiple parameterizations allows validation of scientific results as they as reported across different communities. Apologies for some of the previous comments by @an81. We have not looked at any source code to be very clear (http://www.lpi.tel.uva.es/stable or http://ugrad.stat.ubc.ca/R/library/rmutil/html/Levy.html or Matlab). In the first case, @an81 was referring to the algorithms described in the accompanying paper. Thanks for advising us about the licenses.

### bsdz reviewed Jun 13, 2018

 """ Generates a vector of random stable variables under parameterization A using Weron's formulation [WE]. It is identical with the implementation in GSL library.

#### bsdz Jun 13, 2018

Contributor

This docstring suggests the following code might have been copied from GSL. It is described as identical to GSL's implementation.

#### rishabhkabra Jun 13, 2018 • edited

Hi @bsdz, the GSL library's random number generation algorithm is described as identical to Weren's forumation in the libstable paper (Royuela-del-Val et al., 2017). See the bottom of page 8: https://www.jstatsoft.org/article/view/v078i01/v78i01.pdf . We don't need to read any code to identify the method used by a particular library. Even a look at its documentation would suffice. Otherwise we email the authors, as in the case of libstable.

The purpose of this docstring comment was to give credit to both the primary academic source and other libraries which may be using the same algorithm (as you would in academic citations to avoid conflict). But what works in academia may not work here. :) We should delete this reference to GSL to avoid giving the impression that we actually studied its implementation. Thanks for pointing that out.

#### rishabhkabra Jun 13, 2018 • edited

To allege we are copying code or lying about what we have read is an unnecessary provocation. :)

We invite you to collaborate with us as we have before. We give you all the credit for your initiative and prior work. But please do recognize that we have a lot more experience working with the mathematics of stable law distributions, and can identify when methods are wrong. A library like SciPy cannot afford to repeat and propagate mistakes in the community.

#### ev-br Jun 17, 2018

Member

So GSL documentation claims to be based on the same paper? Then probably this note should be phrased as such.

Member

### rgommers commented Jun 14, 2018

 Thanks for the context @rishabhkabra. More importantly, we have identified mistakes in Nolan (1997)'s method and results. This makes our changes on top of @bsdz's work rather crucial, because any implementation that uses Nolan's numbers as a reference is wrong in striking ways. We are writing an Arxiv paper to describe this because it is important for the community even beyond SciPy, and will be the basis of justifying our work here. Note that SciPy's default policy is to implement published/well-known algorithms, not novel research. However, in cases when it's clear that something is an improvement (accuracy/performance/...) then of course we will accept the novel algorithms. So in this case the ArXiV paper seems like a good idea (and I assume you'll submit it to a journal as well?).
Member

### rgommers commented Jun 16, 2018

 Okay, gh-7374 is now merged. Would be great if you could rebase this PR on master. As soon as the ArXiV paper is up, it may make sense to merge a simple doc PR that points out more possible issues and references the paper.

### ev-br reviewed Jun 17, 2018

 @@ -3582,56 +3583,943 @@ class levy_stable_gen(rv_continuous): Notes ----- Levy-stable distribution (only random variates available -- ignore other docs) ..math:: \alpha-stable distributions form a rich family of probability distributions

#### ev-br Jun 17, 2018

Member

For online math to render, it needs backticks: :math:backtick \alpha backtick.

Note that .. math:: typesets display math.

 which arises in the central limit theorem problems. Stable distributions and their variants are also known under different names to different scientific communities, e.g. physicicts like to call Symmetric Stable Laws as Levy Flights, whereas mathematicians calls the distributions ..math:: \alpha-stable distributions. We will refer to this class

#### ev-br Jun 17, 2018

Member

Ditto.

 calls the distributions ..math:: \alpha-stable distributions. We will refer to this class of distributions as Levy Stable laws / distributions. The densities of Levy Stable laws are well represented by Fox's H-functions [ZOL1].

#### ev-br Jun 17, 2018

Member

References need a trailing underscore: [ZOL1]_

 For ..math::\alpha = 2 the density reduces to Gaussian distributions. For symmetric case of ..math::\alpha = 1 the Levy Stable law is known as a Cauchy distribution. For ..math::\alpha = 0.5 and skewness distribution taking its extreme value, the density is of the form of Levy Distribution (see levy, levy_l).

#### ev-br Jun 17, 2018

Member

Single backticks around levy, levy_l will make them hyperlinks.

 c_P = c_C = c_B = \frac{c_A}{\cos[\beta_B K(\alpha)\pi /2} The parameterization P has rather intuitive relation to CDF of stable law. The parameter p_1 = 1 - CDF(0, \alpha, p_1, c), i.e. the parameter p_1 gives value

#### ev-br Jun 17, 2018

Member

Underscores have special meaning in reST (references), thus they need to be shielded: either wrap the expression in double backticks (typesets as code), or a :math environment.

 def _rvs(self, alpha, beta): @classmethod

#### ev-br Jun 17, 2018

Member

_argcheck should be an instance method, since distribution objects are instances

 return (asymmetry <= 1) & (asymmetry >= -1) @classmethod

#### ev-br Jun 17, 2018

Member

Should this also be an instance method?

 """ Generates a vector of random stable variables under parameterization A using Weron's formulation [WE]. It is identical with the implementation in GSL library.

#### ev-br Jun 17, 2018

Member

So GSL documentation claims to be based on the same paper? Then probably this note should be phrased as such.