New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods for pdf computation, parameter estimation, and variable sampling in stats.levy_stable. Support for different parameterizations. #8766

Open
wants to merge 69 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@rishabhkabra
Copy link

rishabhkabra commented Apr 23, 2018

In response to #8263.
Cross-ref #7374.

  1. Document and support four parameterizations for the distribution, referred to as types A, B, C, and P.
  2. Add support/fix several methods for pdf computation:
  • Zolotarev's: #7374 had a buggy implementation.
  • Quadrature integration.
  • Analytical expansion.
  • FFT + characteristic function for multiple inputs.
  1. Support the following methods for parameter estimation:
  • McCulloch's quantile method
  • Koutrouvelis's empiricial characteristic function method.
  1. Fix issues with random variable sampling.
  2. Add tests for the pdf computation particularly in problematic regions of the parameter space.

Our plan is to benchmark and analyze these methods against the reference implementation/data at http://lpi.tel.uva.es/stable. We are writing an Arxiv paper to support our implementation.

Credits: this is a continuation of gh-7374 by @bsdz; most of the rest of this work was done at the Man AHL hackathon (http://ahl.com/hackathon).

bsdz added some commits May 3, 2017

Add Levy Stable Parameter Estimation using McCulloch 1986 Quantiles
method.
Add PDF and CDF calculation using FFT estimate on Continous Fourier
Integral of Characteristic Function.
Use numpy dstack instead of stack to remain backward compatible.
Reduce tolerance on fit test (for python 2.7).
Added Stable moments.
Improve documentation.
Choose slightly more illustrative example parameters.
@an81

This comment has been minimized.

Copy link

an81 commented on d20c46f Nov 22, 2017

Is there some development on implementation of stable dists?

bsdz and others added some commits Dec 3, 2017

return (asymmetry <= min(1, 2./alpha - 1)) & (asymmetry >= -min(1, 2./alpha - 1))
elif param == "P":
if alpha < 1:
return (asymetry >= 0) & (asymmetry <= 1)

This comment has been minimized.

@chrisb83

chrisb83 May 2, 2018

Member

asymetry --> asymmetry

This comment has been minimized.

@rishabhkabra

rishabhkabra May 2, 2018

Fixed. Good catch!

We need to unit test these parameterization switch methods. More test coverage should help expose any other issues like this.

@chrisb83 do you think we could expose _param_switch as a public method? That would also help users convert between parameterizations.

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 10, 2018

The test failures here are real.

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 10, 2018

Thanks for updating my original PR and improving it. It looks good and I would say it might be the most comprehensive implementation available now.

Since the tests in this PR seem to be failing, and the ones in gh-7374 were passing, I'm not sure where we are with these two PRs. @bsdz do you think the changes on top of your PR, and if so should yours remain open? @rishabhkabra are you planning to finish this (can't tell how much work is left)?

Although I would disagree that the bulk of the work was done at the hackathon as my original PR was done independently with no assistance from that organisation.

I have edited the description at the top of the PR to more accurately reflect the history of the code here.

@bsdz

This comment has been minimized.

Copy link
Contributor

bsdz commented Jun 10, 2018

@rgommers Yes I noticed that this PR doesn't pass test cases. From what I see this PR should improve stability around small values of alpha although confusingly I can't see any test cases added to that effect.

The test samples I generated for pdfs and cdfs are originally from Nolan's stablec.exe program and initially I chose ranges alpha: 1.25 ->1.75 step .25, beta: -1 -. 1 step .5 and x: -10 -> 10 step 1. Nolan makes clear the limitation of his public domain executable, in particular when |alpha-1| < 0.005; alpha < 0.2 or alpha=1 and beta is very small (although one could just use Cauchy here) and so I was unable to generate reasonable test samples in those ranges.

If we can get the improved alpha range support properly tested then yes the PR would be on top my PR along with the introduction of empirical parameter estimations.

I'm not sold on including the parameterisation switches preferring to perhaps offer that as a docstring or offer a link elsewhere. I don't feel it adds much to the internal calculations.

Perhaps we could merge my original PR for 1.2 with added warnings that for some very small values of alpha there might numerical stability issues and merge this PR at a later stage once it has its test cases all covered.

@an81

This comment has been minimized.

Copy link

an81 commented Jun 11, 2018

@bsdz

This comment has been minimized.

Copy link
Contributor

bsdz commented Jun 11, 2018

@an81, I'm struggling to see things from your perspective. In my original PR you were critical of two authors (Rachev and Mittnik) and here you are critical of another (Nolan).

In my original PR I implemented 3 distinct methods for calculating PDFs (1) FFT, (2) CF integration and (3) Zolaterev's implentation (the last being your suggestion). All 3 methods tie out with Nolan's numbers to at least 4 decimal places; the last two methods tie out to even more decimal places. It feels unlikely that so many have it wrong along with other implentations such R, matlab and more.

Perhaps you could just provide some concrete sample values of alpha, beta, x and result densities so we can see the absolute differences to those numbers I extracted from Nolan's stablec executable?

@an81

This comment has been minimized.

Copy link

an81 commented Jun 11, 2018

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 11, 2018

Sure, so I think it is helpful to read the code and implementation in R and Matlab.

Eh, no. R is GPL and Matlab proprietary - if you have read their source code and then written your own code, we cannot accept that code as it then has license issues. Did you already read that code?

@an81

This comment has been minimized.

Copy link

an81 commented Jun 11, 2018

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 11, 2018

Eh, no. R is GPL and Matlab proprietary - if you have read their source
code and then written your own code, we cannot accept that code as it then
has license issues. Did you already read that code?

Hey, why dont you have a look at this first before going into quick
conclusions: https://uvadoc.uva.es/bitstream/10324/15155/1/RoyuelaJSSoft.pdf

I'm not sure why I should read that article. All I'm saying is that "reading R and Matlab source code is helpful" is very poor advice; anyone who does it can no longer write code for the same functionality and have it included in SciPy.

The first R implementation that comes up is http://ugrad.stat.ubc.ca/R/library/rmutil/html/Levy.html, that package is GPL.

I have been working with these distributions for over 10 years, I dont think I need to spend time on reading matlab wrappers if thats what you were asking.

Good. At least I read that as "no I haven't read the R or Matlab source code".

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 11, 2018

I think the code on stable laws is pretty long. I was wondering if we can move to separate module and then just call in continous_dists.py

Yes, that seems like a good idea.

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 11, 2018

Perhaps we could merge my original PR for 1.2 with added warnings that for some very small values of alpha there might numerical stability issues and merge this PR at a later stage once it has its test cases all covered.

That makes sense to me.

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 11, 2018

The first R implementation that comes up is http://ugrad.stat.ubc.ca/R/library/rmutil/html/Levy.html, that package is GPL.

Update: I've now looked at http://lpi.tel.uva.es/stable and that code (C/C++/Matlab/R) is all GPL as well. It can be used to generate test data, and the provided Precomputed replication data can be used. Without written permission from the authors of that package that incorporating their code into SciPy under a BSD license, you cannot look at that source code.

@an81

This comment has been minimized.

Copy link

an81 commented Jun 13, 2018

Regarding the values for tests, there values seems to be correct:
https://nvlpubs.nist.gov/nistpubs/jres/77B/jresv77Bn3-4p143_A1b.pdf
They dont cover the case for alphas around 1, but there are some tabeleted values in different papers for at least 1.1 for symmetric case.

@rishabhkabra

This comment has been minimized.

Copy link

rishabhkabra commented Jun 13, 2018

Hi @rgommers, thanks for your comments firstly. We certainly intend to finish this work. Thanks also for ensuring we can rebase on top of gh-7374.

We are aware of what is causing the tests to fail. These should be easy to fix.

More importantly, we have identified mistakes in Nolan (1997)'s method and results. This makes our changes on top of @bsdz's work rather crucial, because any implementation that uses Nolan's numbers as a reference is wrong in striking ways. We are writing an Arxiv paper to describe this because it is important for the community even beyond SciPy, and will be the basis of justifying our work here.

Our argument for supporting multiple parameterizations is two-fold: firstly, different parameterizations lend themselves to different numerical methods; these have different numerical guarantees in different regions of the parameter space. We don't want users to be stuck with one canonical parameterization. Secondly, supporting multiple parameterizations allows validation of scientific results as they as reported across different communities.

Apologies for some of the previous comments by @an81. We have not looked at any source code to be very clear (http://www.lpi.tel.uva.es/stable or http://ugrad.stat.ubc.ca/R/library/rmutil/html/Levy.html or Matlab). In the first case, @an81 was referring to the algorithms described in the accompanying paper. Thanks for advising us about the licenses.

"""
Generates a vector of random stable variables under parameterization A
using Weron's formulation [WE]. It is identical with the implementation
in GSL library.

This comment has been minimized.

@bsdz

bsdz Jun 13, 2018

Contributor

This docstring suggests the following code might have been copied from GSL. It is described as identical to GSL's implementation.

This comment has been minimized.

@rishabhkabra

rishabhkabra Jun 13, 2018

Hi @bsdz, the GSL library's random number generation algorithm is described as identical to Weren's forumation in the libstable paper (Royuela-del-Val et al., 2017). See the bottom of page 8: https://www.jstatsoft.org/article/view/v078i01/v78i01.pdf . We don't need to read any code to identify the method used by a particular library. Even a look at its documentation would suffice. Otherwise we email the authors, as in the case of libstable.

The purpose of this docstring comment was to give credit to both the primary academic source and other libraries which may be using the same algorithm (as you would in academic citations to avoid conflict). But what works in academia may not work here. :) We should delete this reference to GSL to avoid giving the impression that we actually studied its implementation. Thanks for pointing that out.

This comment has been minimized.

@rishabhkabra

rishabhkabra Jun 13, 2018

To allege we are copying code or lying about what we have read is an unnecessary provocation. :)

We invite you to collaborate with us as we have before. We give you all the credit for your initiative and prior work. But please do recognize that we have a lot more experience working with the mathematics of stable law distributions, and can identify when methods are wrong. A library like SciPy cannot afford to repeat and propagate mistakes in the community.

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

So GSL documentation claims to be based on the same paper? Then probably this note should be phrased as such.

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 14, 2018

Thanks for the context @rishabhkabra.

More importantly, we have identified mistakes in Nolan (1997)'s method and results. This makes our changes on top of @bsdz's work rather crucial, because any implementation that uses Nolan's numbers as a reference is wrong in striking ways. We are writing an Arxiv paper to describe this because it is important for the community even beyond SciPy, and will be the basis of justifying our work here.

Note that SciPy's default policy is to implement published/well-known algorithms, not novel research. However, in cases when it's clear that something is an improvement (accuracy/performance/...) then of course we will accept the novel algorithms. So in this case the ArXiV paper seems like a good idea (and I assume you'll submit it to a journal as well?).

@rgommers

This comment has been minimized.

Copy link
Member

rgommers commented Jun 16, 2018

Okay, gh-7374 is now merged. Would be great if you could rebase this PR on master.

As soon as the ArXiV paper is up, it may make sense to merge a simple doc PR that points out more possible issues and references the paper.

@@ -3582,56 +3583,943 @@ class levy_stable_gen(rv_continuous):
Notes
-----
Levy-stable distribution (only random variates available -- ignore other
docs)
..math:: \alpha-stable distributions form a rich family of probability distributions

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

For online math to render, it needs backticks: :math:backtick \alpha backtick.

Note that .. math:: typesets display math.

which arises in the central limit theorem problems. Stable distributions and their variants
are also known under different names to different scientific communities,
e.g. physicicts like to call Symmetric Stable Laws as Levy Flights, whereas mathematicians
calls the distributions ..math:: \alpha-stable distributions. We will refer to this class

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

Ditto.

calls the distributions ..math:: \alpha-stable distributions. We will refer to this class
of distributions as Levy Stable laws / distributions.
The densities of Levy Stable laws are well represented by Fox's H-functions [ZOL1].

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

References need a trailing underscore: [ZOL1]_

For ..math::\alpha = 2 the density reduces to Gaussian distributions. For symmetric case
of ..math::\alpha = 1 the Levy Stable law is known as a Cauchy distribution. For ..math::\alpha = 0.5
and skewness distribution taking its extreme value, the density is of the form of Levy Distribution
(see levy, levy_l).

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

Single backticks around levy, levy_l will make them hyperlinks.

c_P = c_C = c_B = \frac{c_A}{\cos[\beta_B K(\alpha)\pi /2}
The parameterization P has rather intuitive relation to CDF of stable law.
The parameter p_1 = 1 - CDF(0, \alpha, p_1, c), i.e. the parameter p_1 gives value

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

Underscores have special meaning in reST (references), thus they need to be shielded: either wrap the expression in double backticks (typesets as code), or a :math environment.


def _rvs(self, alpha, beta):
@classmethod

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

_argcheck should be an instance method, since distribution objects are instances

return (asymmetry <= 1) & (asymmetry >= -1)


@classmethod

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

Should this also be an instance method?

"""
Generates a vector of random stable variables under parameterization A
using Weron's formulation [WE]. It is identical with the implementation
in GSL library.

This comment has been minimized.

@ev-br

ev-br Jun 17, 2018

Member

So GSL documentation claims to be based on the same paper? Then probably this note should be phrased as such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment