Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add PDF, CDF and parameter estimation for Stable Distributions #7374

Merged
merged 35 commits into from Jun 16, 2018

Conversation

bsdz
Copy link
Contributor

@bsdz bsdz commented May 3, 2017

Add Levy Stable Parameter Estimation using McCulloch 1986 Quantiles method.
Add PDF and CDF calculation using Zolotarev's method and FFT estimate on Continuous Fourier
Integral of Characteristic Function.

bsdz added 5 commits May 3, 2017 23:15
method.
Add PDF and CDF calculation using FFT estimate on Continous Fourier
Integral of Characteristic Function.
Reduce tolerance on fit test (for python 2.7).
Improve documentation.
Choose slightly more illustrative example parameters.
@bsdz bsdz changed the title Add PDF, CDF and parameter estimation for Stable Distributions ENH: Add PDF, CDF and parameter estimation for Stable Distributions May 5, 2017
Copy link
Member

@pv pv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments. Not reviewing the algorithm/implementation itself.

"""Use McCullock 1986 method - Simple Consistent Estimators
of Stable Distribution Parameters
"""
return self._fitstart(data, *args, **kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Isnt' the result from fitstart an approximation (it uses interpolation tables).

Copy link
Contributor Author

@bsdz bsdz May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that fitstart is an approximation and the quantile estimate works well here. However, the MLE method breaks for other reasons (optimiser?) so I thought it best to override fit() so at least one can estimate parameters.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really exciting to see stable laws implemented in the scipy. I have spend few years working with different density implementations. There are asymptotic expansions and intergral represenations, which performs pretty well and are more stable numerically-wise then fft. Also there are different parametrizations of the complex parameter in the exponent of characteristic function, which I believe are more intuitive. I guess it could be next development.

Copy link
Contributor Author

@bsdz bsdz Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pv I removed the fit() override so this now uses MLE again. using interpolation tables as initial estimate. I simply reordered the test cases and it works; this is because tests switch between different pdf methods and fit() converges best with Zolotarev's method and not FFT. I've added a note to docstring.

q = 16 if fft_n_points_two_power is None else fft_n_points_two_power

density_x, density = levy_stable_gen._pdf_from_cf_with_fft(lambda t: levy_stable_gen._cf(t, _alpha, _beta), h=h, q=q)
f = interpolate.InterpolatedUnivariateSpline(density_x, density)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer splrep + BSpline over *UnivariateSpline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to match the spline as obtained by interp1d. Would the two you suggest do that?

scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved
scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved
scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved
scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved
mu2 = 2
g1 = 0. if alpha == 2. else np.NaN
g2 = 0. if alpha == 2. else np.NaN
return mu, mu2, g1, g2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to return constant results here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is correct. Similar to other distributions, ie:

semicircular_gen._stats
uniform_gen._stats
vonmises_gen._stats
wald_gen._stats

@ev-br ev-br added enhancement A new feature or improvement scipy.stats labels Jun 14, 2017
@rgommers rgommers added this to the 1.1.0 milestone Sep 16, 2017
@bsdz
Copy link
Contributor Author

bsdz commented Nov 22, 2017

Hi @an81, sorry in advance if I have misinterpreted your question. I think if all goes well, and this PR passes review, it will be included for release 1.1.

@an81
Copy link

an81 commented Nov 22, 2017 via email

@an81
Copy link

an81 commented Nov 22, 2017 via email

@bsdz
Copy link
Contributor Author

bsdz commented Nov 22, 2017

hi @an81, thanks for your suggestions. I'm not sure if you've looked at the code in detail but it actually uses FFT or quad. It depends on how many points are requested. This is included in the documentation I provided. I don't think FFT is as bad as you seem to suggest, provided you choose a sufficient number of points. Please do follow the references provided in the code. Also it might be worth reading Stable Paretian Models in Finance by Rachev/Mittnik where they provide a fairly comprehensive coverage of the topic (albeit a little outdated). I've been using a similar FFT approach in various models since 2008 in a commercial setting and a similar approach is used in other popular financial products. Also please do submit code changes and references as I'm sure this PR could be improved. It's certainly not perfect.

@an81
Copy link

an81 commented Nov 22, 2017 via email

@bsdz
Copy link
Contributor Author

bsdz commented Nov 22, 2017

@an81, I feel the biggest challenge in finance is more often simplicity, in the sense: can you explain it to someone who doesn't have a phd in maths; and also speed, in implementation and calculation. FFT allows one to produce a near complete density in one go allowing interpolation of several points at once. Perhaps this might be possible with special functions too. Tbh I don't know much about representing densities with Fox or Meier G functions. I'll be happy to take a look at these methods and implement them into this PR as ideally it would be nice to remove/reduce the Gibbs effect. Can you point to any papers with derivation and/or implementation details?

@an81
Copy link

an81 commented Nov 23, 2017

@bsdz
Copy link
Contributor Author

bsdz commented Nov 24, 2017

Hi @an81. Thank you for the 550+ page book. Please can you be a bit more specific? Some sample code goes a long way too. Also can you perhaps test the existing code and highlight where the Gibbs effect might be more prominent? eg low alpha etc; perhaps this can be just documented with a recommendation that users use quad in these cases (already in code). This is until better implementation is available.

@an81
Copy link

an81 commented Nov 28, 2017 via email

@rgommers
Copy link
Member

Code style looks fine and CI checkers are happy. There's a lot of too long lines, but let's leave those in to not make rebasing gh-8766 on top of this one too difficult.

@rgommers
Copy link
Member

Checked the code coverage (which is good, just 2 branches missing) and that the docs render fine and all formulas given are correct. I think for merging we'll rely mostly on completeness of the tests and comparisons against the output of Nolan's stablec.exe. Further accuracy improvements are then postponed until gh-8766 is ready. I'm +1 on merging this once the few comments I just made are resolved.

bsdz added 2 commits June 13, 2018 00:47
coverage of parameters.
Correct some bugs in formulas.
Improve test coverage and test each method around parameter domains.
Add new best methor that incorporates quad and zolatarev methods.
Switch off FFT by default altogether for PDF.
Add warnings for certain methods and parameters when known to lose
precision.
@rgommers
Copy link
Member

This now has several test failures (https://travis-ci.org/scipy/scipy/jobs/391535771) and a number of PEP8 issues (https://travis-ci.org/scipy/scipy/jobs/391535766)

bsdz added 3 commits June 13, 2018 09:27
support numpy 1.8.2 in1d vs newer isin for travis build.
more descriptive test failure message.
@bsdz
Copy link
Contributor Author

bsdz commented Jun 13, 2018

@rgommers I've fixed the code style and most test failures. Although, it appears for the osx (darwin) build server some of the tests fail with some calculations returning a lower precision. I'm not sure how to deal with this as I don't have access to that platform. One idea is to separate the tests by platform, eg:

tests = [
    ...
    # zolatarev is accurate except at alpha==1
    ['zolotarev', None, 8, lambda r: (sys.platform != 'darwin') & (r['alpha'] != 1)],
    ['zolotarev', None, 6, lambda r: (sys.platform == 'darwin') & (r['alpha'] != 1)],
    ...
]

Although I'd have to test what precision will pass on osx through trial and error by commits to this PR and travis build triggers. Does that sound reasonable or perhaps there's a better way?

@rgommers
Copy link
Member

Although I'd have to test what precision will pass on osx through trial and error by commits to this PR and travis build triggers. Does that sound reasonable or perhaps there's a better way?

I have a macOS machine, I can give it a try. CI is too slow to do trial and error without it getting annoying.

@rgommers
Copy link
Member

One idea is to separate the tests by platform, eg:

That's only a good idea if it's clear what in a given platform causes a specific problem. In this case I don't see a reason why macOS should be worse.

The test is not fixable by changing the test precision - the function return has nan in it. The problem seems to be in these lines:

    with np.errstate(all="ignore"):
        intg_max = optimize.minimize_scalar(lambda theta: -f(theta), bounds=[-xi, np.pi/2])
        intg = integrate.quad(f, -xi, np.pi/2, points=[intg_max.x])[0]

@ilayn
Copy link
Member

ilayn commented Jun 13, 2018

Just for the future reference, please make separate branches to work on new features and to send PRs such that your standard working repo does not interfere with the PRs you have submitted. Once a PR is merged you can safely delete that branch and keep working on other branches.

@bsdz
Copy link
Contributor Author

bsdz commented Jun 13, 2018

@ilayn Thanks for the heads up. I'll work from a separate branch as suggested and merge once resolved.

@rgommers It appears the integration is failing rather than the minimization. I'll see if I can figure out what parameter values are problematic.

bsdz added 2 commits June 16, 2018 15:05
'best' method uses 'zolotarev' for alpha==1 and beta==0.
delicate handling of quad inputs as less flexible on windows/macos
compared to linux.
improve test output on failure.
@bsdz
Copy link
Contributor Author

bsdz commented Jun 16, 2018

@rgommers I've fixed the MacOS issue and all tests pass now. It seems that quad() behaviour differs slightly between Linux, MacOS and Windows with Linux being the most forgiving. For example, on linux if we pass in a point outside the integration bounds it will gracefully continue but on Windows and MacOS it will fail. This happens because minimize_scalar() doesn't guarantee returning values within requested bounds. Also Linux will be happy if we pass bounds that are the same (null point integral) and just return zero but this will fail on other platforms. This could be down to floating point differences and in one case I use isclose() to check if the endpoints match as "==" doesn't work on MacOS.

@rgommers
Copy link
Member

This could be down to floating point differences and in one case I use isclose() to check if the endpoints match as "==" doesn't work on MacOS.

In general, you can never rely on exact equality for floating point numbers. You can even get things like 0 and -0 not comparing equal. You normally want to check with isclose or some such function indeed, but with a relative/absolute tolerance that's appropriate (typically 1e-15 or so for float64).

@rgommers
Copy link
Member

There's 6 merges of scipy master into this branch. Typically we want to avoid that unless absolutely necessary, because it makes the history more messy. In this case let's leave it to not make the rebases in the other PR harder.

@rgommers
Copy link
Member

These two runtime warnings still appear (at least on macOS):

stats/tests/test_distributions.py::TestLevyStable::()::test_pdf_nolan_samples
  /Users/rgommers/Code/scipy/scipy/stats/_continuous_distns.py:3780: RuntimeWarning: Density calculation unstable for alpha=1 and beta!=0. Use quadrature method instead.
    ' Use quadrature method instead.', RuntimeWarning)
  /Users/rgommers/Code/scipy/scipy/stats/_continuous_distns.py:3880: RuntimeWarning: Density calculations experimental for FFT method. Use combination of zolatarev and quadrature methods instead.
    ' Use combination of zolatarev and quadrature methods instead.', RuntimeWarning)

stats/tests/test_distributions.py::TestLevyStable::()::test_cdf_nolan_samples
  /Users/rgommers/Code/scipy/scipy/stats/_continuous_distns.py:3917: RuntimeWarning: Cumulative density calculations experimental for FFT method. Use zolatarev method instead.
    ' Use zolatarev method instead.', RuntimeWarning)

They should be filtered out inside those tests by using

with suppress_warnings() as sup:
    sup.record(RuntimeWarning, "<start of the warning message>*")

@rgommers
Copy link
Member

almost there @bsdz :)

@rgommers rgommers merged commit d48843d into scipy:master Jun 16, 2018
@rgommers
Copy link
Member

All right, in it goes. Thanks @bsdz, nice improvement! And thanks @an81, @mikofski, @pv for the reviews.

Further improvement of these methods in gh-8766.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants