New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add PDF, CDF and parameter estimation for Stable Distributions #7374
Conversation
method. Add PDF and CDF calculation using FFT estimate on Continous Fourier Integral of Characteristic Function.
Reduce tolerance on fit test (for python 2.7).
Improve documentation. Choose slightly more illustrative example parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments. Not reviewing the algorithm/implementation itself.
scipy/stats/_continuous_distns.py
Outdated
"""Use McCullock 1986 method - Simple Consistent Estimators | ||
of Stable Distribution Parameters | ||
""" | ||
return self._fitstart(data, *args, **kwds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Isnt' the result from fitstart an approximation (it uses interpolation tables).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that fitstart is an approximation and the quantile estimate works well here. However, the MLE method breaks for other reasons (optimiser?) so I thought it best to override fit() so at least one can estimate parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really exciting to see stable laws implemented in the scipy. I have spend few years working with different density implementations. There are asymptotic expansions and intergral represenations, which performs pretty well and are more stable numerically-wise then fft. Also there are different parametrizations of the complex parameter in the exponent of characteristic function, which I believe are more intuitive. I guess it could be next development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pv I removed the fit() override so this now uses MLE again. using interpolation tables as initial estimate. I simply reordered the test cases and it works; this is because tests switch between different pdf methods and fit() converges best with Zolotarev's method and not FFT. I've added a note to docstring.
scipy/stats/_continuous_distns.py
Outdated
q = 16 if fft_n_points_two_power is None else fft_n_points_two_power | ||
|
||
density_x, density = levy_stable_gen._pdf_from_cf_with_fft(lambda t: levy_stable_gen._cf(t, _alpha, _beta), h=h, q=q) | ||
f = interpolate.InterpolatedUnivariateSpline(density_x, density) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer splrep + BSpline over *UnivariateSpline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to match the spline as obtained by interp1d. Would the two you suggest do that?
scipy/stats/_continuous_distns.py
Outdated
mu2 = 2 | ||
g1 = 0. if alpha == 2. else np.NaN | ||
g2 = 0. if alpha == 2. else np.NaN | ||
return mu, mu2, g1, g2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to return constant results here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is correct. Similar to other distributions, ie:
semicircular_gen._stats
uniform_gen._stats
vonmises_gen._stats
wald_gen._stats
Hi @an81, sorry in advance if I have misinterpreted your question. I think if all goes well, and this PR passes review, it will be included for release 1.1. |
Sure, well, I have spend few years on working on stable laws, written paper
about them, wrote a code for densities ...
Its not the best idea to use fft, because its numerically unstable.
I can add the code and we see if it passes.
…On 22 November 2017 at 21:18, Blair Azzopardi ***@***.***> wrote:
Hi @an81 <https://github.com/an81>, sorry in advance if I have
misinterpreted your question. I think if all goes well, and this PR passes
review, it will be included for release 1.1.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7374 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIfE8X6I3xOh7NbIm_EIIouKVDwK6eUAks5s5I-xgaJpZM4NQBiP>
.
|
So on my behalf, I honestly hope that if all you use is naive fft,
that the code will not pass, because it will create tons of problems if
somebody would relly on the implementation
and blindly used the function without testing it.
On 22 November 2017 at 21:21, Andrea Karlova <andrea.karlova@gmail.com>
wrote:
… Sure, well, I have spend few years on working on stable laws, written
paper about them, wrote a code for densities ...
Its not the best idea to use fft, because its numerically unstable.
I can add the code and we see if it passes.
On 22 November 2017 at 21:18, Blair Azzopardi ***@***.***>
wrote:
> Hi @an81 <https://github.com/an81>, sorry in advance if I have
> misinterpreted your question. I think if all goes well, and this PR passes
> review, it will be included for release 1.1.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#7374 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AIfE8X6I3xOh7NbIm_EIIouKVDwK6eUAks5s5I-xgaJpZM4NQBiP>
> .
>
|
hi @an81, thanks for your suggestions. I'm not sure if you've looked at the code in detail but it actually uses FFT or quad. It depends on how many points are requested. This is included in the documentation I provided. I don't think FFT is as bad as you seem to suggest, provided you choose a sufficient number of points. Please do follow the references provided in the code. Also it might be worth reading Stable Paretian Models in Finance by Rachev/Mittnik where they provide a fairly comprehensive coverage of the topic (albeit a little outdated). I've been using a similar FFT approach in various models since 2008 in a commercial setting and a similar approach is used in other popular financial products. Also please do submit code changes and references as I'm sure this PR could be improved. It's certainly not perfect. |
I dont like the work of Rachev & Mittnik.
I know their work, other people did better job.
We also looked for the stable for the industry (finance) purpose ...
My main criticism for fft is the Gibbs effect, i.e. the tails fluctuates.
I guess there is work of Nolan, who literally just re-written Zolotarev's
papers.
Its easy to workout he integral representation, as Zolotarev did ( and
which Nolan had implemented end of 90??? )
Anyway, I think it could be more helpful to just share the work which is
done on this topic.
Indeed there expansions which works pretty well for certain regions.
As you know, you can represent densities also as special functions (Fox's
or Meier G- function)
so trying this cn get better results and testing environment too.
To be fair, I really wonder why the people in finance dont use more of
stable laws (tempered stable laws).
Maybe because few of them tried fft and realized that the Gibbs effect is
not realiable ...
…On 22 November 2017 at 21:40, Blair Azzopardi ***@***.***> wrote:
hi Andrea, thanks for your suggestions. I'm not sure if you've looked at
the code in detail but it actually uses FFT or quad. It depends on how many
points are requested. This is included in the documentation I provided. I
don't think FFT is as bad as you seem to suggest, provided you choose a
sufficient number of points. Please do follow the references provided in
the code. Also it might be worth reading Stable Paretian Models in Finance
by Rachev/Mittnik where they provide a fairly comprehensive coverage of the
topic (albeit a little outdated). I've been using a similar FFT approach in
various models since 2008 in a commercial setting and a similar approach is
used in other popular financial products. Also please do submit code
changes and references as I'm sure this PR could be improved. It's
certainly not perfect.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7374 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIfE8edO48WPpMS-M4yNC0LrluZGa1bAks5s5JSxgaJpZM4NQBiP>
.
|
@an81, I feel the biggest challenge in finance is more often simplicity, in the sense: can you explain it to someone who doesn't have a phd in maths; and also speed, in implementation and calculation. FFT allows one to produce a near complete density in one go allowing interpolation of several points at once. Perhaps this might be possible with special functions too. Tbh I don't know much about representing densities with Fox or Meier G functions. I'll be happy to take a look at these methods and implement them into this PR as ideally it would be nice to remove/reduce the Gibbs effect. Can you point to any papers with derivation and/or implementation details? |
Hi @an81. Thank you for the 550+ page book. Please can you be a bit more specific? Some sample code goes a long way too. Also can you perhaps test the existing code and highlight where the Gibbs effect might be more prominent? eg low alpha etc; perhaps this can be just documented with a recommendation that users use quad in these cases (already in code). This is until better implementation is available. |
Blair, is there a way to have a chat via email?
…On 24 November 2017 at 20:38, Blair Azzopardi ***@***.***> wrote:
Hi @an81 <https://github.com/an81>. Thank you for the 550+ page book.
Please can you be a bit more specific? Some sample code goes a long way
too. Also can you perhaps test the existing code and highlight where the
Gibbs effect might be more prominent? eg low alpha etc; perhaps this can be
just documented with a recommendation that users use quad in these cases
(already in code). This is until better implementation is available.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7374 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIfE8VNjHdqANvYfUG8Gg6feKb5np_kLks5s5ylhgaJpZM4NQBiP>
.
|
Code style looks fine and CI checkers are happy. There's a lot of too long lines, but let's leave those in to not make rebasing gh-8766 on top of this one too difficult. |
Checked the code coverage (which is good, just 2 branches missing) and that the docs render fine and all formulas given are correct. I think for merging we'll rely mostly on completeness of the tests and comparisons against the output of Nolan's |
coverage of parameters. Correct some bugs in formulas. Improve test coverage and test each method around parameter domains. Add new best methor that incorporates quad and zolatarev methods. Switch off FFT by default altogether for PDF. Add warnings for certain methods and parameters when known to lose precision.
This now has several test failures (https://travis-ci.org/scipy/scipy/jobs/391535771) and a number of PEP8 issues (https://travis-ci.org/scipy/scipy/jobs/391535766) |
support numpy 1.8.2 in1d vs newer isin for travis build. more descriptive test failure message.
@rgommers I've fixed the code style and most test failures. Although, it appears for the osx (darwin) build server some of the tests fail with some calculations returning a lower precision. I'm not sure how to deal with this as I don't have access to that platform. One idea is to separate the tests by platform, eg: tests = [
...
# zolatarev is accurate except at alpha==1
['zolotarev', None, 8, lambda r: (sys.platform != 'darwin') & (r['alpha'] != 1)],
['zolotarev', None, 6, lambda r: (sys.platform == 'darwin') & (r['alpha'] != 1)],
...
] Although I'd have to test what precision will pass on osx through trial and error by commits to this PR and travis build triggers. Does that sound reasonable or perhaps there's a better way? |
I have a macOS machine, I can give it a try. CI is too slow to do trial and error without it getting annoying. |
That's only a good idea if it's clear what in a given platform causes a specific problem. In this case I don't see a reason why macOS should be worse. The test is not fixable by changing the test precision - the function return has
|
Just for the future reference, please make separate branches to work on new features and to send PRs such that your standard working repo does not interfere with the PRs you have submitted. Once a PR is merged you can safely delete that branch and keep working on other branches. |
'best' method uses 'zolotarev' for alpha==1 and beta==0. delicate handling of quad inputs as less flexible on windows/macos compared to linux. improve test output on failure.
@rgommers I've fixed the MacOS issue and all tests pass now. It seems that quad() behaviour differs slightly between Linux, MacOS and Windows with Linux being the most forgiving. For example, on linux if we pass in a point outside the integration bounds it will gracefully continue but on Windows and MacOS it will fail. This happens because minimize_scalar() doesn't guarantee returning values within requested bounds. Also Linux will be happy if we pass bounds that are the same (null point integral) and just return zero but this will fail on other platforms. This could be down to floating point differences and in one case I use isclose() to check if the endpoints match as "==" doesn't work on MacOS. |
In general, you can never rely on exact equality for floating point numbers. You can even get things like |
There's 6 merges of scipy master into this branch. Typically we want to avoid that unless absolutely necessary, because it makes the history more messy. In this case let's leave it to not make the rebases in the other PR harder. |
These two runtime warnings still appear (at least on macOS):
They should be filtered out inside those tests by using
|
almost there @bsdz :) |
Add Levy Stable Parameter Estimation using McCulloch 1986 Quantiles method.
Add PDF and CDF calculation using Zolotarev's method and FFT estimate on Continuous Fourier
Integral of Characteristic Function.