ENH: improvements to the Stable distribution #9523

bsdz · 2018-11-22T23:42:16Z

Improve performance/stability for piecewise approach especially when alpha = 1.
Replace Mitnik FFT with Wang's better Simpson approach.
Correct location / scale for alpha = 1.
Introduce multiple parameterizations including original S1 and new S0.
Correct typo in rvs for alpha = 1.
Improve testing by using sample points based on percentiles.

ilayn · 2018-11-23T02:58:03Z

Thanks for the PR @bsdz. Do you mind adding some motivation for these replacements and how have been replaced?

bsdz · 2018-11-23T07:48:52Z

Do you mind adding some motivation for these replacements and how have been replaced?

Ah yes of course, there are basically 3 main points I wanted to address.

Firstly, for the piecewise approach we always had trouble when alpha==1. The integrand is continuous and monotonically increases from zero to 1/e before falling back to zero. The function is very peaked and many of our optimizers are unable to easily locate the peak. So now we follow Nolan's advice and use a root finder to locate this peak. We still have a problem with the integrand often losing support after the peak and this can create a long zero tail that quadpack doesn't like so I've added simple bound reducing loop (it's something I hope others might offer better solutions to).

This fix makes the "best" method choice, that selected between and "piecewise" and "dni" (direct numerical integration) based on whether alpha == 1 or not, redundant and so that's been deprecated.

Secondly, for the FFT implementation (used when one needs to generate many sample points for plotting, MLE, etc), there has been a better approach for a while and I explored this in a short paper (Stable Probability Densities using FFTs and Newton-Cote Rules in Scipy). Here a general approach is also derived with accuracy/performance measured and compared. From there I felt that moving from Mitnik's 1st order to Wang's 3rd order FFT approach is very beneficial. Although I have left the machinery in the code that allows one to experiment with higher orders for an increase in accuracy that also will come with a performance hit. This change makes the FFT much more usable with improved accuracy as should be visible in the test cases.

Finally, regarding the name deprecations, after reading more around the numerical considerations around Stable distributions, I felt that we should try and keep with the general community in terms of naming. So 'quadrature' method is now 'dni' which is short for direct numerical integration (of characteristic function). I replaced 'zolotarev' with 'piecewise' which was a mistake on my part from my original PR; I hadn't realised that the original author of that method was actually Nolan who used expansions derived originally by Zolotarev. I've chosen 'piecewise' as it represents the segmenting of the parameter domain and calculates each segment separately.

I also deprecated the auto switch over to FFT when number of requested sample points reached 'pdf_fft_min_points_threshold'. While using the module I realised that one actually tends to explicitly choose what method one wishes to use and since the FFT had been considered unstable in 1.2 the attribute fell out of use.

tylerjereddy · 2019-01-30T17:48:02Z

Strange Azure failures noted & cross-referenced from a similar issue recently observed with Azure CI in NumPy. Hopefully someone from the Microsoft team can give us some advice there.

ericsciple · 2019-01-31T16:51:16Z

@tylerjereddy i'm looking right now...

ericsciple · 2019-01-31T17:11:24Z

@tylerjereddy can you try vmImage (the file contains vmIMage)

I do see in our data, the issue you are describing. You are correct it went to the wrong pool.

However I can't reproduce the issue on my account, but I might be hitting a server with different binaries version.

Here is what I tried.

jobs:
- job: Linux
  pool:
    vmIMage: ubuntu-16.04
  steps:
  - script: pwd
- job: Windows
  pool:
    vmIMage: 'VS2017-Win2016'
  variables:
      OPENBLAS_32: https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com/openblas-5f998ef_gcc7_1_0_win32.zip
      OPENBLAS_64: https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com/openblas-5f998ef_gcc7_1_0_win64.zip
  strategy:
    maxParallel: 1
    matrix:
        Python37-64bit-full:
          PYTHON_VERSION: '3.7'
          PYTHON_ARCH: 'x64'
          TEST_MODE: full
          OPENBLAS: $(OPENBLAS_64)
          BITS: 64
  steps:
  - script: cd

tylerjereddy · 2019-01-31T18:30:46Z

@ericsciple Ok, see #9736

rgommers · 2019-03-01T07:40:51Z

For reference, this is a follow-up to gh-7374. Azure issues were resolved.

rgommers · 2019-03-01T07:43:44Z

Finally, regarding the name deprecations, after reading more around the numerical considerations around Stable distributions, I felt that we should try and keep with the general community in terms of naming.

It looks to me like it would be better to keep the old names around as aliases, rather than deprecating them. The rationale for deprecation is not very strong here.

bsdz · 2019-03-01T08:31:35Z

Finally, regarding the name deprecations, after reading more around the numerical considerations around Stable distributions, I felt that we should try and keep with the general community in terms of naming.

It looks to me like it would be better to keep the old names around as aliases, rather than deprecating them. The rationale for deprecation is not very strong here.

I'm happy to remove the deprecation warnings and update the docstring. Then that should mean the new/old names simply behave as aliases. If that sounds good I'll try and update this PR in the following week.

rgommers · 2019-03-01T17:32:59Z

yes that sounds good, thanks @bsdz

bsdz · 2019-03-14T17:34:37Z

I've updated now to remove deprecations. Would it be possible to add this PR to the 1.3 milestone?

Also I see one of the checks failing - pypy3. It seems unrelated. It's showing:

FAILED scipy/io/matlab/tests/test_streams.py::TestZlibInputStream::test_seek

There seems to be a similar failure on different recently updated PR.

rgommers · 2019-03-14T18:52:09Z

milestone added. please ignore the pypy CI failure, it was already resolved in master but CircleCI runs your actual PR commit rather than the merge of that with master

tylerjereddy

I've added a few comments. This is a rather big diff that I'd like a stats reviewer to confirm "ok" on. I'm unlikely to merge before next week otherwise.

In addition, the commit history could be condensed to remove the whitespace adjustment stuff & indeed there are still whitespace / formatting issues to clean up.

scipy/stats/_continuous_distns.py

scipy/stats/tests/test_distributions.py

WarrenWeckesser · 2019-04-19T01:30:53Z

It's always nice to see proposals to improve scipy, and if I had infinite time I'd dig into this, but I don't, so I won't be able to do a real review of this in the short term. Instead, I'll just be that guy and nag about PEP-8. 😝 Please keep all lines less than 80 characters.

Re-add changes. Reformat to less than 79 chars.

tylerjereddy

Ok, so I'm going to bump the milestone in absence of a resident stats reviewer giving the "ok."

A few other notes:

there seems to be a lot of hard work here & there's even a link to an exploratory whitepaper by the PR author
the commit messages should be prefixed with i.e., ENH and describe the change(s); currently, the commit messages are not informative wrt the main changes
are there no open issues related to the stability limitations addressed here?

tylerjereddy · 2019-04-23T20:05:21Z

scipy/stats/_continuous_distns.py

-                return np.cos(alpha*xi)**(1/(alpha-1)) * \
-                                (np.cos(theta)/np.sin(alpha*(xi+theta)))**(alpha/(alpha-1)) * \
-                                (np.cos(alpha*xi+(alpha-1)*theta)/np.cos(theta))
+                return np.cos(alpha * xi)**(1 / (alpha - 1)) * (


This function is a duplicate of V(theta) from _pdf_single_value_piecewise() in the same module

The crusade against duplication is probably sometimes a bit much, but just bringing this up.

I think perhaps on a later refactor this class could go in its own module and common code could be pushed up to the top.

Regarding amending commits with ENH prefixes, I can't really help there as I'm likely to get it wrong. Sorry.

tylerjereddy · 2019-04-23T20:08:04Z

scipy/stats/_continuous_distns.py

            if beta > 0:

                def V(theta):
-                    expr_1 = np.pi/2+beta*theta
-                    return 2. * expr_1 * np.exp(expr_1*np.tan(theta)/beta) / np.cos(theta) / np.pi
+                    expr_1 = np.pi / 2 + beta * theta


This function also gets duplicated between pdf and cdf versions. Again, duplication isn't always so terrible--just bringing this up.

tylerjereddy · 2019-11-12T03:11:47Z

@WarrenWeckesser @rlucas7 @josef-pkt Should I bump the milestone again?

WarrenWeckesser · 2019-11-12T15:42:12Z

@tylerjereddy, I won't have time to look at this before 1.4 is branched.

rlucas7 · 2019-11-12T21:09:45Z

I doubt I’ll have time between now and release to look through this one. There is a non-trivial amount of stuff to be understood in the PR and will/would take me a few days/weeks to get up to speed.

…

On Nov 12, 2019, at 10:42 AM, Warren Weckesser ***@***.***> wrote: @tylerjereddy, I won't have time to look at this before 1.4 is branched. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ragibson · 2021-12-15T15:10:55Z

Actually, this is also sufficient if you already have scipy installed from that commit

>>> from scipy.stats import levy_stable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/__init__.py", line 453, in <module>
    from ._stats_py import *
  File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/_stats_py.py", line 44, in <module>
    from . import distributions
  File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/distributions.py", line 16, in <module>
    from ._levy_stable import levy_stable
ImportError: cannot import name 'levy_stable' from 'scipy.stats._levy_stable' (unknown location)

steppi · 2021-12-15T15:11:05Z

I see. I’ve been invoking tests with the pytest command directly and didn’t try the runtests.py script. I’ll try this out later today

steppi · 2021-12-15T15:12:59Z

Actually, this is also sufficient if you already have scipy installed from that commit

>>> from scipy.stats import levy_stable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/__init__.py", line 453, in <module>
    from ._stats_py import *
  File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/_stats_py.py", line 44, in <module>
    from . import distributions
  File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/distributions.py", line 16, in <module>
    from ._levy_stable import levy_stable
ImportError: cannot import name 'levy_stable' from 'scipy.stats._levy_stable' (unknown location)

Oh. This actually works on my machine. I’ve been installing in a fresh virtual environment with pip install -e . on Ubuntu 20.04 with an X86_64 architecture. Using a recent Python 3.9 that was built from source.

ragibson · 2021-12-15T15:32:57Z

I'm currently on x86_64 on a distro based on Ubuntu 18.04. It looks like it does work for me in a virtual environment with pip's -e flag.

There's no error on a fresh venv using pip install -e ., but I do get one on a fresh venv using pip install .. Pip's -e flag probably leaves some extra imports available (namely, 'scipy.stats._levy_stable' (unknown location)) for development purposes since it directly links to the checkout directory.

steppi · 2021-12-15T15:39:28Z

I'm currently on x86_64 on a distro based on Ubuntu 18.04. It looks like it does work for me in a virtual environment with pip's -e flag.

There's no error on a fresh venv using pip install -e ., but I do get one on a fresh venv using pip install .. Pip's -e flag probably leaves some extra imports available (namely, 'scipy.stats._levy_stable' (unknown location)) for development purposes since it directly links to the checkout directory.

I see. I’ll make a note to always test with pip install . before pushing. I didn’t realize this could be an issue. I guess this is why the src/ layout is recommended now for new projects. https://hynek.me/articles/testing-packaging/

steppi · 2021-12-18T15:45:44Z

@ragibson . I got it working by explicitly adding the line

    config.add_data_files(join('_levy_stable', '__init__.py'))

to scipy/stats/setup.py. There must be some quirk of the scipy build system that caused the __init__.py not to be included automatically. The build system is significantly more complex than any I'd worked with before. It seems kind of like an abuse to use add_data_files for this purpose. Do you think it's acceptable? If so, is it OK if I force push to this branch?

steppi · 2021-12-18T16:00:14Z

I found .py files being included with add_data_files in scipy/special/setup.py so I think this must be acceptable. I'll change the line to

    config.add_data_files(join('_levy_stable', '*.py'))

so there won't be problems down the line if someone decides to add another .py file to scipy/stats/_levy_stable/.

ragibson · 2021-12-18T16:32:52Z

There must be some quirk of the scipy build system that caused the __init__.py not to be included automatically. The build system is significantly more complex than any I'd worked with before. It seems kind of like an abuse to use add_data_files for this purpose.

I found .py files being included with add_data_files in scipy/special/setup.py so I think this must be acceptable.

Sounds good to me. Honestly, I've not yet been curious enough to look deeply into how scipy's setup works. I think the data files list is a feature of setuptools, but I have no idea as to why it's needed for .py files sometimes here.

If there's a more standard way to do this, perhaps a maintainer will complain and suggest a better method during review.

steppi · 2021-12-18T17:32:05Z

Sounds good to me. Honestly, I've not yet been curious enough to look deeply into how scipy's setup works. I think the data files list is a feature of setuptools, but I have no idea as to why it's needed for .py files sometimes here.

If there's a more standard way to do this, perhaps a maintainer will complain and suggest a better method during review.

Cool, looks like everything is working. Thanks @ragibson for reproducing the bug and helping me figure out the problem with using pip install -e . vs pip install . before testing.

@rgommers we got the consolidation of code into scipy/stats/_levy_stable/__init__.py working. Let us know if everything looks OK. I guess we should wait to merge this one until at least after the xslow marking PR, gh-15234, goes in.

rgommers

Great! That turned out to be harder than I expected, sorry if I caused you some headaches.

This now looks good to me. Please go ahead and merge @steppi. I added one comment about why add_data_files seemed necessary. You can ignore that, it's more an FYI - I suggest to open a new issue for that when you merge and Cc me, then I'll take it along when updating the Meson build for this PR.

rgommers · 2021-12-18T19:11:03Z

scipy/stats/setup.py

@@ -96,6 +96,18 @@ def configuration(parent_package='', top_path=None):
    # Type stubs
    config.add_data_files('*.pyi')

+    # add levy stable module
+    config.add_data_files(join('_levy_stable', '*.py'))


I think the reason this didn't work otherwise is that it's a separate submodule with an __init__.py, and those are normally treated with config.add_subpackage(like right above in thissetup.py`).

steppi · 2021-12-18T19:35:13Z

Great! That turned out to be harder than I expected, sorry if I caused you some headaches.

This now looks good to me. Please go ahead and merge @steppi. I added one comment about why add_data_files seemed necessary. You can ignore that, it's more an FYI - I suggest to open a new issue for that when you merge and Cc me, then I'll take it along when updating the Meson build for this PR.

Thanks @rgommers! issue incoming later today. And thanks @bsdz and @ragibson for all of the hard work you put into this one! I'm glad I was able to help you push it to finish line.

bsdz · 2021-12-18T20:59:24Z

Fantastic work @steppi and @ragibson! Thanks for getting it through and sorry to leave you guys to it for the last leg. Also thanks @rgommers for final merge :)

tupui · 2022-01-07T14:07:49Z

@steppi can you add a note in the wiki? https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.9.0

steppi · 2022-01-07T14:58:46Z

@steppi can you add a note in the wiki? https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.9.0

Sure thing. I should have a some time tomorrow.

tupui · 2022-01-07T15:16:51Z

Thanks 🙏😃

steppi · 2022-01-09T04:06:29Z

@tupui, I've written up a draft for the wiki. I plan to come back to it later for another round of editing. @ragibson, @bsdz, if either of you have time, let me know if you think everything looks OK or if you think I've left out anything important.

bsdz · 2022-01-09T07:20:55Z

@tupui, I've written up a draft for the wiki. I plan to come back to it later for another round of editing. @ragibson, @bsdz, if either of you have time, let me know if you think everything looks OK or if you think I've left out anything important.

Reads well to me. Covers the important improvements. Thanks @steppi

ragibson · 2022-01-09T13:31:35Z

@steppi Yes, this looks great to me -- good work! 👍

steppi · 2022-01-09T13:59:14Z

Thanks for taking a look. Looking it over again, I think the only that needs to change is to not mention the mean and variance when talking about "S0" being a proper location-scale parametrization. The variance only exists when alpha = 2 (normal distribution) and the mean only exists when alpha > 1.

* Create test_levy_stable_fitstart.py Test scipy.stats.levy_stable._fitstart() with data for negative beta, negative loc (delta), and alpha==1. * Update _continuous_distns.py Fix estimate of alpha when beta < 0, and allow negative values of loc (delta) * TST: Tidy tests and resolve NameError Uses numpy.testing.assert_allclose() instead of generic assert with numpy.isclose(). Removes named constants that were only used once, for the sake of simplicity. Removes one test of dubious utility. Resolves NameError caused by incorrect module reference. * BUG: Fix levy_stable._fitstart() Calculate alpha estimate when beta < 0, in style similar to the other parameters. Adjust delta estimate from zeta, when alpha is *not* 1, fixing apparent typo in comparator. * TST: levy_stable._fitstart() new tests TestLevyStable.test_fit_beta_flip() confirms that sign of beta affects loc, not alpha or scale. TestLevyStable.test_fit_delta_shift() confirms that loc slides up and down if data shifts. TestLevyStable.test_fit_loc_extrap() confirms that loc goes out of sample for alpha close to 1. * TST: delete test_levy_stable_fitstart.py Tests improved and relocated to test_distributions.py * STY: Resolve linter complaints Replace lambda expression with def. Wrap long line. * STY: Resolve linter complaints Removed whitespace on blank lines. * STY: Resolve PEP8 violations Following #9523. Replaced lambda expressions with def. Wrapped long lines. Removed trailing whitespace. * STY: Wrap long lines * TST: Update reason for test failure This is #14476 and the bug was not fixed. Co-authored-by: Matt Haberland <mhaberla@calpoly.edu> Co-authored-by: steppi <albert_steppi@hms.harvard.edu>

rgommers added scipy.stats maintenance Items related to regular maintenance tasks labels Nov 23, 2018

bsdz changed the title ~~Stable improvements~~ ENH: Stable improvements Nov 23, 2018

tylerjereddy mentioned this pull request Jan 30, 2019

WIP, TEST: set xcode to 10 via input variable numpy/numpy#12883

Closed

tylerjereddy mentioned this pull request Jan 31, 2019

TST: fix vmImage dispatch in Azure numpy/numpy#12903

Merged

rgommers changed the title ~~ENH: Stable improvements~~ ENH: improvements to the Stable distribution Mar 1, 2019

rgommers added this to the 1.3.0 milestone Mar 14, 2019

tylerjereddy reviewed Apr 18, 2019

View reviewed changes

scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved

scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved

scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved

scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved

Revert to branch base commit for future rebasing.

bb08be3

Re-add changes. Reformat to less than 79 chars.

bsdz force-pushed the stable-improvements branch from bf1294b to bb08be3 Compare April 22, 2019 13:44

fix pycodestyle warnings

d8b4dc7

tylerjereddy reviewed Apr 23, 2019

View reviewed changes

tylerjereddy modified the milestones: 1.3.0, 1.4.0 Apr 23, 2019

mdhaber mentioned this pull request Dec 17, 2021

TST: stats: mark very slow tests as xslow #15234

Merged

steppi added 2 commits December 18, 2021 08:23

MAINT: Change conditions to avoid divide by zero

17dace8

MAINT: Consolidate pdf_from_cf_with_fft

48593b4

MAINT: Consolidate levy_stable code into __init__

b075f87

steppi force-pushed the stable-improvements branch from b07fc4f to b075f87 Compare December 18, 2021 16:09

rgommers approved these changes Dec 18, 2021

View reviewed changes

steppi merged commit e569c7a into scipy:master Dec 18, 2021

steppi mentioned this pull request Dec 18, 2021

MAINT: scipy.stats._levy_stable should be treated as subpackage in scipy/stats/setup.py #15245

Closed

ragibson mentioned this pull request Dec 19, 2021

scipy.stats.levy_stable.pdf can be inaccurate and return nan #12658

Closed

steppi mentioned this pull request Jun 1, 2023

BUG: levy_stable.pdf does not use pdf_default_method_name attribute #18432

Closed

tylerjereddy mentioned this pull request Aug 13, 2023

MAINT: NumPy 1.25.x deprecations #19034

Merged

ENH: improvements to the Stable distribution #9523

ENH: improvements to the Stable distribution #9523

Conversation

bsdz commented Nov 22, 2018 • edited

ilayn commented Nov 23, 2018

bsdz commented Nov 23, 2018

tylerjereddy commented Jan 30, 2019

ericsciple commented Jan 31, 2019

ericsciple commented Jan 31, 2019

tylerjereddy commented Jan 31, 2019

rgommers commented Mar 1, 2019

rgommers commented Mar 1, 2019

bsdz commented Mar 1, 2019

rgommers commented Mar 1, 2019

bsdz commented Mar 14, 2019

rgommers commented Mar 14, 2019

tylerjereddy left a comment

Choose a reason for hiding this comment

WarrenWeckesser commented Apr 19, 2019

tylerjereddy left a comment

Choose a reason for hiding this comment

tylerjereddy Apr 23, 2019

Choose a reason for hiding this comment

bsdz Apr 23, 2019

Choose a reason for hiding this comment

tylerjereddy Apr 23, 2019

Choose a reason for hiding this comment

tylerjereddy commented Nov 12, 2019

WarrenWeckesser commented Nov 12, 2019

rlucas7 commented Nov 12, 2019 via email

ragibson commented Dec 15, 2021

steppi commented Dec 15, 2021

steppi commented Dec 15, 2021 • edited

ragibson commented Dec 15, 2021 • edited

steppi commented Dec 15, 2021 • edited

steppi commented Dec 18, 2021

steppi commented Dec 18, 2021 • edited

ragibson commented Dec 18, 2021

steppi commented Dec 18, 2021 • edited

rgommers left a comment

Choose a reason for hiding this comment

rgommers Dec 18, 2021

Choose a reason for hiding this comment

steppi commented Dec 18, 2021 • edited

bsdz commented Dec 18, 2021

tupui commented Jan 7, 2022

steppi commented Jan 7, 2022

tupui commented Jan 7, 2022

steppi commented Jan 9, 2022

bsdz commented Jan 9, 2022

ragibson commented Jan 9, 2022

steppi commented Jan 9, 2022

bsdz commented Nov 22, 2018 •

edited

steppi commented Dec 15, 2021 •

edited

ragibson commented Dec 15, 2021 •

edited

steppi commented Dec 15, 2021 •

edited

steppi commented Dec 18, 2021 •

edited

steppi commented Dec 18, 2021 •

edited

steppi commented Dec 18, 2021 •

edited