New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: improvements to the Stable distribution #9523
Conversation
Thanks for the PR @bsdz. Do you mind adding some motivation for these replacements and how have been replaced? |
Ah yes of course, there are basically 3 main points I wanted to address. Firstly, for the piecewise approach we always had trouble when alpha==1. The integrand is continuous and monotonically increases from zero to 1/e before falling back to zero. The function is very peaked and many of our optimizers are unable to easily locate the peak. So now we follow Nolan's advice and use a root finder to locate this peak. We still have a problem with the integrand often losing support after the peak and this can create a long zero tail that quadpack doesn't like so I've added simple bound reducing loop (it's something I hope others might offer better solutions to). This fix makes the "best" method choice, that selected between and "piecewise" and "dni" (direct numerical integration) based on whether alpha == 1 or not, redundant and so that's been deprecated. Secondly, for the FFT implementation (used when one needs to generate many sample points for plotting, MLE, etc), there has been a better approach for a while and I explored this in a short paper (Stable Probability Densities using FFTs and Newton-Cote Rules in Scipy). Here a general approach is also derived with accuracy/performance measured and compared. From there I felt that moving from Mitnik's 1st order to Wang's 3rd order FFT approach is very beneficial. Although I have left the machinery in the code that allows one to experiment with higher orders for an increase in accuracy that also will come with a performance hit. This change makes the FFT much more usable with improved accuracy as should be visible in the test cases. Finally, regarding the name deprecations, after reading more around the numerical considerations around Stable distributions, I felt that we should try and keep with the general community in terms of naming. So 'quadrature' method is now 'dni' which is short for direct numerical integration (of characteristic function). I replaced 'zolotarev' with 'piecewise' which was a mistake on my part from my original PR; I hadn't realised that the original author of that method was actually Nolan who used expansions derived originally by Zolotarev. I've chosen 'piecewise' as it represents the segmenting of the parameter domain and calculates each segment separately. I also deprecated the auto switch over to FFT when number of requested sample points reached 'pdf_fft_min_points_threshold'. While using the module I realised that one actually tends to explicitly choose what method one wishes to use and since the FFT had been considered unstable in 1.2 the attribute fell out of use. |
Strange Azure failures noted & cross-referenced from a similar issue recently observed with Azure CI in NumPy. Hopefully someone from the Microsoft team can give us some advice there. |
@tylerjereddy i'm looking right now... |
@tylerjereddy can you try I do see in our data, the issue you are describing. You are correct it went to the wrong pool. However I can't reproduce the issue on my account, but I might be hitting a server with different binaries version. Here is what I tried.
|
@ericsciple Ok, see #9736 |
For reference, this is a follow-up to gh-7374. Azure issues were resolved. |
It looks to me like it would be better to keep the old names around as aliases, rather than deprecating them. The rationale for deprecation is not very strong here. |
I'm happy to remove the deprecation warnings and update the docstring. Then that should mean the new/old names simply behave as aliases. If that sounds good I'll try and update this PR in the following week. |
yes that sounds good, thanks @bsdz |
I've updated now to remove deprecations. Would it be possible to add this PR to the 1.3 milestone? Also I see one of the checks failing - pypy3. It seems unrelated. It's showing:
There seems to be a similar failure on different recently updated PR. |
milestone added. please ignore the pypy CI failure, it was already resolved in master but CircleCI runs your actual PR commit rather than the merge of that with master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few comments. This is a rather big diff that I'd like a stats reviewer to confirm "ok" on. I'm unlikely to merge before next week otherwise.
In addition, the commit history could be condensed to remove the whitespace adjustment stuff & indeed there are still whitespace / formatting issues to clean up.
It's always nice to see proposals to improve scipy, and if I had infinite time I'd dig into this, but I don't, so I won't be able to do a real review of this in the short term. Instead, I'll just be that guy and nag about PEP-8. 😝 Please keep all lines less than 80 characters. |
Re-add changes. Reformat to less than 79 chars.
bf1294b
to
bb08be3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I'm going to bump the milestone in absence of a resident stats reviewer giving the "ok."
A few other notes:
- there seems to be a lot of hard work here & there's even a link to an exploratory whitepaper by the PR author
- the commit messages should be prefixed with i.e., ENH and describe the change(s); currently, the commit messages are not informative wrt the main changes
- are there no open issues related to the stability limitations addressed here?
scipy/stats/_continuous_distns.py
Outdated
return np.cos(alpha*xi)**(1/(alpha-1)) * \ | ||
(np.cos(theta)/np.sin(alpha*(xi+theta)))**(alpha/(alpha-1)) * \ | ||
(np.cos(alpha*xi+(alpha-1)*theta)/np.cos(theta)) | ||
return np.cos(alpha * xi)**(1 / (alpha - 1)) * ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a duplicate of V(theta)
from _pdf_single_value_piecewise()
in the same module
The crusade against duplication is probably sometimes a bit much, but just bringing this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps on a later refactor this class could go in its own module and common code could be pushed up to the top.
Regarding amending commits with ENH prefixes, I can't really help there as I'm likely to get it wrong. Sorry.
scipy/stats/_continuous_distns.py
Outdated
if beta > 0: | ||
|
||
def V(theta): | ||
expr_1 = np.pi/2+beta*theta | ||
return 2. * expr_1 * np.exp(expr_1*np.tan(theta)/beta) / np.cos(theta) / np.pi | ||
expr_1 = np.pi / 2 + beta * theta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function also gets duplicated between pdf
and cdf
versions. Again, duplication isn't always so terrible--just bringing this up.
@WarrenWeckesser @rlucas7 @josef-pkt Should I bump the milestone again? |
@tylerjereddy, I won't have time to look at this before 1.4 is branched. |
I doubt I’ll have time between now and release to look through this one. There is a non-trivial amount of stuff to be understood in the PR and will/would take me a few days/weeks to get up to speed.
… On Nov 12, 2019, at 10:42 AM, Warren Weckesser ***@***.***> wrote:
@tylerjereddy, I won't have time to look at this before 1.4 is branched.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Actually, this is also sufficient if you already have scipy installed from that commit >>> from scipy.stats import levy_stable
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/__init__.py", line 453, in <module>
from ._stats_py import *
File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/_stats_py.py", line 44, in <module>
from . import distributions
File "/home/ryan/.local/lib/python3.9/site-packages/scipy/stats/distributions.py", line 16, in <module>
from ._levy_stable import levy_stable
ImportError: cannot import name 'levy_stable' from 'scipy.stats._levy_stable' (unknown location) |
I see. I’ve been invoking tests with the pytest command directly and didn’t try the |
Oh. This actually works on my machine. I’ve been installing in a fresh virtual environment with |
I'm currently on x86_64 on a distro based on Ubuntu 18.04. It looks like it does work for me in a virtual environment with pip's There's no error on a fresh venv using |
I see. I’ll make a note to always test with |
@ragibson . I got it working by explicitly adding the line config.add_data_files(join('_levy_stable', '__init__.py')) to |
I found .py files being included with config.add_data_files(join('_levy_stable', '*.py')) so there won't be problems down the line if someone decides to add another |
b07fc4f
to
b075f87
Compare
Sounds good to me. Honestly, I've not yet been curious enough to look deeply into how scipy's setup works. I think the data files list is a feature of setuptools, but I have no idea as to why it's needed for If there's a more standard way to do this, perhaps a maintainer will complain and suggest a better method during review. |
Cool, looks like everything is working. Thanks @ragibson for reproducing the bug and helping me figure out the problem with using @rgommers we got the consolidation of code into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! That turned out to be harder than I expected, sorry if I caused you some headaches.
This now looks good to me. Please go ahead and merge @steppi. I added one comment about why add_data_files
seemed necessary. You can ignore that, it's more an FYI - I suggest to open a new issue for that when you merge and Cc me, then I'll take it along when updating the Meson build for this PR.
@@ -96,6 +96,18 @@ def configuration(parent_package='', top_path=None): | |||
# Type stubs | |||
config.add_data_files('*.pyi') | |||
|
|||
# add levy stable module | |||
config.add_data_files(join('_levy_stable', '*.py')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason this didn't work otherwise is that it's a separate submodule with an __init__.py, and those are normally treated with
config.add_subpackage(like right above in this
setup.py`).
Thanks @rgommers! issue incoming later today. And thanks @bsdz and @ragibson for all of the hard work you put into this one! I'm glad I was able to help you push it to finish line. |
@steppi can you add a note in the wiki? https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.9.0 |
Sure thing. I should have a some time tomorrow. |
Thanks 🙏😃 |
Reads well to me. Covers the important improvements. Thanks @steppi |
@steppi Yes, this looks great to me -- good work! 👍 |
Thanks for taking a look. Looking it over again, I think the only that needs to change is to not mention the mean and variance when talking about "S0" being a proper location-scale parametrization. The variance only exists when |
* Create test_levy_stable_fitstart.py Test scipy.stats.levy_stable._fitstart() with data for negative beta, negative loc (delta), and alpha==1. * Update _continuous_distns.py Fix estimate of alpha when beta < 0, and allow negative values of loc (delta) * TST: Tidy tests and resolve NameError Uses numpy.testing.assert_allclose() instead of generic assert with numpy.isclose(). Removes named constants that were only used once, for the sake of simplicity. Removes one test of dubious utility. Resolves NameError caused by incorrect module reference. * BUG: Fix levy_stable._fitstart() Calculate alpha estimate when beta < 0, in style similar to the other parameters. Adjust delta estimate from zeta, when alpha is *not* 1, fixing apparent typo in comparator. * TST: levy_stable._fitstart() new tests TestLevyStable.test_fit_beta_flip() confirms that sign of beta affects loc, not alpha or scale. TestLevyStable.test_fit_delta_shift() confirms that loc slides up and down if data shifts. TestLevyStable.test_fit_loc_extrap() confirms that loc goes out of sample for alpha close to 1. * TST: delete test_levy_stable_fitstart.py Tests improved and relocated to test_distributions.py * STY: Resolve linter complaints Replace lambda expression with def. Wrap long line. * STY: Resolve linter complaints Removed whitespace on blank lines. * STY: Resolve PEP8 violations Following #9523. Replaced lambda expressions with def. Wrapped long lines. Removed trailing whitespace. * STY: Wrap long lines * TST: Update reason for test failure This is #14476 and the bug was not fixed. Co-authored-by: Matt Haberland <mhaberla@calpoly.edu> Co-authored-by: steppi <albert_steppi@hms.harvard.edu>
Improve performance/stability for piecewise approach especially when alpha = 1.
Replace Mitnik FFT with Wang's better Simpson approach.
Correct location / scale for alpha = 1.
Introduce multiple parameterizations including original S1 and new S0.
Correct typo in rvs for alpha = 1.
Improve testing by using sample points based on percentiles.