-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: levy_stable _fitstart fixes #14476
BUG: levy_stable _fitstart fixes #14476
Conversation
Test scipy.stats.levy_stable._fitstart() with data for negative beta, negative loc (delta), and alpha==1.
Fix estimate of alpha when beta < 0, and allow negative values of loc (delta)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to a stats
regular on the actual fix--I just added a few "mechanical" comments re: the tests.
Note that there is an ongoing overhaul of the levy_stable code at #9523. This fix is partially correct, but is handled in a somewhat incorrect way (in terms of the current implementation structure). Note that McCulloch's paper shows how to handle negative values in their interpolations. Scipy's code is simply missing the negative # table III - alpha = psi_1(nu_alpha, nu_beta)
# table IV - beta = psi_2(nu_alpha, nu_beta)
# Table V - nu_c = psi_3(alpha, beta)
# Table VII - nu_zeta = psi_5(alpha, beta) scipy/scipy/stats/_continuous_distns.py Lines 5092 to 5099 in 88d4d94
Really, I think you should add psi_1_1 similar to the existing definition of psi_2_1 and then use it in place of psi_1 atscipy/scipy/stats/_continuous_distns.py Lines 5111 to 5116 in 88d4d94
I think you are also correct that the clipping of delta to [eps, infty) is incorrect here: scipy/scipy/stats/_continuous_distns.py Lines 5117 to 5121 in 88d4d94
Of course, the location of the distribution can be negative and I see no reason why the code does this unless negative delta is handled elsewhere and I missed it while skimming.
This is an unfortunate reality of the distribution itself in most common parameterizations (e.g. Zolotarev's / Nolan's S0). The mode of the distribution inherently diverges to infinity when delta is finite and alpha is close to (but not exactly) 1.0. McCulloch's paper has a few bits where this issue is mentioned, e.g. Note: "if alpha is insignificantly different from unity, the true value of delta could lie anywhere". That said, they do mention how to handle the case where alpha happens to be precisely unity and this is handled in the scipy/scipy/stats/_continuous_distns.py Line 5119 in 88d4d94
|
Uses numpy.testing.assert_allclose() instead of generic assert with numpy.isclose(). Removes named constants that were only used once, for the sake of simplicity. Removes one test of dubious utility. Resolves NameError caused by incorrect module reference.
@lutefiskhotdish I looked into this further and here are my suggested changes to the fix itself. I'm trying to keep the changes as minimal as reasonably possible so as to minimize the merging difficulties into the diff --git a/scipy/stats/_continuous_distns.py b/scipy/stats/_continuous_distns.py
index 7d779e22a..070738737 100644
--- a/scipy/stats/_continuous_distns.py
+++ b/scipy/stats/_continuous_distns.py
@@ -5090,6 +5090,7 @@ class levy_stable_gen(rv_continuous):
[0, -0.061, -0.279, -0.659, -1.198]]
psi_1 = interpolate.interp2d(nu_beta_range, nu_alpha_range, alpha_table, kind='linear')
+ psi_1_1 = lambda nu_beta, nu_alpha: psi_1(nu_beta, nu_alpha) if nu_beta > 0 else psi_1(-nu_beta, nu_alpha)
psi_2 = interpolate.interp2d(nu_beta_range, nu_alpha_range, beta_table, kind='linear')
psi_2_1 = lambda nu_beta, nu_alpha: psi_2(nu_beta, nu_alpha) if nu_beta > 0 else -psi_2(-nu_beta, nu_alpha)
@@ -5109,14 +5110,14 @@ class levy_stable_gen(rv_continuous):
nu_beta = (p95 + p05 - 2*p50)/(p95 - p05)
if nu_alpha >= 2.439:
- alpha = np.clip(psi_1(abs(nu_beta), nu_alpha)[0], np.finfo(float).eps, 2.)
+ alpha = np.clip(psi_1_1(nu_beta, nu_alpha)[0], np.finfo(float).eps, 2.)
beta = np.clip(psi_2_1(nu_beta, nu_alpha)[0], -1., 1.)
else:
alpha = 2.0
beta = np.sign(nu_beta)
c = (p75 - p25) / phi_3_1(beta, alpha)[0]
zeta = p50 + c*phi_5_1(beta, alpha)[0]
- delta = zeta-beta*c*np.tan(np.pi*alpha/2.) if alpha == 1. else zeta
+ delta = zeta-beta*c*np.tan(np.pi*alpha/2.) if alpha != 1. else zeta
return (alpha, beta, delta, c) As mentioned in #14476 (comment), note that the previous code mixes the cases for the delta estimate. @lutefiskhotdish With the above patch, your tests still pass $ python runtests.py -v -m full -t scipy/stats/tests/test_levy_stable_fitstart.py
Build OK (0:06:09.884302 elapsed)
================================================ test session starts =================================================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3.9
cachedir: .pytest_cache
rootdir: /home/ryan/PycharmProjects/scipy_stable/scipy, configfile: pytest.ini
collected 3 items
scipy/stats/tests/test_levy_stable_fitstart.py::TestLevy_stable::test_neg_beta PASSED [ 33%]
scipy/stats/tests/test_levy_stable_fitstart.py::TestLevy_stable::test_neg_delta PASSED [ 66%]
scipy/stats/tests/test_levy_stable_fitstart.py::TestLevy_stable::test_delta_shift PASSED [100%]
================================================= 3 passed in 0.19s ================================================== and the estimates are much more accurate >>> from scipy.stats import levy_stable
>>>
>>> SIZE = 5 * 10 ** 7
>>> rvs = levy_stable.rvs(alpha=1.75, beta=0.5, loc=2.0, scale=5.0, size=SIZE)
>>> print(levy_stable._fitstart(rvs))
(1.7533970685117124, 0.554005532981971, 2.058881157647261, 4.990065245947207)
>>>
>>> rvs = levy_stable.rvs(alpha=1.75, beta=-0.5, loc=2.0, scale=5.0, size=SIZE)
>>> print(levy_stable._fitstart(rvs))
(1.7535113451206517, -0.5563912462386369, 1.937892365646908, 4.989910501051077)
>>>
>>> rvs = levy_stable.rvs(alpha=1.75, beta=0.5, loc=-2.0, scale=5.0, size=SIZE)
>>> print(levy_stable._fitstart(rvs))
(1.7530215468936947, 0.5526121180489312, -1.9417311451832033, 4.989026437213537)
>>>
>>> rvs = levy_stable.rvs(alpha=1.75, beta=-0.5, loc=-2.0, scale=5.0, size=SIZE)
>>> print(levy_stable._fitstart(rvs))
(1.7529422179586065, -0.5530799346384316, -2.05956703533164, 4.989271645373516) These tests estimate loc ~ 0.9 in your code and scipy's current implementation. @lutefiskhotdish You should also move your tests near the existing tests for scipy/scipy/stats/tests/test_distributions.py Lines 3028 to 3052 in affdc4d
perhaps with more specific names like test_fit_neg_beta() , test_fit_neg_delta() , and test_fit_delta_shift() .
The tests in that location set random seeds with I also think you might want to use |
Calculate alpha estimate when beta < 0, in style similar to the other parameters. Adjust delta estimate from zeta, when alpha is *not* 1, fixing apparent typo in comparator.
TestLevyStable.test_fit_beta_flip() confirms that sign of beta affects loc, not alpha or scale. TestLevyStable.test_fit_delta_shift() confirms that loc slides up and down if data shifts. TestLevyStable.test_fit_loc_extrap() confirms that loc goes out of sample for alpha close to 1.
Tests improved and relocated to test_distributions.py
@ragibson Thanks. I think I've taken your advice on board in the latest commits but let me know if I've missed anything or if you have additional suggestsions. |
@lutefiskhotdish This looks mostly good to me. The CI linter does complain about a few formatting items:
After this, it looks like all the CI tests will pass, so you might have to go ahead and format the legacy code in I wonder if the scipy maintainers would be okay with disabling the lint check for those few lines in |
Replace lambda expression with def. Wrap long line.
Removed whitespace on blank lines.
@ragibson I think the two commits I've just done make the bug fixes PEP8 compliant. Am I to understand I should fix PEP8 violations in legacy code before BUG fixes can be taken onboard? |
@lutefiskhotdish Typically you try to get the CI tests to pass before a PR is merged. Scipy has a tool that checks the diff for PEP8 violations, so editing legacy code (that is not PEP8 compliant) means you inherit those formatting errors from the perspective of the CI tests. I see a few remaining PEP8 violations from your edits:
For reference, this is from the CI failure at https://dev.azure.com/scipy-org/SciPy/_build/results?buildId=13326&view=logs&j=fdd89725-f657-5341-7d77-3c5c44c4f5b6&t=325fc03e-e0b9-5339-adaf-099a49a14600 and you can run similar checks locally with e.g. $ python runtests.py -v --pep8 which gives me the same three warnings. There are some details on setting up your local environment correctly for this, but if you're interested, see https://scipy.github.io/devdocs/dev/contributor/runtests.html and https://scipy.github.io/devdocs/dev/contributor/pep8.html. |
Following scipy#9523. Replaced lambda expressions with def. Wrapped long lines. Removed trailing whitespace.
@ragibson Showing successful checks now so that must be progress, and now I know how to follow the trail of obscure links to find failing lint output.... |
@ragibson Was this one close enough to try to resolve the merge conflicts? |
Yes, the edits to scipy/scipy/stats/_levy_stable/__init__.py Lines 488 to 629 in a574519
and the edits in scipy/stats/tests/test_distributions.py will probably have no conflicts since they can still be placed right after levy_stable 's test_fit() scipy/scipy/stats/tests/test_distributions.py Lines 3293 to 3295 in 7818647
|
Hmmm. Many of the changes in this PR were already present. You can compare by going back to this commit and comparing against I can make fixes if need be, but I'll let @steppi merge this one if it looks good. |
Thanks. I should have time to take a look tomorrow. |
This is scipy#14476 and the bug was not fixed.
This is scipy#14476 and the bug was not fixed.
fcf1ca8
to
bea64a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I pushed one commit to change the xfail
reason for test_fit_rvs
. I had also written the previous reason; it referred to this PR as a possible fix and was written before I looked into what this PR is actually doing. The test failure is an unrelated timeout. @ragibson, if you have no objections, I'll merge this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks good to me. The main fixes were addressing the missing negative sign in psi_1 and removing the clipping of delta to [eps, inf) since that parameter can definitely be negative.
The extra tests seem reasonable as well if they're passing.
Thanks @lutefiskhotdish and reviewers! |
Reference issue
See #14470
What does this implement/fix?
Adds new tests for levy_stable._fitstart(), and fixes bugs so the tests now pass.
Additional information
There is still something flaky about the calculation of the location parameter (delta) when alpha==1, as it tries to calculate tan(pi/2) -> infinity. In practice, it will be very rare that alpha==1. when estimated from data samples.