Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV: stats: check for distribution/method keyword name collisions #13490

Merged
merged 23 commits into from
Feb 3, 2022

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Feb 2, 2021

Reference issue

gh-5982

What does this implement/fix?

There are name collisions between scipy.stats distribution shapes and method parameters. For example, alpha is the name of a shape of the levy_stable distribution and it is used as a parameter of the stats.rv_continuous.interval method, and this can cause problems when calling levy_stable.interval with keyword arguments.

This PR adds a check for this condition.

Incidentally, it also finds distribution methods for which the parameters are not documented. This is the case for some of the overridden fit methods. For example:

>>> help(norm.fit)
Help on method wrapper in module scipy.stats._continuous_distns:

wrapper(*args, **kwds) method of scipy.stats._continuous_distns.norm_gen instance
    # if fit method is overridden only for MLE and doens't specify what to do
    # if method == 'mm', this decorator calls generic implementation

>>> help(truncnorm.fit)
Help on method fit in module scipy.stats._distn_infrastructure:

fit(data, *args, **kwds) method of scipy.stats._continuous_distns.truncnorm_gen instance
    Return estimates of shape (if applicable), location, and scale...

Additional information

I've not touched refguide_check.py or any other tools before, so I don't know what I'm doing.

@mdhaber
Copy link
Contributor Author

mdhaber commented Feb 2, 2021

The refguide_asv_check failure is exactly what this PR is intended to cause (until we address the name collisions / missing documentation):

2021-02-02T07:09:25.1188503Z ===========
2021-02-02T07:09:25.1189515Z scipy.stats
2021-02-02T07:09:25.1190507Z ===========
2021-02-02T07:09:25.1191264Z 
2021-02-02T07:09:25.1192272Z stats.uniform.fit
2021-02-02T07:09:25.1194105Z -----------------
2021-02-02T07:09:25.1194976Z 
2021-02-02T07:09:25.1196033Z Method parameters are not documented.
2021-02-02T07:09:25.1196881Z 
2021-02-02T07:09:25.1197906Z stats.levy_stable.interval
2021-02-02T07:09:25.1200465Z --------------------------
2021-02-02T07:09:25.1204013Z 
2021-02-02T07:09:25.1205476Z Distribution/method keyword collision: {'alpha'}
2021-02-02T07:09:25.1206074Z 
2021-02-02T07:09:25.1206721Z stats.gumbel_l.fit
2021-02-02T07:09:25.1207708Z ------------------
2021-02-02T07:09:25.1208173Z 
2021-02-02T07:09:25.1208871Z Method parameters are not documented.
2021-02-02T07:09:25.1209369Z 
2021-02-02T07:09:25.1209996Z stats.beta.fit
2021-02-02T07:09:25.1210922Z --------------
2021-02-02T07:09:25.1211400Z 
2021-02-02T07:09:25.1212071Z Method parameters are not documented.
2021-02-02T07:09:25.1212555Z 
2021-02-02T07:09:25.1213179Z stats.erlang.fit
2021-02-02T07:09:25.1214123Z ----------------
2021-02-02T07:09:25.1214609Z 
2021-02-02T07:09:25.1215746Z Method parameters are not documented.
2021-02-02T07:09:25.1216287Z 
2021-02-02T07:09:25.1216903Z stats.gumbel_r.fit
2021-02-02T07:09:25.1217917Z ------------------
2021-02-02T07:09:25.1218388Z 
2021-02-02T07:09:25.1219057Z Method parameters are not documented.
2021-02-02T07:09:25.1219569Z 
2021-02-02T07:09:25.1220181Z stats.invgauss.fit
2021-02-02T07:09:25.1221159Z ------------------
2021-02-02T07:09:25.1221624Z 
2021-02-02T07:09:25.1222319Z Method parameters are not documented.
2021-02-02T07:09:25.1222807Z 
2021-02-02T07:09:25.1223428Z stats.pareto.fit
2021-02-02T07:09:25.1224365Z ----------------
2021-02-02T07:09:25.1224827Z 
2021-02-02T07:09:25.1225570Z Method parameters are not documented.
2021-02-02T07:09:25.1226057Z 
2021-02-02T07:09:25.1227792Z stats.nbinom.moment
2021-02-02T07:09:25.1231165Z -------------------
2021-02-02T07:09:25.1232623Z 
2021-02-02T07:09:25.1234015Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1234593Z 
2021-02-02T07:09:25.1235222Z stats.norm.fit
2021-02-02T07:09:25.1236167Z --------------
2021-02-02T07:09:25.1236652Z 
2021-02-02T07:09:25.1237325Z Method parameters are not documented.
2021-02-02T07:09:25.1237844Z 
2021-02-02T07:09:25.1238455Z stats.ksone.moment
2021-02-02T07:09:25.1239453Z ------------------
2021-02-02T07:09:25.1239924Z 
2021-02-02T07:09:25.1242824Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1243617Z 
2021-02-02T07:09:25.1244284Z stats.zipfian.moment
2021-02-02T07:09:25.1245297Z --------------------
2021-02-02T07:09:25.1245782Z 
2021-02-02T07:09:25.1246852Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1247431Z 
2021-02-02T07:09:25.1248052Z stats.laplace.fit
2021-02-02T07:09:25.1249026Z -----------------
2021-02-02T07:09:25.1249499Z 
2021-02-02T07:09:25.1250192Z Method parameters are not documented.
2021-02-02T07:09:25.1250685Z 
2021-02-02T07:09:25.1251314Z stats.rayleigh.fit
2021-02-02T07:09:25.1252272Z ------------------
2021-02-02T07:09:25.1252745Z 
2021-02-02T07:09:25.1253480Z Method parameters are not documented.
2021-02-02T07:09:25.1253983Z 
2021-02-02T07:09:25.1254611Z stats.hypergeom.moment
2021-02-02T07:09:25.1255772Z ----------------------
2021-02-02T07:09:25.1256314Z 
2021-02-02T07:09:25.1257384Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1257971Z 
2021-02-02T07:09:25.1258583Z stats.wald.fit
2021-02-02T07:09:25.1259529Z --------------
2021-02-02T07:09:25.1259997Z 
2021-02-02T07:09:25.1260662Z Method parameters are not documented.
2021-02-02T07:09:25.1261180Z 
2021-02-02T07:09:25.1261796Z stats.betabinom.moment
2021-02-02T07:09:25.1262841Z ----------------------
2021-02-02T07:09:25.1274055Z 
2021-02-02T07:09:25.1275216Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1275573Z 
2021-02-02T07:09:25.1275941Z stats.expon.fit
2021-02-02T07:09:25.1276509Z ---------------
2021-02-02T07:09:25.1276772Z 
2021-02-02T07:09:25.1277172Z Method parameters are not documented.
2021-02-02T07:09:25.1277450Z 
2021-02-02T07:09:25.1277809Z stats.lognorm.fit
2021-02-02T07:09:25.1278357Z -----------------
2021-02-02T07:09:25.1278620Z 
2021-02-02T07:09:25.1279018Z Method parameters are not documented.
2021-02-02T07:09:25.1279291Z 
2021-02-02T07:09:25.1279652Z stats.pearson3.fit
2021-02-02T07:09:25.1280190Z ------------------
2021-02-02T07:09:25.1280477Z 
2021-02-02T07:09:25.1281074Z Method parameters are not documented.
2021-02-02T07:09:25.1281386Z 
2021-02-02T07:09:25.1281760Z stats.yulesimon.interval
2021-02-02T07:09:25.1282353Z ------------------------
2021-02-02T07:09:25.1282631Z 
2021-02-02T07:09:25.1283237Z Distribution/method keyword collision: {'alpha'}
2021-02-02T07:09:25.1283586Z 
2021-02-02T07:09:25.1283935Z stats.kstwo.moment
2021-02-02T07:09:25.1284487Z ------------------
2021-02-02T07:09:25.1284752Z 
2021-02-02T07:09:25.1285354Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1285678Z 
2021-02-02T07:09:25.1286025Z stats.binom.moment
2021-02-02T07:09:25.1286580Z ------------------
2021-02-02T07:09:25.1286840Z 
2021-02-02T07:09:25.1287442Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1287942Z 
2021-02-02T07:09:25.1288304Z stats.nhypergeom.moment
2021-02-02T07:09:25.1288871Z -----------------------
2021-02-02T07:09:25.1289146Z 
2021-02-02T07:09:25.1289735Z Distribution/method keyword collision: {'n'}
2021-02-02T07:09:25.1290056Z 
2021-02-02T07:09:25.1290412Z stats.logistic.fit
2021-02-02T07:09:25.1290944Z ------------------
2021-02-02T07:09:25.1291224Z 
2021-02-02T07:09:25.1291599Z Method parameters are not documented.
2021-02-02T07:09:25.1291876Z 
2021-02-02T07:09:25.1292104Z 
2021-02-02T07:09:25.1292479Z ERROR: refguide or doctests have errors
2021-02-02T07:09:25.3206153Z ##[error]Bash exited with code '1'.
2021-02-02T07:09:25.3219134Z ##[section]Finishing: Refguide Check

@mdhaber
Copy link
Contributor Author

mdhaber commented Sep 3, 2021

@rgommers I guess I didn't ping you on this before. Since you suggested this originally, I thought you might be interested. Is this what you had in mind for the test? In the original suggestion, you mentioned changing the names and documenting the backwards incompatibility in the release notes - but would there be a deprecation message for a few versions?
I understand if there are higher priority things; this is not at the top of my list.

@rgommers
Copy link
Member

rgommers commented Sep 3, 2021

Since you suggested this originally, I thought you might be interested. Is this what you had in mind for the test?

I have absolute no memory of that anymore.

It looks like the Method parameters are not documented should be easy to fix. The renames are a bit harder. I agree that that requires a regular deprecation.

@mdhaber
Copy link
Contributor Author

mdhaber commented Sep 3, 2021

I have absolute no memory of that anymore.

Or maybe there a two of you : ) That would also explain how all the work gets done!

In case you like to keep tabs on what other-you is saying, here's the post I meant.

OK, maybe I'll chip away at this. Would be good to fix this sort of thing before we call something 2.0.

@mdhaber
Copy link
Contributor Author

mdhaber commented Sep 4, 2021

Rebased on master, and here are the current failures.

===========
scipy.stats
===========

stats.ksone.moment
------------------

Distribution/method keyword collision: {'n'}

stats.pareto.fit
----------------

Method parameters are not documented.

stats.binom.moment
------------------

Distribution/method keyword collision: {'n'}

stats.nbinom.moment
-------------------

Distribution/method keyword collision: {'n'}

stats.zipfian.moment
--------------------

Distribution/method keyword collision: {'n'}

stats.hypergeom.moment
----------------------

Distribution/method keyword collision: {'n'}

stats.nchypergeom_wallenius.moment
----------------------------------

Distribution/method keyword collision: {'n'}

stats.invgauss.fit
------------------

Method parameters are not documented.

stats.gumbel_r.fit
------------------

Method parameters are not documented.

stats.levy_stable.interval
--------------------------

Distribution/method keyword collision: {'alpha'}

stats.betabinom.moment
----------------------

Distribution/method keyword collision: {'n'}

stats.logistic.fit
------------------

Method parameters are not documented.

stats.gumbel_l.fit
------------------

Method parameters are not documented.

stats.yulesimon.interval
------------------------

Distribution/method keyword collision: {'alpha'}

stats.kstwo.moment
------------------

Distribution/method keyword collision: {'n'}

stats.nchypergeom_fisher.moment
-------------------------------

Distribution/method keyword collision: {'n'}

stats.erlang.fit
----------------

Method parameters are not documented.

stats.wald.fit
--------------

Method parameters are not documented.

stats.nhypergeom.moment
-----------------------

Distribution/method keyword collision: {'n'}

@github-actions github-actions bot added Benchmarks Running, verifying or documenting benchmarks for SciPy C/C++ Items related to the internal C/C++ code base CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure Cython Issues with the internal Cython code base Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org Fortran Items related to the internal Fortran code base github Items related to the code repository gitpod scipy._lib scipy.cluster scipy.fft scipy.fftpack scipy.integrate scipy.interpolate labels Sep 4, 2021
@mdhaber
Copy link
Contributor Author

mdhaber commented Sep 7, 2021

Doc build issue looks unrelated to me.

/home/circleci/repo/doc/source/tutorial/interpolate.rst:503: WARNING: Exception occurred in plotting interpolate-8
 from /home/circleci/repo/doc/source/tutorial/interpolate.rst:

I don't see anything else awry there. Let's see if it's happening in other PRs.

@mdhaber mdhaber marked this pull request as ready for review September 7, 2021 22:22
@mdhaber mdhaber closed this Dec 17, 2021
@mdhaber mdhaber reopened this Dec 17, 2021
@mdhaber
Copy link
Contributor Author

mdhaber commented Jan 23, 2022

Only a few issues have come up since September, all in levy_stable.

===========
scipy.stats
===========

stats.levy_stable.cdf
---------------------

Method parameters are not documented properly.

stats.levy_stable.pdf
---------------------

Method parameters are not documented properly.

stats.levy_stable.rvs
---------------------

Method parameters are not documented properly.

I'll take a look at them. Looks like it's because the public methods were overridden.
Update: manually added docstrings. I considered setting it programmatically (e.g. levy_stable.pdf.__doc__ = rv_continuous.pdf.__doc__), which would also satisfy the refguide check. But I imagine there's something wrong with that. Probably wouldn't matter, since these don't actually render anywhere...
Second Update: changed my mind. Used the inherit_docstring_from decorator instead to keep things consistent. This is not the right PR to fix things in, but at some point we need distribution-specific docstrings for distribution methods.

I thought I sent an email to the mailing list in September, but I looked back and it was rejected because I sent it from the wrong address. I just resent.

@mdhaber mdhaber added the maintenance Items related to regular maintenance tasks label Jan 24, 2022
def moment(self, order=None, *args, **kwds):
"""non-central moment of distribution of specified order.

.. deprecated:: 1.8.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will need to change to 1.9.0 everywhere, and 1.10.0 will need to be 1.11.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In recent deprecations we put 2.0. I am not sure if we decided yet if we are going to have 1.10 or go directly to 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the plan is there. We could say "two releases" instead of assigning it a number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "deprecated in 1.9.0 and will be removed two releases after"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll go ahead and update the version numbers tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or today. Should be all set now after 547d1bb.

Copy link
Contributor Author

@mdhaber mdhaber Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ev-br Wanted to make sure I understood - you suggested we "merge it for 1.9.x early in the release cycle." Does that mean we are waiting for something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite the reverse, the suggestion is to merge it once the deprecation wording is tweaked. And since it is now, I pressed the green button.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ev-br!

Copy link
Member

@ev-br ev-br left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. This is epic. This can be made a poster example of "what you may need to do if you really care about backcompat".

A possible minor tweak could be to craft the wording of what is 1.9.0 + two releases, but I'd suggest we merge it as is, early in the 1.9.x release cycle and craft wording iteratively.

@ev-br ev-br merged commit a574519 into scipy:main Feb 3, 2022
@ev-br
Copy link
Member

ev-br commented Feb 3, 2022

Thanks @mdhaber , great to see this loooong-standing bug fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants