Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: stats: frozen distribution attributes a and b should consider parameters #13228

Closed
wants to merge 2 commits into from

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Dec 11, 2020

Reference issue

Closes gh-13225

What does this implement/fix?

The a and b attributes of frozen distributions currently hold the lower and upper bounds of the support of the unscaled, unshifted distribution. It seems that they should hold the support of the distribution considering the location, scale, and shape parameters.

@mdhaber mdhaber changed the title MAINT: stats: frozen distribution attributes a and b should consider loc and scale MAINT: stats: frozen distribution attributes a and b should consider parameters Dec 11, 2020
@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 11, 2020

Looks like it might be intentional for a and b to be those of the unscaled, unshifted distribution if all these tests fail? I'm asking the test authors in gh-11092.

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 25, 2020

@pvanmulbregt I was hoping to get your opinion on this because in this change it looks like you intended for a and b to be for the unscaled, unshifted distribution. Should a and b not reflect the location and scale of the frozen distribution?

@rlucas7 @ev-br I'd appreciate your thoughts, too, as you added/reviewed the test in gh-11092 that is changed here. Previously, check_freezing checked explicitly that loc and scale would not affect the a and b of the frozen distribution; I would propose changing it to ensure that loc and scale do afffect the a and b of the frozen distribution in the intended way.

@glemaitre You reported gh-11089 initially, so I wanted to make sure this change wouldn't throw a wrench into things for scikit-learn.

Copy link
Member

@rlucas7 rlucas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdhaber the approach seems reasonable but I left a question/suggestion it might help to clarify in the PR.

@@ -308,15 +308,17 @@ def check_pickling(distfn, args):

def check_freezing(distfn, args):
# regression test for gh-11089: freezing a distribution fails
# if loc and/or scale are specified
# if loc and/or scale are specified. Modified to resolve gh-13225;
# loc and scale _should_ affect `a` and `b`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell from the diff in this page alone; is this effecting a change if the distribution is not frozen?
NOTE: I'm assuming this should only impact the frozen distributions.

Would it make sense to confirm with a test (either in the test in this PR or a new one, either is fine to me) the uniform case as given by the associated issue (e.g. check it impacts frozen and doesn't imact non-frozen)?

Also, the comment is a bit confusing on clarifying this, suggest to rephrase comment to make it explicit if this works on (a) only frozen, (b) frozen and non-frozen.

@@ -437,7 +437,7 @@ def __init__(self, dist, *args, **kwds):
self.dist = dist.__class__(**dist._updated_ctor_param())

shapes, _, _ = self.dist._parse_args(*args, **kwds)
self.a, self.b = self.dist._get_support(*shapes)
self.a, self.b = self.dist.support(*self.args, **self.kwds)
Copy link
Member

@ev-br ev-br Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_args call above is no longer needed, I suspect

EDIT: to clarify, it's not needed if calling .support, and is needed for _get_support

@pvanmulbregt
Copy link
Contributor

@mdhaber I'm actually not in favor of encouraging use of a and b directly, but to always use support(). Direct access only works some of the time. (see #13225 (comment))
IMHO, these attributes would be better named _a and _b to:

  • make clear that they are private, and
  • to distinguish the a , b of the support from any shape parameters named a, b (such as occurs in stats.beta(a, b)).

Copy link
Member

@ev-br ev-br left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a
bit of digging, here is where these attributes went in #4340

I think there was not much thought other than making it rv.a == rv.dist.a etc. This PR is a then a backcompat break, but probably a mild one, and only need a release note (but let's be prepared to revert if there are complaints).

Whether .a and .b should indeed coincide with the internal .dist attributes or reflect shifting and scaling, I'm not sure. Can be either, I suppose, but should be documented.

Removing them altogether, as Paul suggests is probably too much churn.

All in all, I'd leave them be and direct people to using .support, but if anyone feels like changing it, fine, I'm +0.

@pvanmulbregt
Copy link
Contributor

Upon further reflection, I don't see the value to the user, nor to the stats codebase.

The a and b attributes are currently not public attributes of the stats distribution classes. This PR would implicitly promote a and b to public.
a and b only agree with the support() for some (unshifted, unscaled) distribution families. The PR would expand this to "agreement for some distributions, possibly scaled and/or shifted, possibly frozen." But it still would not cover all families.

Encouraging an approach which works in many simple cases, but doesn't work with more complicated distributions is a source of confusion. Using support() works for all distributions, frozen or unfrozen, shifted/scaled or not.

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 27, 2020

Ok. My only goal here was to close gh-13225. Shall we just close that then, explaining to the user that a and b are essentially private and reiterating that support is the preferred way to get the support?

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 27, 2020

I just wanted to check about:

The PR would expand this to "agreement for some distributions, possibly scaled and/or shifted, possibly frozen." But it still would not cover all families.

This PR only changes the __init__ method for frozen distribution. When the distribution is frozen, a and b get set with the support method instead of _get_support. I also think the revised test checks that - even for distributions for which the shape parameters affect the support.

So if I understand correctly, this would make the a and b correct for all frozen distributions, always - and that's all it does. Is that not right? What am I missing?

Why does doing this implicitly make them public?

Or is the concern that making a and b correct for frozen distributions encourages use of a and b, so we should leave them with information that doesn't correspond with the frozen distribution to discourage their use?

@pvanmulbregt
Copy link
Contributor

Making the change in response to gh-13225 is implicitly accepting the use case there, namely referencing a and b to mean the support.

Having a meaning of a and b in the rv_frozen distribution different from that in the rv_continuous means having to think differently in the two cases. I'd rather not introduce such differences, nor support a and b publicly as it imposes maintenance constraints, without a corresponding benefit.

Shall we just close that then, explaining to the user that a and b are essentially private and reiterating that support is the preferred way to get the support?

+1.

@mdhaber
Copy link
Contributor Author

mdhaber commented Dec 27, 2020

OK. I'll close these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect values for .a and .b (min and max) for scipy.stats.uniform with non-default args
4 participants