BUG/ENH: stats: make frozen distributions hold separate instances #3245

Merged
merged 1 commit into from Feb 2, 2014

Conversation

Projects
None yet
4 participants
Member

ev-br commented Jan 26, 2014

This PR fixes two bugs related to frozen distributions with parameter-dependent support.

Previously, rv_frozen instances held a reference to a global distribution instance.
For some distributions, support is parameter-dependent, and thus some spooky action on a distance:

>>> import scipy.stats as stats
>>> rv = stats.genpareto(c=0.1)
>>> rv.dist.a, rv.dist.b
(0.0, inf)                            # correct
>>> 
>>> stats.genpareto.pdf(0, -0.1)       #  _argcheck sets  self.b
1.0
>>> rv.dist.a, rv.dist.b
(0.0, array(10.0))                 # oops
>>> 

A separate bug is that freezing did not call _argcheck, so that self.dist.a and self.dist.b were always the defaults:

>>> rv = stats.genpareto(c=-0.1)
>>> rv.dist.a, rv.dist.b 
(0.0, inf)                             #: should be (0, 10)
>>> 
BUG/ENH: stats: make frozen distributions separate instances
Previously, rv_frozen instances held a reference to a global distribution
instance.
Member

josef-pkt commented Jan 26, 2014

just to understand this:

The problem is that if we access .a and .b, then they can show the wrong numbers (if they have been changed by a different call).

However, all methods pdf, cdf, stats, ... always give the correct answers because .a and .b are always reset to the correct values.

Coverage Status

Coverage remained the same when pulling 83c77af on ev-br:frozen_distr into 8ad48cd on scipy:master.

Member

ev-br commented Jan 26, 2014

In a first approximation, yes --- so, lowest priority overall.

Public methods pdf, cdf etc hopefully all use _argcheck, thus resetting .a and .b at each call. Here's one: #2849, hopefully, that was the last one :-).

However, having separate instances still looks to me like a way to go for e.g. storing temporaries (#2823), and other improvements for the frozen distributions (which, I think, can be useful, at least in some use cases, eg sampling)

Member

josef-pkt commented Jan 27, 2014

Yes, I agree, the less we have to rely on the global instances, the better.

Member

josef-pkt commented Jan 27, 2014

looks good to me
I don't see anything that might cause problems, but I didn't really go through the details.

+
+ # a, b may be set in _argcheck, depending on *args, **kwds. Ouch.
+ shapes, _, _ = self.dist._parse_args(*args, **kwds)
+ self.dist._argcheck(*shapes)
@rgommers

rgommers Feb 1, 2014

Owner

The above 2 lines shouldn't be needed, because the rv_frozen instances have public methods that all contain _argcheck right?

@josef-pkt

josef-pkt Feb 1, 2014

Member

No, they are not needed for the methods, but we get correct .a, .b if we call it once.
(see my initial comment)

@rgommers

rgommers Feb 2, 2014

Owner

Why does that matter? In rv_continuous.__init__ we also don't call _argcheck. Or do you propose to change that as well? Making rv_frozen "more correct" than the regular distributions doesn't help much imho.

@ev-br

ev-br Feb 2, 2014

Member

Consider:

>>> rv = stats.genpareto(c=-0.1)
>>> rv.dist.a, rv.dist.b 

in master this yields (0.0, inf).

Or, consider the generic example (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.genpareto.html#scipy.stats.genpareto):
rv.dist.b is used before pdf.
(This thing tripped me over at least once in gh-3207.)

For a regular distribution __init__, we don't have the shape values, so .a and .b may or may not be correct just after importing. For frozen ones, we do know the shapes, hence we can fix the support -- not saying that we must, just that we can.
Basically, it's not a super-big deal; IMO it just makes reasoning about frozen distributions a bit easier.

@rgommers

rgommers Feb 2, 2014

Owner

Ah, that's a good point. If there's nothing to fix for regular dists, then these lines now make sense to me.

Owner

rgommers commented Feb 1, 2014

Other than my one comment I don't know how to reason about the logic; it seems to make sense, so if the tests pass then it can go in I think.

rgommers added a commit that referenced this pull request Feb 2, 2014

Merge pull request #3245 from ev-br/frozen_distr
BUG/ENH: stats: make frozen distributions hold separate instances

@rgommers rgommers merged commit 50d122e into scipy:master Feb 2, 2014

1 check passed

default The Travis CI build passed
Details
Owner

rgommers commented Feb 2, 2014

OK merging. Thanks Evgeni, Josef.

Owner

rgommers commented Feb 2, 2014

The new github labels/milestones interface is great by the way:)

@ev-br ev-br deleted the ev-br:frozen_distr branch Apr 3, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment