-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats broadcasting in rvs (Trac #1544) #2069
Comments
@WarrenWeckesser wrote on 2011-10-31 It does broadcast; you just have to give arguments with compatible shapes to get the broadcasting that you want:
In input [36], loc has shape (3,) and scale has shape (2,1), which broadast to (2,3). |
@WarrenWeckesser wrote on 2011-10-31 Sorry, my previous post missed the key issue, which is the following surprising behavior:
Moreover, the "random" output shown in Out [36] of my previous post looks decidedly nonrandom. It appears that the given value of the 'size' keyword determines the number of random values generated; after generating this many samples, the result is broadcast with the other arguments. Smells like a bug to me. Is there a good use case for the current behavior? A work-around is to give the size explicitly. For example:
|
@WarrenWeckesser wrote on 2011-10-31 It gets worse: it seems broadcasting of additional shape parameters does not work. For example, the gamma distribution has one additional parameter. In the following, line [103] works as expected. I expect line [104] to produce an array with shape (2,2), but instead it raises an exception. And I would expect line [105] to complain about incompatible shapes.
|
Similar broadcasting issue described at http://thread.gmane.org/gmane.comp.python.scientific.user/33953 Looks like there are no generic broadcasting tests. One more gap that needs to be closed. |
This was recently reported in stackoverflow: http://stackoverflow.com/questions/33810477/unexpected-scipy-stats-uniform-behaviour From the link: "This bug or feature has cost me a couple of days of confusion and debugging in my larger program." Ouch. I'm upping the priority of this issue, and setting the milestone to 0.17. Generating nonrandom rvs is really bad. |
Ugh. Fixing this so the API is nice and numpythonic (i.e. fully broadcasting) will almost certainly break backwards compatibility for anyone who has noticed the problem and used what looks like a reasonable work-around. The way I wish
The shape of the return value:
Some examples, with input parameters
The problem is that, as was noted above (and I even suggested it as a work-around in the recent question on stackoverflow), currently a call such as
generates an array with shape (5,). That is, for each combination of parameters, it generates one sample. With the "fully broadcasting" API, that call should create an array with shape (5, 5). |
Proposal 1:
|
I would have suggested to add a new I got pretty used to |
Yes, I think a new keyword could work, too. But then you've got an ugly API, with |
Peanut gallery here: I'm +1 for a new method. I think it'll much easier to explain to users, without them needing to fully understand the specifics of the bug. |
On the other hand, after looking at the code, I suspect the current
Note that You can get a 2-d array, but the length of the second dimension must be 3:
Less trivial broadcasting of the arguments:
The
But you can do
So, another approach to this issue is to fix scipy's broken implementation of numpy's API for random variate generation. |
Although I like my original proposal for the interpretation of |
The |
This looks like a nontrivial change to make though; may be a bit late for 0.17.0. |
I'll do what I can in the next day or so, but in my experience, changes to the distributions code always ramify far more than I expect. If it doesn't make it into 0.17, so be it. |
+1 for interpreting the inconsistency with numpy.random as a bug. |
If the fix is not consistent with the current workaround of giving full size, then it still might break quite a bit of user code, and I would not do without regular deprecation policy. I don't remember if statsmodels has any old code for this, we use almost exclusively numpy because it has almost all the distributions that we support. Some of the Bayesian packages might use this more, at least based on some questions that were in the past. pymc has or had a wrapper around scipy.stats.distributions (I haven't checked in years). |
Proof-of-concept, work-in-progress pull request: #5526 |
The class rv_generic was updated to handle broadcasting of the arguments to the rvs() method. The following distributions required additional custom changes to support broadcasting in rvs(): levy_stable, pearson3, rice, truncnorm, hypergeom, planck, randint. The function `numpy.broadcast_to` is used in several places. This function was added to numpy in version 1.10, so a copy was added to _lib/_numpy_compat.py to support older version of numpy. Closes scipygh-2069.
The class rv_generic was updated to handle broadcasting of the arguments to the rvs() method. The following distributions required additional custom changes to support broadcasting in rvs(): levy_stable, pearson3, rice, truncnorm, hypergeom, planck, randint. The function `numpy.broadcast_to` is used in several places. This function was added to numpy in version 1.10, so a copy was added to _lib/_numpy_compat.py to support older version of numpy. Closes scipygh-2069.
A proposed fix is in #5526. It makes |
I think I ran into the same bug as #2069 (comment). I'd appreciate some help on this. Here's my traceback:
This is a confusing bug that happens seemingly randomly with low probability in a larger program, and I'm having difficulty getting a minimum working example. I have verified that |
From the traceback you can see that threads are being used in your code. Do you know if multiple threads are calling
That error can be generated by calling
Two threads calling |
Thanks for the quick response @WarrenWeckesser. I think you've resolved my problem, thank you.
Yup, my code uses |
Original ticket http://projects.scipy.org/scipy/ticket/1544 on 2011-10-30 by @josef-pkt, assigned to unknown.
I thought that rvs would create a broadcasted random array for different loc
however it just draws one rvs and broadcasts that, so we get all identical rvs
The text was updated successfully, but these errors were encountered: