-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: sparse: smarter random index selection #3650
Conversation
This would be the right thing to do, except that
where pop_size will be
So a temporary array with size |
Should this not be changed in numpy?
|
I agree that it's a numpy bug, and there's even an issue for it: numpy/numpy#2764 Commit f375852 uses the workaround from that issue. It doesn't use the numpy random state (because it uses the Python stdlib's |
It's a problem for Travis, at least. The failing tests are due to not using numpy's random seed for index selection, which makes the tests nondeterministic. I'm not sure how to fix this. Any ideas? |
Before using Python's |
A quick check shows that the states aren't easily transferable. I've rewritten the patch (again) to use The heuristic used in the actual |
gk *= 1.05 | ||
ind = _gen_unique_rand(random_state, gk) | ||
# Use the algorithm from python's random.sample for k < n/3. | ||
if n < 3*k: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be mn < 3*k
(and mn
in the comment above).
The tests passed with numpy 1.5.1? Looks like we need a unit test for the case |
Instead of using
|
|
I've been putting these into |
Any further comments before merging? |
I'll take another look this week. I'm curious how the performance of the new method compares to the (debugged) old method. |
ind = _gen_unique_rand(random_state, gk) | ||
# Use the algorithm from python's random.sample for k < mn/3. | ||
if mn < 3*k: | ||
# ind = random_state.choice(mn, size=k, replace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally don't leave in code that has been commented out. This line should be removed, or additional comments should be added to explain that this is what we would use, but choice
is only available in numpy version 1.7 or later.
I don't know what went wrong with the python 2.7 build on Travis. How about rebasing and squashing your commits into a single commit? That will clean up the history and rerun Travis. If the tests pass, I think this is ready to go. |
By the way, since this is a bug fix, the prefix for the commit message should be "BUG: sparse: ...". |
BUG: sparse: smarter random index selection
Merged. Thanks! |
Fixes #3648.
Caveat: The
np.random.choice
function was added in numpy 1.7.0, so we probably need a shim for older versions. I'm not sure how best to accomplish that, so suggestions are welcome.