Fixes for np.random.choice #2727

seberg · 2012-11-13T00:56:10Z

As mentioned on the mailinglist this changes the default for the size argument to None and allows tuples as well, just as other random functions do. This part would need to be in the 1.7. released to avoid changing it lateron.

The second commit fixes a bug that I noticed while doing the first thing. I know that tests are missing (and if someone feels like adding them please do so), but I think it would be better to raise an Exception then to keep the current behaviour (unless I am seeing things wrong). However np.unique returns new sorted, but it must be kept exactly in the original order as far as I can see.

It does not adress a possible axis argument (see also #2724) as it should not matter for 1.7.

The size argument to random.choice should work like it does for all other functions in random as well.

Random choice used np.unique to find new indices when replace was False and p given. This is wrong since unique will sort the indices. This solves the bug, but likely not ideal.

njsmith · 2012-11-13T08:53:33Z

Looks reasonable, but needs tests like noted.

You need to regenerate the .c file from the .pyx, that doesn't happen automatically. (Another reason for tests, I think this means the current patch may actually not be tested...?)

seberg · 2012-11-13T16:37:03Z

Yes, had not really run the tests (at least not for all changes), and lots of small things I overlooked as I did not have the new things tested. Forgot to note in the commit, but thanks to @alan-isaac for pointing out the 0-d vs. scalar thing. The code to do this right is a bit awkward to be honest, maybe someone knows a better way, but the different branches need different special cases to give an array or scalar consistent here.

I have added tests for this too. I have run the tests locally (I think the ternary operator in the cython file should not matter for 2.4 as its translated to C). I did not commit mtrand.c since I am currently using cython 1.6, so it would be nice if someone else could do that after looking over this (though if that is inconvenient I guess I could also install a newer version).

Especially for the replace=False with p given case, someone should have a very careful look if it is correct, since the test for this will only make sure that behaviour for the test case does not change.

@alan-isaac

Thanks to @alan-isaac for pointing out the 0-d vs. scalar issue.

seberg · 2012-11-13T16:52:10Z

Sorry, did a rebase if anyone noticed. mtrand.c with a wrong version sneaked in there... Note that the Travis tests have to fail since mtrand.c is recreated.

seberg · 2012-11-17T11:53:56Z

Regenerated mtrand.c using cython 0.17.1. @certik just so its not forgotten, these changes are meant for 1.7.

njsmith · 2012-11-17T13:27:00Z

@josef-pkt: I know you keep an eye on a lot of the RNG stuff, want to review this? (Note @seberg's comment above: "Especially for the replace=False with p given case, someone should have a very careful look if it is correct, since the test for this will only make sure that behaviour for the test case does not change.")

certik · 2012-12-06T22:31:19Z

@seberg, see #2793.

rgommers · 2012-12-07T07:30:02Z

See also http://projects.scipy.org/numpy/ticket/2246, maybe you could take that along?

seberg · 2012-12-07T08:44:31Z

@rgommers thats not hard, but I don't see much of a reason to try hard to squeeze it into 1.7.

rgommers · 2012-12-07T20:46:20Z

Agreed, that's what I commented on the ticket too.

This is PR #2727, but rebased

seberg · 2012-12-13T23:09:20Z

Closing here, since it was pushed into master, but I would feel better if someone had at least a look at the stuff and think its ok... I cannot see a flaw in the bug fix (without the sort the results are obviously wrong) and I looked at the results and they seem reasonable, but still...

seberg · 2012-12-14T16:48:00Z

@njsmith, @josef-pkt maybe it is not a huge issue, but I just wondered if the function as is is actually 100% right. The thing is it uses things like np.random.permutation to call other RNG functions. Is this correct, or could this mean that the wrong random state is used for some funnier use cases? I have really no idea if this can or cannot be a problem and self.permutation must be used instead?

Acutally only one case in there and that is with np.random.random(shape)

njsmith · 2012-12-14T23:22:39Z

!!

If a method on RandomState is calling into the shared global RandomState via np.random.foo, then that is a huge bug!

Really we should have tests that scramble the global random seed, and then test that RandomState(<fixed seed>).<method> always returns the same value. It's critical that these methods be deterministic (given the seed) in order to allow people to, e.g., reproduce scientific results across numpy releases.

certik · 2012-12-15T19:59:37Z

Ok, so it looks like I merged in a huge bug. My apologies for that. I have created a high priority issue for this: #2826.
@seberg, would you have time to fix this? That would be a huge help. If not, I'll just revert these patches.

Backwards compatible by adding an `axis` keyword to the end of the signature for choice. Default is 0, which selects by row. The default keyword arguments make it functionally equivalent to random.choice from Python's standard library. Note, however, that a list of tuples will have the tuples returned as lists. Issue numpy#2724 was originally opened with that request, but numpy#2727 seems to have closed it accidentally.

seberg added 2 commits November 13, 2012 01:43

ENH: make random.choice size argument default to None and allow tuple

029f7dc

The size argument to random.choice should work like it does for all other functions in random as well.

BUG: Do not sort new indices in random.choice.

b9a9000

Random choice used np.unique to find new indices when replace was False and p given. This is wrong since unique will sort the indices. This solves the bug, but likely not ideal.

seberg added 2 commits November 13, 2012 17:46

BUG: Fix various problems in the previous commits.

21e701d

Thanks to @alan-isaac for pointing out the 0-d vs. scalar issue.

TST: Add tests for new feature and fix in random.choice

a36ae28

MAINT: Regenerate mtrand.c

106e2ce

seberg mentioned this pull request Nov 23, 2012

np.random.choice without replacement or weights is less efficient than random.sample? #2764

Closed

certik added a commit that referenced this pull request Dec 13, 2012

Merge pull request #2793 from certik/pull-request-2727

cdeb48b

This is PR #2727, but rebased

seberg closed this Dec 13, 2012

certik mentioned this pull request Dec 15, 2012

Fix the RandomState bug #2826

Closed

masasin mentioned this pull request Jul 7, 2016

ENH: Add multidimensional array support to numpy.random.choice #7810

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for np.random.choice #2727

Fixes for np.random.choice #2727

seberg commented Nov 13, 2012

njsmith commented Nov 13, 2012

seberg commented Nov 13, 2012

seberg commented Nov 13, 2012

seberg commented Nov 17, 2012

njsmith commented Nov 17, 2012

certik commented Dec 6, 2012

rgommers commented Dec 7, 2012

seberg commented Dec 7, 2012

rgommers commented Dec 7, 2012

seberg commented Dec 13, 2012

seberg commented Dec 14, 2012

njsmith commented Dec 14, 2012

certik commented Dec 15, 2012

Fixes for np.random.choice #2727

Fixes for np.random.choice #2727

Conversation

seberg commented Nov 13, 2012

njsmith commented Nov 13, 2012

seberg commented Nov 13, 2012

seberg commented Nov 13, 2012

seberg commented Nov 17, 2012

njsmith commented Nov 17, 2012

certik commented Dec 6, 2012

rgommers commented Dec 7, 2012

seberg commented Dec 7, 2012

rgommers commented Dec 7, 2012

seberg commented Dec 13, 2012

seberg commented Dec 14, 2012

njsmith commented Dec 14, 2012

certik commented Dec 15, 2012