BUG: sparse: smarter random index selection #3650

perimosocordiae · 2014-05-13T19:09:06Z

Caveat: The np.random.choice function was added in numpy 1.7.0, so we probably need a shim for older versions. I'm not sure how best to accomplish that, so suggestions are welcome.

coveralls · 2014-05-13T20:03:04Z

Coverage remained the same when pulling ee4f0eb on perimosocordiae:patch-5 into 7ff4e90 on scipy:master.

WarrenWeckesser · 2014-05-13T20:06:29Z

This would be the right thing to do, except that np.random.choice ultimately does this:

    idx = self.permutation(pop_size)[:size]

where pop_size will be m*n and size will be k. permutation is implemented like this:

    def permutation(self, object x):
        if isinstance(x, (int, long, np.integer)):
            arr = np.arange(x)
        else:
            arr = np.array(x)
        self.shuffle(arr)
        return arr

So a temporary array with size m*n is created behind the scenes. That's bad if m*n is large.

argriffing · 2014-05-13T20:34:56Z

This would be the right thing to do, except that np.random.choice ultimately does this:

Should this not be changed in numpy?

>>> np.random.choice(1000000000000, size=2)
array([586913473276, 824603097730])
>>> np.random.choice(1000000000000, size=2, replace=False)
Traceback (most recent call last):
  File "mtrand.pyx", line 4490, in mtrand.RandomState.permutation (numpy/random/mtrand/mtrand.c:20666)
MemoryError

perimosocordiae · 2014-05-13T20:40:43Z

I agree that it's a numpy bug, and there's even an issue for it: numpy/numpy#2764

Commit f375852 uses the workaround from that issue. It doesn't use the numpy random state (because it uses the Python stdlib's random.sample), but I'm not sure if that's a problem.

perimosocordiae · 2014-05-13T21:14:11Z

It's a problem for Travis, at least. The failing tests are due to not using numpy's random seed for index selection, which makes the tests nondeterministic.

I'm not sure how to fix this. Any ideas?

WarrenWeckesser · 2014-05-13T21:22:03Z

Before using Python's random.sample, the internal state of the Python RNG can be set using random.setstate, giving it numpy's state retrieved using np.random.get_state(). I haven't thought too much about this, so I don't know what problems that will cause, if any.

perimosocordiae · 2014-05-13T21:51:42Z

A quick check shows that the states aren't easily transferable. I've rewritten the patch (again) to use np.random.choice when n < 3*k, and to use the simple set-based algorithm from the python stdlib otherwise.

The heuristic used in the actual random.sample is more sophisticated, but I'm not sure that it transfers correctly to the numpy case.

WarrenWeckesser · 2014-05-13T22:11:51Z

scipy/sparse/construct.py

-        gk *= 1.05
-        ind = _gen_unique_rand(random_state, gk)
+    # Use the algorithm from python's random.sample for k < n/3.
+    if n < 3*k:


Should be mn < 3*k (and mn in the comment above).

coveralls · 2014-05-13T22:40:20Z

Coverage remained the same when pulling 02842a6 on perimosocordiae:patch-5 into 7ff4e90 on scipy:master.

coveralls · 2014-05-14T14:09:08Z

Coverage remained the same when pulling 7dd6169 on perimosocordiae:patch-5 into 7ff4e90 on scipy:master.

WarrenWeckesser · 2014-05-14T14:20:22Z

The tests passed with numpy 1.5.1? Looks like we need a unit test for the case mn < 3*k.

WarrenWeckesser · 2014-05-14T14:56:29Z

Caveat: The np.random.choice function was added in numpy 1.7.0...

Instead of using choice, just roll your own version, with something like

    # ind = random_state.choice(mn, size=k, replace=False)
    r = np.arange(mn)
    random_state.shuffle(r)
    ind = r[:k]

WarrenWeckesser · 2014-05-14T15:19:13Z

permutation is in numpy 1.5.1, so a shorter replacement is

    # ind = random_state.choice(mn, size=k, replace=False)
    ind = random_state.permutation(mn)[:k]

argriffing · 2014-05-14T15:21:57Z

Instead of using np.random.choice, just roll your own version

I've been putting these into scipy/lib/_numpy_compat.py.

coveralls · 2014-05-14T16:26:54Z

Coverage remained the same when pulling 69555fd on perimosocordiae:patch-5 into 7ff4e90 on scipy:master.

perimosocordiae · 2014-05-19T20:28:30Z

Any further comments before merging?

WarrenWeckesser · 2014-05-19T20:49:46Z

I'll take another look this week.

I'm curious how the performance of the new method compares to the (debugged) old method.

WarrenWeckesser · 2014-05-29T17:57:02Z

scipy/sparse/construct.py

-        ind = _gen_unique_rand(random_state, gk)
+    # Use the algorithm from python's random.sample for k < mn/3.
+    if mn < 3*k:
+        # ind = random_state.choice(mn, size=k, replace=False)


We generally don't leave in code that has been commented out. This line should be removed, or additional comments should be added to explain that this is what we would use, but choice is only available in numpy version 1.7 or later.

coveralls · 2014-05-29T18:50:55Z

Coverage increased (+0.2%) when pulling 918cb7f on perimosocordiae:patch-5 into 7ff4e90 on scipy:master.

coveralls · 2014-05-29T19:08:58Z

Coverage increased (+0.2%) when pulling 918cb7f on perimosocordiae:patch-5 into 7ff4e90 on scipy:master.

WarrenWeckesser · 2014-05-29T19:31:53Z

I don't know what went wrong with the python 2.7 build on Travis.

How about rebasing and squashing your commits into a single commit? That will clean up the history and rerun Travis. If the tests pass, I think this is ready to go.

WarrenWeckesser · 2014-05-29T19:34:18Z

By the way, since this is a bug fix, the prefix for the commit message should be "BUG: sparse: ...".

Fixes scipygh-3648.

coveralls · 2014-05-29T20:40:00Z

Coverage increased (+0.0%) when pulling 589c372 on perimosocordiae:patch-5 into acc7c94 on scipy:master.

BUG: sparse: smarter random index selection

WarrenWeckesser · 2014-05-29T20:57:43Z

Merged. Thanks!

WarrenWeckesser reviewed May 13, 2014
View reviewed changes

WarrenWeckesser added scipy.sparse labels May 14, 2014

WarrenWeckesser reviewed May 29, 2014
View reviewed changes

perimosocordiae changed the title ~~ENH: smarter random index selection~~ BUG: smarter random index selection May 29, 2014

perimosocordiae changed the title ~~BUG: smarter random index selection~~ BUG: sparse: smarter random index selection May 29, 2014

BUG: sparse: smarter random index selection

589c372

Fixes scipygh-3648.

WarrenWeckesser added a commit that referenced this pull request May 29, 2014

Merge pull request #3650 from perimosocordiae/patch-5

93656a8

BUG: sparse: smarter random index selection

WarrenWeckesser merged commit 93656a8 into scipy:master May 29, 2014

perimosocordiae deleted the patch-5 branch May 29, 2014 20:58

pv added this to the 0.15.0 milestone Jun 10, 2014

pv mentioned this pull request Feb 24, 2015

Cannot create large sparse matrices with random entries #4552

Open

yoavram mentioned this pull request Oct 2, 2015

np.random.choice without replacement or weights is less efficient than random.sample? numpy/numpy#2764

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: sparse: smarter random index selection #3650

BUG: sparse: smarter random index selection #3650

perimosocordiae commented May 13, 2014

coveralls commented May 13, 2014

WarrenWeckesser commented May 13, 2014

argriffing commented May 13, 2014

perimosocordiae commented May 13, 2014

perimosocordiae commented May 13, 2014

WarrenWeckesser commented May 13, 2014

perimosocordiae commented May 13, 2014

WarrenWeckesser May 13, 2014

coveralls commented May 13, 2014

coveralls commented May 14, 2014

WarrenWeckesser commented May 14, 2014

WarrenWeckesser commented May 14, 2014

WarrenWeckesser commented May 14, 2014

argriffing commented May 14, 2014

coveralls commented May 14, 2014

perimosocordiae commented May 19, 2014

WarrenWeckesser commented May 19, 2014

WarrenWeckesser May 29, 2014

coveralls commented May 29, 2014

coveralls commented May 29, 2014

WarrenWeckesser commented May 29, 2014

WarrenWeckesser commented May 29, 2014

coveralls commented May 29, 2014

WarrenWeckesser commented May 29, 2014

BUG: sparse: smarter random index selection #3650

BUG: sparse: smarter random index selection #3650

Conversation

perimosocordiae commented May 13, 2014

coveralls commented May 13, 2014

WarrenWeckesser commented May 13, 2014

argriffing commented May 13, 2014

perimosocordiae commented May 13, 2014

perimosocordiae commented May 13, 2014

WarrenWeckesser commented May 13, 2014

perimosocordiae commented May 13, 2014

WarrenWeckesser May 13, 2014

Choose a reason for hiding this comment

coveralls commented May 13, 2014

coveralls commented May 14, 2014

WarrenWeckesser commented May 14, 2014

WarrenWeckesser commented May 14, 2014

WarrenWeckesser commented May 14, 2014

argriffing commented May 14, 2014

coveralls commented May 14, 2014

perimosocordiae commented May 19, 2014

WarrenWeckesser commented May 19, 2014

WarrenWeckesser May 29, 2014

Choose a reason for hiding this comment

coveralls commented May 29, 2014

coveralls commented May 29, 2014

WarrenWeckesser commented May 29, 2014

WarrenWeckesser commented May 29, 2014

coveralls commented May 29, 2014

WarrenWeckesser commented May 29, 2014