Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bizarre times for `scipy.sparse.rand` function with 'low' density values. #9036

Closed
rwolst opened this issue Jul 15, 2018 · 2 comments
Closed

Bizarre times for `scipy.sparse.rand` function with 'low' density values. #9036

rwolst opened this issue Jul 15, 2018 · 2 comments
Milestone

Comments

@rwolst
Copy link
Contributor

@rwolst rwolst commented Jul 15, 2018

The scipy.sparse.rand function is strangely slow for some densities. It seems like it can be made much faster without much effort. The sparse_random function below seems to be about 10x faster on the worst case of the scipy.sparse.rand function. The timing plots are equally bizarre, showing terrible performance up until around density=0.4 and then pretty much equivalent timings between both functions.

Judging by the code (https://github.com/scipy/scipy/blob/v1.1.0/scipy/sparse/construct.py#L775), it looks to be caused by this conditional

    # Use the algorithm from python's random.sample for k < mn/3.
    if mn < 3*k:
        ind = random_state.choice(mn, size=k, replace=False)
    else:
        ind = np.empty(k, dtype=tp)
        selected = set()
        for i in xrange(k):
            j = random_state.randint(mn)
            while j in selected:
                j = random_state.randint(mn)
            selected.add(j)

I'm not sure why this is even there and it is not simply

ind = random_state.choice(mn, size=k, replace=False)

Reproducing code example:

import scipy as sp
import scipy.sparse
import numpy as np
from contexttimer import Timer
import matplotlib.pyplot as plt

def sparse_random(n_rows, n_cols, density):
    """A faster implementation of scipy.sparse.rand?"""
    if density == 0:
        return sp.sparse.coo_matrix((n_rows, n_cols))
    N = n_rows*n_cols
    nnz = int(N*density)
    idx = np.random.choice(N, nnz, replace=False)
    rows, cols = np.divmod(idx, n_cols)
    data = np.random.rand(nnz)
    return sp.sparse.coo_matrix((data, (rows, cols)), shape=(n_rows,n_cols))

def time_test(n_rows, n_cols):
    densities = np.arange(0, 1.1, 0.1)
    times_sp = np.empty(densities.size)
    times = np.empty(densities.size)
    for i, density in enumerate(densities):
        with Timer() as t:
            sp.sparse.rand(n_rows,n_cols,density)
        times_sp[i] = t.elapsed
        
        with Timer() as t:
            sparse_random(n_rows,n_cols,density)
        times[i] = t.elapsed
    plt.plot(densities, times_sp)
    plt.plot(densities, times)
    plt.legend(['sp.sparse.rand', 'sparse_random'])
    plt.show()

time_test(120, 80)

My timings are below:

bizarre_sparse_rand

Scipy/Numpy/Python version information:

In [1]: import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.1.0 1.14.5 sys.version_info(major=3, minor=6, micro=4, releaselevel='final', serial=0)
@perimosocordiae

This comment has been minimized.

Copy link
Member

@perimosocordiae perimosocordiae commented Jul 16, 2018

It turns out this is mostly a historical artifact. Here's the commit that added the code in question: 589c372

  1. We couldn't use random_state.choice until numpy 1.7 became the minimum required version.
  2. The workaround using random_state.permutation was inefficient for small densities, so a special case was added.
  3. When the numpy version bump happened in commit aa6bc5b, we didn't notice that the special case was no longer needed (and apparently slower, as well).

Feel free to send a PR with the fix, if you'd like!

@rwolst

This comment has been minimized.

Copy link
Contributor Author

@rwolst rwolst commented Jul 17, 2018

Ok, I'll fix this.

rwolst added a commit to rwolst/scipy that referenced this issue Jul 17, 2018
@ilayn ilayn added this to the 1.2.0 milestone Jul 17, 2018
rgommers added a commit to rwolst/scipy that referenced this issue Aug 31, 2018
rgommers added a commit that referenced this issue Nov 7, 2018
MAINT: Fix slow sparse.rand for k < mn/3 (#9036).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.