Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bizarre times for scipy.sparse.rand function with 'low' density values. #9036

Closed
rwolst opened this issue Jul 15, 2018 · 2 comments
Closed
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse
Milestone

Comments

@rwolst
Copy link
Contributor

rwolst commented Jul 15, 2018

The scipy.sparse.rand function is strangely slow for some densities. It seems like it can be made much faster without much effort. The sparse_random function below seems to be about 10x faster on the worst case of the scipy.sparse.rand function. The timing plots are equally bizarre, showing terrible performance up until around density=0.4 and then pretty much equivalent timings between both functions.

Judging by the code (https://github.com/scipy/scipy/blob/v1.1.0/scipy/sparse/construct.py#L775), it looks to be caused by this conditional

    # Use the algorithm from python's random.sample for k < mn/3.
    if mn < 3*k:
        ind = random_state.choice(mn, size=k, replace=False)
    else:
        ind = np.empty(k, dtype=tp)
        selected = set()
        for i in xrange(k):
            j = random_state.randint(mn)
            while j in selected:
                j = random_state.randint(mn)
            selected.add(j)

I'm not sure why this is even there and it is not simply

ind = random_state.choice(mn, size=k, replace=False)

Reproducing code example:

import scipy as sp
import scipy.sparse
import numpy as np
from contexttimer import Timer
import matplotlib.pyplot as plt

def sparse_random(n_rows, n_cols, density):
    """A faster implementation of scipy.sparse.rand?"""
    if density == 0:
        return sp.sparse.coo_matrix((n_rows, n_cols))
    N = n_rows*n_cols
    nnz = int(N*density)
    idx = np.random.choice(N, nnz, replace=False)
    rows, cols = np.divmod(idx, n_cols)
    data = np.random.rand(nnz)
    return sp.sparse.coo_matrix((data, (rows, cols)), shape=(n_rows,n_cols))

def time_test(n_rows, n_cols):
    densities = np.arange(0, 1.1, 0.1)
    times_sp = np.empty(densities.size)
    times = np.empty(densities.size)
    for i, density in enumerate(densities):
        with Timer() as t:
            sp.sparse.rand(n_rows,n_cols,density)
        times_sp[i] = t.elapsed
        
        with Timer() as t:
            sparse_random(n_rows,n_cols,density)
        times[i] = t.elapsed
    plt.plot(densities, times_sp)
    plt.plot(densities, times)
    plt.legend(['sp.sparse.rand', 'sparse_random'])
    plt.show()

time_test(120, 80)

My timings are below:

bizarre_sparse_rand

Scipy/Numpy/Python version information:

In [1]: import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.1.0 1.14.5 sys.version_info(major=3, minor=6, micro=4, releaselevel='final', serial=0)
@perimosocordiae
Copy link
Member

It turns out this is mostly a historical artifact. Here's the commit that added the code in question: 589c372

  1. We couldn't use random_state.choice until numpy 1.7 became the minimum required version.
  2. The workaround using random_state.permutation was inefficient for small densities, so a special case was added.
  3. When the numpy version bump happened in commit aa6bc5b, we didn't notice that the special case was no longer needed (and apparently slower, as well).

Feel free to send a PR with the fix, if you'd like!

@rwolst
Copy link
Contributor Author

rwolst commented Jul 17, 2018

Ok, I'll fix this.

rwolst added a commit to rwolst/scipy that referenced this issue Jul 17, 2018
@ilayn ilayn added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse labels Jul 17, 2018
@ilayn ilayn added this to the 1.2.0 milestone Jul 17, 2018
rgommers pushed a commit to rwolst/scipy that referenced this issue Aug 31, 2018
tylerjereddy pushed a commit to rwolst/scipy that referenced this issue Nov 7, 2018
rgommers added a commit that referenced this issue Nov 7, 2018
MAINT: Fix slow sparse.rand for k < mn/3 (#9036).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse
Projects
None yet
Development

No branches or pull requests

3 participants