ENH: sparse: speedup LIL matrix slicing #3338

Merged
merged 5 commits into from Feb 1, 2016

Projects

None yet

6 participants

@pv
Member
pv commented Feb 16, 2014

The current implementation for slicing LIL matrices is very pessimal. Replace it with a more optimal implementation.

Gives a factor of ~1/density speedup:

In []: A = rand(3000, 3000, density=1e-3, format="lil")

# Before (9ccc68475)
In [4]: %timeit A[::2,::-2]
10 loops, best of 3: 87.5 ms per loop

# After (d47e27b0)
In [4]: %timeit A[::2,::-2]
1000 loops, best of 3: 603 µs per loop

Gives speedups also for smaller slices:

# Before
In [5]: %timeit A[3,:]
1000 loops, best of 3: 213 µs per loop
In [6]: %timeit A[:,3]
1000 loops, best of 3: 901 µs per loop
In [7]: %timeit A[1:5,1:5]
10000 loops, best of 3: 83.3 µs per loop

# After
In [5]: %timeit A[3,:]
10000 loops, best of 3: 38.8 µs per loop
In [6]: %timeit A[:,3]
1000 loops, best of 3: 702 µs per loop
In [7]: %timeit A[1:5,1:5]
10000 loops, best of 3: 40.3 µs per loop

(Column slices in LIL are still problematic, but that's due to the matrix format.)

EDIT updated benchmarks

@pv
Member
pv commented Feb 16, 2014

TBH, the LIL/DOK matrices should be reimplemented e.g. in Cython, with appropriate low-level data storage.

@coveralls

Coverage Status

Coverage remained the same when pulling 63988b9 on pv:lil-speed into 2df405a on scipy:master.

@pv pv added the PR label Feb 19, 2014
@coveralls

Coverage Status

Coverage remained the same when pulling 77929e5 on pv:lil-speed into 2df405a on scipy:master.

@rgommers
Member

Guess this can be closed?

@pv
Member
pv commented Feb 24, 2014

No, this is a different optimization.
Probably needs re-benchmarking and reimplementation in Cython.

@pv
Member
pv commented Mar 2, 2014

Rebased and cythonized.

@coveralls

Coverage Status

Changes Unknown when pulling d47e27b on pv:lil-speed into * on scipy:master*.

@jnothman jnothman commented on the diff Mar 5, 2014
scipy/sparse/lil.py
@@ -220,11 +220,30 @@ def getrowview(self, i):
def getrow(self, i):
"""Returns a copy of the 'i'th row.
"""
+ if i < 0:
@jnothman
jnothman Mar 5, 2014 Contributor

Is there a reason this is inlined rather than using _check_row_bounds?

@jnothman jnothman commented on the diff Mar 5, 2014
scipy/sparse/lil.py
@@ -220,11 +220,30 @@ def getrowview(self, i):
def getrow(self, i):
"""Returns a copy of the 'i'th row.
"""
+ if i < 0:
+ i += self.shape[0]
+ if i < 0 or i >= self.shape[0]:
+ raise IndexError('row index out of bounds')
@jnothman
jnothman Mar 5, 2014 Contributor

There's no test coverage for this line.

@jnothman
Contributor
jnothman commented Mar 5, 2014

Otherwise, this LGTM

@pv pv removed the PR label Aug 13, 2014
@rgommers
Member
rgommers commented Jan 1, 2015

This needs only a rebase and a very minor tweak it looks like. Time to get it in?

@perimosocordiae
Member

Rebased in #5789, closing.

@larsmans
Contributor
larsmans commented Feb 1, 2016

Reopening so that the status of this will be "merged" when I merge #5789.

@larsmans larsmans reopened this Feb 1, 2016
@larsmans larsmans merged commit d47e27b into scipy:master Feb 1, 2016

1 check passed

default The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment