Sparse Matrices: thoughts and comments (Trac #1004) #1531

Open
scipy-gitbot opened this Issue Apr 25, 2013 · 9 comments

Comments

Projects
None yet
6 participants

Original ticket http://projects.scipy.org/scipy/ticket/1004 on 2009-09-27 by trac user kyleabeauchamp, assigned to @wnbell.

I have several comments.

The documentation on sparse matrices should be improved / centralized. In particular, the documentation page doesn't include examples of matrix matrix products or matrix vector products. The matrix vector product is perhaps the most important thing one can do with a sparse matrix (sparse iterative solvers).

Second, I think a useful feature might be to include a hidden flag AUTO_RECAST in each sparse matrix. If this flag is set, when a given operation is called (solve equation, multiply matrices, etc), scipy would try to convert the sparse matrix to the best format (DOK, LIL, CSR, etc) for the job. Certain operations take forever for certain types of matrices, but can be done quickly for another type. Even including the time spent converting to a different sparse format, I often see a speedup of 100x.

Finally, I don't think we have an implementation of the Jacobi method for solving linear equations. It converges horribly slowly, but it has numerical stability advantages over other methods. For my problems, I find that many of the sparse solvers in scipy explode, while the Jacobi solver I wrote does just fine. This could indicate one of two things: bugs in the existing sparse solvers/wrappers or my using an incorrect solver for a given problem (I've tried several. In particular, I think bicg should work fine on my problem...).

Finally, here's the code I wrote for a jacobi solver. I can clean it up and try to get it into scipy if any wants...

def SolveEqn(A,b,X,nIter):
n=A.shape[0]

D=scipy.sparse.lil_diags([A.diagonal()],[0],(n,n))

R=A-D

DI=scipy.sparse.lil_diags([1/A.diagonal()],[0],(n,n))

DI=DI.tocsr()

R=R.tocsr()

for i in range(nIter):

    X=DI.matvec((b-R.matvec(X)))

return(X)
Contributor

Juanlu001 commented Aug 17, 2013

+1 for the matrix vector product examples, precisely I was trying to figure out what is the best way to do it and in the first pass I have found no information. I have read that "the CSR format is good for fast matrix vector products" but there are no examples in the docs.

Owner

pv commented Aug 17, 2013

This probably won't get fixed unless someone (not me) gets around and writes the documentation. The correct place for it would be at https://github.com/scipy/scipy/blob/master/scipy/sparse/__init__.py#L83

Contributor

Juanlu001 commented Aug 18, 2013

After doing some informal tests, I found that

  • A.dot(v) is way faster than np.dot(A, v) when A is DIA,
  • np.dot(A, v) raises "ValueError: setting an array element with a sequence." when A is CSR,
  • A.dot(v) is notieable fast when A is CSR.

Is this the case? I'm pinging the maintainers of the module, @stefanv and @jakevdp for directions to write a piece of documentation on this. I also saw sparse.linalg but I don't know how to use it.

Owner

stefanv commented Aug 18, 2013

Numpy's np.dot does not know anything about sparse matrices at the moment,
in contrast to the A.dot method of the sparse matrix object. This type of
thing can be documented at the module level docstring or in the user guide.

Member

jakevdp commented Aug 18, 2013

In scikit-learn we have a utility function called safe_sparse_dot which performs either A.dot or np.dot automatically, depending on the input type (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/extmath.py#L74). I wonder if it would be worth exposing a function similar to this as scipy.sparse.dot? It would certainly make a lot of code much cleaner.

+1 on updating the documentation, if you feel like taking that on @Juanlu001. There are two ways it could happen: you could change the module level docstring in scipy.sparse (as mentioned by @pv above). This docstring gets automatically imported by Sphinx into the website build, and ends up here: http://docs.scipy.org/doc/scipy/reference/sparse.html

Another option is, if you wanted to write some documentation with a more narrative style, you could add a new sparse.rst file in the tutorials directory (see https://github.com/scipy/scipy/blob/master/doc/source/tutorial/arpack.rst for an example). In the website build, that ends up here: http://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html

Contributor

cowlicks commented Aug 18, 2013

@jakevdp There is a PR in NumPy which will allow numpy.dot to be "aware" of sparse matrices. Basically sparse matrices will have a method (__numpy_ufunc__) that will take over when a ufunc or dot is called on them. Once this gets accepted safe_sparse_dot shouldn't be necessary.

About the docs, I was looking into this but I'm not very familiar with how the doc system works. I'll try to cook something up like the arpack.rst example.

Contributor

Juanlu001 commented Aug 18, 2013

@cowlicks If I understood correctly that awareness feature won't be available until NumPy 1.9 (or will be experimental on 1.8), so it can be ignored while writing some documentation for sparse. Today was the very first time I used sparse matrices for anything serious so if you feel like writing it yourself go ahead, otherwise I can give it a shot myself with some review :)

Contributor

cowlicks commented Aug 18, 2013

@Juanlu001 There are a few other things on my task list, so if you would take over this that would be great.

Juanlu001 added a commit to Juanlu001/scipy that referenced this issue Aug 19, 2013

Contributor

Juanlu001 commented Aug 19, 2013

Ignore the first commit, I created a branch sparse-docs. As I say in the PR, I'm gonna wait for some more narrative docs in the tutorial as @jakevdp suggests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment