# Sparse matrices

Author: Alexandre Gramfort

Sparse matrices are often useful in numerical simulations dealing with large systems, if the problem can be described in matrix form where the matrices or vectors mostly contains zeros. Scipy has a good support for sparse matrices, with basic linear algebra operations (such as equation solving, eigenvalue calculations, etc).

There are many possible strategies for storing sparse matrices in an efficient way. Some of the most common are the so-called coordinate form (COO), list of list (LIL) form, and compressed-sparse column CSC (and row, CSR). Each format has some advantanges and disadvantages. Most computational algorithms (equation solving, matrix-matrix multiplication, etc) can be efficiently implemented using CSR or CSC formats, but they are not so intuitive and not so easy to initialize. So often a sparse matrix is initially created in COO or LIL format (where we can efficiently add elements to the sparse matrix data), and then converted to CSC or CSR before used in real calcalations.

For more information about these sparse formats, see e.g. http://en.wikipedia.org/wiki/Sparse_matrix

When we create a sparse matrix we have to choose which format it should be stored in for maximal computational efficiency. One objective of this notebook is to allow you to answer this question.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
import numpy as np
from scipy import sparse

A = sparse.csr_matrix([[1, 2, 0],
                       [0, 0, 3],
                       [4, 0, 5]])
print(A)

In [None]:
A  # A is a CSR matrix so in "Compressed Sparse Row" format

In [None]:
plt.spy(A, markersize=74)

# Why sparse matrices?

In [None]:
%load_ext memory_profiler

In [None]:
%memit C = np.random.randn(3000, 3000)

In [None]:
%memit C = sparse.rand(3000, 3000, density=0.01)

# Conversion to dense numpy array

In [None]:
A.toarray()

In [None]:
type(A)

In [None]:
type(A.toarray())

More efficient way to create sparse matrices: create an empty matrix and populate with using matrix indexing (avoids creating a potentially large dense matrix)

In [None]:
A = sparse.lil_matrix((4, 4)) # empty 4x4 sparse matrix
A[0, 0] = 1  # standard insertion
A[1, 1] = 3
A[2, 2] = A[2, 1] = 1
A[3, 3] = A[3, 0] = 1
A

In [None]:
plt.spy(A, markersize=56)

## Converting between different sparse matrix formats:

In [None]:
A

In [None]:
A = sparse.csr_matrix(A)
A

In [None]:
A = sparse.csc_matrix(A)
A

In [None]:
A = sparse.coo_matrix(A)
A

In [None]:
print(A.data, A.row, A.col)

# Arithmetic

In [None]:
A = sparse.csr_matrix(A)

In [None]:
A.toarray()

In [None]:
A.T.toarray()  # transpose

In [None]:
(A + A).toarray()

In [None]:
(2 * A).toarray()

In [None]:
v = np.array([1, 2, 3, 4])
# Dot product
print(A.dot(v))  # recommended
print(A * v)  # works as a "matrix" !!!

In [None]:
(A * A).toarray()  # Warning this does a true matrix x matrix product (A @ A)

## Why 3 formats?

### Insertion

In [None]:
nnz = 10000
ii = np.random.randint(10000, size=nnz)
jj = np.random.randint(10000, size=nnz)
vv = np.random.randn(nnz)

In [None]:
A = sparse.csr_matrix((10000, 10000))
A[ii, jj] = vv

In [None]:
A = sparse.csc_matrix((10000, 10000))
A[ii, jj] = vv

In [None]:
A = sparse.lil_matrix((10000, 10000))
A[ii, jj] = vv

### Multiplication

In [None]:
A

In [None]:
v = np.random.randn(A.shape[0])
%timeit A.dot(v)

In [None]:
v = np.random.randn(A.shape[0])
A_csr = sparse.csr_matrix(A)
%timeit A_csr.dot(v)

In [None]:
v = np.random.randn(A.shape[0])
A_csc = sparse.csc_matrix(A)
%timeit A_csc.dot(v)

**Remark:** CSR (resp. CSC) matrices are faster for right (resp. left) multiplication

In [None]:
A_csc.T  # Transposing a CSC matrix makes a CSR matrix

# Reference

* [Official document](http://docs.scipy.org/doc/scipy/reference/sparse.html)