# [NTDS'18] tutorial 5: Sparse Arrays on Scipy
[ntds'18]: https://github.com/mdeff/ntds_2018

[Eda Bayram](http://lts4.epfl.ch/bayram), [EPFL LTS4](http://lts4.epfl.ch)

## Ojective
This tutorial will provide a short tutorial on ``scipy.sparse`` module. We will talk about:

1) What is sparsity?

2) Sparse Matrix Storage Schemes

3) Linear Operations on Sparse Matrices

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import sparse
from scipy import linalg
from sys import getsizeof

%matplotlib inline

## 1. Sparsity

Why do we need sparse representation?

* Less memory usage
* More efficiency

Mos of the graph representations are sparse!

Let us create a random sparse matrix and analyze the sparsity.

In [None]:
N =250 
dummy = sparse.random(N,N,density=0.01)
print('Number of nonzeros: ', dummy.getnnz(), ',density: ', dummy.getnnz()/ dummy.shape[0]**2)

In [None]:
plt.spy(dummy,markersize=1);

In [None]:
print(dummy)

Let us convert the sparse array to dense formats and look at the memmory space they occupy.

In [None]:
dummyA = dummy.A
dummyArr = dummy.toarray()
dummyDense = dummy.todense()

print('Type: ', type(dummy), ',size: ',getsizeof(dummy))
print('Type: ', type(dummyA), ',size: ',getsizeof(dummyA))
print('Type: ', type(dummyArr), ',size: ',getsizeof(dummyArr))
print('Type: ', type(dummyDense), ',size: ',getsizeof(dummyDense))

## 2. Sparse Matrix Storage Schemes

`Sparse` module provides several formats of sparse data structures, which could be advantageous for different tasks such as matrix construction, indexing and linear operations.

### 2.1 List of Lists Format (LIL)

* Supports indexing, which cannot be done with other sparse matrix formats
* Changing sparsity structure is efficient, eg; reading a sparse matrix from a text file

In [None]:
# initiate an empty lil matrix
mtx = sparse.lil_matrix((4, 5))

In [None]:
# assign some of the indices, i.e.; Changing the sparsity
mtx[:2,[1,3]] = np.array([[1,2],[3,4]])

mtx.toarray()

In [None]:
# Read some of the indices
mtx[:2].toarray()

### 2.2 Coordinate Format (COO)

Construction of the matrix using (data, ij) tuple and data

In [None]:
row = np.array([0, 3, 1, 0]) # row coordinates
col = np.array([0, 3, 1, 2]) # column coordinates
data = np.array([4, 5, 7, 9]) # non-zero elements

mtx = sparse.coo_matrix((data, (row, col)), shape=(4, 4))

In [None]:
mtx.toarray()

Advantages:
* Fast element-wise operations
* Fast conversion to other sparse formats

In [None]:
# Element-wise power
mtx.power(0.5).toarray()

In [None]:
mtx_csr = mtx.tocsr()

Disadvantages:
* Indexing is not possible (Use LIL instead!)
* Slow at arithmetic operations (Use CSR, CSC instead!)

### 2.3 Compressed Sparse Row & Column Formats (CSR & CSC)

In [None]:
# Get the data array
mtx_csr.data

`CSR` is row oriented:
* efficient row slicing
* fast matrix vector products, the right multiplication `CSR * v`

In [None]:
# Get array of column indices for CSR
mtx_csr.indices

In [None]:
# Matrix-vector product from the right
v = np.array([1, 1, 1, 1])
mtx_csr.dot(v)

`CSC` is column oriented:
* efficient column slicing
* fast matrix vector products, the left multiplication `v * CSC`

In [None]:
mtx_csc = mtx.tocsc()
# Get array of row indices for CSC
mtx_csc.indices

In [None]:
# vectro-matrix product
v * mtx_csc

Efficient arithmetic operations `CSC + CSC`, `CSR * CSR`, etc.

In [None]:
# Matrix-Matrix product (* is elementwise product on Numpy!)
prod = mtx_csr*mtx_csc
prod.toarray()

In [None]:
prod = mtx_csr@mtx_csc #@ is matrix product both on numpy and scipy!
prod.toarray()

You can read more about the sparse matrix storage schemes [here](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_.28CSR.2C_CRS_or_Yale_format.29).

## 3. Linear Agebra on Sparse Matrices

In [None]:
import scipy.sparse.linalg as sparseLA

### 3.1 Some Basic Operations

In [None]:
# sparse matrix from diagonals
A = sparse.spdiags(np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]]),[-1,0,2],4,4)
A.toarray()

** Inversion of a sparse matrix **

In [None]:
A = A.tocsc() #convert it to CSC matrix for efficiency
Ainv = sparseLA.inv(A)
Ainv.toarray()

In [None]:
sparseLA.norm(A) #default frobenius norm

** Solve A x = b **

In [None]:
b = np.array([1,1,1,1])
x = sparseLA.spsolve(A,b)
x

### 3.2 Eigenvalue Decomposition

For the full eigendecomposition of an array, you can use the functions provided by Numpy:
* `numpy.linalg.eig`
* `numpy.linalg.eigvals`
* `numpy.linalg.eigh`
* `numpy.linalg.eighvals`


Scipy presents more functionality (read [here](https://www.scipy.org/scipylib/faq.html#why-both-numpy-linalg-and-scipy-linalg-what-s-the-difference)) such as solving generalized eigenvalue problem, you can use the functions from Scipy:
* `scipy.linalg.eig`
* `scipy.linalg.eigvals`
* `scipy.linalg.eigh`
* `scipy.linalg.eighvals`

In [None]:
linalg.eigvals(A.toarray())

Decomposition of an Hermitian matrix:

In [None]:
A = np.array([[1, -2j], [2j, 5]])
linalg.eigvalsh(A)

However, for quickly finding a few eigenvalues of a large sparse matrix, you should use the corresponding functions from the [sparse module](https://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html):

* `scipy.sparse.eigs`
* `scipy.sparse.eigsh`

In [None]:
dummy = sparse.random(30,30,density= 0.01)
evals, evecs = sparseLA.eigs(dummy,k=5,which='SM')
evals