# Sparse Arrays (**`scipy.sparse`**)

Documentation Link: https://docs.scipy.org/doc/scipy/tutorial/sparse.html

In [2]:
import numpy as np
import scipy as scp
from scipy import sparse

In [3]:
dense_arr = np.array([[1, 0, 0, 2], [0, 4, 1, 0], [0, 0, 5, 0]])
sparse_arr = sparse.coo_array(dense_arr)
display('dense', dense_arr, 'sparse', sparse_arr)
display('prop same as arrays', sparse_arr.max(), sparse_arr.argmax(), 
        sparse_arr.mean(), 'no. of stored values', sparse_arr.nnz)
display('reduction op', sparse_arr.mean(axis=1))

'dense'

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]])

'sparse'

<3x4 sparse array of type '<class 'numpy.int32'>'
	with 5 stored elements in COOrdinate format>

'prop same as arrays'

5

10

1.0833333333333333

'no. of stored values'

5

'reduction op'

array([0.75, 1.25, 1.25])

## [Sparse array formats](https://docs.scipy.org/doc/scipy/tutorial/sparse.html#understanding-sparse-array-formats)

In [6]:
display(dense_arr[2,2])
display(# sparse_arr[2,2],
"'coo_array' object is not subscriptable")
display(sparse_arr.tocsr()[2,2], 
'Compressed Sparse Row (CSR) csr_array support slicing and element indexing')
display(sparse_arr@sparse_arr.T, 
'the dot product of two sparse arrays in COO format will be a CSR format array')

5

"'coo_array' object is not subscriptable"

5

'Compressed Sparse Row (CSR) csr_array support slicing and element indexing'

<3x3 sparse array of type '<class 'numpy.intc'>'
	with 5 stored elements in Compressed Sparse Row format>

'the dot product of two sparse arrays in COO format will be a CSR format array'

The [scipy.sparse](https://docs.scipy.org/doc/scipy/reference/sparse.html#module-scipy.sparse) module contains the following formats, each with their own distinct advantages and disadvantages:

- **Block Sparse Row (BSR) arrays** [scipy.sparse.bsr_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.bsr_array.html#scipy.sparse.bsr_array): Most appropriate when the parts of the array with data occur in contiguous blocks.

- **Coordinate (COO) arrays** [scipy.sparse.coo_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_array.html#scipy.sparse.coo_array): Provide a simple way to construct sparse arrays and modify them in place. COO can also be quickly converted into other formats such as CSR, CSC, or BSR.

- **Compressed Sparse Row (CSR) arrays** [scipy.sparse.csr_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_array.html#scipy.sparse.csr_array): Most useful for fast arithmetic, vector products, and slicing by row.

- **Compressed Sparse Column (CSC) arrays** [scipy.sparse.csc_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_array.html#scipy.sparse.csc_array): Most useful for fast arithmetic, vector products, and slicing by column.

- **Diagonal (DIA) arrays** [scipy.sparse.dia_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.dia_array.html#scipy.sparse.dia_array): Useful for efficient storage and fast arithmetic as long as the data primarily occurs along diagonals of the array.

- **Dictionary of Keys (DOK) arrays** [scipy.sparse.dok_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.dok_array.html#scipy.sparse.dok_array): Useful for fast construction and single-element access.

- **List of Lists (LIL) arrays** [scipy.sparse.lil_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_array.html#scipy.sparse.lil_array): Useful for fast construction and modification of sparse arrays.


In [14]:
display(dense_arr)
row = [0,0,1,1,2]
col = [0,3,1,2,2]
data = [1,2,4,1,5]
csr_arr = sparse.csr_array((data, (row, col))) # data and coordinate
display(csr_arr)

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]])

<3x4 sparse array of type '<class 'numpy.intc'>'
	with 5 stored elements in Compressed Sparse Row format>

The [scipy.sparse.csr_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_array.html#scipy.sparse.csr_array), [scipy.sparse.csc_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_array.html#scipy.sparse.csc_array), and [scipy.sparse.coo_array](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_array.html#scipy.sparse.coo_array) allow for this style of construction.


## [Sparse arrays, implicit zeros, and duplicates](https://docs.scipy.org/doc/scipy/tutorial/sparse.html#sparse-arrays-implicit-zeros-and-duplicates)

In [16]:
row = [0,0,1,1,2,2]
col = [0,3,1,2,2,3]
data = [1,2,4,1,5,0]
csr_arr = sparse.csr_array((data, (row, col)))
display(csr_arr, '6 stored, not 5!')
display('sparse to dense', csr_arr.todense(), dense_arr)
csr_arr.eliminate_zeros()
display(csr_arr, csr_arr.todense())

<3x4 sparse array of type '<class 'numpy.intc'>'
	with 6 stored elements in Compressed Sparse Row format>

'6 stored, not 5!'

'sparse to dense'

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]], dtype=int32)

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]])

<3x4 sparse array of type '<class 'numpy.intc'>'
	with 5 stored elements in Compressed Sparse Row format>

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]], dtype=int32)

In [19]:
row = [0,0,1,1,1,2]
col = [0,3,1,1,2,2]
data = [1,2,1,3,1,5]
dupes = sparse.coo_array((data, (row, col)))
display(dupes, dupes.todense(), 'at (1,1) it\'s 1+3')
dupes.sum_duplicates()
display(dupes, dupes.todense())

<3x4 sparse array of type '<class 'numpy.int32'>'
	with 6 stored elements in COOrdinate format>

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]])

"at (1,1) it's 1+3"

<3x4 sparse array of type '<class 'numpy.intc'>'
	with 5 stored elements in COOrdinate format>

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]], dtype=int32)

## [Canonical formats](https://docs.scipy.org/doc/scipy/tutorial/sparse.html#canonical-formats)

In [23]:
dupes = sparse.coo_array((data, (row, col)))
display(dupes.has_canonical_format)
dupes.sum_duplicates()
display(dupes.has_canonical_format)

False

True