# What is Sparse Data

Sparse data is data that has mostly unused elements (elements that don't carry any information ).

It can be an array like this one:

[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]

<b>Sparse Data:</b> is a data set where most of the item values are zero.

<b>Dense Array:</b> is the opposite of a sparse array: most of the values are not zero.

# How to Work With Sparse Data

SciPy has a module, <b>scipy.sparse</b> that provides functions to deal with sparse data.

There are primarily two types of sparse matrices that we use:

<b>CSC</b> - Compressed Sparse Column. For efficient arithmetic, fast column slicing.

<b>CSR</b> - Compressed Sparse Row. For fast row slicing, faster matrix vector products

We will use the CSR matrix in this tutorial.

# CSR Matrix

We can create CSR matrix by passing an arrray into function <b>scipy.sparse.csr_matrix()</b>

In [1]:
import numpy as np
from scipy.sparse import csr_matrix

In [2]:
arr = np.array([0,0,0,0,0,1,1,0,2])

In [4]:
print(csr_matrix(arr))

  (0, 5)	1
  (0, 6)	1
  (0, 8)	2




From the result we can see that there are 3 items with value.

The 1. item is in row 0 position 5 and has the value 1.

The 2. item is in row 0 position 6 and has the value 1.

The 3. item is in row 0 position 8 and has the value 2.


# Sparse Matrix Methods

Viewing stored data (not the zero items) with the <b>data</b> property:

In [8]:
arr = np.array([[0,0,0],[0,0,1],[1,0,2]])

In [9]:
print(csr_matrix(arr))

  (1, 2)	1
  (2, 0)	1
  (2, 2)	2


Counting nonzeros with the <b>count_nonzero()</b> method:

In [10]:
print(csr_matrix(arr).count_nonzero())

3


Removing zero-entries from the matrix with the <b>eliminate_zeros()</b> method:

In [25]:
import numpy as np
from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

In [26]:
mat = csr_matrix(arr)
mat.eliminate_zeros()
print(mat)

  (1, 2)	1
  (2, 0)	1
  (2, 2)	2


Eliminating duplicate entries with the <b>sum_duplicates()</b> method:

In [28]:
mat.sum_duplicates()
print(mat)

  (1, 2)	1
  (2, 0)	1
  (2, 2)	2


Converting from csr to csc with the <b>tocsc()</b> method:

In [None]:
newarr = csr_m