# Linear Algebra Algorithms for Data Science

**Purpose:** The purpose of this workbook is to help you get comfortable with the topics outlined below.

**Prereqs**
* Python Fundamentals Workbook or a good grasp of basic Python
* Numpy Workbook or a good grasp of creating and manipulating numpy arrays
* Matplotlib Workbook or a good grasp of plotting using matplotlib
    
**Recomended Usage**
* Run each of the cells (Shift+Enter) and edit them as necessary to solidify your understanding
* Do any of the exercises that are relevant to helping you understand the material

**Topics Covered**
* Representing Problems and Data with Matrices
* Linear Algebra Fundamentals (linearity, matrix math, transpose, identity, determinant, etc)
* Computational Linear Algebra Considerations
* Practical Applications in Data Science (dimensionality reduction, ...)

# Workbook Setup

In [2]:
# Reload all modules before executing a new line
%load_ext autoreload
%autoreload 2

# Abide by PEP8 code style
# %load_ext pycodestyle_magic
# %pycodestyle_on

# Plot all matplotlib plots in output cell and save on close
%matplotlib inline

In [3]:
import numpy as np

import matplotlib as mpl
import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

from IPython.display import YouTubeVideo

## Helper Functions

In [3]:
def print_a(a):
    print('Dim: {}\nShape: {}\n{}'.format(a.ndim, a.shape, a))

# Dimensionality Reduction

Frequently in data science disciplines we will need to **reduce the dimensionality of a matrix**. This means we find an equivalent or close to equivalent representation of it that is smaller (ie. less memory and less time intensive)

There are several methods for dimensionality reduction....

* **Factorization / Decomposition:** We could factor the matrix, or break it down into the product of lower dimensional matrices.
* PCA ...

## Dimensionality Reduction

Frequently data scientists will work with what are called sparse matrices (matrices with lots of zeros). In order to more efficiently store sparse matrices we can store just the non-zero values.

*I highly recoment checking out the Principle Component Analysis Workbook for more details on dimensionality reduction if its relevant to you.*

Some data structures better suited for efficiently working with sparse matrices:

* Dictionary of Keys. A dictionary is used where a row and column index is mapped to a value.

* List of Lists. Each row of the matrix is stored as a list, with each sublist containing the column index and the value.

* Coordinate List. A list of tuples is stored with each tuple containing the row index, column index, and the value.

* Compressed Sparse Row. The sparse matrix is represented using three one-dimensional arrays for the non-zero values, the extents of the rows, and the column indexes.

* Compressed Sparse Column. The same as the Compressed Sparse Row method except the column indices are compressed and read first before the row indices.

In [29]:
from scipy.sparse import csr_matrix

In [30]:
# create dense matrix
A = np.array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
print(A)

[[1 0 0 1 0 0]
 [0 0 2 0 0 1]
 [0 0 0 2 0 0]]


In [31]:
# convert to sparse matrix (CSR method)
S = csr_matrix(A)
print(S)

  (0, 0)	1
  (0, 3)	1
  (1, 2)	2
  (1, 5)	1
  (2, 3)	2


In [32]:
# reconstruct dense matrix
B = S.todense()
print(B)

[[1 0 0 1 0 0]
 [0 0 2 0 0 1]
 [0 0 0 2 0 0]]


# Clustering

# <span style="color:red;">Singular Value Decomposition (SVD)</span>

In [74]:
M = np.array([[3, 0, 2], [2, 0, -2], [0, 1, 1]])
M

array([[ 3,  0,  2],
       [ 2,  0, -2],
       [ 0,  1,  1]])

In [75]:
U, S, Vtranspose = np.linalg.svd(M)
print(U)
print(S)
print(Vtranspose)

[[-0.95123459  0.23048583 -0.20500982]
 [-0.28736244 -0.90373717  0.31730421]
 [-0.11214087  0.36074286  0.92589903]]
[3.72021075 2.87893436 0.93368567]
[[-0.9215684  -0.03014369 -0.38704398]
 [-0.38764928  0.1253043   0.91325071]
 [ 0.02096953  0.99166032 -0.12716166]]


# Appendix

## Additional Resources

[Linear Algebra Playlist (by 1blue3brown)](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)

[Understanding Linearity and Linear Transformations (by Khan Academy)](https://www.khanacademy.org/math/linear-algebra/matrix-transformations/linear-transformations/a/visualizing-linear-transformations)

[Numerical Linear Algebra (by Rachel Thomas, Fastai)](https://nbviewer.jupyter.org/github/fastai/numerical-linear-algebra/tree/master/)