$\def\*#1{\mathbf{#1}}$
$\DeclareMathOperator*{\argmax}{arg\,max}$
# Linear Algebra

## Information $\rightarrow$ Matrix

* Each row is a **record** (instance, object, point,...).
* Each column is a **feature** (attribute, dimension,...).

$$
D = 
\left(
\begin{array}{c|cccc}
        & X_1 & X_2 & \cdots & X_d\\
        \hline
  \*x_1 & x_{1,1} & x_{1,2} & \cdots & x_{1,d} \\
  \*x_2 & x_{2,1} & x_{2,2} & \cdots & x_{2,d} \\
  \vdots & \vdots  & \vdots  & \ddots & \vdots  \\
  \*x_n & x_{n,1} & x_{n,2} & \cdots & x_{n,d} 
\end{array}
\right)
$$

## Linear algebra

* *The mathematics of matrices*, therefore of curcial importance in data science.
* To understand many *machine learning algorithms* 

A $m \times n$ matrix can be used to represent objects such as :

* Data
* Geometric point sets
* Systems of equations
* Graphs and networks (see adjacency or incidence matrices)
* Rearrangement operations

Operations on matrices are available in linear algebra libraries (such as numpy.linalg); are developed to take advantage of the CPU architecture (see, [intel MKL](https://software.intel.com/en-us/mkl)). The idea is therefore to formulate our problems usin linear algebra in order to take advantage of these libraries.

## Geometry and vectors

Points can be represented by a unit vector plus their magnitude :

In [None]:
import numpy as np

X = np.array([[1, 12, 4, 13, 3, 10],
              [10, 1, 40, 1, 30, 1],
              [0, 6, 2, 6, 1, 5],
              [0.5, 6, 2, 6.5, 1.5, 5]])

norms = np.linalg.norm(X, axis=1)
norms = norms[:, np.newaxis]

print('Unit vectors :', X/norms, sep='\n')

print('Magnitudes :', norms, sep='\n')

### Distance

Let's compute the scalar product between each unit vector :

In [None]:
Xn = X/norms
np.dot(Xn, Xn.T)

Compute the distance between points by the angles between vectors.

$$
cos(\theta) = \frac{A \cdot B}{||A||\ ||B||}
$$

$$
cos(0) = 1, cos(\pi/2) = 0, cos(\pi) = -1
$$

Observations :

* In the case of unit vectors, the *dot product* is equal to the cosine function.
* The cosine function is equal to the correlation of two mean-zero variables :

$$
\begin{aligned} 
r &= \frac{\sum_{i=1}^n(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n(x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n(y_i - \bar{y})^2}}\\
&= \frac{\sum_{i=1}^nx_i y_i}{\sqrt{\sum_{i=1}^nx_i^2} \sqrt{\sum_{i=1}^ny_i^2}}\\
&= \frac{\*x \cdot \*y}{||\*x||\ ||\*y||}\\
\end{aligned}
$$

* The dot product measure how "in sync" the two vectors are.



## Matrix Operations

In [None]:
# Matrix addition, linear combinations and transpose on pictures

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib notebook

img = mpimg.imread('../datasets/stinkbug.png')
height = img.shape[0]
img = img[:, :height, 0]

fix, axs = plt.subplots(nrows=2, ncols=2)

axs[0, 0].imshow(img)
axs[0, 1].imshow(img.T)
axs[1, 0].imshow(img + img.T)
axs[1, 1].imshow(img + 0.5 * img[:,::-1])

for ax in axs.flatten():
    ax.set_xticks([])
    ax.set_yticks([])

### Matrix Product

The matrix product $C = AB$ is defined as follows : 

$$
C_{ij} = \sum_{i=1}^k A_{ik}\cdot B_{kj},
$$

where the *inner dimensions* of $A$ and $B$ have to be the same, *i.e.* if $A$ is an $n \times k$ matrix, then $B$ has to be a $k \times m$ matrix. 

The product matrix $C$, each $n \times m$ element $C_{ij}$ is equal to the dot product of the $i$-th row of A with the $j$-th column of B.

Matrix multiplication *does not commute* :

In [None]:
A = np.matrix([[1, 1], [0, 1]])
B = np.matrix([[1, 1], [1, 0]])
print(A.dot(B))
print(B.dot(A))

And obviously for non square matrices :

In [None]:
A = np.random.rand(3, 4)
B = np.random.rand(4, 2)
print(A.dot(B))
# print(B.dot(A)) the inner dimentions are different

Matrix multiplication is *associative* :

$$
(AB)C = A(BC)
$$

Let us compare the computation time of $(AB)(CD)$ and $(A(BC))D$ when $A \in \mathbb{R}^{1 \times n}$, $B, C \in \mathbb{R}^{n \times n}$, and $D \in \mathbb{R}^{n \times 1}$. (see, http://www.scipy-lectures.org/advanced/optimizing/ for optimization)

In [None]:
n = 1000
A = np.random.randn(n)
B = np.random.randn(n, n)
C = np.random.randn(n, n)
D = np.random.randn(n)

In [None]:
%timeit -n 30 -r 1 np.dot(np.dot(A, B), np.dot(C,D))

In [None]:
%timeit -n 30 -r 1 np.dot(np.dot(A, np.dot(B, C)), D)

By taking advantage of well optimized linear algebra libraries, formulating problems as matrix products (when considering large arrays) leads to very big performance wins in practice.

### Applications of Matrix multiplication

Consider an $n \times d$ data matrix $A$ :

* $A \cdot A^T$ : is an $n \times m$ matrix of dot products, which measure the "in sync-ness" among the points
* $A^T \cdot A$ : is an $d \times d$ matrix of dot products, which measure the "in sync-ness among the features

These matrices are the so-called **covariance matrices** when rows and colums have mean zero.

In [None]:
fig, axs = plt.subplots(ncols=3)

M = mpimg.imread('../datasets/lincoln_memorial.png')[:,:,0]

axs[0].imshow(M)
axs[0].set_title('$M$')

A = M/np.sqrt(np.sum(M*M, axis=1))[:, np.newaxis]

axs[1].imshow(np.dot(A, A.T))
axs[1].set_title('$M \cdot M^T$')

A = A/np.sqrt(np.sum(M*M, axis=0))

axs[2].imshow(np.dot(A.T, A))
axs[2].set_title('$M^T \cdot M$')

for ax in axs:
    ax.axis('off')
    
plt.tight_layout()

* The $k$th power of the **adjacency matrix** $A^k$ of a graph gives the number of path of length $k$ between each pair of vertex $(i, j)$.
* Permutations (rows or columns):

$$
P_{(2431)} = 
\left[
\begin{array}{cccc}
  0 & 0 & 0 & 1 \\
  1 & 0 & 0 & 0 \\
  0 & 0 & 1 & 1 \\
  0 & 1 & 0 & 0
\end{array}
\right]
$$

* Rotating points in space :

$$
R_{\theta} = 
\left[
\begin{array}{cc}
\cos(\theta) & -\sin(\theta)\\
\sin(\theta) & \cos(\theta)
\end{array}
\right]
$$


In [None]:
fig, axs = plt.subplots(nrows=2, ncols=2)
n = M.shape[0]
P = np.identity(n)
P = np.concatenate((P[n//2:,:], P[:n//2,:]))

axs[0, 0].imshow(M)
axs[0, 0].set_title('$M$')

axs[0, 1].imshow(P)
axs[0, 1].set_title('$P$')

axs[1, 0].imshow(np.dot(P, M))
axs[1, 0].set_title('$P \cdot M$')

axs[1, 1].imshow(np.dot(M, P))
axs[1, 1].set_title('$M \cdot P$')

for ax in axs.flatten():
    ax.axis('off')
    
plt.tight_layout()

## Identity Matrices and Inversion

* $IA = AI = A$
* $A \cdot A^{-1} = I$ 

For example, when $A$ is a $2 \times 2$ matrix :

$$
A^{-1} = \left[
\begin{array}{cc}
a & b\\
c & d
\end{array}
\right]^{-1}
= \frac{1}{ad - bc} \left[
\begin{array}{cc}
d & -b\\
-c & a
\end{array}
\right]
$$

In [None]:
from numpy.linalg import inv
a = np.array([[1., 2.], [3., 4.]])
ainv = inv(a)
ainv

In [None]:
a = np.array([[3., 2.], [3., 2.]])
ainv = inv(a)

In [None]:
a = np.array([[3., 2.], [6., 4.]])
ainv = inv(a)

* Matrices that are not inversible are called *singular* (otherwise, *non-singular*).
* If the *determinant* is not zero, the the matrix is *non-singular*.
* Only squared matrices are inversible.

## Matrix Rank

## Factoring Matrices

## Eigenvalues and Eigenvectors

## Eigenvalue Decomposition

## References

* **The Data Science Design Manual**, by Steven Skiena, 2017, Springer;
* Python notebooks available at [http://data-manual.com/data](http://data-manual.com/data);
* Lectures slides available at [http://www3.cs.stonybrook.edu/~skiena/data-manual/lectures/](http://www3.cs.stonybrook.edu/~skiena/data-manual/lectures/);
* https://matplotlib.org/users/image_tutorial.html.