# Mathematics for machine learning - linear algebra

- $f(x) = \dfrac{1}{\sigma\sqrt{2\pi}}exp\left[\dfrac{-(x-\mu)^{2}}{2\sigma^{2}}\right]$
- Consider $\begin{bmatrix} \mu \\ \sigma \\ \end{bmatrix}$ as vectors

## Vectors

For two vectors $r = \begin{bmatrix} r_{i} \\ r_{j} \\ \end{bmatrix}$ and $s = \begin{bmatrix} s_{i} \\ s_{j} \\ \end{bmatrix}$ 
- $|r| = \sqrt{a^{2}+b^{2}}$
- $r \cdot s = r_{i}s_{i} + r_{j}s_{j}$ 
- $r \cdot s = s \cdot r$ (communitive)
- $r \cdot (s+t) = r \cdot s + r \cdot t$ (distributive)
- $r \cdot (as) = a(r \cdot \ s)$ (associative)

## Cosine rule

- $c^{2} = a^{2} + b^{2} - 2abcos\theta$
- $|r-s|^{2} = |r|^{2} + |s|^{2} - 2|r||s|cos\theta$
- $(r-s) \cdot (r-s) = |r^{2}| -2s \cdot r + |s^{2}|$
- Thus, $s \cdot r = |r||s|cos\theta, cos\theta = \dfrac{s \cdot r}{|r||s|}$

## Projection

- $\dfrac{r \cdot s}{|r|} = |s|cos\theta$ (scalar projection)
- $\hat{r}|s|cos\theta$ (vector projection = scalar projection multiplied by a unit vector)

## Changing basis

- $\hat{e}_{1} = \begin{bmatrix} 1 \\ 0 \\ \end{bmatrix}, \hat{e}_{2} = \begin{bmatrix} 0 \\ 1 \\ \end{bmatrix}, b_{1}= \begin{bmatrix} 2 \\ 1 \\ \end{bmatrix}, b_{2}= \begin{bmatrix} -2 \\ 4 \\ \end{bmatrix}$
- $r_{e} = 3\hat{e}_{1} + 4\hat{e}_{2} = \begin{bmatrix} 3 \\ 4 \\ \end{bmatrix}$
- $r_{b} = \begin{bmatrix} 2 \\ 1/2 \\ \end{bmatrix} \left(\dfrac{r_{e}b_{1}}{|b_{1}|^{2}} = 2, \dfrac{r_{e}b_{2}}{|b_{2}|^{2}} = \dfrac{1}{2}\right)$

## Basis

- Not linear combos
- Spans the space

## Matrices

Rotate 90 degree anti-clockwise?
- $\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \rightarrow \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ \end{bmatrix}$

Matrix multiplication is not communitive
- $A_{2}A_{1} \ne A_{1}A_{2}$

Matrix multiplication is associative
- $A_{3}(A_{2}A_{1}) = (A_{3}A_{2})A_{1}$

## Determinant

- $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$
- $\dfrac{1}{ad-bc}$ (if 0, then linearly dependent)

## Chaning basus

- $B^{-1}RB = R_{B}$

## Orthogonal matrices

Orthonormal
- $a_{i} \cdot a_{j} = 0$ if $i \ne j$ 
- $a_{i} \cdot a_{j} = 1$ if $i = j$ 

Also
- $A^{T}A = I$ $(A^{T} = A^{-1})$

## Gram-Schmidt process 

How to construct orthonormal basis

- let $v = \{v_{1}, v_{2} \dots v_{n}\}$ vectors spanning a space
- let $e = \{e_{1}, e_{2} \dots e_{n}\}$ orthonormalized verions of $v$'s
- leave $v_{1}$ as it is
- $v_{2} = (v_{2} \cdot e_{1})e_{1} + u_{2}$, rearranging $u_{2} = v_{2} - (v_{2} \cdot e_{1})e_{1}$
- then $e_{2} = \dfrac{u_{2}}{|u_{2}|}$
- $u_{3} = v_{3} - (v_{3} \cdot e_{1})e_{1} - (v_{3} \cdot e_{2})e_{2}$
- then $e_{3} = \dfrac{u_{3}}{|u_{3}|}$

## Reflecting in a plane

Example
- $v_{1} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, v_{2} = \begin{bmatrix} 2 \\ 0 \\ 1 \end{bmatrix}, v_{3} = \begin{bmatrix} 3 \\ 1 \\ -1 \end{bmatrix}$
- $e_{1} = \dfrac{v_{1}}{|v_{1}|} = \dfrac{1}{\sqrt{3}} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}$
- $u_{2} = v_{2} - (v_{2} \cdot e_{1})e_{1} = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}$
- $e_{2} = \dfrac{u_{2}}{|u_{2}|} = \dfrac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}$
- $u_{3} = v_{3} - (v_{3} \cdot e_{1})e_{1} - (v_{3} \cdot e_{2})e_{2} = \begin{bmatrix} 1 \\ 1 \\ -2 \end{bmatrix}$
- $e_{3} = \dfrac{u_{3}}{|u_{3}|} = \dfrac{1}{\sqrt{6}} \begin{bmatrix} 1 \\ 1 \\ -2 \end{bmatrix}$

Transformation matrix
- $E = \begin{bmatrix} e_{1} & e_{2} & e_{3} \end{bmatrix}$ (note $E^{T} = E^{-1}$)

Reflection matrix in $e_{3}$
- $T_{E} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{bmatrix}$
- Going from $r$ to $r^{'}$ is hard
- So we do $r \xrightarrow{E^{-1}} r_{E} \xrightarrow{T_{E}} r_{E}^{'} \xrightarrow{E} r^{'}$ 
- $ET_{E}E^{'}r = r^{'}$

## Eigen

- eigenvectors stay in the same span after applying transformations
- for eigenvalues $\lambda$, eigenvectors $x$, and transformation matrix $A$
    - $Ax = \lambda x$
    - $(A-\lambda I)x = 0$
    - det$(A-\lambda I) = 0$
    
## Changing to eigenbasis

- $v \xrightarrow{c^{-1}} [v]_{E} \xrightarrow{D^{n}} [T^{n}v]_{E} \xrightarrow{c} T^{n}v$
- $T^{n} = CD^{n}C^{-1}$