# Linear Algebra
---
Table of Contents   
[**Chapter 1:** Introduction to Linear Algebra and to Mathematics for Machine Learning](#chapter-1-introduction-to-linear-algebra-and-to-mathematics-for-machine-learning)      
[**Chapter 2:** Vectors are objects that move around space](#chapter-2-vectors-are-objects-that-move-around-space)      
[**Chapter 3:** Matrices in Linear Algebra: Objects that operate on Vectors](#chapter-3-matrices-in-linear-algebra-objects-that-operate-on-vectors)     
[**Chapter 4:** Matrices make linear mappings](#chapter-4-matrices-make-linear-mappings)        
[**Chapter 5:** Eigenvalues and Eigenvectors: Application to Data Problems](#chapter-5-eigenvalues-and-eigenvectors-application-to-data-problems)       

## Chapter 1: Introduction to Linear Algebra and to Mathematics for Machine Learning 

### Motivation

#### Solving simultaneous equations
- **a**: apple
- **b**: banana
$$
\begin{cases}
2a + 3b = 8 \\
10a + 1b = 13
\end{cases}
$$
We can write it into *matrices* and *vectors*:
$$
\begin{pmatrix}
2 & 3 \\
10 & 1
\end{pmatrix}
\begin{bmatrix}
a \\
b
\end{bmatrix}
=
\begin{bmatrix}
8\\
13
\end{bmatrix}
$$
in which $\begin{bmatrix} a \\ b \end{bmatrix}$ and $\begin{bmatrix} 8 \\ 13 \end{bmatrix}$ are vectors, $\begin{pmatrix} 2 & 3 \\ 10 & 1 \end{pmatrix}$ is a matrix.

#### Find the **optimal value** of the parameters in the equation describing this line.
![image.png](attachment:image.png)
- **Optimal value**: the ones that fit the data in the histogram best.

*Example:* the distribution of people height with **Normal** or **Gaussian** distribution. (we need to find the $\mu$ and $\sigma$)
![image-3.png](attachment:image-3.png)
we an plot this distribtuion to the contour plot as below
![image-4.png](attachment:image-4.png)
and we make a vector of change in $\mu$ and a change in $\sigma$ as $\begin{bmatrix} \mu \\ \sigma \end{bmatrix}$ to $\begin{bmatrix} \mu' \\ \sigma' \end{bmatrix}$
> What we are trying to do then is find the location in that space, when the badness is minimised, the goodness is maximised, and the function fits the data best. (if the badness surface here was liek a contour map of a landscape, we are trying to find the bottom of the hill, the lowest possible point in the landscape.)

#### Operations with Vector
Two rules of a vector:
1. Addition
2. Multiplication by a scalar number

**Addition** <br>
![image.png](attachment:image.png)      
We have $r + s = s + r$

**Scalar multiplication** <br>
![image-3.png](attachment:image-3.png)      

**Vector** <br>
![image-2.png](attachment:image-2.png)
in which vector $i$ and $j$ are unit vector for height and length.

## Chapter 2: Vectors are objects that move around space

### Modulus & inner product
1. **Modulus** (or "norm")
![image.png](attachment:image.png)
We have:
- **vector a**: $\vec{a} = \begin{bmatrix} a \\ b \end{bmatrix} = \begin{vmatrix} a \\ b \end{vmatrix} $
- **sizer of vector a**: $ |r| = \sqrt{a^2 + b^2} $

2. **Inner product** (or "dot product")
![image-2.png](attachment:image-2.png)
We have: 
$$
\begin{align*}
r\!\cdot\! s & = r_i\,s_i + r_j\,s_j           \\
             & = 3(-1) + 2\cdot 2              \\
             & = 1                             \\
             & = s \cdot r (commutative)
\end{align*}
$$
We have 
- **distributive over addition**: $ r \cdot (s + t) = r \cdot s + r \cdot t $ <br>
, proved by $ r = \begin{bmatrix} r_1 \\ r_2 \\ ... \\ r_n \end{bmatrix} $, $s = \begin{bmatrix} s_1 \\ s_2 \\ ... \\ s_n \end{bmatrix} $, and $r = \begin{bmatrix} r_1 \\ r_2 \\ ... \\ r_n \end{bmatrix} $

therefore:
$$ 
\begin{align*} 
r \cdot (s+t) 
    &= r_1 \cdot (s_1+t_1) + r_2 \cdot (s_2+t_2) + ... + r_n \cdot (s_n+t_n) \\
    &= r_1s_1 + r_1t_1 + r_2s_2 + r_2t_2 + ... + r_ns_n + r_nt_n   \\
    &= r \cdot s + r \cdot t
\end{align*}
$$

- **associative over scalar multiplication**: $ r\cdot (as) = a(r\cdot s) $ <br>
, proved by 
$$ 
\begin{align*}
r\cdot (as) &= r_1(as_1) + r_2(as_2) \\
            &= a(r_1s_1 + r_2s_2) \\
            &= a(r \cdot s)
\end{align*}
$$

- **dot product with itself**:
$$
\begin{align*}
r \cdot r   &= r_1.r_1 + r_2.r_2 + ... + r_n.r_n \\
            &= r_1^2 + r_2^2 + ... + r_n^2  \\
            &= \left(\sqrt{r_1^2 + r_2^2 + ... + r_n^2} \right)^2   \\
            & = |r|^2
\end{align*}
$$

### Cosine & dot product
![image.png](attachment:image.png) <br>
- Cosine rule: $c^2 = a^2 + b^2 -2ab\cos\theta$
$$
|r-s|^2 = |r|^2 + |s|^2 - 2|r||s|\cos\theta \tag{1}   \\
$$
in which we have 
$$
\begin{align*}
|r-s|^2 &= (r-s)(r-s)   \\
        &= r.r - s.r - s.r -s.-s \\
        &= |r|^2 - 2s.r + |s|^2 \tag{2}  \\
\end{align*}
$$
with $(1) = (2)$, we have:
$$
2r.s = 2|r||s|\cos\theta \\
$$
$$
\implies \boxed{r.s= |r||s|\cos\theta \\}
$$
if:
1. $\theta = 90^{\circ}$ then $\cos\theta = 0 \implies r.s = |r||s|.0 = 0$ (called **orthogonal** or **perpendicular**)
2. $\theta = 0^{\circ}$ then $\cos\theta = 1 \implies r.s = |r||s|$
3. $\theta = 180^{\circ}$ then $\cos\theta = -1 \implies r.s = -|r||s|$

## Chapter 3: Matrices in Linear Algebra: Objects that operate on Vectors

## Chapter 4: Matrices make linear mappings

## Chapter 5: Eigenvalues and Eigenvectors: Application to Data Problems