# Linear Algebra
- Linear Algebra is the focus on linear systems and determinants
- Links solutions of linear systems to intersections of lines or planes

- Useful for 
    - Efficient description of large data sets using feature vector
    - Constructing low-dimensional approximations of the available data through projection and least square

- Under the hood of every algorithm:
    - Regression (minimzing squared errors), Principle Component Analysis (finding principal axes), Neural Networks (matrix multiplication)

In [4]:
# Scalar, Vector, Matrix (2D grid of numbers = an excel sheet (sample x features))
import math

# A vector is just a list of numbers that represent a point or direction; 2D point (x=3, y=4)
v = [3, 4]

# Magnitude of a vector ||v|| = sqrt(x^2 + y^2)
magnitude = math.sqrt(v[0]**2 + v[1]**2)
print("The magnitude of vector v is", magnitude)

# Distance between two vectors
u = [1, 2]
v = [4, 6]

diff = [u[0] - v[0], u[1] - v[1]]
distance = math.sqrt(diff[0]**2 + diff[1]**2)
print(distance)

The magnitude of vector v is 5.0
5.0


#### Data Structures of ML:
- Scalar: a single number
- Vector: an ordered list of numbers; fundamental unit to represent a quantity with magnitude and direction; an array or a list of attributes in CS
- Matrix: A 2D grid of numbers
- Tensor: Matrices of N-dimensions
    - Rank-3 Tensor: Time series data (Rows x Columns x Time)
    - Rank-4 Tensor: Image batch (Batch Size x Height x Width x Color)


- Data Point - represented using vectors or collection of vectors (matrix)
- A collection of vectors is a matrix, Matrix M has m rows (samples, data points), n columns (features/attributes)
- Matrix Operations: vector addition, subtraction, scalar multiplication

#### Dot Product - results in scalar
- Mathematical definition:
  $a \cdot b = \sum_{i=1}^{n} a_i b_i$

- Geometric definition:
$$
\mathbf{a} \cdot \mathbf{b}
= \|\mathbf{a}\|\,\|\mathbf{b}\| \cos\theta
$$

#### Cosine Similarity - ignores magnitude & focuses on direction
- Use this to compare the similarity of two vectors; range: -1 (opposite) to 1 (identical)
- If 0, then vectors are orthogonal (perpendicular)

- Geometric definition:
$$
\cos(\theta)
= \frac{\mathbf{a} \cdot \mathbf{b}}
{\|\mathbf{a}\|\,\|\mathbf{b}\|}
$$

#### Vector Norm
- Function that measures the **size** or **magnitude** of a vector by quantifying its length from the origin
- Vector norm:
$$
\|x\|_p
$$

- the vector normal for p = 1,2,.. is defined as:
$$
\|x\|_p \triangleq \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}
$$
- Where $|x_i|$ is the absolute value of $x_i$.

- Properties: 
    - non negative |x| > 0 when x $ \neq $ 0
    - Definite |x| = 0 <u>**if and only if**</u> x = 0
    - Triangle inequality |x + y| $\leq$ |x| + |y|

- For a vector x = [$ x_1 $, $ x_2 $,..., $ x_n $] $\in$ $\mathbb{R}^n$
1. $ l_0 $ "norm" - counts the number of non-zero entries:
$$
\|x\|_0 = \text{number of non-zero elements in } x
$$

2. $ l_1 $ - sum of absolute values
$$
\|x\|_1 = \sum_{i=1}^{n} |x_i|
$$

3. $ l_2 $ norm (Euclidean norm) - square root of sum of squares:
$$
\|x\|_2 = \left( \sum_{i=1}^{n} x_i^2 \right)^{1/2}
$$

4. $ l_p $ normal (general) - for p $ \geq $ 1:
$$
\|x\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p} = (|x_1|^P + |x_2|^p + ... + |x_n|^p)^{1/p}
$$

5. $ l_\infty $ norm (max norm) - largest absolute value entry:
$$
\|x\|_\infty = \max_{i} |x_i| 
$$

#### Transformation
- y = Ax
- Input: vector x
- Function: Matrix A (transformation matrix) 
- Output: vector y
- Matrices only perform linear transformations: rotate, scale, shear, reflect

#### Matrix Multiplication
- Inner dimensions must match (m x n) $\cdot$ (n x p) = (m x p)
- Application:
    - Neural Networks: a chain of matrix multiplications
    - $ Layer_2 (Layer_1(input))  \approx y = W_2 \cdot (W_1 \cdot x)$

- Identity matrix (I)
    - Square matrix with 1s on diagonal, 0s elsewhere
    - $ A \cdot I = A$
- Inverse Matrix $(A^-1)$ 
    - $ A \cdot A^-1 = I$
    - A square matrix that does not have a matrix inverse is a **Singular Matrix**



#### Solving Systems of Equations
- The problem: Ax = b
- A: Data/Features
- x: Weights (unknowns) 
- b: Targets/Labels
- Solution: $x = A^{-1}b$
- inverting large matrices is slow/unstable
- Use approximation algorithm (Gradient Descent)

- *Transposition of a matrix flips it over its main diagonal, switching rows and columns to create a new matrix