# 2. Mathematical Foundations for Machine Learning

This notebook contains the minimum math you'll use in almost every ML model: linear algebra, calculus & optimization, probability and basic statistics.

## Linear Algebra essentials

- Linear Algebra provides a compact and efficient way to represent data, model parameters, and computations such as transformations, projections, and optimizations.
- Almost every ML algorithm is based on operations involving vectors, matrices, or tensors.

## Vector and Matrices

- A **vector** is an ordered list of numbers that represents a quantity with both _magnitude_ and _direction_. In Machine Learning, a vector usually represents a **data point** or a set of **model parameters**. For example:
    - A single house with feature `[size, number_of_rooms, price]` can be written as a vector: 
    
        \begin{aligned}
        \mathbf{x} = [120, 3, 240000]^T \in \mathbb{R}^3
        \end{aligned}

    - A linear model's weights (e.g., regression coefficients) as another vector: 
    
        \begin{aligned}
        \mathbf{w} = [w_1, w_2, w_3]^T
        \end{aligned}

---

- A **matrix** is a 2D array of numbers (rows and columns) used to represent multiple vectors together or a **linear transformation**. For example, in machine learning:
    - A **dataset** with $m$ samples and $n$ features is represented as  
        
        \begin{aligned}
        \mathbf{X} = 
        \begin{bmatrix}
        x_{11} & x_{12} & \dots & x_{1n} \\
        x_{21} & x_{22} & \dots & x_{2n} \\
        \vdots & \vdots & \ddots & \vdots \\
        x_{m1} & x_{m2} & \dots & x_{mn}
        \end{bmatrix}
        \in \mathbb{R}^{m \times n}
        \end{aligned}
    
        Each **row** represents a data sample, and each **column** represents a feature.

---

- **Matrix–Vector Product**: When multiplying a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ by a vector $\mathbf{x} \in \mathbb{R}^{n}$:

\begin{aligned}
\mathbf{y} = \mathbf{A}\mathbf{x}
\end{aligned}

The result $\mathbf{y} \in \mathbb{R}^{m}$ is another vector, where each element $y_i$ is the **dot product** of the $i^{th}$ row of $\mathbf{A}$ with the vector $\mathbf{x}$:

\begin{aligned}
y_1 &= a_{11}x_1 + a_{12}x_2 + \dots + a_{1n}x_n \\
y_2 &= a_{21}x_1 + a_{22}x_2 + \dots + a_{2n}x_n \\
&\;\vdots \\
y_m &= a_{m1}x_1 + a_{m2}x_2 + \dots + a_{mn}x_n
\end{aligned}

or in compact notation:

\begin{aligned}
y_i = \sum_{j=1}^{n} a_{ij} x_j, \quad \text{for } i = 1, 2, \dots, m
\end{aligned}

This means each output element $y_i$ is computed as the **weighted sum** of the elements of $\mathbf{x}$, using the $i^{th}$ row of $\mathbf{A}$ as weights.

For example:

\begin{aligned}
\mathbf{A} = 
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix},
\quad
\mathbf{x} = 
\begin{bmatrix}
5 \\
6
\end{bmatrix}
\end{aligned}

then

\begin{aligned}
\mathbf{y} = \mathbf{A}\mathbf{x} =
\begin{bmatrix}
1\times5 + 2\times6 \\
3\times5 + 4\times6
\end{bmatrix}
=
\begin{bmatrix}
17 \\
39
\end{bmatrix}
\end{aligned}

In [37]:
import numpy as np

# Vector 3x1
x = np.array([5, 6])

# Matrix 3x2
A = np.array([[1, 2], [3, 4]])

print(f"x\n{x}")
print(f"x shape: {x.shape}")
print(f"\nA\n{A}")
print(f"A shape: {A.shape}")
print(f"\nMatrix-vector product A @ x[:2]:", A @ x[:2])

x
[5 6]
x shape: (2,)

A
[[1 2]
 [3 4]]
A shape: (2, 2)

Matrix-vector product A @ x[:2]: [17 39]
