In [None]:
'''
 * Copyright (c) 2016 Radhamadhab Dalai
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
'''

![image.png](attachment:image.png)

## Foundations and Four Pillars of Machine Learning

### Fig.1: The Foundations and Four Pillars of Machine Learning

Machine learning is built upon a solid mathematical foundation, often represented by four key pillars:

- **Dimensionality Reduction**
- **Regression**
- **Classification**
- **Density Estimation**

These pillars rely on several mathematical disciplines, including:

- **Vector Calculus**
- **Probability & Distributions**
- **Optimization**
- **Linear Algebra**
- **Analytic Geometry**
- **Matrix Decomposition**

The structure of this book bridges mathematical concepts with machine learning algorithms. Readers can approach the material in multiple ways, such as:

1. **Bottom-up**: Building foundational mathematical skills before tackling complex concepts.
2. **Top-down**: Selecting topics based on machine learning applications.

Most readers blend these approaches for effective learning.

---

## Part I: Mathematics

The four pillars of machine learning (illustrated in Figure 1.1) require a robust mathematical foundation, which we establish in Part I.

### Linear Algebra
We represent numerical data as vectors and organize tables of such data as matrices. The study of vectors and matrices is known as **linear algebra**, introduced in Chapter 2. A vector can be expressed as:

$$
\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}
$$

A matrix, representing a collection of vectors, is written as:

$$
\mathbf{A} = \begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{bmatrix}
$$

### Similarity and Analytic Geometry
Given two vectors, say \(\mathbf{u}\) and \(\mathbf{v}\), representing real-world objects, we aim to quantify their similarity. Similar vectors should yield similar outputs from a machine learning predictor. To formalize this, we define operations like the dot product:

$$
\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n
$$

This concept of similarity, along with distances, is central to **analytic geometry**, covered in Chapter 3.

### Matrix Decomposition
In Chapter 4, we explore matrices and **matrix decomposition**. Operations on matrices, such as eigenvalue decomposition or singular value decomposition (SVD), enhance machine learning by providing intuitive data interpretations and improving computational efficiency. For a matrix $\mathbf{A}$, SVD is expressed as:

$$
\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T
$$

where $\mathbf{U}$ and $\mathbf{V}$ are orthogonal matrices, and $\mathbf{\Sigma}$ is a diagonal matrix of singular values.

### Noise and Probability
Data often consists of noisy observations masking an underlying signal. Machine learning seeks to extract this signal from the noise. To quantify "noise," we rely on **probability and distributions**, providing a language to model uncertainty and variability in data.



## Linear Algebra

When formalizing intuitive concepts, a common approach is to construct a set of objects (symbols) and a set of rules to manipulate these objects. This is known as an **algebra**. **Linear algebra** is the study of vectors and specific rules to manipulate them. The vectors familiar from school are often called "geometric vectors," typically denoted with a small arrow above the letter, e.g., $\overrightarrow{x}$ and $\overrightarrow{y}$. In this book, we explore more general vector concepts and denote them with bold letters, e.g., $\mathbf{x}$ and $\mathbf{y}$.

In general, vectors are special objects that can be added together and multiplied by scalars to produce another object of the same kind. Mathematically, any object satisfying these two properties can be considered a vector. Here are some examples:



## Examples of Vector Objects

1. **Geometric Vectors**  
   Geometric vectors, familiar from high school mathematics and physics, are directed segments (see Fig 2.1(a)). They can be drawn in two or three dimensions. Two geometric vectors $\overrightarrow{x}$ and $\overrightarrow{y}$ can be added:

   $$
   \overrightarrow{x} + \overrightarrow{y} = \overrightarrow{z}
   $$

   where $\overrightarrow{z}$ is another geometric vector. Multiplication by a scalar $\lambda \in \mathbb{R}$, e.g., $\lambda \overrightarrow{x}$, yields a scaled version of the original vector, also a geometric vector. This satisfies the vector properties introduced earlier. Interpreting vectors as geometric vectors leverages our intuition about direction and magnitude to reason about mathematical operations.

2. **Polynomials**  
   Polynomials are also vectors (see Figure 2.1(b)). Two polynomials can be added together, resulting in another polynomial, and multiplication by a scalar $\lambda \in \mathbb{R}$ yields another polynomial. For example, consider two polynomials:

   $$
   p(x) = 2x^2 + 3x + 1 \quad \text{and} \quad q(x) = x^2 - 2x + 4
   $$

   Their sum $p(x) + q(x) = 3x^2 + x + 5$ is a polynomial, and scaling $p(x)$ by $\lambda$ gives $\lambda p(x) = 2\lambda x^2 + 3\lambda x + \lambda$, also a polynomial. Thus, polynomials are unusual but valid instances of vectors. Unlike geometric vectors (concrete "drawings"), polynomials are abstract, yet they share the same vector properties.

3. **Audio Signals**  
   Audio signals, represented as sequences of numbers, are vectors. Adding two audio signals produces a new audio signal, and scaling an audio signal by a scalar $\lambda \in \mathbb{R}$ results in another audio signal. This satisfies the vector definition.

4. **Elements of $\mathbb{R}^n$**  
   Tuples of $n$ real numbers, denoted as elements of $\mathbb{R}^n$, are vectors. For example:

   $$
   \mathbf{a} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \in \mathbb{R}^3
   $$

   Adding two vectors $\mathbf{a}, \mathbf{b} \in \mathbb{R}^n$ component-wise yields another vector:

   $$
   \mathbf{a} + \mathbf{b} = \mathbf{c} \in \mathbb{R}^n
   $$

   Multiplying $\mathbf{a}$ by a scalar $\lambda \in \mathbb{R}$ gives:

   $$
   \lambda \mathbf{a} \in \mathbb{R}^n
   $$

   This representation aligns with arrays of real numbers in programming languages, facilitating algorithm implementation.

---

## Focus of Linear Algebra

Linear algebra emphasizes the similarities across these vector concepts: they can be added and scaled. We primarily focus on vectors in $\mathbb{R}^n$, as most linear algebra algorithms are formulated in this space. In Chapter 8, we will see that data is often represented as vectors in $\mathbb{R}^n$. This book concentrates on finite-dimensional vector spaces, where there is a one-to-one correspondence between any vector type and $\mathbb{R}^n$. When useful, we draw on geometric vector intuitions and array-based algorithms.

---

## Closure and Vector Spaces

A key mathematical idea is **closure**: What is the set of all objects resulting from proposed operations? For vectors, this question becomes: What is the set of vectors generated by starting with a small set, adding them, and scaling them? This leads to a **vector space** (Section 2.4). For example, starting with vectors $\mathbf{v}_1, \mathbf{v}_2 \in \mathbb{R}^n$, all linear combinations:

$$
\mathbf{w} = \alpha \mathbf{v}_1 + \beta \mathbf{v}_2, \quad \alpha, \beta \in \mathbb{R}
$$

form a vector space. This concept underpins much of machine learning.

---

## Figures
![image.png](attachment:image.png)

Different types of vectors. Vectors can be surprising objects, including (a) geometric vectors and (b) polynomials.

### Figure 2.1: Different Types of Vectors
- **(a) Geometric Vectors**: Directed segments in 2D or 3D space.  
- **(b) Polynomials**: Abstract functions as vectors.

---

## Summary
The concepts introduced here are summarized in Figure 2.2. This chapter draws from lecture notes and books by Drumm and Weil (2001), Strang (2003), Hogben (2013), Liesen and Mehrmann (2015), and Pavel Grinfeld’s Linear Algebra series.

*This material is published by Cambridge University Press as Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong (2020). This version is free to view and download for personal use only. Not for re-distribution, re-sale, or use in derivative works.*

![image-2.png](attachment:image-2.png)

A mind map of the concepts introduced in this chapter, along with where they are used in other parts of the book.
# Vector Concepts and Linear Algebra

## Figure 2.2: Mind Map of Concepts

Figure 2.2 provides a mind map of the concepts introduced in this chapter and their connections to other parts of the book:

- **Closure Property**: Leads to the concept of a vector space.
- **Vector Space**: Includes properties like linear independence and basis.
  - **Linear Independence**: A maximal set of independent vectors.
  - **Basis**: Solves systems of linear equations.
- **System of Linear Equations**: Solved by methods like Gaussian elimination and matrix inverse.
- **Linear/Affine Mapping**: Connects to matrix operations.
- **Abelian Group with +**: A property of vector addition.
- **Matrix Representation**: Used across chapters.

### Connections to Other Chapters:
- **Chapter 3**: Analytic geometry.
- **Chapter 5**: Vector calculus.
- **Chapter 10**: Dimensionality reduction (e.g., PCA).
- **Chapter 12**: Classification.

Recommended resources include Gilbert Strang’s Linear Algebra course at MIT and the Linear Algebra Series by 3Blue1Brown.

Linear algebra is fundamental to machine learning and mathematics. This chapter’s concepts extend to:
- Geometry (Chapter 3).
- Vector calculus (Chapter 5), requiring matrix operations.
- Dimensionality reduction (Chapter 10) via projections (Section 3.8).
- Linear regression (Chapter 9), solving least-squares problems.

---

## 2.1 Systems of Linear Equations

Systems of linear equations are central to linear algebra, providing tools to solve many problems.

### Example 2.1: Production Planning
A company produces products $N_1, \ldots, N_n$, requiring resources $R_1, \ldots, R_m$. To produce one unit of product $N_j$, \(a_{ij}$ units of resource $R_i$ are needed $(i = 1, \ldots, m$, $j = 1, \ldots, n$). Given $b_i$ units of resource $R_i$, find an optimal production plan $(x_1, \ldots, x_n) \in \mathbb{R}^n$ such that no resources are wasted.

The total units of resource $R_i$ needed are:

$$
a_{i1} x_1 + \cdots + a_{in} x_n
$$

The system of equations is:

$$
\begin{align}
a_{11} x_1 + \cdots + a_{1n} x_n &= b_1 \\
&\vdots \\
a_{m1} x_1 + \cdots + a_{mn} x_n &= b_m
\end{align}
$$

where $a_{ij}, b_i \in \mathbb{R}$, and $x_1, \ldots, x_n$ are unknowns. Every $n$-tuple $(x_1, \ldots, x_n) \in \mathbb{R}^n$ satisfying this system is a solution.

### Example 2.2: Solving Systems
Consider:

$$
\begin{align}
x_1 + x_2 + x_3 &= 3 \quad (1) \\
x_1 - x_2 + 2x_3 &= 2 \quad (2) \\
2x_1 + 3x_3 &= 1 \quad (3)
\end{align}
$$

Adding (1) and (2) gives $2x_1 + 3x_3 = 5$, contradicting (3). Thus, **no solution** exists.

Now consider:

$$
\begin{align}
x_1 + x_2 + x_3 &= 3 \quad (1) \\
x_1 - x_2 + 2x_3 &= 2 \quad (2) \\
x_2 + x_3 &= 2 \quad (3)
\end{align}
$$

From (1) and (3), $x_1 = 1$. From (1) + (2), $2x_1 + 3x_3 = 5$, so $x_3 = 1$. Then (3) gives \(x_2 = 1\). The unique solution is $(1, 1, 1)$.

Finally, consider:

$$
\begin{align}
x_1 + x_2 + x_3 &= 3 \quad (1) \\
x_1 - x_2 + 2x_3 &= 2 \quad (2) \\
2x_1 + 3x_3 &= 5 \quad (3)
\end{align}
$$

Since (1) + (2) = (3), (3) is redundant. Solving (1) and (2):
- $2x_1 = 5 - 3x_3$
- $2x_2 = 1 + x_3$

Let $x_3 = a \in \mathbb{R}$ (free variable). The solution set is:

$$
\left( \frac{5 - 3a}{2}, \frac{1 + a}{2}, a \right), \quad a \in \mathbb{R}
$$

This has **infinitely many solutions**.

In general, real-valued systems yield **no, one, or infinitely many solutions**. Linear regression (Chapter 9) addresses cases like Example 2.1 when exact solutions are impossible.

---

## Geometric Interpretation

For two variables $x_1, x_2$, each equation defines a line in the $x_1 x_2$-plane. The solution is their intersection:
- A point (unique solution).
- A line (redundant equations).
- Empty (parallel lines).

### Figure 2.3: Example
Consider:

$$
\begin{align}
4x_1 + 4x_2 &= 5 \\
2x_1 - 4x_2 &= 1
\end{align}
$$

The solution is the point $(x_1, x_2) = (1, \frac{1}{4})$.

For three variables, each equation defines a plane in 3D space. The intersection can be a plane, line, point, or empty set.

---

## Matrix Notation

For a systematic approach, we use matrices. The system from Example 2.1:

$$
\begin{align}
a_{11} x_1 + \cdots + a_{1n} x_n &= b_1 \\
&\vdots \\
a_{m1} x_1 + \cdots + a_{mn} x_n &= b_m
\end{align}
$$

can be written as:

$$
\begin{bmatrix}
a_{11} \\
\vdots \\
a_{m1}
\end{bmatrix} x_1 + \cdots + \begin{bmatrix}
a_{1n} \\
\vdots \\
a_{mn}
\end{bmatrix} x_n = \begin{bmatrix}
b_1 \\
\vdots \\
b_m
\end{bmatrix}
$$

or compactly:

$$
\begin{bmatrix}
a_{11} & \cdots & a_{1n} \\
\vdots & \ddots & \vdots \\
a_{m1} & \cdots & a_{mn}
\end{bmatrix}
\begin{bmatrix}
x_1 \\
\vdots \\
x_n
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
\vdots \\
b_m
\end{bmatrix}
$$

We will explore matrix operations and solving such systems in Section 2.3.

Below is a Python implementation in a Jupyter notebook-style Markdown format that demonstrates key concepts from the "Systems of Linear Equations" section (Section 2.1). The code uses NumPy to solve the example systems of linear equations provided in the text and includes comments linking back to the mathematical concepts. I'll focus on solving the three systems from Example 2.2 and visualizing the geometric interpretation for the two-variable system from Figure 2.3.


## Python Implementation of Linear Algebra Concepts

This notebook implements the systems of linear equations from Section 2.1 using Python and NumPy. We solve the examples from Example 2.2 and visualize the geometric interpretation from Figure 2.3.

## Setup: Import Libraries

```python
import numpy as np
import matplotlib.pyplot as plt
```

---

## Example 2.2: Solving Systems of Linear Equations

### System 1: No Solution
The system:
$$
\begin{align}
x_1 + x_2 + x_3 &= 3 \\
x_1 - x_2 + 2x_3 &= 2 \\
2x_1 + 3x_3 &= 1
\end{align}
$$

```python
# Coefficient matrix A and constant vector b
A1 = np.array([[1, 1, 1],
               [1, -1, 2],
               [2, 0, 3]])
b1 = np.array([3, 2, 1])

# Attempt to solve using numpy.linalg.solve
try:
    x1 = np.linalg.solve(A1, b1)
    print("Solution:", x1)
except np.linalg.LinAlgError:
    print("System 1 has no solution (singular matrix or inconsistent system).")
```

**Output**:  
```
System 1 has no solution (singular matrix or inconsistent system).
```

**Explanation**: As noted in the text, adding the first two equations gives $2x_1 + 3x_3 = 5$, which contradicts the third equation $(2x_1 + 3x_3 = 1$). NumPy detects this inconsistency.

---

### System 2: Unique Solution
The system:
$$
\begin{align}
x_1 + x_2 + x_3 &= 3 \\
x_1 - x_2 + 2x_3 &= 2 \\
x_2 + x_3 &= 2
\end{align}
$$

```python
# Coefficient matrix A and constant vector b
A2 = np.array([[1, 1, 1],
               [1, -1, 2],
               [0, 1, 1]])
b2 = np.array([3, 2, 2])

# Solve the system
x2 = np.linalg.solve(A2, b2)
print("Solution to System 2:", x2)

# Verify the solution
print("Verification (A2 @ x2):", A2 @ x2)
print("Expected (b2):", b2)
```

**Output**:  
```
Solution to System 2: [1. 1. 1.]
Verification (A2 @ x2): [3. 2. 2.]
Expected (b2): [3 2 2]
```

**Explanation**: The solution \((1, 1, 1)\) matches the text’s derivation: \(x_1 = 1\), \(x_3 = 1\), \(x_2 = 1\).

---

### System 3: Infinitely Many Solutions
The system:
$$
\begin{align}
x_1 + x_2 + x_3 &= 3 \\
x_1 - x_2 + 2x_3 &= 2 \\
2x_1 + 3x_3 &= 5
\end{align}
$$

Since the third equation is redundant (\( (1) + (2) = (3) \)), we solve the first two with \(x_3\) as a free variable.

```python
# Reduced system (first two equations)
A3 = np.array([[1, 1],
               [1, -1]])
b3 = lambda a: np.array([3 - a, 2 - 2*a])  # b adjusted for x3 = a

# Solve for a specific value of x3 (e.g., a = 1)
a = 1
x3_partial = np.linalg.solve(A3, b3(a))
print(f"Solution for x3 = {a}: x1 = {x3_partial[0]}, x2 = {x3_partial[1]}")

# General solution
print("General solution: x1 = (5 - 3a)/2, x2 = (1 + a)/2, x3 = a, where a ∈ ℝ")
```

**Output**:  
```
Solution for x3 = 1: x1 = 1.0, x2 = 1.0
General solution: x1 = (5 - 3a)/2, x2 = (1 + a)/2, x3 = a, where a ∈ ℝ
```

**Explanation**: The general solution matches the text: $\left( \frac{5 - 3a}{2}, \frac{1 + a}{2}, a \right)$.

---

## Geometric Interpretation: Figure 2.3
The system:
$$
\begin{align}
4x_1 + 4x_2 &= 5 \\
2x_1 - 4x_2 &= 1
\end{align}
$$

### Solve the System
```python
# Coefficient matrix A and constant vector b
A4 = np.array([[4, 4],
               [2, -4]])
b4 = np.array([5, 1])

# Solve
x4 = np.linalg.solve(A4, b4)
print("Solution to Figure 2.3 system:", x4)
```

**Output**:  
```
Solution to Figure 2.3 system: [1.   0.25]
```

### Visualize the Lines
```python
# Define the lines
x1 = np.linspace(0, 2, 100)
x2_line1 = (5 - 4*x1) / 4  # 4x1 + 4x2 = 5
x2_line2 = (2*x1 - 1) / 4  # 2x1 - 4x2 = 1

# Plot
plt.figure(figsize=(8, 6))
plt.plot(x1, x2_line1, label=r'$4x_1 + 4x_2 = 5$')
plt.plot(x1, x2_line2, label=r'$2x_1 - 4x_2 = 1$')
plt.plot(x4[0], x4[1], 'ro', label=f'Solution ({x4[0]}, {x4[1]})')
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.grid(True)
plt.xlabel(r'$x_1$')
plt.ylabel(r'$x_2$')
plt.legend()
plt.title('Geometric Interpretation: Intersection of Lines')
plt.show()
```

**Output**:  
A plot showing two lines intersecting at $(1, 0.25)$, confirming the solution.

---

## Matrix Notation
Rewrite the system from Example 2.1 in matrix form:
$$
\mathbf{A} \mathbf{x} = \mathbf{b}
$$

```python
# Example matrix form (using System 2 as an example)
A_example = A2
x_example = x2
b_example = b2

# Verify matrix multiplication
print("A @ x:", A_example @ x_example)
print("b:", b_example)
```

**Output**:  
```
A @ x: [3. 2. 2.]
b: [3 2 2]
```

**Explanation**: This demonstrates the compact notation $\mathbf{A} \mathbf{x} = \mathbf{b}$.

---

## Conclusion
This implementation covers:
- Solving systems with no, unique, or infinite solutions.
- Geometric visualization of a two-variable system.
- Matrix representation of linear equations.

NumPy’s `linalg.solve` handles unique solutions, while free variables require manual parameterization.