# Introduction

**Vectors as Geometric Objects in Analytic Geometry**

Understanding vectors in linear algebra goes beyond their coordinate representation. Vectors can be seen as arrows in space, having both magnitude (length) and direction. Here are some key ideas and examples:

1. **Vectors as Arrows:**  
   Think of a vector as an arrow with a certain length and direction. For example, the vector $(3,4)$ in $\mathbb{R}^2$ is visualized as an arrow starting at the origin and ending at the point $(3,4)$. Its length is calculated as $\sqrt{3^2+4^2} = 5$.

2. **Free Vectors vs. Bound Vectors:**  
   Vectors are considered "free," meaning that the vector $(3,4)$ represents the same displacement regardless of where it is placed in space. Only its magnitude and direction matter, not its exact position.

3. **Operations on Vectors:**  
   - **Addition:** You add vectors by placing the tail of one arrow at the head of another. For instance, adding $(3,4)$ and $(1,2)$ results in $(4,6)$, which is the diagonal of the parallelogram formed by the two arrows.  
   - **Scalar Multiplication:** Multiplying a vector by a scalar changes its magnitude but not its direction (unless the scalar is negative, which reverses the direction). For example, $2\cdot(3,4)$ gives $(6,8)$.

4. **Geometric Interpretations in Analytic Geometry:**  
   - A **line** can be represented as all points of the form $P = P_0 + t\,v$, where $P_0$ is a fixed point and $v$ is a direction vector.  
   - A **plane** in $\mathbb{R}^3$ can be described as $P = P_0 + s\,v + t\,w$, where $v$ and $w$ are independent direction vectors.

5. **Changing Basis and Invariant Properties:**  
   When you change the basis of a vector space, the coordinates of a vector change but its intrinsic properties, such as magnitude and direction, do not. For example, the vector $(3,4)$ always has a magnitude of 5 regardless of the basis.

6. **Example – Rotation:**  
   If you rotate the vector $(3,4)$ by $90^\circ$ counterclockwise, its new coordinates become $(-4,3)$ while the magnitude remains $5$. This shows that even though the coordinate representation changes, the length and the geometric direction (relative to the new orientation) are preserved.

**Summary:**  
- Vectors are visualized as arrows with a specific length and direction.
- Their geometric properties (magnitude, direction, and relationships like parallelism) remain invariant under a change of basis, even though their coordinates may change.
- Analytic geometry uses these ideas to solve problems by translating geometric operations (like rotation, translation, and scaling) into algebraic operations.


# Norms

A norm on a vector space is a function that assigns a non-negative real number to each vector, representing its "length" or "magnitude." Norms capture the idea of distance and size in the space.

**Properties of Norms:**  
A function $\|\cdot\|: V \to \mathbb{R}$ is a norm if, for all vectors $x,y\in V$ and any scalar $\alpha$, it satisfies:
1. **Positivity:** $\|x\|\ge 0$, and $\|x\|=0$ if and only if $x=0$.
2. **Homogeneity:** $\|\alpha x\| = |\alpha|\,\|x\|$.
3. **Triangle Inequality:** $\|x+y\| \le \|x\|+\|y\|$.

---

**Examples of Norms:**

1. **Euclidean Norm (2-norm):**  
   For $x=(x_1,x_2,\dots,x_n)\in\mathbb{R}^n$,  
   $$
   \|x\|_2 = \sqrt{x_1^2+x_2^2+\cdots+x_n^2}.
   $$
   *Example:* For $x=(3,4)$, $\|x\|_2 = \sqrt{3^2+4^2} = 5$.

2. **1-Norm (Manhattan Norm):**  
   $$
   \|x\|_1 = |x_1|+|x_2|+\cdots+|x_n|.
   $$
   *Example:* For $x=(3,-4)$, $\|x\|_1 = |3|+|-4| = 3+4 = 7$.

3. **Infinity Norm (Max Norm):**  
   $$
   \|x\|_\infty = \max\{|x_1|, |x_2|, \dots, |x_n|\}.
   $$
   *Example:* For $x=(3,-4,2)$, $\|x\|_\infty = 4$.

4. **Frobenius Norm (for matrices):**  
   For a matrix $A=[a_{ij}]$,  
   $$
   \|A\|_F = \sqrt{\sum_{i,j} |a_{ij}|^2}.
   $$

---

**Exercises:**

1. **Exercise 1:**  
   Compute the 2-norm and 1-norm for the vector $x=(1,-2,2)$.

2. **Exercise 2:**  
   Prove that for any vector $x\in\mathbb{R}^n$,  
   $$
   \|x\|_\infty \le \|x\|_2 \le \sqrt{n}\,\|x\|_\infty.
   $$

3. **Exercise 3:**  
   Let $x=(3,4)$ in $\mathbb{R}^2$. Verify the triangle inequality for the 2-norm by checking that  
   $$
   \|(3,4)+(1,-1)\|_2 \le \|(3,4)\|_2 + \|(1,-1)\|_2.
   $$

---

**Summary:**  
Norms provide a measure of vector "size" and are essential in various applications, including error analysis, optimization, and machine learning. They help us quantify distances and ensure that different methods for measuring vector length are consistent with geometric intuition

***Problem : Implement a Python function to compute the Lp norm of a vector.***

In [1]:
import numpy as np

def lp_norm(x, p):
    """
    Compute the Lp norm of a vector x.

    Parameters:
    x (list or numpy array): The input vector.
    p (int or float): The order of the norm.

    Returns:
    float: The Lp norm of x.
    """
    x = np.array(x)  # Convert input to numpy array
    if p == np.inf:
        return np.max(np.abs(x))  # L∞ norm
    else:
        return np.sum(np.abs(x) ** p) ** (1 / p)  # Lp norm

# Example usage
x = [3, -4, 5]
print("L1 norm:", lp_norm(x, 1))          # L1 norm
print("L2 norm:", lp_norm(x, 2))          # L2 norm
print("L∞ norm:", lp_norm(x, np.inf))     # L∞ norm

L1 norm: 12.0
L2 norm: 7.0710678118654755
L∞ norm: 5


# Inner Products


The **inner product** of two vectors $ \mathbf{u} $ and $ \mathbf{v} $ in an $ n $-dimensional real space is defined as:

$$
\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^{n} u_i v_i
$$

For example, if:

$$
\mathbf{u} =
\begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix}, \quad
\mathbf{v} =
\begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}
$$

Then, the inner product is:

$$
\langle \mathbf{u}, \mathbf{v} \rangle = (2 \times 1) + (3 \times -1) + (4 \times 2) = 2 - 3 + 8 = 7
$$

---

**Inner Product Properties**
1. **Commutativity:** $ \langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle $
2. **Linearity:** $ \langle a\mathbf{u} + b\mathbf{v}, \mathbf{w} \rangle = a\langle \mathbf{u}, \mathbf{w} \rangle + b\langle \mathbf{v}, \mathbf{w} \rangle $
3. **Positivity:** $ \langle \mathbf{u}, \mathbf{u} \rangle \geq 0 $ and $ \langle \mathbf{u}, \mathbf{u} \rangle = 0 $ if and only if $ \mathbf{u} = 0 $

## Dot Product

The **dot product** of two vectors $ \mathbf{u} $ and $ \mathbf{v} $ in $ \mathbb{R}^n $ is defined as:

$$
\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i
$$

For example, if:

$$
\mathbf{u} =
\begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}, \quad
\mathbf{v} =
\begin{bmatrix} 4 \\ -1 \\ 2 \end{bmatrix}
$$

Then, the dot product is:

$$
\mathbf{u} \cdot \mathbf{v} = (3 \times 4) + (2 \times -1) + (1 \times 2) = 12 - 2 + 2 = 12
$$

---

**Properties of the Dot Product:**
1. **Commutative:** $ \mathbf{u} \cdot \mathbf{v} = \mathbf{v} \cdot \mathbf{u} $
2. **Distributive:** $ \mathbf{u} \cdot (\mathbf{v} + \mathbf{w}) = \mathbf{u} \cdot \mathbf{v} + \mathbf{u} \cdot \mathbf{w} $
3. **Scalar Multiplication:** $ (c \mathbf{u}) \cdot \mathbf{v} = c (\mathbf{u} \cdot \mathbf{v}) $
4. **Zero Vector:** If $ \mathbf{u} \cdot \mathbf{v} = 0 $, then $ \mathbf{u} $ and $ \mathbf{v} $ are **orthogonal** (perpendicular).


In [2]:
a = np.array([3, 2, 1])
b = np.array([4, -1, 2])

np.dot(a, b)

np.int64(12)

## Genaral Inner Products


An **inner product** is a function that takes two vectors and produces a scalar, satisfying certain properties:

1. **Conjugate Symmetry**:  
   $$ \langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle} $$  
   (For real-valued inner products, this reduces to **symmetry**)

2. **Linearity in the First Argument**:  
   $$ \langle c\mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = c\langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle $$  

3. **Positive-Definiteness**:  
   $$ \langle \mathbf{v}, \mathbf{v} \rangle \geq 0 $$  
   and  
   $$ \langle \mathbf{v}, \mathbf{v} \rangle = 0 \text{ if and only if } \mathbf{v} = 0 $$  

---

**Examples of Inner Products**  

1. **Standard Dot Product in $ \mathbb{R}^n $**  
   $$ \langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^{n} u_i v_i $$  
   This is the standard Euclidean inner product.

2. **Complex Inner Product in $ \mathbb{C}^n $**  
   $$ \langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^{n} u_i \overline{v_i} $$  
   This includes **complex conjugation** for vectors in complex space.

3. **Integral Inner Product (Functional Analysis)**  
   $$ \langle f, g \rangle = \int_a^b f(x) g(x) \,dx $$  
   Used in function spaces (e.g., Fourier series, Hilbert spaces).


## Symmetric, Positive Definite Matrices


A **symmetric matrix** is a square matrix that is equal to its transpose:  
$$ A = A^T $$  

A **positive definite matrix** is a symmetric matrix where all its eigenvalues are positive, meaning:  
$$ \mathbf{x}^T A \mathbf{x} > 0 \quad \text{for all nonzero } \mathbf{x} \in \mathbb{R}^n $$  

---

**Properties of Symmetric Matrices**  
1. **Real Eigenvalues**: All eigenvalues of a symmetric matrix are real.  
2. **Orthogonal Eigenvectors**: Eigenvectors corresponding to distinct eigenvalues are orthogonal.  
3. **Spectral Theorem**: Any symmetric matrix can be diagonalized by an orthogonal matrix $ Q $:  
   $$ A = Q \Lambda Q^T $$  
   where $ \Lambda $ is a diagonal matrix of eigenvalues.

---

**Properties of Positive Definite Matrices**  
1. **All Principal Minors are Positive**: Leading principal minors (determinants of upper-left $ k \times k $ submatrices) must be positive.  
2. **Cholesky Decomposition**: Any positive definite matrix can be decomposed as:  
   $$ A = LL^T $$  
   where $ L $ is a lower triangular matrix.  
3. **Positive Eigenvalues**: All eigenvalues of a positive definite matrix are strictly greater than zero.  

---

**Example of a Symmetric, Positive Definite Matrix**  
Consider:  
$$ A = \begin{bmatrix} 4 & 2 \\ 2 & 3 \end{bmatrix} $$  

- It is symmetric since $ A^T = A $.  
- Compute eigenvalues by solving:  
  $$ \det(A - \lambda I) = 0 $$  
  $$ \begin{vmatrix} 4-\lambda & 2 \\ 2 & 3-\lambda \end{vmatrix} = 0 $$  
  Expanding:  
  $ (4 - \lambda)(3 - \lambda) - (2)(2) = 0 $
  $ 12 - 4\lambda - 3\lambda + \lambda^2 - 4 = 0 $  
  $ \lambda^2 - 7\lambda + 8 = 0 $  
  $ (\lambda - 4)(\lambda - 3) = 0 $  
  The eigenvalues are $ \lambda_1 = 4, \lambda_2 = 3 $, which are both positive, confirming that $ A $ is positive definite.


# Lengths and Distances
 

The **length (or norm)** of a vector measures its magnitude. The **distance** between two vectors quantifies how far apart they are in a vector space.

---

**Vector Norm (Length of a Vector)**  
The **norm** (or length) of a vector $ \mathbf{v} \in \mathbb{R}^n $ is given by:  
$$ \|\mathbf{v}\| = \sqrt{\mathbf{v} \cdot \mathbf{v}} = \sqrt{\sum_{i=1}^{n} v_i^2} $$  

For a vector $ \mathbf{v} = (v_1, v_2, ..., v_n) $, the norm is:  
$$ \|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + ... + v_n^2} $$  

**Example:**  
For $ \mathbf{v} = (3, 4) $ in $ \mathbb{R}^2 $:  
$$ \|\mathbf{v}\| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5 $$  

---

**Distance Between Two Vectors**  
The **Euclidean distance** between two vectors $ \mathbf{u} $ and $ \mathbf{v} $ in $ \mathbb{R}^n $ is given by:  
$$ d(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\| = \sqrt{\sum_{i=1}^{n} (u_i - v_i)^2} $$  

**Example:**  
Let $ \mathbf{u} = (1,2) $ and $ \mathbf{v} = (4,6) $ in $ \mathbb{R}^2 $. The distance is:  
$$ d(\mathbf{u}, \mathbf{v}) = \sqrt{(4-1)^2 + (6-2)^2} = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = 5 $$  

---

**Norm Properties:**  
1. **Non-negativity:** $ \|\mathbf{v}\| \geq 0 $, and $ \|\mathbf{v}\| = 0 $ if and only if $ \mathbf{v} = \mathbf{0} $.  
2. **Scaling:** $ \|\alpha \mathbf{v}\| = |\alpha| \|\mathbf{v}\| $ for any scalar $ \alpha $.  
3. **Triangle Inequality:** $ \|\mathbf{u} + \mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\| $.  
4. **Pythagorean Theorem:** If $ \mathbf{u} $ and $ \mathbf{v} $ are orthogonal, then:  
   $$ \|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2 $$  

---

**Other Norms (Different Ways to Measure Length):**  
1. **$ p $-Norm (Generalized Norm):**  
   $$ \|\mathbf{v}\|_p = \left( \sum_{i=1}^{n} |v_i|^p \right)^{\frac{1}{p}} $$  
   - **Manhattan Norm ($ p = 1 $):** $ \|\mathbf{v}\|_1 = \sum_{i=1}^{n} |v_i| $  
   - **Euclidean Norm ($ p = 2 $):** $ \|\mathbf{v}\|_2 = \sqrt{\sum_{i=1}^{n} v_i^2} $  
   - **Infinity Norm ($ p \to \infty $):** $ \|\mathbf{v}\|_{\infty} = \max |v_i| $  

**Example:**  
For $ \mathbf{v} = (3, -4) $:  
- **Manhattan norm:** $ \|\mathbf{v}\|_1 = |3| + |-4| = 7 $  
- **Euclidean norm:** $ \|\mathbf{v}\|_2 = \sqrt{3^2 + (-4)^2} = 5 $  
- **Infinity norm:** $ \|\mathbf{v}\|_{\infty} = \max(|3|, |-4|) = 4 $  


In [3]:
import numpy as np
v = np.array([3, 4])

n = np.linalg.norm(v)
print(f'The Euclidean norm of vector v is: {n}')

The Euclidean norm of vector v is: 5.0


In [4]:
import numpy as np
v = np.array([1, 2])
u = np.array([4, 6])

de = np.linalg.norm(u-v)
print(f'The Euclidean distance between u and v is: {de}')

The Euclidean distance between u and v is: 5.0


In [5]:
# Manhattan distance
import numpy as np
v = np.array([1, 2])
u = np.array([4, 6])

dm = np.sum(np.abs(u-v))
print(f'The Manhatten distance between u and v is: {dm}')

The Manhatten distance between u and v is: 7


# Angles and Orthogonality


Understanding angles and orthogonality is crucial in linear algebra, geometry, and machine learning. The angle between vectors helps determine their directional similarity, while orthogonality plays a key role in projections, feature engineering, and optimization.

---

**Angle Between Two Vectors**    

The angle $ \theta $ between two nonzero vectors $ \mathbf{u}, \mathbf{v} \in \mathbb{R}^n $ is given by the **cosine similarity formula**:  

$$ \cos \theta = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} $$  

where:
- $ \mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i $ (dot product)
- $ \|\mathbf{u}\| $ and $ \|\mathbf{v}\| $ are the Euclidean norms (magnitudes) of the vectors.

**Example:**  
Let $ \mathbf{u} = (1, 2) $ and $ \mathbf{v} = (3, 4) $, then:  

1. Compute the dot product:  
   $$ \mathbf{u} \cdot \mathbf{v} = (1)(3) + (2)(4) = 3 + 8 = 11 $$  

2. Compute the magnitudes:  
   $$ \|\mathbf{u}\| = \sqrt{1^2 + 2^2} = \sqrt{1 + 4} = \sqrt{5} $$  
   $$ \|\mathbf{v}\| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5 $$  

3. Compute $ \cos \theta $:  
   $$ \cos \theta = \frac{11}{\sqrt{5} \times 5} = \frac{11}{5\sqrt{5}} = \frac{11\sqrt{5}}{25} $$  

4. Find $ \theta $ using inverse cosine:  
   $$ \theta = \cos^{-1} \left( \frac{11\sqrt{5}}{25} \right) $$  

**Geometric Interpretation**    
- If $ \theta = 0^\circ $, the vectors point in the same direction.  
- If $ \theta = 90^\circ $, the vectors are perpendicular (orthogonal).  
- If $ \theta = 180^\circ $, the vectors point in opposite directions.  

**Machine Learning Application:**  
- **Cosine Similarity:** Used in text analysis, recommendation systems, and clustering to measure the similarity between feature vectors.  
- **Feature Engineering:** Helps decide whether two features are highly correlated.  

---

**Orthogonality (Perpendicular Vectors)**    

Two vectors $ \mathbf{u} $ and $ \mathbf{v} $ are **orthogonal** if their dot product is zero:  

$$ \mathbf{u} \cdot \mathbf{v} = 0 $$  

**Example:**  
Consider $ \mathbf{u} = (1, -2) $ and $ \mathbf{v} = (2, 1) $:  
$$ \mathbf{u} \cdot \mathbf{v} = (1)(2) + (-2)(1) = 2 - 2 = 0 $$  
Since the dot product is zero, the vectors are orthogonal.

**Geometric Interpretation**    
- Orthogonal vectors form a **right angle** (90°).  
- In higher dimensions, orthogonal vectors define perpendicular directions in a vector space.  

**Machine Learning Application:**  
- **Principal Component Analysis (PCA):** Finds orthogonal directions of maximum variance to reduce dimensionality.  
- **Regularization (Ridge and Lasso Regression):** Encourages independent (orthogonal) feature representations.  
- **Neural Networks:** Orthogonal weight initialization helps stabilize training.  

---

**Orthonormal Vectors**    

Vectors are **orthonormal** if they are both **orthogonal** and have unit length:  

$$ \|\mathbf{u}\| = 1, \quad \|\mathbf{v}\| = 1, \quad \text{and} \quad \mathbf{u} \cdot \mathbf{v} = 0 $$  

**Example:**  
The standard basis vectors in $ \mathbb{R}^2 $ are:  
$$ \mathbf{e_1} = (1, 0), \quad \mathbf{e_2} = (0, 1) $$  
They satisfy:  
$$ \mathbf{e_1} \cdot \mathbf{e_2} = (1)(0) + (0)(1) = 0 $$  
$$ \|\mathbf{e_1}\| = \sqrt{1^2 + 0^2} = 1, \quad \|\mathbf{e_2}\| = \sqrt{0^2 + 1^2} = 1 $$  
Thus, they are **orthonormal**.

**Machine Learning Application:**    
- **Gram-Schmidt Process:** Converts a set of vectors into an orthonormal basis.  
- **Eigenvectors in PCA:** The principal components form an orthonormal set.  

---

**Orthogonal Matrices and Their Use in Linear Transformations**  

An **orthogonal matrix** is a square matrix $ Q \in \mathbb{R}^{n \times n} $ whose columns (or rows) form an **orthonormal basis**. This means the columns are mutually **orthogonal** and have unit length.

Mathematically, a matrix $ Q $ is **orthogonal** if:

$$ Q^T Q = Q Q^T = I $$  

where $ I $ is the identity matrix.

---

**Properties of Orthogonal Matrices**    

1. **Preserve Inner Products**  
   If $ Q $ is an orthogonal matrix and $ \mathbf{u}, \mathbf{v} $ are vectors, then:  

   $$ (Q\mathbf{u}) \cdot (Q\mathbf{v}) = \mathbf{u} \cdot \mathbf{v} $$  

   This means that applying $ Q $ does not change angles or lengths.

2. **Preserve Norms (Lengths)**  
   For any vector $ \mathbf{x} $:  

   $$ \| Q \mathbf{x} \| = \| \mathbf{x} \| $$  

3. **Determinant is ±1**  
   $$ \det(Q) = \pm 1 $$  
   - If $ \det(Q) = 1 $, $ Q $ represents a **rotation**.  
   - If $ \det(Q) = -1 $, $ Q $ represents a **reflection**.  

4. **Inverse is Transpose**  
   $$ Q^{-1} = Q^T $$  
   This simplifies matrix computations, especially in solving linear equations.

---

**Example: 2D Rotation Matrix**    

The standard **rotation matrix** in $ \mathbb{R}^2 $:

$$ Q = 
\begin{bmatrix}
\cos \theta & -\sin \theta \\
\sin \theta & \cos \theta
\end{bmatrix} $$  

- Rotates a vector by an angle $ \theta $.  
- Since $ Q^T Q = I $, it is **orthogonal**.

**Applying to a vector**  
Let $ \mathbf{x} = (1,0) $ and rotate it by $ 90^\circ $ ($ \theta = \frac{\pi}{2} $):

$$ Q = 
\begin{bmatrix}
\cos \frac{\pi}{2} & -\sin \frac{\pi}{2} \\
\sin \frac{\pi}{2} & \cos \frac{\pi}{2}
\end{bmatrix} =
\begin{bmatrix}
0 & -1 \\
1 & 0
\end{bmatrix} $$  

Multiplying:

$$ Q \mathbf{x} =
\begin{bmatrix}
0 & -1 \\
1 & 0
\end{bmatrix} 
\begin{bmatrix}
1 \\
0
\end{bmatrix} =
\begin{bmatrix}
0 \\
1
\end{bmatrix} $$  

Thus, $ \mathbf{x} = (1,0) $ is rotated to $ (0,1) $.

---

**Example: Reflection Matrix**    

A reflection over the **x-axis** is given by:

$$ Q =
\begin{bmatrix}
1 & 0 \\
0 & -1
\end{bmatrix} $$  

Applying to $ \mathbf{x} = (3, 4) $:

$$ Q \mathbf{x} =
\begin{bmatrix}
1 & 0 \\
0 & -1
\end{bmatrix} 
\begin{bmatrix}
3 \\
4
\end{bmatrix} =
\begin{bmatrix}
3 \\
-4
\end{bmatrix} $$  

Thus, the point is reflected over the x-axis.

---

**Machine Learning Applications of Orthogonal Matrices**    

1. **Principal Component Analysis (PCA)**  
   - Finds an **orthogonal basis** that maximizes variance in data.  
   - Uses an **orthogonal transformation** to reduce dimensionality.  

2. **Neural Networks (Weight Initialization)**  
   - Orthogonal weight matrices help stabilize training in deep learning.  

3. **QR Decomposition**  
   - Factorizes a matrix into an **orthogonal matrix** ($ Q $) and an **upper triangular matrix** ($ R $).  
   - Used in least squares regression and solving linear equations.  

4. **Eigenvalue Decomposition**  
   - **Symmetric matrices** have **orthogonal eigenvectors**, making them easier to diagonalize.  



In [6]:
import numpy as np

u = np.array([1, 2])
v = np.array([3, 4])

A = np.array([[2, 1],
             [1, 2]])

env = np.linalg.norm(v)
mnu = np.sum(np.abs(u))

ed = np.linalg.norm(u-v)
md = np.sum(np.abs(u-v))

dp = u @ v
wp = u.T @ A @ v

wnv = np.sqrt(u.T @ A @ u)
wd = np.sqrt((u - v).T @ A @ (u - v))

print(f"The Euclidean norm of v: {env}")
print(f"The Manhattan norm of u: {mnu}")

print(f"\nThe Euclidean distance between u and v: {ed}")
print(f"The Manhattan distance between u and v: {md}")

print(f"\nThe dot product of u and v: {dp}")
print(f"The weighted inner product of u and v: {wp}")

print(f"\nThe wighted Euclidean norm of v is: {wnv}")
print(f"The wighted Euclidean distance of v is: {wd}")

The Euclidean norm of v: 5.0
The Manhattan norm of u: 3

The Euclidean distance between u and v: 2.8284271247461903
The Manhattan distance between u and v: 4

The dot product of u and v: 11
The weighted inner product of u and v: 32

The wighted Euclidean norm of v is: 3.7416573867739413
The wighted Euclidean distance of v is: 4.898979485566356


# Orthonormal Basis

An **orthonormal basis** of a vector space is a set of **orthonormal vectors** that spans the space.  

A set of vectors $ \{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n\} $ in $ \mathbb{R}^n $ is **orthonormal** if:  

1. Each vector has **unit length** (norm = 1):  
   $$ \|\mathbf{v}_i\| = 1 \quad \forall i $$  
2. Each pair of vectors is **orthogonal**:  
   $$ \mathbf{v}_i \cdot \mathbf{v}_j = 0 \quad \text{for } i \neq j $$  

If an orthonormal set of vectors spans a vector space, it forms an **orthonormal basis**.

---
**Example: Standard Basis in $ \mathbb{R}^3 $**  
The standard basis vectors in $ \mathbb{R}^3 $ are:

$$ \mathbf{e}_1 = (1,0,0), \quad \mathbf{e}_2 = (0,1,0), \quad \mathbf{e}_3 = (0,0,1) $$  

These satisfy:

- **Unit length**:  
  $$ \|\mathbf{e}_1\| = \|\mathbf{e}_2\| = \|\mathbf{e}_3\| = 1 $$  
- **Orthogonality**:  
  $$ \mathbf{e}_1 \cdot \mathbf{e}_2 = 0, \quad \mathbf{e}_1 \cdot \mathbf{e}_3 = 0, \quad \mathbf{e}_2 \cdot \mathbf{e}_3 = 0 $$  

Thus, $ \{\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3\} $ is an **orthonormal basis** for $ \mathbb{R}^3 $.

---

**Example: Orthonormal Basis from Any Basis (Gram-Schmidt Process)**  
If given a set of **linearly independent vectors**, we can convert them into an **orthonormal basis** using the **Gram-Schmidt process**.

Given two linearly independent vectors:

$$ \mathbf{v}_1 = (3,1), \quad \mathbf{v}_2 = (2,2) $$  

Step 1: Normalize the first vector:  

$$ \mathbf{u}_1 = \frac{\mathbf{v}_1}{\|\mathbf{v}_1\|} = \frac{(3,1)}{\sqrt{3^2 + 1^2}} = \frac{(3,1)}{\sqrt{10}} $$  

Step 2: Remove the projection of $ \mathbf{v}_2 $ onto $ \mathbf{u}_1 $:  

$$ \mathbf{v}_2' = \mathbf{v}_2 - \text{proj}_{\mathbf{u}_1} \mathbf{v}_2 $$  
$$ \mathbf{v}_2' = (2,2) - \frac{(2,2) \cdot (3,1)}{\|\mathbf{u}_1\|^2} \mathbf{u}_1 $$  

Step 3: Normalize $ \mathbf{v}_2' $ to get $ \mathbf{u}_2 $.

This process ensures the new vectors are **orthogonal and unit-length**, forming an **orthonormal basis**.

---

**Machine Learning Applications of Orthonormal Bases**  
1. **Principal Component Analysis (PCA)**  
   - Uses an orthonormal basis to represent data in lower dimensions.
  
2. **Fourier Transforms & Signal Processing**  
   - Uses an **orthonormal basis** (Fourier basis) to represent signals efficiently.

3. **Quantum Computing**  
   - States are represented in an **orthonormal basis** of a Hilbert space.

4. **Orthogonal Weight Initialization in Neural Networks**  
   - Helps stabilize training and prevent gradient issues.

Using an **orthonormal basis** simplifies calculations, improves numerical stability, and provides efficient ways to represent data in various domains.


# Orthogonal Complement
 

The **orthogonal complement** of a subspace $ W $ in a vector space $ V $ (with an inner product) is the set of all vectors in $ V $ that are **orthogonal** to every vector in $ W $.  

Mathematically, the **orthogonal complement** of $ W $, denoted as $ W^\perp $, is defined as:  

$$ W^\perp = \{ \mathbf{v} \in V \ | \ \mathbf{v} \cdot \mathbf{w} = 0, \quad \forall \mathbf{w} \in W \} $$  

This means that every vector in $ W^\perp $ is **perpendicular** to every vector in $ W $.

---

**Example 1: Orthogonal Complement in $ \mathbb{R}^3 $**
Consider the subspace $ W $ of $ \mathbb{R}^3 $ spanned by the vector:

$$ \mathbf{w} = (1,2,3) $$  

To find the **orthogonal complement** $ W^\perp $, we look for all vectors $ \mathbf{v} = (x,y,z) $ that satisfy:

$$ \mathbf{w} \cdot \mathbf{v} = 0 $$  
$$ (1,2,3) \cdot (x,y,z) = 0 $$  
$$ 1x + 2y + 3z = 0 $$  

This defines a plane in $ \mathbb{R}^3 $, meaning the **orthogonal complement** of a **1D subspace** is a **2D plane**.

---

**Example 2: Orthogonal Complement in $ \mathbb{R}^2 $**
Let $ W $ be the **x-axis** in $ \mathbb{R}^2 $, i.e.,  

$$ W = \text{span} \{(1,0)\} $$  

A vector $ (x,y) $ is in the orthogonal complement $ W^\perp $ if:

$$ (1,0) \cdot (x,y) = 0 $$  
$$ 1x + 0y = 0 \Rightarrow x = 0 $$  

Thus, the **orthogonal complement** is the **y-axis**, i.e.,  

$$ W^\perp = \text{span} \{(0,1)\} $$  

---

**Properties of the Orthogonal Complement**
1. **Dimensional Relationship**  
   If $ W $ is a subspace of $ V $ of dimension $ k $, then:  
   $$ \dim(W) + \dim(W^\perp) = \dim(V) $$  

   For example, in $ \mathbb{R}^3 $, if $ W $ is a **line (1D)**, then $ W^\perp $ is a **plane (2D)**.

2. **Double Complement Property**  
   $$ (W^\perp)^\perp = W $$  

   This means taking the orthogonal complement twice brings us back to the original subspace.

3. **Orthogonal Decomposition**  
   Every vector in $ V $ can be **uniquely** written as:  
   $$ \mathbf{v} = \mathbf{w} + \mathbf{w}^\perp $$  
   where $ \mathbf{w} \in W $ and $ \mathbf{w}^\perp \in W^\perp $.  
   This is crucial in **least squares regression**.

---

**Machine Learning Applications**
1. **Least Squares Regression**  
   - The residual vector (error) is **orthogonal** to the column space of the design matrix.
   - This helps minimize errors in **linear regression**.

2. **Principal Component Analysis (PCA)**  
   - PCA finds the **orthogonal complement** of the lower-dimensional subspace that best represents data variance.

3. **Fourier Transforms**  
   - Uses orthogonal complements to decompose signals into basis functions.

The **orthogonal complement** provides insights into projections, decompositions, and dimensionality reduction in various applications.


# Inner Product of Functions

In functional analysis and machine learning, an **inner product of functions** extends the concept of the dot product to infinite-dimensional vector spaces. The inner product defines a measure of **similarity** between two functions, much like how the dot product measures similarity between vectors.

---

**Definition**    
The inner product of two functions $ f(x) $ and $ g(x) $ over an interval $ [a, b] $ with respect to a **weight function** $ w(x) $ is defined as:

$$ \langle f, g \rangle = \int_a^b f(x) g(x) w(x) \, dx $$

where:
- $ f(x), g(x) $ are real or complex-valued functions,
- $ w(x) $ is a **weight function** (default is $ w(x) = 1 $),
- The integral computes a measure of similarity between $ f(x) $ and $ g(x) $.

If $ \langle f, g \rangle = 0 $, the functions are **orthogonal** over $ [a, b] $.

---

**Example 1: Standard Inner Product (No Weight Function)**  
Consider the functions:

$$ f(x) = x, \quad g(x) = x^2 $$  

on the interval $ [0,1] $. Their inner product is:

$$ \langle f, g \rangle = \int_0^1 x \cdot x^2 \, dx $$  

$$ = \int_0^1 x^3 \, dx $$  

$$ = \frac{1}{4} $$  

Since the inner product is **nonzero**, $ f(x) $ and $ g(x) $ are **not orthogonal**.

---

**Example 2: Checking Orthogonality**    
Let $ f(x) = \sin(x) $ and $ g(x) = \cos(x) $ over $ [0, \pi] $. The inner product is:

$$ \langle f, g \rangle = \int_0^\pi \sin(x) \cos(x) \, dx $$  

Using the identity:

$$ \sin(x) \cos(x) = \frac{1}{2} \sin(2x) $$  

We compute:

$$ \langle f, g \rangle = \int_0^\pi \frac{1}{2} \sin(2x) \, dx $$  

Evaluating:

$$ = \frac{1}{2} \left[ -\frac{\cos(2x)}{2} \right]_0^\pi $$  
$$ = \frac{1}{4} [ -\cos(2\pi) + \cos(0) ] $$  
$$ = \frac{1}{4} [ -1 + 1 ] = 0 $$  

Since the inner product is **zero**, $ \sin(x) $ and $ \cos(x) $ are **orthogonal**.

---

**Example 3: Inner Product with a Weight Function**  
Let:

$$ f(x) = x, \quad g(x) = 1 $$  

over $ [0,1] $ with weight function $ w(x) = x $. The inner product is:

$$ \langle f, g \rangle = \int_0^1 x \cdot 1 \cdot x \, dx $$  

$$ = \int_0^1 x^2 \, dx $$  

$$ = \frac{1}{3} $$  

This demonstrates how different **weight functions** impact the inner product.

---

**Properties of the Inner Product**  
1. **Linearity**:  
   $$ \langle af + bg, h \rangle = a \langle f, h \rangle + b \langle g, h \rangle $$  

2. **Symmetry**:  
   $$ \langle f, g \rangle = \langle g, f \rangle $$  

3. **Positive Definiteness**:  
   $$ \langle f, f \rangle \geq 0 $$  
   and  
   $$ \langle f, f \rangle = 0 \iff f = 0 $$  

4. **Orthogonality**:  
   $$ \langle f, g \rangle = 0 \Rightarrow f \perp g $$  

---

**Applications in Machine Learning and Data Science**  
- **Fourier Series & Signal Processing**:  
  Functions are decomposed into orthogonal basis functions (e.g., sine and cosine waves).  

- **Principal Component Analysis (PCA)**:  
  PCA finds orthogonal directions (principal components) in data.  

- **Kernel Methods in Machine Learning**:  
  Inner products define similarity in **feature spaces** for algorithms like **SVMs** and **Gaussian Processes**.  

- **Least Squares Regression**:  
  The residual is **orthogonal** to the column space of the design matrix.  

The **inner product of functions** is fundamental in functional analysis, physics, and machine learning.


# Orthogonal Projections


**General Concept of Projection**    
A **projection** is the transformation of a vector onto another vector or subspace. It represents the closest approximation of a vector in one space onto another space. Projection is widely used in **computer graphics, physics, statistics, and machine learning**.

If a vector **$ v $** is projected onto another vector **$ u $**, the resulting vector lies **along** $ u $. More generally, a vector can be projected onto a **subspace**, producing a shadow-like representation of that vector within the subspace.

---

**Orthogonal Projection**    
The **orthogonal projection** of a vector **$ v $** onto a subspace is the closest point to $ v $ in that subspace. This means the difference between $ v $ and its projection is **perpendicular (orthogonal)** to the subspace.

**Key Characteristics of Orthogonal Projection**  
1. The difference between a vector and its projection is always **perpendicular** to the subspace.
2. It minimizes the **distance** between the original vector and the subspace.
3. The projection represents the **best approximation** of the vector within the given subspace.

**Geometric Interpretation**  
If we project **$ v $** onto a **line** (1D subspace), the projection lies on that line. If we project onto a **plane** (2D subspace), the projection lies in that plane.

- If **$ v $** is already in the subspace, its projection is **itself**.
- If **$ v $** is not in the subspace, it gets **mapped** to the closest point in that subspace.

**Applications of Projection in Machine Learning & Data Science**  
- **Principal Component Analysis (PCA)**: Projects high-dimensional data onto a lower-dimensional subspace while preserving variance.
- **Linear Regression**: The least squares solution finds the **projection** of observed data onto a linear model.
- **Fourier Analysis**: Projects functions onto basis functions like sine and cosine waves.
- **Computer Graphics**: Projection transformations are used for rendering 3D objects in 2D.


## Projection onto One-Dimensional Subspaces (Lines)

 

**Concept of Projection onto a Line**    
A **one-dimensional subspace** is simply a **line** passing through the origin. If we have a vector **$ v $**, its projection onto a line spanned by a vector **$ u $** gives the closest approximation of $ v $ that lies along $ u $.

Mathematically, the **projection of $ v $ onto $ u $** is given by:

$$ \text{Proj}_u v = \frac{\langle v, u \rangle}{\langle u, u \rangle} u $$

where:
- **$ \langle v, u \rangle $** is the **dot product** of $ v $ and $ u $.
- **$ \langle u, u \rangle $** is the **dot product** of $ u $ with itself (equivalent to $ ||u||^2 $, the squared norm of $ u $).
- The scalar **$ \frac{\langle v, u \rangle}{\langle u, u \rangle} $** is the projection **coefficient**, determining how much of $ u $ is needed to best approximate $ v $.

---

**Geometric Interpretation**  
The projection of $ v $ onto the line spanned by $ u $ is the **shadow** of $ v $ when a perpendicular is dropped from $ v $ onto the line. The vector **$ v - \text{Proj}_u v $** is **orthogonal** to the line.

This means:
- **The projection lies on the line.**
- **The error (difference) between $ v $ and its projection is perpendicular to the line.**

---

**Example Calculation**    
Let’s say we have vectors:

$$ v = \begin{bmatrix} 3 \\ 4 \end{bmatrix}, \quad u = \begin{bmatrix} 1 \\ 2 \end{bmatrix} $$

1. Compute the dot product **$ \langle v, u \rangle $**:

   $$ \langle v, u \rangle = (3)(1) + (4)(2) = 3 + 8 = 11 $$

2. Compute the squared norm of **$ u $**:

   $$ ||u||^2 = \langle u, u \rangle = (1)^2 + (2)^2 = 1 + 4 = 5 $$

3. Compute the projection coefficient:

   $$ \frac{\langle v, u \rangle}{\langle u, u \rangle} = \frac{11}{5} = 2.2 $$

4. Multiply by **$ u $** to get the projection:

   $$ \text{Proj}_u v = 2.2 \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 2.2 \\ 4.4 \end{bmatrix} $$

Thus, **$ v $** is projected onto the line as **$ [2.2, 4.4] $**.

**Transformation Matrix for Projection**    

**Projection Matrix Concept**    
A **projection matrix** is a square matrix that, when applied to a vector, projects it onto a certain subspace. For **projection onto a line** (one-dimensional subspace), we can represent the projection operation as a matrix multiplication. The result of multiplying a vector by this matrix is the projection of the vector onto the subspace.

For a vector **$ v $** and a line spanned by a unit vector **$ u $**, the projection matrix **$ P $** that projects any vector onto the line along **$ u $** is given by:

$$ P = \frac{u u^T}{u^T u} $$

where:
- **$ u u^T $** is the outer product of the vector **$ u $** with itself.
- **$ u^T u $** is the squared norm of **$ u $**.

Since **$ u $** is a unit vector in this case, **$ u^T u = 1 $**, and the projection matrix simplifies to:

$$ P = u u^T $$

---

**Example Calculation of Projection Matrix**  
Consider a vector **$ u = \begin{bmatrix} 1 \\ 2 \end{bmatrix} $**, and we want to compute the projection matrix onto the line spanned by **$ u $**.

1. Compute the outer product **$ u u^T $**:

   $$ u u^T = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \begin{bmatrix} 1 & 2 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} $$

2. The projection matrix **$ P $** is:

   $$ P = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} $$

---

**Geometric Interpretation**  
The transformation matrix **$ P $** represents a linear operation that **maps any vector to its projection onto the subspace spanned by $ u $**. When this matrix is applied to a vector, the result is the **closest point** on the line defined by **$ u $**.

---

**Projection Matrix for Higher Dimensions**  
For projection onto a **higher-dimensional subspace** (e.g., a plane or hyperplane), the process is analogous. The matrix would be constructed using the **basis vectors** of the subspace. The general form for the projection matrix onto a subspace spanned by multiple vectors is:

$$ P = A (A^T A)^{-1} A^T $$

where:
- **$ A $** is the matrix whose columns are the basis vectors of the subspace.
- **$ A^T $** is the transpose of **$ A $**.
- **$ (A^T A)^{-1} $** is the inverse of **$ A^T A $**, assuming it is invertible.

---

**Machine Learning & Data Science Applications**  
- **Linear Regression**: The least squares solution finds the **projection** of observed data onto a model's feature space.
- **PCA (Principal Component Analysis)**: Reduces dimensionality by projecting data onto **principal components** (lines).
- **Fourier Analysis**: Projects functions onto sinusoidal basis functions.
- **Error Minimization**: Projections are used to **minimize reconstruction error** in various optimization tasks.



In [7]:
import numpy as np

u = np.array([1, 2])
v = np.array([3, 4])

c = np.dot(u, v)/np.dot(u, u)
proj_v_onto_u = c * u
proj_v_onto_u

array([2.2, 4.4])

In [8]:
c

np.float64(2.2)

## Projection onto General Subspaces


**General Projection onto Subspaces**  
In general, projecting a vector onto a subspace involves finding the closest vector in that subspace to the given vector. If you are projecting onto a subspace spanned by multiple vectors, the projection is determined by finding a linear combination of these vectors that minimizes the distance between the original vector and the subspace.

Given a vector **$ v $** and a subspace **$ S $** spanned by multiple vectors **$ u_1, u_2, ..., u_k $**, the projection of **$ v $** onto **$ S $** is the vector in **$ S $** that is closest to **$ v $**. Mathematically, this can be achieved by solving the system of equations derived from the least squares problem.

The formula for the projection of **$ v $** onto the subspace spanned by the vectors **$ u_1, u_2, ..., u_k $** is:

$$ P = A (A^T A)^{-1} A^T $$

where:
- **$ A $** is the matrix whose columns are the basis vectors of the subspace **$ S $**.
- **$ A^T $** is the transpose of **$ A $**.
- **$ (A^T A)^{-1} $** is the inverse of **$ A^T A $**, assuming it is invertible.

This formula computes the projection matrix that projects a vector onto the subspace **$ S $**.

---

**Example: Projection onto a Subspace**  
Consider a vector **$ v = \begin{bmatrix} 3 \\ 4 \end{bmatrix} $** and a subspace **$ S $** spanned by the vectors **$ u_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix} $** and **$ u_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix} $** (the standard basis of **$ \mathbb{R}^2 $**). We want to project **$ v $** onto **$ S $**.

Since **$ u_1 $** and **$ u_2 $** form an orthonormal basis for **$ \mathbb{R}^2 $**, the projection matrix is simply the identity matrix:

$$ P = I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $$

This projection is straightforward because **$ v $** is already in the subspace spanned by **$ u_1 $** and **$ u_2 $**.

For a more complex subspace, such as a plane spanned by arbitrary vectors, we can use the projection formula outlined above.

---

**Geometric Interpretation**  
The projection of **$ v $** onto **$ S $** is the **closest vector in the subspace** to **$ v $**. In higher-dimensional spaces, the projection is not always visually intuitive, but it can be thought of as the orthogonal "shadow" of the vector **$ v $** onto the subspace **$ S $**.

---

**Machine Learning Applications**  
- **Linear Regression**: The projection onto a subspace is used to find the **best-fit line** or **hyperplane** in linear regression. The projection matrix projects the data points onto the regression line or hyperplane that minimizes the error.
- **PCA**: In Principal Component Analysis (PCA), the projection onto the principal components is the key operation. The data points are projected onto a lower-dimensional subspace defined by the eigenvectors of the covariance matrix.
- **Dimensionality Reduction**: Techniques like PCA use projections to reduce the number of features in a dataset while retaining as much variance as possible.
- **Signal Processing**: Projections are used in filtering and noise reduction, where the goal is to project the noisy signal onto a cleaner subspace.


## Gram-Schmidt Orthogonalization

**Introduction**  
The **Gram-Schmidt process** is a method for converting a set of **linearly independent vectors** into an **orthonormal** set of vectors. This is useful in many applications, such as:
- Constructing an **orthonormal basis** for a vector space.
- Finding an **orthogonal projection** of a vector onto a subspace.
- Decomposing a matrix in **QR factorization**, which is widely used in solving linear systems and optimization problems.

If we have a set of **$ n $** linearly independent vectors **$ \{v_1, v_2, \dots, v_n\} $**, the Gram-Schmidt process produces an orthonormal set **$ \{q_1, q_2, \dots, q_n\} $** such that:
1. The **$ q_i $** vectors are orthogonal (**$ q_i \cdot q_j = 0 $** for **$ i \neq j $**).
2. Each **$ q_i $** has unit length (**$ ||q_i|| = 1 $**).

---

**Step-by-Step Gram-Schmidt Process**  
Given linearly independent vectors **$ \{v_1, v_2, ..., v_n\} $**, we construct the orthonormal set **$ \{q_1, q_2, ..., q_n\} $** using the following procedure:

1. **First Vector Normalization:**
   - Define the first orthogonal vector **$ u_1 $** as:
     $$ u_1 = v_1 $$
   - Normalize it to obtain the first orthonormal vector:
     $$ q_1 = \frac{u_1}{||u_1||} $$

2. **Second Vector Orthogonalization:**
   - Remove the component of **$ v_2 $** along **$ q_1 $**:
     $$ u_2 = v_2 - \text{proj}_{q_1}(v_2) $$
   - Compute the projection:
     $$ \text{proj}_{q_1}(v_2) = \frac{q_1 \cdot v_2}{q_1 \cdot q_1} q_1 $$
   - Normalize **$ u_2 $** to obtain:
     $$ q_2 = \frac{u_2}{||u_2||} $$

3. **General Case for Any Vector** **$ v_k $**:**
   - Orthogonalize **$ v_k $** against all previous **$ q_i $**:
     $$ u_k = v_k - \sum_{i=1}^{k-1} \text{proj}_{q_i}(v_k) $$
   - Normalize **$ u_k $** to obtain:
     $$ q_k = \frac{u_k}{||u_k||} $$

This process is repeated for all **$ n $** vectors.

---

**Example**  
Given two linearly independent vectors in **$ \mathbb{R}^2 $**:

$$ v_1 = \begin{bmatrix} 3 \\ 1 \end{bmatrix}, \quad v_2 = \begin{bmatrix} 2 \\ 2 \end{bmatrix} $$

1. **Compute the first orthonormal vector**:
   - **$ u_1 = v_1 = \begin{bmatrix} 3 \\ 1 \end{bmatrix} $**
   - Normalize:
     $$ q_1 = \frac{u_1}{||u_1||} = \frac{1}{\sqrt{10}} \begin{bmatrix} 3 \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{3}{\sqrt{10}} \\ \frac{1}{\sqrt{10}} \end{bmatrix} $$

2. **Compute the second orthonormal vector**:
   - Projection of **$ v_2 $** onto **$ q_1 $**:
     $$ \text{proj}_{q_1}(v_2) = \left( \frac{q_1 \cdot v_2}{q_1 \cdot q_1} \right) q_1 $$
     $$ = \left( \frac{\left(\frac{3}{\sqrt{10}}, \frac{1}{\sqrt{10}}\right) \cdot (2,2)}{1} \right) q_1 $$
     $$ = \left( \frac{6 + 2}{\sqrt{10}} \right) q_1 $$
     $$ = \left( \frac{8}{\sqrt{10}} \right) q_1 $$
     $$ = \begin{bmatrix} \frac{24}{10} \\ \frac{8}{10} \end{bmatrix} = \begin{bmatrix} 2.4 \\ 0.8 \end{bmatrix} $$

   - Compute **$ u_2 $**:
     $$ u_2 = v_2 - \text{proj}_{q_1}(v_2) = \begin{bmatrix} 2 \\ 2 \end{bmatrix} - \begin{bmatrix} 2.4 \\ 0.8 \end{bmatrix} = \begin{bmatrix} -0.4 \\ 1.2 \end{bmatrix} $$

   - Normalize **$ u_2 $** to get **$ q_2 $**:
     $$ ||u_2|| = \sqrt{(-0.4)^2 + (1.2)^2} = \sqrt{0.16 + 1.44} = \sqrt{1.6} $$
     $$ q_2 = \frac{u_2}{||u_2||} = \frac{1}{\sqrt{1.6}} \begin{bmatrix} -0.4 \\ 1.2 \end{bmatrix} = \begin{bmatrix} -\frac{0.4}{\sqrt{1.6}} \\ \frac{1.2}{\sqrt{1.6}} \end{bmatrix} $$

Thus, the orthonormal basis is:

$$ q_1 = \begin{bmatrix} \frac{3}{\sqrt{10}} \\ \frac{1}{\sqrt{10}} \end{bmatrix}, \quad q_2 = \begin{bmatrix} -\frac{0.4}{\sqrt{1.6}} \\ \frac{1.2}{\sqrt{1.6}} \end{bmatrix} $$

---

**Geometric Interpretation**  
The Gram-Schmidt process **transforms a set of non-orthogonal basis vectors into an orthonormal set** by iteratively removing components in directions already accounted for. Geometrically:
- The first vector remains unchanged.
- The second vector is modified by removing its projection onto the first.
- The third vector is modified by removing its projections onto the first two, and so on.

This ensures that all vectors remain perpendicular.

---

**Machine Learning Applications**  
1. **QR Decomposition**: The Gram-Schmidt process is used to decompose a matrix into an **orthogonal matrix Q** and an **upper triangular matrix R**.
2. **Principal Component Analysis (PCA)**: PCA involves finding **orthonormal basis vectors** (principal components) that maximize variance.
3. **Orthogonal Projections**: The process is fundamental in least-squares regression, where projections are used to minimize residual errors.
4. **Stability in Numerical Computation**: Many algorithms in optimization and statistics require an orthonormal basis to avoid numerical instability.

---

**Summary**  
- **Gram-Schmidt orthogonalizes a set of vectors**, turning them into an **orthonormal basis**.
- The process removes components of one vector in the direction of others to maintain orthogonality.
- It is widely used in **QR decomposition, PCA, and numerical optimization**.



## Projection onto Affine Subspaces


**Introduction**  
An **affine subspace** in **$ \mathbb{R}^n $** is a translation of a linear subspace. Unlike a linear subspace, which passes through the origin, an affine subspace may be **shifted** away from the origin.

If we want to project a vector **$ x $** onto an affine subspace, we need to:
1. **Identify a point on the subspace** (say **$ p $**).
2. **Find the closest point to $ x $** that belongs to the affine subspace.

---

**Mathematical Definition**  
Let:
- **$ S $** be an affine subspace of **$ \mathbb{R}^n $**, defined as:
  $$ S = \{ p + y \mid y \in V \} $$
  where **$ p $** is a fixed point in the subspace and **$ V $** is a linear subspace.
- **$ x \in \mathbb{R}^n $** be a vector we want to project onto **$ S $**.

The **projection of $ x $ onto $ S $**, denoted as **$ \text{proj}_S(x) $**, is given by:

$$ \text{proj}_S(x) = p + \text{proj}_V(x - p) $$

where **$ \text{proj}_V(x - p) $** is the **orthogonal projection** of **$ x - p $** onto the subspace **$ V $**.

---

**Computing the Projection**  
To compute **$ \text{proj}_S(x) $**, follow these steps:

1. **Find an orthonormal basis for $ V $** (using Gram-Schmidt if needed).
2. **Compute the projection of $ x - p $** onto **$ V $** using the standard projection formula:
   $$ \text{proj}_V(x - p) = VV^T (x - p) $$
   where **$ V $** is a matrix whose columns are the basis vectors of **$ V $**.
3. **Add back $ p $** to get the projection onto **$ S $**:
   $$ \text{proj}_S(x) = p + VV^T (x - p) $$

---

**Example**  
Consider **$ \mathbb{R}^3 $** with an affine subspace **$ S $** defined by:

- **A point on the subspace**: $ p = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} $
- **A basis for the subspace's direction**:
  $$ V = \left\{ \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix} \right\} $$

We want to **project** the point:

$$ x = \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} $$

onto **$ S $**.

1. **Construct matrix $ V $** from basis vectors:
   $$ V = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix} $$

2. **Compute $ VV^T $**:
   $$ VV^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 2 \end{bmatrix} $$

3. **Compute $ \text{proj}_V(x - p) $**:
   $$ x - p = \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} - \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix} $$

   $$ \text{proj}_V(x - p) = VV^T (x - p) $$

   $$ = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 2 \end{bmatrix} \begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix} $$

   $$ = \begin{bmatrix} (1)(3) + (0)(3) + (1)(3) \\ (0)(3) + (1)(3) + (1)(3) \\ (1)(3) + (1)(3) + (2)(3) \end{bmatrix} $$

   $$ = \begin{bmatrix} 6 \\ 6 \\ 12 \end{bmatrix} $$

4. **Compute $ \text{proj}_S(x) $**:
   $$ \text{proj}_S(x) = p + \text{proj}_V(x - p) $$

   $$ = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} + \begin{bmatrix} 6 \\ 6 \\ 12 \end{bmatrix} $$

   $$ = \begin{bmatrix} 7 \\ 8 \\ 15 \end{bmatrix} $$

---

**Geometric Interpretation**  
- The vector **$ x $** is **projected** onto the affine subspace **$ S $**, meaning we find the **closest point** to **$ x $** in **$ S $**.
- The process involves:
  - **Shifting** the problem so that the subspace passes through the origin.
  - **Projecting onto the linear subspace**.
  - **Shifting back** to the affine subspace.

---

**Applications in Machine Learning**  
1. **Least Squares Regression**: The **solution to a least-squares problem is a projection** onto an affine subspace.
2. **Dimensionality Reduction**: PCA projects data onto a lower-dimensional **affine subspace** that captures the most variance.
3. **Optimization and Control**: Many optimization problems require projecting onto affine constraints.

---

**Summary**  
- **Affine subspaces are linear subspaces shifted away from the origin**.
- **Projection onto an affine subspace** is done by:
  1. **Shifting** to a linear subspace.
  2. **Projecting** onto that subspace.
  3. **Shifting back**.
- The projection is computed using:
  $$ \text{proj}_S(x) = p + VV^T (x - p) $$
- This concept is essential in **regression, PCA, and optimization**.



# Rotations

## Rotation in 2D Space 

**Introduction**  
A rotation in **$\mathbb{R}^2$** is a transformation that rotates a vector **counterclockwise** by an angle **$\theta$** around the origin. Rotations are **linear transformations** and can be represented using a **rotation matrix**.

---

**Rotation Matrix in $\mathbb{R}^2$**  
The standard **rotation matrix** for a **counterclockwise** rotation by an angle **$\theta$** in **$\mathbb{R}^2$** is:

$$
R(\theta) =
\begin{bmatrix}
\cos\theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{bmatrix}
$$

Given a vector **$ v = \begin{bmatrix} x \\ y \end{bmatrix} $**, its rotated version **$ v' $** is given by:

$$
v' = R(\theta) v
$$

$$
\begin{bmatrix} x' \\ y' \end{bmatrix} =
\begin{bmatrix}
\cos\theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{bmatrix}
\begin{bmatrix} x \\ y \end{bmatrix}
$$

which simplifies to:

$$
x' = x\cos\theta - y\sin\theta
$$

$$
y' = x\sin\theta + y\cos\theta
$$

---

**Example Calculation**  
Let’s rotate the point **$(2, 3)$** by **$45^\circ$** (**$\theta = \frac{\pi}{4}$**):

1. Compute **$\cos 45^\circ = \frac{\sqrt{2}}{2}$** and **$\sin 45^\circ = \frac{\sqrt{2}}{2}$**.
2. Apply the rotation matrix:

$$
R(45^\circ) =
\begin{bmatrix}
\frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} \\
\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}
\end{bmatrix}
$$

$$
\begin{bmatrix} x' \\ y' \end{bmatrix} =
\begin{bmatrix}
\frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} \\
\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}
\end{bmatrix}
\begin{bmatrix} 2 \\ 3 \end{bmatrix}
$$

$$
=
\begin{bmatrix}
(2)(\frac{\sqrt{2}}{2}) + (-3)(\frac{\sqrt{2}}{2}) \\
(2)(\frac{\sqrt{2}}{2}) + (3)(\frac{\sqrt{2}}{2})
\end{bmatrix}
$$

$$
=
\begin{bmatrix}
\frac{2\sqrt{2}}{2} - \frac{3\sqrt{2}}{2} \\
\frac{2\sqrt{2}}{2} + \frac{3\sqrt{2}}{2}
\end{bmatrix}
=
\begin{bmatrix}
-\frac{\sqrt{2}}{2} \\
\frac{5\sqrt{2}}{2}
\end{bmatrix}
$$

Thus, the rotated point is approximately **(-0.71, 3.54)**.

---

**Geometric Interpretation**  
- Rotation **preserves the length** of vectors (i.e., it is an **orthogonal transformation**).
- Rotation **does not change the origin**.
- The axes remain **perpendicular** to each other.

---

**Machine Learning Applications**  
1. **Data Augmentation**: Rotations are commonly used in image processing to artificially expand datasets.
2. **Computer Vision**: Used in **object tracking** and **pose estimation**.
3. **Feature Engineering**: PCA (Principal Component Analysis) often involves rotation of axes to align data along principal components.

---

**Summary**  
- Rotation in **$\mathbb{R}^2$** is given by:
  
  $$
  R(\theta) =
  \begin{bmatrix}
  \cos\theta & -\sin\theta \\
  \sin\theta & \cos\theta
  \end{bmatrix}
  $$

- Rotating a vector **$ (x, y) $** by **$ \theta $** results in:

  $$
  x' = x\cos\theta - y\sin\theta
  $$

  $$
  y' = x\sin\theta + y\cos\theta
  $$

- **Applications** include **computer vision, image processing, and data transformations**.


## Rotation in 3D Space

**Introduction**  
Rotation in **$\mathbb{R}^3$** extends the concept of **2D rotation** by introducing rotations about the **x, y, and z axes**. Unlike **$\mathbb{R}^2$**, where there is only one plane of rotation, **3D space** allows for rotations about multiple axes.

A **rotation in 3D** is still a **linear transformation**, meaning it can be represented using **rotation matrices**.

---

**Rotation Matrices in $\mathbb{R}^3$**  
In **3D space**, we define three fundamental rotation matrices:
1. **Rotation about the x-axis** by an angle **$\theta$**:
   
   $$
   R_x(\theta) =
   \begin{bmatrix}
   1 & 0 & 0 \\
   0 & \cos\theta & -\sin\theta \\
   0 & \sin\theta & \cos\theta
   \end{bmatrix}
   $$

2. **Rotation about the y-axis** by an angle **$\theta$**:

   $$
   R_y(\theta) =
   \begin{bmatrix}
   \cos\theta & 0 & \sin\theta \\
   0 & 1 & 0 \\
   -\sin\theta & 0 & \cos\theta
   \end{bmatrix}
   $$

3. **Rotation about the z-axis** by an angle **$\theta$**:

   $$
   R_z(\theta) =
   \begin{bmatrix}
   \cos\theta & -\sin\theta & 0 \\
   \sin\theta & \cos\theta & 0 \\
   0 & 0 & 1
   \end{bmatrix}
   $$

Each of these matrices **rotates** a **point** or **vector** around a specific axis while keeping that axis fixed.

---

**Example: Rotating a Point Around the z-Axis**  
Suppose we want to rotate the point **$(1, 2, 3)$** by **$90^\circ$** (or **$\frac{\pi}{2}$ radians**) around the **z-axis**.

1. Compute **$\cos 90^\circ = 0$** and **$\sin 90^\circ = 1$**.
2. Use the **z-axis rotation matrix**:

   $$
   R_z(90^\circ) =
   \begin{bmatrix}
   0 & -1 & 0 \\
   1 & 0 & 0 \\
   0 & 0 & 1
   \end{bmatrix}
   $$

3. Apply the transformation:

   $$
   \begin{bmatrix} x' \\ y' \\ z' \end{bmatrix} =
   R_z(90^\circ) \cdot
   \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}
   $$

   $$
   =
   \begin{bmatrix}
   0 & -1 & 0 \\
   1 & 0 & 0 \\
   0 & 0 & 1
   \end{bmatrix}
   \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}
   $$

   $$=
   \begin{bmatrix}
   (0)(1) + (-1)(2) + (0)(3) \\
   (1)(1) + (0)(2) + (0)(3) \\
   (0)(1) + (0)(2) + (1)(3)
   \end{bmatrix}
   =
   \begin{bmatrix} -2 \\ 1 \\ 3 \end{bmatrix}
   $$

Thus, the rotated point is **$(-2, 1, 3)$**.

---

**Geometric Interpretation**  
- Rotations in **3D space** preserve the **length** of vectors (**orthogonal transformation**).
- They **preserve angles** between vectors.
- They **do not change the origin**.
- Composing rotations about different axes in sequence **does not commute** in general.

---

**Rotation About an Arbitrary Axis**  
In **3D**, we can also **rotate about an arbitrary axis** defined by a unit vector **$\hat{u} = (u_x, u_y, u_z)$** using Rodrigues’ rotation formula:

   $$
   R(\theta) = I + (\sin\theta)K + (1 - \cos\theta)K^2
   $$

where **$K$** is the **skew-symmetric matrix** of **$\hat{u}$**:

   $$
   K =
   \begin{bmatrix}
   0 & -u_z & u_y \\
   u_z & 0 & -u_x \\
   -u_y & u_x & 0
   \end{bmatrix}
   $$

This allows for **free rotation around any vector** in space.

---

**Machine Learning Applications**  
1. **3D Data Augmentation**: Rotations are used in **3D computer vision** (e.g., **point clouds, LiDAR scans**).
2. **Robotics & Kinematics**: Used for **robot arm movement** and **drone orientation**.
3. **Graphics & Simulations**: Game engines use 3D rotation matrices for **character movement and animations**.

---

**Summary**  
- **Rotations in $\mathbb{R}^3$** are represented by **rotation matrices** about the **x, y, and z** axes.
- The **rotation matrix** depends on the axis and **angle $\theta$**.
- Rotations **preserve distances and angles**.
- **More complex rotations** can be handled using **Rodrigues' formula** or **quaternions**.



## Rotation in nD space

**Introduction**  
Rotation in **$\mathbb{R}^n$** generalizes the concept of **2D and 3D rotations** to **n-dimensional space**. A **rotation** is a **linear transformation** that preserves the **lengths of vectors** and the **angles** between them. 

In **$\mathbb{R}^n$**, a rotation can be defined using an **orthogonal matrix** $R$ such that:

$$
R^T R = I
$$

where $I$ is the **identity matrix**, ensuring that the transformation **preserves vector norms**.

---

**Rotation Matrices in $\mathbb{R}^n$**  
A general **rotation matrix** in **n-dimensional space** is an **orthogonal matrix** with **determinant $1$**:

$$
R \in \mathbb{R}^{n \times n}, \quad R^T R = I, \quad \det(R) = 1
$$

For any vector **$x \in \mathbb{R}^n$**, its rotated version is given by:

$$
x' = R x
$$

where **$R$** is an **$n \times n$ rotation matrix**.

---

**Rotation in Higher Dimensions**  
In **$\mathbb{R}^n$**, a rotation occurs within a **plane spanned by two basis vectors**. Any rotation can be defined by selecting two coordinate axes and rotating within that **2D subspace** while leaving the remaining dimensions unchanged.

A **Givens rotation matrix** is a simple rotation within a **2D plane inside $\mathbb{R}^n$**:

$$
G(i, j, \theta) = I + (R_{ij} - I)
$$

where $R_{ij}$ is a **2D rotation matrix** applied to dimensions $i$ and $j$, while the other dimensions remain unchanged.

---

**Example: Rotation in $\mathbb{R}^4$**  
Consider a **4D vector**:

$$
x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}
$$

To rotate it in the **$x_1-x_2$ plane** by an angle **$\theta$**, we use:

$$
R_{12}(\theta) =
\begin{bmatrix}
\cos\theta & -\sin\theta & 0 & 0 \\
\sin\theta & \cos\theta & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{bmatrix}
$$

Applying this to **$x$** results in a rotation **only within the $x_1-x_2$ plane**, leaving $x_3$ and $x_4$ unchanged.

---

**General Rotation in $\mathbb{R}^n$**  
A general rotation matrix can be constructed using **$n(n-1)/2$ elementary 2D rotations**, each rotating along a **plane spanned by two dimensions**. 

A **higher-dimensional rotation** can be represented as a **composition** of these **elementary rotations**.

For any two **unit vectors** $u, v \in \mathbb{R}^n$, the **rotation matrix** that rotates **$u$ onto $v$** is:

$$
R = I + (v - u) u^T + (u - v) v^T
$$

where the transformation aligns the vector **$u$** with **$v$**.

---

**Applications in Machine Learning**  
1. **Data Augmentation**: Rotations in **higher dimensions** are used for **image transformations, 3D object recognition**, and **augmented data generation**.
2. **Principal Component Analysis (PCA)**: PCA uses **rotation matrices** to **align** the data along its **principal axes**.
3. **Neural Networks & Optimization**: Weight transformations often involve **rotations in high-dimensional space**.
4. **Quantum Computing**: Rotation matrices play a role in **unitary transformations** in quantum mechanics.

---

**Summary**  
- **Rotations in $\mathbb{R}^n$** are **orthogonal transformations** preserving **distances and angles**.
- A **rotation** occurs within a **2D subspace** of **$\mathbb{R}^n$**.
- **Givens rotations** allow for **efficient** computation of rotations in high-dimensional spaces.
- Rotations are fundamental in **machine learning**, **dimensionality reduction**, and **computer graphics**.



## Properties of Rotations


A **rotation** in **$\mathbb{R}^n$** is a **linear transformation** that **preserves distances** and **angles** between vectors.The transformation is represented by an **orthogonal matrix** $R$ such that:

$$
R^T R = I
$$

where **$I$** is the **identity matrix**. Additionally, for a proper rotation:

$$
\det(R) = 1
$$

---

**2. Properties of Rotation Matrices**    
*(a) Rotations Preserve Length (Norm Preservation)*  
For any vector $x \in \mathbb{R}^n$, its length (Euclidean norm) remains unchanged after rotation:

$$
\| R x \| = \| x \|
$$

**Proof:**
Using the definition of rotation matrices:

$$
\| R x \|^2 = (R x)^T (R x) = x^T R^T R x = x^T I x = \| x \|^2
$$

Thus, rotations do not change vector magnitudes.

---

*(b) Rotations Preserve Angles*  
For any two vectors $x, y \in \mathbb{R}^n$, the **angle between them** remains unchanged after applying a rotation matrix:

$$
\cos\theta = \frac{x^T y}{\|x\| \|y\|}
$$

Since rotation preserves the dot product:

$$
(Rx)^T (Ry) = x^T R^T R y = x^T y
$$

it follows that the angle $\theta$ between vectors is unchanged.

---

*(c) Composition of Rotations is a Rotation*  
If **$R_1$** and **$R_2$** are two rotation matrices, their product **$R_3 = R_1 R_2$** is also a rotation matrix:

$$
(R_1 R_2)^T (R_1 R_2) = R_2^T R_1^T R_1 R_2 = R_2^T I R_2 = R_2^T R_2 = I
$$

Thus, the set of all rotation matrices forms a **group** under matrix multiplication.

---

*(d) Inverse of a Rotation is Its Transpose*  
For any rotation matrix $R$, its **inverse** is equal to its **transpose**:

$$
R^{-1} = R^T
$$

**Proof:**
From the definition:

$$
R^T R = I
$$

Multiplying both sides by **$R^{-1}$**, we get:

$$
R^{-1} = R^T
$$

Thus, rotating by **$R$** and then by **$R^T$** returns the vector to its original position.

---

*(e) Determinant of a Rotation Matrix*  
A **proper rotation** matrix has a determinant of **1**:

$$
\det(R) = 1
$$

This ensures that the transformation **preserves orientation** (i.e., no reflection occurs).

---

*(f) Rotations Do Not Change Volume*  
For any **region** in $\mathbb{R}^n$, the volume remains unchanged after rotation:

$$
\text{Volume}(R S) = \text{Volume}(S)
$$

where $S$ is a geometric object, such as a parallelepiped or hypersphere.

**Proof:**
Since **rotation matrices have determinant 1**, they preserve **volume**.

---

**3. Special Properties in $\mathbb{R}^2$ and $\mathbb{R}^3$**  
*(a) 2D Rotation Matrix*  
A **rotation matrix** in **$\mathbb{R}^2$** for an angle **$\theta$** is:

$$
R(\theta) =
\begin{bmatrix}
\cos\theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{bmatrix}
$$

*(b) 3D Rotation Matrices*  
In **$\mathbb{R}^3$**, rotation matrices are defined about coordinate axes:

- **Rotation about the $x$-axis:**
  $$
  R_x(\theta) =
  \begin{bmatrix}
  1 & 0 & 0 \\
  0 & \cos\theta & -\sin\theta \\
  0 & \sin\theta & \cos\theta
  \end{bmatrix}
  $$

- **Rotation about the $y$-axis:**
  $$
  R_y(\theta) =
  \begin{bmatrix}
  \cos\theta & 0 & \sin\theta \\
  0 & 1 & 0 \\
  -\sin\theta & 0 & \cos\theta
  \end{bmatrix}
  $$

- **Rotation about the $z$-axis:**
  $$
  R_z(\theta) =
  \begin{bmatrix}
  \cos\theta & -\sin\theta & 0 \\
  \sin\theta & \cos\theta & 0 \\
  0 & 0 & 1
  \end{bmatrix}
  $$

---

**4. Applications in Machine Learning & Data Science**  
**(a) Principal Component Analysis (PCA)**  
- PCA uses **rotation matrices** to **align** data with its **principal components**.
- This allows for **dimensionality reduction** and better visualization.

**(b) Data Augmentation in Computer Vision**  
- **Rotation transformations** are used to **augment datasets** by generating rotated versions of images.
- Helps train **neural networks** to be **rotation-invariant**.

**(c) Robotics & Computer Graphics**  
- **3D object transformations** use rotation matrices for **camera movements** and **robot arm movements**.

**(d) Optimization in High-Dimensional Spaces**  
- In **gradient-based optimizations**, rotation matrices are used to **reorient** coordinate systems.

---

**5. Summary**  
- Rotations **preserve length and angles**.
- They are represented by **orthogonal matrices** with **determinant 1**.
- The **inverse** of a rotation is **its transpose**.
- They **preserve volume** and do not introduce distortions.
- **Essential** in **machine learning, data science, and engineering**.

