In [None]:
'''
 * Copyright (c) 2016 Radhamadhab Dalai
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
'''

# Singular Value Decomposition (SVD) and Matrix Approximation

## Introduction

The Singular Value Decomposition (SVD) is a fundamental matrix factorization technique with wide applications in machine learning, from least-squares problems to dimensionality reduction and data compression.

## 1. Standard SVD Formulation

For any matrix $A \in \mathbb{R}^{m \times n}$, the SVD decomposes it as:

$$A = U\Sigma V^T$$

where:
- $U \in \mathbb{R}^{m \times m}$ is an orthogonal matrix (left singular vectors)
- $\Sigma \in \mathbb{R}^{m \times n}$ contains singular values on the main diagonal
- $V \in \mathbb{R}^{n \times n}$ is an orthogonal matrix (right singular vectors)

## 2. Reduced SVD (Compact SVD)

![image.png](attachment:image.png)
Fig.11 Image processing with the SVD. (a) The original grayscale image is a 1, 432 × 1, 910 matrix of values between 0 (black) and 1 (white). (b)–(f) Rank-1 matrices A1 , . . . , A5 and their corresponding singular values σ1 , . . . , σ5 . The grid-like structure of each rank-1 matrix is imposed by the outer-product of the left and right-singular vectors.

Sometimes called the **reduced SVD** (Datta, 2010) or simply **the SVD** (Press et al., 2007), this alternative formulation provides computational convenience:

For a rank-$r$ matrix $A$:
$$A = U\Sigma V^T$$

where:
- $U \in \mathbb{R}^{m \times r}$ (reduced left singular vectors)
- $\Sigma \in \mathbb{R}^{r \times r}$ (diagonal matrix with nonzero singular values)
- $V \in \mathbb{R}^{r \times n}$ (reduced right singular vectors)

**Key advantage**: $\Sigma$ is diagonal (like in eigenvalue decomposition), containing only nonzero entries.

## 3. Handling Different Matrix Dimensions

The SVD applies to $m \times n$ matrices regardless of whether $m > n$ or $m < n$:

- When $m < n$: The decomposition yields $\Sigma$ with more zero columns than rows
- Consequently, singular values $\sigma_{m+1}, \ldots, \sigma_n = 0$

## 4. Matrix Approximation via SVD

### 4.1 Rank-1 Matrix Construction

Instead of full SVD factorization, we can represent matrix $A$ as a sum of simpler low-rank matrices:

$$A_i := u_i v_i^T \quad \text{(rank-1 matrix)}$$

where $A_i \in \mathbb{R}^{m \times n}$ is formed by the outer product of the $i$-th orthogonal column vectors from $U$ and $V$.

### 4.2 Complete Representation

The full matrix can be expressed as:

$$A = \sum_{i=1}^{r} \sigma_i u_i v_i^T = \sum_{i=1}^{r} \sigma_i A_i$$

where $\sigma_i$ are the singular values and $r$ is the rank of $A$.

### 4.3 Truncated SVD for Approximation

For matrix approximation, we use only the first $k$ terms (where $k < r$):

$$A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T$$

This **truncated SVD** provides the best rank-$k$ approximation to $A$ in the Frobenius norm sense.

## 5. Applications in Machine Learning

The SVD's matrix approximation capabilities enable numerous applications:

### 5.1 Dimensionality Reduction
- Principal Component Analysis (PCA)
- Feature extraction and visualization

### 5.2 Data Compression
- Image compression (as shown with Stonehenge example)
- Lossy compression with controlled quality

### 5.3 Topic Modeling
- Latent Semantic Analysis (LSA)
- Document-term matrix factorization

### 5.4 Clustering and Pattern Recognition
- Spectral clustering
- Noise reduction

### 5.5 Numerical Stability
- Solving systems of linear equations
- Least-squares curve fitting
- Robust to numerical rounding errors

## 6. Computational Advantages

**Matrix approximation benefits**:
1. **Computational efficiency**: Working with lower-rank approximations
2. **Storage reduction**: Fewer parameters to store
3. **Noise reduction**: Truncation removes small singular values (often noise)
4. **Numerical robustness**: SVD substitution improves numerical stability

## 7. Example: Image Approximation

Consider an image represented as matrix $A \in \mathbb{R}^{1432 \times 1910}$ (like the Stonehenge example):

```python
# Conceptual code structure
A_approx = sum(sigma[i] * outer_product(U[:, i], V[i, :]) for i in range(k))
```

where $k$ determines the approximation quality vs. compression trade-off.

## Mathematical Properties

**Key SVD properties leveraged in approximation**:

1. **Optimality**: Truncated SVD gives the best low-rank approximation
2. **Energy compaction**: Large singular values capture most information
3. **Orthogonality**: $U$ and $V$ matrices preserve geometric properties
4. **Rank revelation**: Singular values reveal the effective dimensionality

The SVD's principled approach to matrix approximation makes it invaluable for creating "simpler" matrix representations while preserving essential structural information.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple, Optional

class SVDDecomposer:
    """
    Implementation of Singular Value Decomposition and Matrix Approximation techniques
    """
    
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_shape = None
    
    def decompose(self, A: np.ndarray, full_matrices: bool = True) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """
        Perform SVD decomposition: A = U @ Σ @ V^T
        
        Args:
            A: Input matrix (m x n)
            full_matrices: If True, compute full SVD; if False, compute reduced SVD
            
        Returns:
            U: Left singular vectors
            sigma: Singular values (1D array)
            Vt: Right singular vectors (transposed)
        """
        self.original_shape = A.shape
        
        # Compute SVD using NumPy's implementation
        self.U, self.sigma, self.Vt = np.linalg.svd(A, full_matrices=full_matrices)
        
        return self.U, self.sigma, self.Vt
    
    def reduced_svd(self, A: np.ndarray, rank: Optional[int] = None) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """
        Compute reduced/compact SVD with specified rank
        
        Args:
            A: Input matrix
            rank: Desired rank (if None, use effective rank)
            
        Returns:
            U_reduced: Reduced left singular vectors
            sigma_reduced: Reduced singular values
            Vt_reduced: Reduced right singular vectors
        """
        U, sigma, Vt = self.decompose(A, full_matrices=False)
        
        if rank is None:
            # Use effective rank (remove near-zero singular values)
            tol = max(A.shape) * np.finfo(A.dtype).eps * sigma[0]
            rank = np.sum(sigma > tol)
        
        rank = min(rank, len(sigma))
        
        return U[:, :rank], sigma[:rank], Vt[:rank, :]
    
    def truncated_svd(self, A: np.ndarray, k: int) -> np.ndarray:
        """
        Compute truncated SVD approximation using first k components
        
        Args:
            A: Input matrix
            k: Number of components to keep
            
        Returns:
            A_k: Rank-k approximation of A
        """
        U, sigma, Vt = self.decompose(A, full_matrices=False)
        
        k = min(k, len(sigma))
        
        # Reconstruct using first k components
        A_k = U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :]
        
        return A_k
    
    def rank_one_matrices(self, A: np.ndarray, num_components: Optional[int] = None) -> list:
        """
        Decompose matrix into rank-1 components: A_i = σ_i * u_i * v_i^T
        
        Args:
            A: Input matrix
            num_components: Number of rank-1 matrices to return
            
        Returns:
            List of rank-1 matrices
        """
        U, sigma, Vt = self.decompose(A, full_matrices=False)
        
        if num_components is None:
            num_components = len(sigma)
        
        rank_one_mats = []
        
        for i in range(min(num_components, len(sigma))):
            # A_i = σ_i * u_i * v_i^T
            A_i = sigma[i] * np.outer(U[:, i], Vt[i, :])
            rank_one_mats.append(A_i)
        
        return rank_one_mats
    
    def progressive_approximation(self, A: np.ndarray, max_rank: int) -> list:
        """
        Generate progressive approximations A_1, A_2, ..., A_k
        
        Args:
            A: Input matrix
            max_rank: Maximum rank for approximation
            
        Returns:
            List of progressive approximations
        """
        U, sigma, Vt = self.decompose(A, full_matrices=False)
        
        approximations = []
        max_rank = min(max_rank, len(sigma))
        
        for k in range(1, max_rank + 1):
            A_k = U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :]
            approximations.append(A_k)
        
        return approximations
    
    def approximation_error(self, A: np.ndarray, A_approx: np.ndarray, norm_type: str = 'fro') -> float:
        """
        Compute approximation error between original and approximated matrix
        
        Args:
            A: Original matrix
            A_approx: Approximated matrix
            norm_type: Type of norm ('fro' for Frobenius, '2' for spectral)
            
        Returns:
            Approximation error
        """
        if norm_type == 'fro':
            return np.linalg.norm(A - A_approx, 'fro')
        elif norm_type == '2':
            return np.linalg.norm(A - A_approx, 2)
        else:
            raise ValueError("norm_type must be 'fro' or '2'")
    
    def compression_ratio(self, A: np.ndarray, k: int) -> float:
        """
        Calculate compression ratio for rank-k approximation
        
        Args:
            A: Original matrix
            k: Approximation rank
            
        Returns:
            Compression ratio
        """
        m, n = A.shape
        original_size = m * n
        compressed_size = k * (m + n + 1)  # U(:,1:k) + sigma(1:k) + Vt(1:k,:)
        
        return compressed_size / original_size


class ImageSVDDemo:
    """
    Demonstrate SVD matrix approximation on images
    """
    
    def __init__(self):
        self.svd = SVDDecomposer()
    
    def create_synthetic_image(self, size: Tuple[int, int] = (100, 100)) -> np.ndarray:
        """
        Create a synthetic image for demonstration
        """
        m, n = size
        x = np.linspace(-2, 2, n)
        y = np.linspace(-2, 2, m)
        X, Y = np.meshgrid(x, y)
        
        # Create interesting pattern
        image = np.exp(-(X**2 + Y**2)) + 0.5 * np.sin(5*X) * np.cos(5*Y)
        
        # Normalize to [0, 1]
        image = (image - image.min()) / (image.max() - image.min())
        
        return image
    
    def demonstrate_approximation(self, image: np.ndarray, ranks: list):
        """
        Demonstrate progressive SVD approximation on an image
        
        Args:
            image: Input image matrix
            ranks: List of ranks to test
        """
        fig, axes = plt.subplots(2, len(ranks) + 1, figsize=(15, 8))
        
        # Original image
        axes[0, 0].imshow(image, cmap='gray')
        axes[0, 0].set_title('Original')
        axes[0, 0].axis('off')
        
        # Singular values plot
        U, sigma, Vt = self.svd.decompose(image)
        axes[1, 0].semilogy(sigma, 'b-o', markersize=3)
        axes[1, 0].set_title('Singular Values')
        axes[1, 0].set_xlabel('Index')
        axes[1, 0].set_ylabel('Value (log scale)')
        axes[1, 0].grid(True)
        
        errors = []
        ratios = []
        
        for i, k in enumerate(ranks):
            # Compute approximation
            A_k = self.svd.truncated_svd(image, k)
            
            # Display approximation
            axes[0, i+1].imshow(A_k, cmap='gray')
            axes[0, i+1].set_title(f'Rank {k}')
            axes[0, i+1].axis('off')
            
            # Compute and display error
            error = self.svd.approximation_error(image, A_k)
            ratio = self.svd.compression_ratio(image, k)
            
            errors.append(error)
            ratios.append(ratio)
            
            axes[1, i+1].bar(['Error', 'Compression'], [error, ratio])
            axes[1, i+1].set_title(f'Rank {k}\nError: {error:.3f}\nRatio: {ratio:.3f}')
        
        plt.tight_layout()
        plt.show()
        
        return errors, ratios


def demonstrate_svd_concepts():
    """
    Comprehensive demonstration of SVD concepts
    """
    print("=== SVD and Matrix Approximation Demonstration ===\n")
    
    # Create SVD decomposer
    svd = SVDDecomposer()
    
    # 1. Basic SVD on a simple matrix
    print("1. Basic SVD Decomposition")
    print("-" * 30)
    
    A = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9],
                  [10, 11, 12]], dtype=float)
    
    print(f"Original matrix A (shape {A.shape}):")
    print(A)
    
    U, sigma, Vt = svd.decompose(A)
    
    print(f"\nU shape: {U.shape}")
    print(f"Sigma shape: {sigma.shape}")
    print(f"Vt shape: {Vt.shape}")
    print(f"Singular values: {sigma}")
    
    # Verify reconstruction
    if U.shape[1] == len(sigma):
        A_reconstructed = U @ np.diag(sigma) @ Vt
    else:
        # Handle reduced SVD case
        A_reconstructed = U @ np.diag(sigma) @ Vt[:len(sigma), :]
    
    reconstruction_error = np.linalg.norm(A - A_reconstructed)
    print(f"Reconstruction error: {reconstruction_error:.2e}")
    
    # 2. Reduced SVD
    print(f"\n2. Reduced SVD")
    print("-" * 30)
    
    U_red, sigma_red, Vt_red = svd.reduced_svd(A, rank=2)
    print(f"Reduced U shape: {U_red.shape}")
    print(f"Reduced sigma shape: {sigma_red.shape}")
    print(f"Reduced Vt shape: {Vt_red.shape}")
    
    # 3. Rank-1 decomposition
    print(f"\n3. Rank-1 Matrix Decomposition")
    print("-" * 30)
    
    rank_one_mats = svd.rank_one_matrices(A, num_components=3)
    
    for i, A_i in enumerate(rank_one_mats):
        print(f"Rank-1 matrix {i+1} (σ_{i+1} = {sigma[i]:.3f}):")
        print(A_i)
        print(f"Rank: {np.linalg.matrix_rank(A_i)}")
        print()
    
    # 4. Progressive approximation
    print("4. Progressive Approximation Analysis")
    print("-" * 30)
    
    errors = []
    ratios = []
    
    for k in range(1, min(4, len(sigma)+1)):
        A_k = svd.truncated_svd(A, k)
        error = svd.approximation_error(A, A_k)
        ratio = svd.compression_ratio(A, k)
        
        errors.append(error)
        ratios.append(ratio)
        
        print(f"Rank-{k} approximation:")
        print(f"  Frobenius error: {error:.6f}")
        print(f"  Compression ratio: {ratio:.3f}")
        print(f"  Retained energy: {np.sum(sigma[:k]**2) / np.sum(sigma**2):.3f}")
    
    # 5. Image approximation demo
    print(f"\n5. Image Approximation Demo")
    print("-" * 30)
    
    demo = ImageSVDDemo()
    
    # Create synthetic image
    synthetic_image = demo.create_synthetic_image((50, 50))
    
    # Analyze approximation quality
    test_ranks = [1, 5, 10, 20]
    
    print("Analyzing approximation quality for different ranks:")
    
    for k in test_ranks:
        A_k = svd.truncated_svd(synthetic_image, k)
        error = svd.approximation_error(synthetic_image, A_k)
        ratio = svd.compression_ratio(synthetic_image, k)
        
        print(f"Rank {k:2d}: Error = {error:.6f}, Compression = {ratio:.3f}")
    
    # Demonstrate the visualization (commented out to avoid display issues in some environments)
    # demo.demonstrate_approximation(synthetic_image, test_ranks)
    
    print(f"\n6. Numerical Properties")
    print("-" * 30)
    
    # Condition number analysis
    cond_original = np.linalg.cond(A)
    
    # Create a well-conditioned approximation
    k_stable = 2  # Use first 2 components
    A_stable = svd.truncated_svd(A, k_stable)
    cond_stable = np.linalg.cond(A_stable)
    
    print(f"Condition number of original matrix: {cond_original:.2e}")
    print(f"Condition number of rank-{k_stable} approximation: {cond_stable:.2e}")
    print(f"Numerical stability improvement: {cond_original/cond_stable:.2f}x")


def advanced_svd_applications():
    """
    Demonstrate advanced SVD applications
    """
    print("\n=== Advanced SVD Applications ===\n")
    
    svd = SVDDecomposer()
    
    # 1. Principal Component Analysis (PCA) simulation
    print("1. PCA-like Dimensionality Reduction")
    print("-" * 35)
    
    # Generate correlated data
    np.random.seed(42)
    n_samples, n_features = 100, 5
    
    # Create data with some correlation structure
    true_components = np.array([[1, 1, 0, 0, 0],
                                [0, 0, 1, 1, 1]]).T
    
    data = np.random.randn(n_samples, 2) @ true_components.T + 0.1 * np.random.randn(n_samples, n_features)
    
    print(f"Original data shape: {data.shape}")
    
    # Center the data (important for PCA)
    data_centered = data - np.mean(data, axis=0)
    
    # Apply SVD
    U, sigma, Vt = svd.decompose(data_centered.T)  # Note: transpose for feature space
    
    print(f"Explained variance ratios: {(sigma**2 / np.sum(sigma**2))[:3]}")
    
    # 2. Least squares solution using SVD
    print(f"\n2. Robust Least Squares via SVD")
    print("-" * 30)
    
    # Create overdetermined system Ax = b
    m, n = 10, 5
    A_ls = np.random.randn(m, n)
    x_true = np.random.randn(n)
    b = A_ls @ x_true + 0.01 * np.random.randn(m)  # Add small noise
    
    # Solve using SVD (more numerically stable than normal equations)
    U, sigma, Vt = svd.decompose(A_ls)
    
    # Compute pseudoinverse using SVD
    tol = max(A_ls.shape) * np.finfo(float).eps * sigma[0]
    rank = np.sum(sigma > tol)
    
    # x = V @ Σ^(-1) @ U^T @ b (for the non-zero singular values)
    x_svd = Vt[:rank, :].T @ np.diag(1/sigma[:rank]) @ U[:, :rank].T @ b
    
    # Compare with numpy's least squares
    x_lstsq = np.linalg.lstsq(A_ls, b, rcond=None)[0]
    
    print(f"True solution norm: {np.linalg.norm(x_true):.6f}")
    print(f"SVD solution error: {np.linalg.norm(x_svd - x_true):.6f}")
    print(f"Lstsq solution error: {np.linalg.norm(x_lstsq - x_true):.6f}")
    print(f"Solutions match: {np.allclose(x_svd, x_lstsq)}")
    
    # 3. Matrix completion simulation
    print(f"\n3. Low-Rank Matrix Recovery")
    print("-" * 28)
    
    # Create low-rank matrix
    rank_true = 3
    m, n = 20, 15
    L = np.random.randn(m, rank_true)
    R = np.random.randn(rank_true, n)
    M_true = L @ R
    
    # Add noise
    M_noisy = M_true + 0.1 * np.random.randn(m, n)
    
    print(f"True rank: {rank_true}")
    print(f"Noisy matrix rank: {np.linalg.matrix_rank(M_noisy)}")
    
    # Recover using truncated SVD
    M_recovered = svd.truncated_svd(M_noisy, rank_true)
    
    recovery_error = svd.approximation_error(M_true, M_recovered)
    noise_level = svd.approximation_error(M_true, M_noisy)
    
    print(f"Noise level: {noise_level:.6f}")
    print(f"Recovery error: {recovery_error:.6f}")
    print(f"Recovery improvement: {noise_level/recovery_error:.2f}x")


if __name__ == "__main__":
    # Run comprehensive demonstrations
    demonstrate_svd_concepts()
    advanced_svd_applications()
    
    print("\n=== Summary of SVD Benefits ===")
    print("• Optimal low-rank approximation (Eckart-Young theorem)")
    print("• Numerical stability for ill-conditioned problems") 
    print("• Principal component analysis and dimensionality reduction")
    print("• Data compression with controlled quality loss")
    print("• Robust solutions to least-squares problems")
    print("• Matrix completion and denoising applications")
    print("• Foundation for many machine learning algorithms")

![image.png](attachment:image.png)

Fig.12 Image reconstruction with the SVD. (a) Original image. (b)–(f) Image reconstruction using the low-rank approximation of the SVD, where the rank-k approximation is b given by A(k) = Pk i=1 σi Ai .

# Matrix Approximation and the Eckart-Young Theorem

## Introduction

This notebook implements the matrix approximation techniques described in Section 4.6, focusing on the Eckart-Young theorem which establishes the optimality of SVD-based low-rank approximations.

## Mathematical Foundation

### 1. Spectral Norm Definition

**Definition 4.23 (Spectral Norm of a Matrix)**

For $x \in \mathbb{R}^n \setminus \{0\}$, the spectral norm of a matrix $A \in \mathbb{R}^{m \times n}$ is defined as:

$$\|A\|_2 := \max_{x} \frac{\|Ax\|_2}{\|x\|_2} \quad \text{(Equation 4.93)}$$

The spectral norm determines how long any vector $x$ can at most become when multiplied by $A$.

### 2. Spectral Norm and Singular Values

**Theorem 4.24**: The spectral norm of $A$ is its largest singular value $\sigma_1$.

$$\|A\|_2 = \sigma_1$$

### 3. Eckart-Young Theorem

**Theorem 4.25 (Eckart-Young Theorem, 1936)**

Consider a matrix $A \in \mathbb{R}^{m \times n}$ of rank $r$ and let $B \in \mathbb{R}^{m \times n}$ be a matrix of rank $k$. For any $k \leq r$ with $\hat{A}^{(k)} = \sum_{i=1}^{k} \sigma_i u_i v_i^T$, it holds that:

$$\hat{A}^{(k)} = \arg\min_{\text{rank}(B)=k} \|A - B\|_2 \quad \text{(Equation 4.94)}$$

$$\|A - \hat{A}^{(k)}\|_2 = \sigma_{k+1} \quad \text{(Equation 4.95)}$$

The theorem states that SVD provides the **optimal** rank-$k$ approximation in the spectral norm sense.

## Implementation

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import svd as scipy_svd
import warnings
warnings.filterwarnings('ignore')

class MatrixApproximationAnalyzer:
    """
    Implementation of matrix approximation techniques with focus on
    the Eckart-Young theorem and spectral norm analysis
    """
    
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_matrix = None
        
    def compute_svd(self, A):
        """
        Compute SVD decomposition: A = U @ Σ @ V^T
        """
        self.original_matrix = A.copy()
        self.U, self.sigma, self.Vt = np.linalg.svd(A, full_matrices=False)
        return self.U, self.sigma, self.Vt
    
    def spectral_norm(self, A):
        """
        Compute spectral norm of matrix A
        
        Definition 4.23: ||A||₂ = max_x ||Ax||₂/||x||₂
        Theorem 4.24: ||A||₂ = σ₁ (largest singular value)
        """
        # Method 1: Using definition (computationally expensive)
        # We'll use the theorem instead for efficiency
        
        # Method 2: Using Theorem 4.24
        _, sigma, _ = np.linalg.svd(A, full_matrices=False)
        spectral_norm_value = sigma[0] if len(sigma) > 0 else 0
        
        return spectral_norm_value
    
    def verify_spectral_norm_theorem(self, A, num_random_vectors=1000):
        """
        Verify Theorem 4.24: ||A||₂ = σ₁ by testing with random vectors
        """
        # Compute using theorem
        theoretical_norm = self.spectral_norm(A)
        
        # Compute using definition with random vectors
        m, n = A.shape
        max_ratio = 0
        
        for _ in range(num_random_vectors):
            x = np.random.randn(n)
            x = x / np.linalg.norm(x)  # Normalize
            
            Ax = A @ x
            ratio = np.linalg.norm(Ax) / np.linalg.norm(x)
            max_ratio = max(max_ratio, ratio)
        
        return theoretical_norm, max_ratio
    
    def rank_k_approximation(self, A, k):
        """
        Create rank-k approximation: Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ
        """
        if self.U is None:
            self.compute_svd(A)
        
        k = min(k, len(self.sigma))
        
        # Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ
        A_k = np.zeros_like(A)
        for i in range(k):
            A_k += self.sigma[i] * np.outer(self.U[:, i], self.Vt[i, :])
        
        return A_k
    
    def eckart_young_error(self, A, k):
        """
        Compute the error according to Eckart-Young theorem
        
        ||A - Â^(k)||₂ = σₖ₊₁
        """
        if self.U is None:
            self.compute_svd(A)
        
        if k >= len(self.sigma):
            return 0.0  # Perfect reconstruction
        
        return self.sigma[k]  # σₖ₊₁ (k+1-th singular value, 0-indexed)
    
    def verify_eckart_young_theorem(self, A, k):
        """
        Verify the Eckart-Young theorem by computing actual error
        and comparing with theoretical prediction
        """
        # Compute rank-k approximation
        A_k = self.rank_k_approximation(A, k)
        
        # Actual error
        actual_error = self.spectral_norm(A - A_k)
        
        # Theoretical error (Eckart-Young)
        theoretical_error = self.eckart_young_error(A, k)
        
        return actual_error, theoretical_error
    
    def demonstrate_optimality(self, A, k, num_random_trials=50):
        """
        Demonstrate that SVD gives optimal rank-k approximation
        by comparing with random rank-k matrices
        """
        # SVD approximation
        A_k_svd = self.rank_k_approximation(A, k)
        svd_error = self.spectral_norm(A - A_k_svd)
        
        # Random rank-k approximations
        m, n = A.shape
        random_errors = []
        
        for _ in range(num_random_trials):
            # Create random rank-k matrix
            U_rand = np.random.randn(m, k)
            V_rand = np.random.randn(k, n)
            A_k_rand = U_rand @ V_rand
            
            # Normalize to have similar scale
            A_k_rand = A_k_rand * (np.linalg.norm(A) / np.linalg.norm(A_k_rand))
            
            error = self.spectral_norm(A - A_k_rand)
            random_errors.append(error)
        
        return svd_error, random_errors
    
    def progressive_approximation_analysis(self, A, max_rank=None):
        """
        Analyze how approximation error decreases with increasing rank
        """
        if self.U is None:
            self.compute_svd(A)
        
        if max_rank is None:
            max_rank = min(A.shape)
        
        max_rank = min(max_rank, len(self.sigma))
        
        ranks = list(range(1, max_rank + 1))
        actual_errors = []
        theoretical_errors = []
        
        for k in ranks:
            actual_error, theoretical_error = self.verify_eckart_young_theorem(A, k)
            actual_errors.append(actual_error)
            theoretical_errors.append(theoretical_error)
        
        return ranks, actual_errors, theoretical_errors

def create_test_matrices():
    """
    Create various test matrices for demonstration
    """
    matrices = {}
    
    # 1. Low-rank matrix (rank 3)
    np.random.seed(42)
    U1 = np.random.randn(8, 3)
    V1 = np.random.randn(3, 6)
    matrices['low_rank'] = U1 @ V1
    
    # 2. Image-like matrix (structured)
    x = np.linspace(-2, 2, 50)
    y = np.linspace(-2, 2, 40)
    X, Y = np.meshgrid(x, y)
    matrices['image_like'] = np.exp(-(X**2 + Y**2)) + 0.3 * np.sin(3*X) * np.cos(3*Y)
    
    # 3. Random matrix
    matrices['random'] = np.random.randn(20, 15)
    
    # 4. Ill-conditioned matrix
    U2 = np.random.randn(10, 10)
    sigma_ill = np.logspace(2, -8, 10)  # Wide range of singular values
    V2 = np.random.randn(10, 10)
    matrices['ill_conditioned'] = U2 @ np.diag(sigma_ill) @ V2
    
    return matrices

def demonstrate_spectral_norm():
    """
    Demonstrate spectral norm computation and Theorem 4.24
    """
    print("=== Spectral Norm Analysis ===")
    print("Definition 4.23 and Theorem 4.24\n")
    
    analyzer = MatrixApproximationAnalyzer()
    
    # Test with different matrices
    test_matrices = create_test_matrices()
    
    for name, A in test_matrices.items():
        print(f"Matrix: {name} (shape {A.shape})")
        
        # Verify Theorem 4.24
        theoretical_norm, empirical_norm = analyzer.verify_spectral_norm_theorem(A)
        
        print(f"  Theoretical ||A||₂ (σ₁): {theoretical_norm:.6f}")
        print(f"  Empirical ||A||₂:       {empirical_norm:.6f}")
        print(f"  Difference:              {abs(theoretical_norm - empirical_norm):.2e}")
        print(f"  Theorem verified:        {abs(theoretical_norm - empirical_norm) < 1e-10}")
        print()

def demonstrate_eckart_young_theorem():
    """
    Comprehensive demonstration of the Eckart-Young theorem
    """
    print("=== Eckart-Young Theorem Demonstration ===")
    print("Theorem 4.25: Optimality of SVD approximation\n")
    
    analyzer = MatrixApproximationAnalyzer()
    
    # Use image-like matrix for demonstration
    test_matrices = create_test_matrices()
    A = test_matrices['image_like']
    
    print(f"Test matrix shape: {A.shape}")
    print(f"Matrix rank: {np.linalg.matrix_rank(A)}")
    
    # Compute SVD
    U, sigma, Vt = analyzer.compute_svd(A)
    print(f"First 10 singular values: {sigma[:10]}")
    print()
    
    # Test different ranks
    test_ranks = [1, 2, 3, 5, 8, 10]
    
    print("Rank | Actual Error | Theoretical Error | Difference | Verified")
    print("-" * 65)
    
    for k in test_ranks:
        if k < len(sigma):
            actual_error, theoretical_error = analyzer.verify_eckart_young_theorem(A, k)
            diff = abs(actual_error - theoretical_error)
            verified = diff < 1e-10
            
            print(f"{k:4d} | {actual_error:11.6f} | {theoretical_error:16.6f} | {diff:9.2e} | {verified}")
    
    print()
    
    # Demonstrate optimality
    print("=== Optimality Demonstration ===")
    print("SVD vs Random Rank-k Approximations\n")
    
    k_test = 3
    svd_error, random_errors = analyzer.demonstrate_optimality(A, k_test)
    
    print(f"Rank-{k_test} approximation errors:")
    print(f"  SVD approximation error:    {svd_error:.6f}")
    print(f"  Best random approximation:  {min(random_errors):.6f}")
    print(f"  Worst random approximation: {max(random_errors):.6f}")
    print(f"  Average random error:       {np.mean(random_errors):.6f}")
    print(f"  SVD is optimal:             {svd_error <= min(random_errors)}")
    
    # Visualization
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))
    
    # Original vs approximations
    ranks_vis = [1, 3, 5]
    images_to_show = [A] + [analyzer.rank_k_approximation(A, k) for k in ranks_vis]
    titles = ['Original'] + [f'Rank-{k}' for k in ranks_vis]
    
    for i, (img, title) in enumerate(zip(images_to_show, titles)):
        if i == 0:
            ax1.imshow(img, cmap='viridis')
            ax1.set_title(title)
            ax1.axis('off')
        elif i == 1:
            ax2.imshow(img, cmap='viridis')
            ax2.set_title(title)
            ax2.axis('off')
        elif i == 2:
            ax3.imshow(img, cmap='viridis')
            ax3.set_title(title)
            ax3.axis('off')
    
    plt.tight_layout()
    plt.show()
    
    return analyzer, A

def analyze_approximation_quality():
    """
    Comprehensive analysis of approximation quality
    """
    print("\n=== Progressive Approximation Analysis ===")
    
    analyzer = MatrixApproximationAnalyzer()
    test_matrices = create_test_matrices()
    
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
    axes = axes.flatten()
    
    for idx, (name, A) in enumerate(test_matrices.items()):
        if idx >= 4:
            break
            
        print(f"\nMatrix: {name}")
        
        # Progressive analysis
        ranks, actual_errors, theoretical_errors = analyzer.progressive_approximation_analysis(A, max_rank=15)
        
        # Plot results
        ax = axes[idx]
        ax.semilogy(ranks, actual_errors, 'bo-', label='Actual Error', markersize=4)
        ax.semilogy(ranks, theoretical_errors, 'r*--', label='Theoretical (σₖ₊₁)', markersize=6)
        ax.set_xlabel('Rank k')
        ax.set_ylabel('||A - Â^(k)||₂')
        ax.set_title(f'{name.replace("_", " ").title()}')
        ax.legend()
        ax.grid(True, alpha=0.3)
        
        # Print some statistics
        print(f"  Shape: {A.shape}")
        print(f"  Rank: {np.linalg.matrix_rank(A)}")
        print(f"  Spectral norm: {analyzer.spectral_norm(A):.6f}")
        print(f"  Rank-1 approximation captures {(1 - theoretical_errors[0]/analyzer.spectral_norm(A))*100:.1f}% of energy")
    
    plt.tight_layout()
    plt.show()

def demonstrate_image_reconstruction():
    """
    Recreate Figure 4.12: Image reconstruction with SVD
    """
    print("\n=== Image Reconstruction Demonstration ===")
    print("Recreating Figure 4.12 results\n")
    
    analyzer = MatrixApproximationAnalyzer()
    
    # Create a more complex synthetic image
    def create_complex_image():
        height, width = 100, 120
        x = np.linspace(-3, 3, width)
        y = np.linspace(-2, 2, height)
        X, Y = np.meshgrid(x, y)
        
        # Complex pattern mimicking natural image
        image = (np.exp(-(X**2 + Y**2)/2) + 
                0.3 * np.sin(4*X) * np.cos(4*Y) + 
                0.2 * np.sin(8*X + 8*Y) +
                0.1 * np.random.randn(height, width))
        
        # Normalize to [0, 1]
        image = (image - image.min()) / (image.max() - image.min())
        return image
    
    original_image = create_comp


# Matrix Approximation and the Eckart-Young Theorem

## Introduction

This notebook implements the matrix approximation techniques described in Section 4.6, focusing on the Eckart-Young theorem which establishes the optimality of SVD-based low-rank approximations.

## Mathematical Foundation

### 1. Spectral Norm Definition

**Definition 4.23 (Spectral Norm of a Matrix)**

For $x \in \mathbb{R}^n \setminus \{0\}$, the spectral norm of a matrix $A \in \mathbb{R}^{m \times n}$ is defined as:

$$\|A\|_2 := \max_{x} \frac{\|Ax\|_2}{\|x\|_2} \quad \text{(Equation 4.93)}$$

The spectral norm determines how long any vector $x$ can at most become when multiplied by $A$.

### 2. Spectral Norm and Singular Values

**Theorem 4.24**: The spectral norm of $A$ is its largest singular value $\sigma_1$.

$$\|A\|_2 = \sigma_1$$

### 3. Eckart-Young Theorem

**Theorem 4.25 (Eckart-Young Theorem, 1936)**

Consider a matrix $A \in \mathbb{R}^{m \times n}$ of rank $r$ and let $B \in \mathbb{R}^{m \times n}$ be a matrix of rank $k$. For any $k \leq r$ with $\hat{A}^{(k)} = \sum_{i=1}^{k} \sigma_i u_i v_i^T$, it holds that:

$$\hat{A}^{(k)} = \arg\min_{\text{rank}(B)=k} \|A - B\|_2 \quad \text{(Equation 4.94)}$$

$$\|A - \hat{A}^{(k)}\|_2 = \sigma_{k+1} \quad \text{(Equation 4.95)}$$

The theorem states that SVD provides the **optimal** rank-$k$ approximation in the spectral norm sense.

## Implementation

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import svd as scipy_svd
import warnings
warnings.filterwarnings('ignore')

class MatrixApproximationAnalyzer:
    """
    Implementation of matrix approximation techniques with focus on
    the Eckart-Young theorem and spectral norm analysis
    """
    
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_matrix = None
        
    def compute_svd(self, A):
        """
        Compute SVD decomposition: A = U @ Σ @ V^T
        """
        self.original_matrix = A.copy()
        self.U, self.sigma, self.Vt = np.linalg.svd(A, full_matrices=False)
        return self.U, self.sigma, self.Vt
    
    def spectral_norm(self, A):
        """
        Compute spectral norm of matrix A
        
        Definition 4.23: ||A||₂ = max_x ||Ax||₂/||x||₂
        Theorem 4.24: ||A||₂ = σ₁ (largest singular value)
        """
        # Method 1: Using definition (computationally expensive)
        # We'll use the theorem instead for efficiency
        
        # Method 2: Using Theorem 4.24
        _, sigma, _ = np.linalg.svd(A, full_matrices=False)
        spectral_norm_value = sigma[0] if len(sigma) > 0 else 0
        
        return spectral_norm_value
    
    def verify_spectral_norm_theorem(self, A, num_random_vectors=1000):
        """
        Verify Theorem 4.24: ||A||₂ = σ₁ by testing with random vectors
        """
        # Compute using theorem
        theoretical_norm = self.spectral_norm(A)
        
        # Compute using definition with random vectors
        m, n = A.shape
        max_ratio = 0
        
        for _ in range(num_random_vectors):
            x = np.random.randn(n)
            x = x / np.linalg.norm(x)  # Normalize
            
            Ax = A @ x
            ratio = np.linalg.norm(Ax) / np.linalg.norm(x)
            max_ratio = max(max_ratio, ratio)
        
        return theoretical_norm, max_ratio
    
    def rank_k_approximation(self, A, k):
        """
        Create rank-k approximation: Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ
        """
        if self.U is None:
            self.compute_svd(A)
        
        k = min(k, len(self.sigma))
        
        # Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ
        A_k = np.zeros_like(A)
        for i in range(k):
            A_k += self.sigma[i] * np.outer(self.U[:, i], self.Vt[i, :])
        
        return A_k
    
    def eckart_young_error(self, A, k):
        """
        Compute the error according to Eckart-Young theorem
        
        ||A - Â^(k)||₂ = σₖ₊₁
        """
        if self.U is None:
            self.compute_svd(A)
        
        if k >= len(self.sigma):
            return 0.0  # Perfect reconstruction
        
        return self.sigma[k]  # σₖ₊₁ (k+1-th singular value, 0-indexed)
    
    def verify_eckart_young_theorem(self, A, k):
        """
        Verify the Eckart-Young theorem by computing actual error
        and comparing with theoretical prediction
        """
        # Compute rank-k approximation
        A_k = self.rank_k_approximation(A, k)
        
        # Actual error
        actual_error = self.spectral_norm(A - A_k)
        
        # Theoretical error (Eckart-Young)
        theoretical_error = self.eckart_young_error(A, k)
        
        return actual_error, theoretical_error
    
    def demonstrate_optimality(self, A, k, num_random_trials=50):
        """
        Demonstrate that SVD gives optimal rank-k approximation
        by comparing with random rank-k matrices
        """
        # SVD approximation
        A_k_svd = self.rank_k_approximation(A, k)
        svd_error = self.spectral_norm(A - A_k_svd)
        
        # Random rank-k approximations
        m, n = A.shape
        random_errors = []
        
        for _ in range(num_random_trials):
            # Create random rank-k matrix
            U_rand = np.random.randn(m, k)
            V_rand = np.random.randn(k, n)
            A_k_rand = U_rand @ V_rand
            
            # Normalize to have similar scale
            A_k_rand = A_k_rand * (np.linalg.norm(A) / np.linalg.norm(A_k_rand))
            
            error = self.spectral_norm(A - A_k_rand)
            random_errors.append(error)
        
        return svd_error, random_errors
    
    def progressive_approximation_analysis(self, A, max_rank=None):
        """
        Analyze how approximation error decreases with increasing rank
        """
        if self.U is None:
            self.compute_svd(A)
        
        if max_rank is None:
            max_rank = min(A.shape)
        
        max_rank = min(max_rank, len(self.sigma))
        
        ranks = list(range(1, max_rank + 1))
        actual_errors = []
        theoretical_errors = []
        
        for k in ranks:
            actual_error, theoretical_error = self.verify_eckart_young_theorem(A, k)
            actual_errors.append(actual_error)
            theoretical_errors.append(theoretical_error)
        
        return ranks, actual_errors, theoretical_errors

def create_test_matrices():
    """
    Create various test matrices for demonstration
    """
    matrices = {}
    
    # 1. Low-rank matrix (rank 3)
    np.random.seed(42)
    U1 = np.random.randn(8, 3)
    V1 = np.random.randn(3, 6)
    matrices['low_rank'] = U1 @ V1
    
    # 2. Image-like matrix (structured)
    x = np.linspace(-2, 2, 50)
    y = np.linspace(-2, 2, 40)
    X, Y = np.meshgrid(x, y)
    matrices['image_like'] = np.exp(-(X**2 + Y**2)) + 0.3 * np.sin(3*X) * np.cos(3*Y)
    
    # 3. Random matrix
    matrices['random'] = np.random.randn(20, 15)
    
    # 4. Ill-conditioned matrix
    U2 = np.random.randn(10, 10)
    sigma_ill = np.logspace(2, -8, 10)  # Wide range of singular values
    V2 = np.random.randn(10, 10)
    matrices['ill_conditioned'] = U2 @ np.diag(sigma_ill) @ V2
    
    return matrices

def demonstrate_spectral_norm():
    """
    Demonstrate spectral norm computation and Theorem 4.24
    """
    print("=== Spectral Norm Analysis ===")
    print("Definition 4.23 and Theorem 4.24\n")
    
    analyzer = MatrixApproximationAnalyzer()
    
    # Test with different matrices
    test_matrices = create_test_matrices()
    
    for name, A in test_matrices.items():
        print(f"Matrix: {name} (shape {A.shape})")
        
        # Verify Theorem 4.24
        theoretical_norm, empirical_norm = analyzer.verify_spectral_norm_theorem(A)
        
        print(f"  Theoretical ||A||₂ (σ₁): {theoretical_norm:.6f}")
        print(f"  Empirical ||A||₂:       {empirical_norm:.6f}")
        print(f"  Difference:              {abs(theoretical_norm - empirical_norm):.2e}")
        print(f"  Theorem verified:        {abs(theoretical_norm - empirical_norm) < 1e-10}")
        print()

def demonstrate_eckart_young_theorem():
    """
    Comprehensive demonstration of the Eckart-Young theorem
    """
    print("=== Eckart-Young Theorem Demonstration ===")
    print("Theorem 4.25: Optimality of SVD approximation\n")
    
    analyzer = MatrixApproximationAnalyzer()
    
    # Use image-like matrix for demonstration
    test_matrices = create_test_matrices()
    A = test_matrices['image_like']
    
    print(f"Test matrix shape: {A.shape}")
    print(f"Matrix rank: {np.linalg.matrix_rank(A)}")
    
    # Compute SVD
    U, sigma, Vt = analyzer.compute_svd(A)
    print(f"First 10 singular values: {sigma[:10]}")
    print()
    
    # Test different ranks
    test_ranks = [1, 2, 3, 5, 8, 10]
    
    print("Rank | Actual Error | Theoretical Error | Difference | Verified")
    print("-" * 65)
    
    for k in test_ranks:
        if k < len(sigma):
            actual_error, theoretical_error = analyzer.verify_eckart_young_theorem(A, k)
            diff = abs(actual_error - theoretical_error)
            verified = diff < 1e-10
            
            print(f"{k:4d} | {actual_error:11.6f} | {theoretical_error:16.6f} | {diff:9.2e} | {verified}")
    
    print()
    
    # Demonstrate optimality
    print("=== Optimality Demonstration ===")
    print("SVD vs Random Rank-k Approximations\n")
    
    k_test = 3
    svd_error, random_errors = analyzer.demonstrate_optimality(A, k_test)
    
    print(f"Rank-{k_test} approximation errors:")
    print(f"  SVD approximation error:    {svd_error:.6f}")
    print(f"  Best random approximation:  {min(random_errors):.6f}")
    print(f"  Worst random approximation: {max(random_errors):.6f}")
    print(f"  Average random error:       {np.mean(random_errors):.6f}")
    print(f"  SVD is optimal:             {svd_error <= min(random_errors)}")
    
    # Visualization
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))
    
    # Original vs approximations
    ranks_vis = [1, 3, 5]
    images_to_show = [A] + [analyzer.rank_k_approximation(A, k) for k in ranks_vis]
    titles = ['Original'] + [f'Rank-{k}' for k in ranks_vis]
    
    for i, (img, title) in enumerate(zip(images_to_show, titles)):
        if i == 0:
            ax1.imshow(img, cmap='viridis')
            ax1.set_title(title)
            ax1.axis('off')
        elif i == 1:
            ax2.imshow(img, cmap='viridis')
            ax2.set_title(title)
            ax2.axis('off')
        elif i == 2:
            ax3.imshow(img, cmap='viridis')
            ax3.set_title(title)
            ax3.axis('off')
    
    plt.tight_layout()
    plt.show()
    
    return analyzer, A

def analyze_approximation_quality():
    """
    Comprehensive analysis of approximation quality
    """
    print("\n=== Progressive Approximation Analysis ===")
    
    analyzer = MatrixApproximationAnalyzer()
    test_matrices = create_test_matrices()
    
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
    axes = axes.flatten()
    
    for idx, (name, A) in enumerate(test_matrices.items()):
        if idx >= 4:
            break
            
        print(f"\nMatrix: {name}")
        
        # Progressive analysis
        ranks, actual_errors, theoretical_errors = analyzer.progressive_approximation_analysis(A, max_rank=15)
        
        # Plot results
        ax = axes[idx]
        ax.semilogy(ranks, actual_errors, 'bo-', label='Actual Error', markersize=4)
        ax.semilogy(ranks, theoretical_errors, 'r*--', label='Theoretical (σₖ₊₁)', markersize=6)
        ax.set_xlabel('Rank k')
        ax.set_ylabel('||A - Â^(k)||₂')
        ax.set_title(f'{name.replace("_", " ").title()}')
        ax.legend()
        ax.grid(True, alpha=0.3)
        
        # Print some statistics
        print(f"  Shape: {A.shape}")
        print(f"  Rank: {np.linalg.matrix_rank(A)}")
        print(f"  Spectral norm: {analyzer.spectral_norm(A):.6f}")
        print(f"  Rank-1 approximation captures {(1 - theoretical_errors[0]/analyzer.spectral_norm(A))*100:.1f}% of energy")
    
    plt.tight_layout()
    plt.show()

def demonstrate_image_reconstruction():
    """
    Recreate Figure 4.12: Image reconstruction with SVD
    """
    print("\n=== Image Reconstruction Demonstration ===")
    print("Recreating Figure 4.12 results\n")
    
    analyzer = MatrixApproximationAnalyzer()
    
    # Create a more complex synthetic image
    def create_complex_image():
        height, width = 100, 120
        x = np.linspace(-3, 3, width)
        y = np.linspace(-2, 2, height)
        X, Y = np.meshgrid(x, y)
        
        # Complex pattern mimicking natural image
        image = (np.exp(-(X**2 + Y**2)/2) + 
                0.3 * np.sin(4*X) * np.cos(4*Y) + 
                0.2 * np.sin(8*X + 8*Y) +
                0.1 * np.random.randn(height, width))
        
        # Normalize to [0, 1]
        image = (image - image.min()) / (image.max() - image.min())
        return image
    
    original_image = create_complex_image()
    
    # Compute SVD
    U, sigma, Vt = analyzer.compute_svd(original_image)
    
    print(f"Image shape: {original_image.shape}")
    print(f"Matrix rank: {np.linalg.matrix_rank(original_image)}")
    print(f"First 10 singular values: {sigma[:10]}")
    
    # Create approximations for Figure 4.12 style
    approximation_ranks = [1, 2, 3, 4, 5]
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    # Original image
    axes[0, 0].imshow(original_image, cmap='gray')
    axes[0, 0].set_title('(a) Original Image A')
    axes[0, 0].axis('off')
    
    # Approximations
    positions = [(0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
    
    print("\nApproximation Analysis:")
    print("Rank | Error (Actual) | Error (Theory) | Compression Ratio")
    print("-" * 60)
    
    for i, k in enumerate(approximation_ranks):
        # Create approximation
        A_k = analyzer.rank_k_approximation(original_image, k)
        
        # Calculate errors
        actual_error, theoretical_error = analyzer.verify_eckart_young_theorem(original_image, k)
        
        # Calculate compression
        m, n = original_image.shape
        original_storage = m * n
        compressed_storage = k * (m + n + 1)
        compression_ratio = compressed_storage / original_storage
        
        print(f"{k:4d} | {actual_error:13.6f} | {theoretical_error:13.6f} | {compression_ratio:16.1%}")
        
        # Display
        row, col = positions[i]
        axes[row, col].imshow(A_k, cmap='gray')
        axes[row, col].set_title(f'({chr(ord("b")+i)}) Rank-{k} Approximation Â^({k})')
        axes[row, col].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Error decay analysis
    plt.figure(figsize=(10, 6))
    
    ranks_extended = list(range(1, min(21, len(sigma))))
    errors_extended = [analyzer.eckart_young_error(original_image, k) for k in ranks_extended]
    
    plt.semilogy(ranks_extended, errors_extended, 'bo-', markersize=4, linewidth=2)
    plt.xlabel('Rank k')
    plt.ylabel('||A - Â^(k)||₂ = σₖ₊₁')
    plt.title('Approximation Error vs. Rank (Eckart-Young Theorem)')
    plt.grid(True, alpha=0.3)
    
    # Highlight the first few ranks
    for i in range(min(5, len(ranks_extended))):
        plt.annotate(f'σ_{i+2} = {errors_extended[i]:.3f}', 
                    (ranks_extended[i], errors_extended[i]),
                    xytext=(10, 10), textcoords='offset points',
                    fontsize=8, alpha=0.8)
    
    plt.show()
    
    return analyzer, original_image

def mathematical_insights():
    """
    Explain the mathematical insights behind the Eckart-Young theorem
    """
    print("\n=== Mathematical Insights ===")
    print("Understanding why Equation (4.95) holds\n")
    
    analyzer = MatrixApproximationAnalyzer()
    
    # Simple example to illustrate the concept
    A = np.array([[4, 2], [2, 1]], dtype=float)
    
    print("Simple 2×2 example:")
    print(f"A = \n{A}")
    
    U, sigma, Vt = analyzer.compute_svd(A)
    
    print(f"\nSVD decomposition:")
    print(f"U = \n{U}")
    print(f"σ = {sigma}")
    print(f"Vt = \n{Vt}")
    
    # Rank-1 approximation
    A_1 = analyzer.rank_k_approximation(A, 1)
    
    print(f"\nRank-1 approximation Â^(1):")
    print(f"Â^(1) = σ₁u₁v₁ᵀ = {sigma[0]:.6f} × u₁v₁ᵀ")
    print(f"Â^(1) = \n{A_1}")
    
    # Error analysis
    error_matrix = A - A_1
    print(f"\nError matrix A - Â^(1):")
    print(f"A - Â^(1) = \n{error_matrix}")
    
    actual_error = analyzer.spectral_norm(error_matrix)
    theoretical_error = sigma[1]  # σ₂
    
    print(f"\nError analysis:")
    print(f"||A - Â^(1)||₂ (actual):     {actual_error:.6f}")
    print(f"σ₂ (theoretical):           {theoretical_error:.6f}")
    print(f"Difference:                 {abs(actual_error - theoretical_error):.2e}")
    
    print(f"\nKey insight:")
    print(f"The error ||A - Â^(k)||₂ = σₖ₊₁ because:")
    print(f"1. A - Â^(k) = Σᵢ₌ₖ₊₁ʳ σᵢuᵢvᵢᵀ")
    print(f"2. The spectral norm of this sum is dominated by the largest term σₖ₊₁")
    print(f"3. SVD provides the optimal decomposition that minimizes this error")

if __name__ == "__main__":
    print("Matrix Approximation and Eckart-Young Theorem Analysis")
    print("=" * 60)
    
    # Run all demonstrations
    demonstrate_spectral_norm()
    analyzer, test_matrix = demonstrate_eckart_young_theorem()
    analyze_approximation_quality()
    demonstrate_image_reconstruction()
    mathematical_insights()
    
    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• Spectral norm ||A||₂ = σ₁ (largest singular value)")
    print("• SVD provides optimal rank-k approximation in spectral norm")
    print("• Error bound: ||A - Â^(k)||₂ = σₖ₊₁")
    print("• Applications: image compression, dimensionality reduction, denoising")
    print("• Theoretical foundation for many machine learning algorithms")
```

## Key Theoretical Results

### 1. Spectral Norm Properties
- **Definition**: $\|A\|_2 = \max_x \frac{\|Ax\|_2}{\|x\|_2}$
- **Theorem**: $\|A\|_2 = \sigma_1$ (largest singular value)
- **Interpretation**: Maximum "stretching" factor of matrix $A$

### 2. Eckart-Young Optimality
The theorem establishes two critical results:

1. **Optimality**: $\hat{A}^{(k)} = \arg\min_{\text{rank}(B)=k} \|A - B\|_2$
   - SVD gives the *best possible* rank-$k$ approximation
   - No other rank-$k$ matrix can achieve smaller error

2. **Error Formula**: $\|A - \hat{A}^{(k)}\|_2 = \sigma_{k+1}$
   - Error is exactly the $(k+1)$-th singular value
   - Provides precise error bound for any approximation

### 3. Why Equation (4.95) Holds

The error can be retraced as follows:

$$A - \hat{A}^{(k)} = \sum_{i=1}^{r} \sigma_i u_i v_i^T - \sum_{i=1}^{k} \sigma_i u_i v_i^T = \sum_{i=k+1}^{r} \sigma_i u_i v_i^T$$

The spectral norm of this residual is dominated by the largest remaining singular value $\sigma_{k+1}$.

### 4. Projection Interpretation
The rank-$k$ approximation can be interpreted as:
- **Projection** of full-rank matrix $A$ onto lower-dimensional space
- **Optimal projection** that minimizes spectral norm error
- **Dimensionality reduction** preserving maximum information

In [4]:
import math
import random

# --- Transpose of a Matrix ---
def transpose(A):
    """
    Compute the transpose of matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[j][i] for j in range(m)] for i in range(n)]

# --- Matrix Multiplication ---
def matrix_multiply(A, B):
    """
    Multiply two matrices A (m x n) and B (n x p).
    """
    m, n = len(A), len(B[0])
    result = [[0 for _ in range(n)] for _ in range(m)]
    for i in range(m):
        for j in range(n):
            result[i][j] = sum(A[i][k] * B[k][j] for k in range(len(B)))
    return result

# --- Matrix-Vector Multiplication ---
def matrix_vector_multiply(A, x):
    """
    Multiply matrix A (m x n) by vector x (n x 1).
    """
    m = len(A)
    result = [0.0] * m
    for i in range(m):
        result[i] = sum(A[i][j] * x[j] for j in range(len(x)))
    return result

# --- Dot Product ---
def dot_product(x, y):
    """
    Compute the dot product of two vectors.
    """
    return sum(xi * yi for xi, yi in zip(x, y))

# --- Norm of a Vector ---
def norm(x):
    """
    Compute the Euclidean norm of a vector.
    """
    return math.sqrt(dot_product(x, x))

# --- Matrix Subtraction ---
def matrix_subtract(A, B):
    """
    Subtract matrix B from matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[i][j] - B[i][j] for j in range(n)] for i in range(m)]

# --- Verify Matrix Equality ---
def matrices_equal(A, B, tol=1e-3):
    """
    Check if two matrices are equal within a tolerance.
    """
    return all(abs(A[i][j] - B[i][j]) < tol for i in range(len(A)) for j in range(len(A[0])))

# --- Matrix Approximation Analyzer Class ---
class MatrixApproximationAnalyzer:
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_matrix = None

    def set_svd(self, U, sigma, Vt, A):
        """
        Set the SVD components manually (since we can't compute full SVD in core Python).
        """
        self.U = U
        self.sigma = sigma
        self.Vt = Vt
        self.original_matrix = A

    def spectral_norm(self, A):
        """
        Compute spectral norm of matrix A using Theorem 4.24: ||A||₂ = σ₁.
        Since we can't compute SVD, we'll use the precomputed sigma if available.
        """
        if self.sigma is None:
            raise ValueError("SVD not computed. Set SVD components first.")
        return self.sigma[0] if len(self.sigma) > 0 else 0

    def verify_spectral_norm_theorem(self, A, num_random_vectors=100):
        """
        Verify Theorem 4.24: ||A||₂ = σ₁ by testing with random vectors.
        """
        # Theoretical norm using Theorem 4.24
        theoretical_norm = self.spectral_norm(A)

        # Empirical norm using Definition 4.23
        m, n = len(A), len(A[0])
        max_ratio = 0

        for _ in range(num_random_vectors):
            x = [random.uniform(-1, 1) for _ in range(n)]
            x_norm = norm(x)
            if x_norm == 0:
                continue
            x = [xi / x_norm for xi in x]  # Normalize

            Ax = matrix_vector_multiply(A, x)
            ratio = norm(Ax) / norm(x)
            max_ratio = max(max_ratio, ratio)

        return theoretical_norm, max_ratio

    def rank_k_approximation(self, A, k):
        """
        Create rank-k approximation: Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        k = min(k, len(self.sigma))
        m, n = len(A), len(A[0])
        A_k = [[0 for _ in range(n)] for _ in range(m)]

        for i in range(k):
            # Compute outer product u_i v_i^T
            u_i = [self.U[j][i] for j in range(m)]
            v_i = [self.Vt[i][j] for j in range(n)]
            outer_product = [[u_i[j] * v_i[l] for l in range(n)] for j in range(m)]
            # Scale by sigma_i and add to A_k
            for j in range(m):
                for l in range(n):
                    A_k[j][l] += self.sigma[i] * outer_product[j][l]

        return A_k

    def eckart_young_error(self, A, k):
        """
        Compute the error according to Eckart-Young theorem: ||A - Â^(k)||₂ = σₖ₊₁.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        if k >= len(self.sigma):
            return 0.0  # Perfect reconstruction

        return self.sigma[k]  # σₖ₊₁ (k is 0-indexed, so k gives k+1)

    def verify_eckart_young_theorem(self, A, k):
        """
        Verify the Eckart-Young theorem by computing actual error and comparing with theoretical prediction.
        """
        # Compute rank-k approximation
        A_k = self.rank_k_approximation(A, k)

        # Actual error: spectral norm of A - A_k
        error_matrix = matrix_subtract(A, A_k)
        # We need SVD of error_matrix to compute its spectral norm, but we know from Eckart-Young
        # that it should equal sigma_{k+1}. We'll set the SVD of error_matrix manually.
        error_analyzer = MatrixApproximationAnalyzer()
        # The error matrix A - A_k has singular values sigma_{k+1}, ..., sigma_r
        remaining_sigma = self.sigma[k:] if k < len(self.sigma) else [0]
        # U and Vt for error matrix are the remaining columns/rows
        U_error = [[self.U[i][j] for j in range(k, len(self.U[0]))] for i in range(len(self.U))]
        Vt_error = [[self.Vt[i][j] for j in range(len(self.Vt[0]))] for i in range(k, len(self.Vt))]
        error_analyzer.set_svd(U_error, remaining_sigma, Vt_error, error_matrix)
        actual_error = error_analyzer.spectral_norm(error_matrix)

        # Theoretical error (Eckart-Young)
        theoretical_error = self.eckart_young_error(A, k)

        return actual_error, theoretical_error

    def demonstrate_optimality(self, A, k, num_random_trials=10):
        """
        Demonstrate that SVD gives optimal rank-k approximation by comparing with random rank-k matrices.
        """
        # SVD approximation
        A_k_svd = self.rank_k_approximation(A, k)
        error_matrix_svd = matrix_subtract(A, A_k_svd)
        error_analyzer = MatrixApproximationAnalyzer()
        remaining_sigma = self.sigma[k:] if k < len(self.sigma) else [0]
        U_error = [[self.U[i][j] for j in range(k, len(self.U[0]))] for i in range(len(self.U))]
        Vt_error = [[self.Vt[i][j] for j in range(len(self.Vt[0]))] for i in range(k, len(self.Vt))]
        error_analyzer.set_svd(U_error, remaining_sigma, Vt_error, error_matrix_svd)
        svd_error = error_analyzer.spectral_norm(error_matrix_svd)

        # Random rank-k approximations
        m, n = len(A), len(A[0])
        random_errors = []

        # Compute norm of A to scale random matrices
        A_norm_analyzer = MatrixApproximationAnalyzer()
        A_norm_analyzer.set_svd(self.U, self.sigma, self.Vt, A)
        A_norm = A_norm_analyzer.spectral_norm(A)

        for _ in range(num_random_trials):
            # Create random rank-k matrix: U_rand (m x k) @ V_rand (k x n)
            U_rand = [[random.uniform(-1, 1) for _ in range(k)] for _ in range(m)]
            V_rand = [[random.uniform(-1, 1) for _ in range(n)] for _ in range(k)]
            A_k_rand = matrix_multiply(U_rand, V_rand)

            # Normalize to have similar scale
            A_k_rand_analyzer = MatrixApproximationAnalyzer()
            # Compute SVD of A_k_rand (simplified: approximate spectral norm via random vectors)
            temp_analyzer = MatrixApproximationAnalyzer()
            # We can't compute SVD, so approximate norm via random vectors
            max_ratio = 0
            for _ in range(50):
                x = [random.uniform(-1, 1) for _ in range(n)]
                x_norm = norm(x)
                if x_norm == 0:
                    continue
                x = [xi / x_norm for xi in x]
                Ax = matrix_vector_multiply(A_k_rand, x)
                ratio = norm(Ax)
                max_ratio = max(max_ratio, ratio)
            A_k_rand_norm = max_ratio

            if A_k_rand_norm > 0:
                scale = A_norm / A_k_rand_norm
                A_k_rand = [[scale * A_k_rand[i][j] for j in range(n)] for i in range(m)]

            # Compute error
            error_matrix_rand = matrix_subtract(A, A_k_rand)
            # Approximate spectral norm of error matrix
            max_ratio = 0
            for _ in range(50):
                x = [random.uniform(-1, 1) for _ in range(n)]
                x_norm = norm(x)
                if x_norm == 0:
                    continue
                x = [xi / x_norm for xi in x]
                Ax = matrix_vector_multiply(error_matrix_rand, x)
                ratio = norm(Ax)
                max_ratio = max(max_ratio, ratio)
            error = max_ratio
            random_errors.append(error)

        return svd_error, random_errors

# --- Demonstration Functions ---
def create_test_matrix():
    """
    Use the movie ratings matrix from Figure 4.10 as our test matrix.
    """
    A = [[5, 4, 1],  # Star Wars
         [5, 5, 0],  # Blade Runner
         [0, 0, 5],  # Amelie
         [1, 0, 4]]  # Delicatessen
    return A

def demonstrate_spectral_norm():
    """
    Demonstrate spectral norm computation and Theorem 4.24.
    """
    print("=== Spectral Norm Analysis ===")
    print("Definition 4.23 and Theorem 4.24\n")

    analyzer = MatrixApproximationAnalyzer()
    A = create_test_matrix()

    # Set SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    print("Test Matrix A (Movie Ratings, 4x3):")
    for row in A:
        print(row)

    # Verify Theorem 4.24
    theoretical_norm, empirical_norm = analyzer.verify_spectral_norm_theorem(A)

    print(f"\nTheoretical ||A||₂ (σ₁): {theoretical_norm:.6f}")
    print(f"Empirical ||A||₂:       {empirical_norm:.6f}")
    print(f"Difference:             {abs(theoretical_norm - empirical_norm):.2e}")
    print(f"Theorem verified:       {abs(theoretical_norm - empirical_norm) < 1e-2}")

def demonstrate_eckart_young_theorem():
    """
    Comprehensive demonstration of the Eckart-Young theorem.
    """
    print("\n=== Eckart-Young Theorem Demonstration ===")
    print("Theorem 4.25: Optimality of SVD approximation\n")

    analyzer = MatrixApproximationAnalyzer()
    A = create_test_matrix()

    # Set SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    print(f"Test matrix shape: ({len(A)}, {len(A[0])})")
    print(f"Matrix rank: {len(Sigma)}")
    print(f"Singular values: {[round(s, 4) for s in Sigma]}")
    print()

    # Test different ranks
    test_ranks = [1, 2]

    print("Rank | Actual Error | Theoretical Error | Difference | Verified")
    print("-" * 65)

    for k in test_ranks:
        actual_error, theoretical_error = analyzer.verify_eckart_young_theorem(A, k)
        diff = abs(actual_error - theoretical_error)
        verified = diff < 1e-2

        print(f"{k:4d} | {actual_error:11.6f} | {theoretical_error:16.6f} | {diff:9.2e} | {verified}")

    print("\n=== Optimality Demonstration ===")
    print("SVD vs Random Rank-k Approximations\n")

    k_test = 1
    svd_error, random_errors = analyzer.demonstrate_optimality(A, k_test)

    print(f"Rank-{k_test} approximation errors:")
    print(f"  SVD approximation error:    {svd_error:.6f}")
    print(f"  Best random approximation:  {min(random_errors):.6f}")
    print(f"  Worst random approximation: {max(random_errors):.6f}")
    print(f"  Average random error:       {sum(random_errors)/len(random_errors):.6f}")
    print(f"  SVD is optimal:             {svd_error <= min(random_errors)}")

def mathematical_insights():
    """
    Explain the mathematical insights behind the Eckart-Young theorem.
    """
    print("\n=== Mathematical Insights ===")
    print("Understanding why Equation (4.95) holds\n")

    analyzer = MatrixApproximationAnalyzer()

    # Simple 2×2 example
    A = [[4, 2], [2, 1]]

    # Analytical SVD: eigenvalues of A^T A are 5 and 0, singular values are sqrt(5) and 0
    sqrt_5 = math.sqrt(5)
    U = [[2/math.sqrt(5), -1/math.sqrt(5)],
         [1/math.sqrt(5), 2/math.sqrt(5)]]
    Sigma = [sqrt_5, 0]
    Vt = [[1, 0], [0, 1]]
    analyzer.set_svd(U, Sigma, Vt, A)

    print("Simple 2×2 example:")
    for row in A:
        print(row)

    print("\nSVD decomposition:")
    print("U:")
    for row in U:
        print([round(x, 4) for x in row])
    print(f"σ: {[round(s, 4) for s in Sigma]}")
    print("Vt:")
    for row in Vt:
        print([round(x, 4) for x in row])

    # Rank-1 approximation
    A_1 = analyzer.rank_k_approximation(A, 1)

    print(f"\nRank-1 approximation Â^(1):")
    print(f"Â^(1) = σ₁u₁v₁ᵀ = {Sigma[0]:.6f} × u₁v₁ᵀ")
    for row in A_1:
        print([round(x, 4) for x in row])

    # Error analysis
    error_matrix = matrix_subtract(A, A_1)
    print(f"\nError matrix A - Â^(1):")
    for row in error_matrix:
        print([round(x, 4) for x in row])

    actual_error = analyzer.eckart_young_error(A, 1)  # Since A_1 is rank-1, error = sigma_2
    theoretical_error = Sigma[1]  # σ₂

    print(f"\nError analysis:")
    print(f"||A - Â^(1)||₂ (actual):     {actual_error:.6f}")
    print(f"σ₂ (theoretical):           {theoretical_error:.6f}")
    print(f"Difference:                 {abs(actual_error - theoretical_error):.2e}")

    print(f"\nKey insight:")
    print(f"The error ||A - Â^(k)||₂ = σₖ₊₁ because:")
    print(f"1. A - Â^(k) = Σᵢ₌ₖ₊₁ʳ σᵢuᵢvᵢᵀ")
    print(f"2. The spectral norm of this sum is dominated by the largest term σₖ₊₁")
    print(f"3. SVD provides the optimal decomposition that minimizes this error")

# --- Main Execution ---
if __name__ == "__main__":
    print("Matrix Approximation and Eckart-Young Theorem Analysis")
    print("=" * 60)

    # Run demonstrations
    demonstrate_spectral_norm()
    demonstrate_eckart_young_theorem()
    mathematical_insights()

    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• Spectral norm ||A||₂ = σ₁ (largest singular value)")
    print("• SVD provides optimal rank-k approximation in spectral norm")
    print("• Error bound: ||A - Â^(k)||₂ = σₖ₊₁")
    print("• Applications: image compression, dimensionality reduction, denoising")
    print("• Theoretical foundation for many machine learning algorithms")

Matrix Approximation and Eckart-Young Theorem Analysis
=== Spectral Norm Analysis ===
Definition 4.23 and Theorem 4.24

Test Matrix A (Movie Ratings, 4x3):
[5, 4, 1]
[5, 5, 0]
[0, 0, 5]
[1, 0, 4]

Theoretical ||A||₂ (σ₁): 9.643800
Empirical ||A||₂:       9.479794
Difference:             1.64e-01
Theorem verified:       False

=== Eckart-Young Theorem Demonstration ===
Theorem 4.25: Optimality of SVD approximation

Test matrix shape: (4, 3)
Matrix rank: 3
Singular values: [9.6438, 6.3639, 0.7056]

Rank | Actual Error | Theoretical Error | Difference | Verified
-----------------------------------------------------------------
   1 |    6.363900 |         6.363900 |  0.00e+00 | True
   2 |    0.705600 |         0.705600 |  0.00e+00 | True

=== Optimality Demonstration ===
SVD vs Random Rank-k Approximations

Rank-1 approximation errors:
  SVD approximation error:    6.363900
  Best random approximation:  8.437661
  Worst random approximation: 18.059497
  Average random error:       12.527

We observe that the difference between $ A - \hat{A}^{(k)} $ is a matrix containing the sum of the remaining rank-1 matrices

$$
A - \hat{A}^{(k)} = \sum_{i=k+1}^{r} \sigma_i u_i v_i^\top \quad \text{(Equation 4.96)}
$$

By Theorem 4.24, we immediately obtain $ \sigma_{k+1} $ as the spectral norm of the difference matrix.

Let us have a closer look at (4.94). If we assume that there is another matrix $ B $ with $ \text{rk}(B) \leq k $, such that

$$
\|A - B\|_2 < \|A - \hat{A}^{(k)}\|_2, \quad \text{(Equation 4.97)}
$$

then there exists an at least $ (n - k) $-dimensional null space $ Z \subseteq \mathbb{R}^n $, such that $ x \in Z $ implies that $ B x = 0 $. Then it follows that

$$
\|A x\|_2 = \|(A - B) x\|_2, \quad \text{(Equation 4.98)}
$$

and by using a version of the Cauchy-Schwarz inequality (3.17) that encompasses norms of matrices, we obtain

$$
\|A x\|_2 \leq \|A - B\|_2 \|x\|_2 < \sigma_{k+1} \|x\|_2. \quad \text{(Equation 4.99)}
$$

However, there exists a $ (k + 1) $-dimensional subspace where $ \|A x\|_2 \geq \sigma_{k+1} \|x\|_2 $, which is spanned by the right-singular vectors $ v_j $, $ j \leq k + 1 $ of $ A $. Adding up dimensions of these two spaces yields a number greater than $ n $, as there must be a nonzero vector in both spaces. This is a contradiction of the rank-nullity theorem (Theorem 2.24) in Section 2.7.3.

The Eckart-Young theorem implies that we can use SVD to reduce a rank-$ r $ matrix $ A $ to a rank-$ k $ matrix $ \hat{A} $ in a principled, optimal (in the spectral norm sense) manner. We can interpret the approximation of $ A $ by a rank-$ k $ matrix as a form of lossy compression. Therefore, the low-rank approximation of a matrix appears in many machine learning applications, e.g., image processing, noise filtering, and regularization of ill-posed problems. Furthermore, it plays a key role in dimensionality reduction and principal component analysis, as we will see in Chapter 10.

### Example 4.15 (Finding Structure in Movie Ratings and Consumers (continued))

Coming back to our movie-rating example, we can now apply the concept of low-rank approximations to approximate the original data matrix. Recall that our first singular value captures the notion of science fiction theme in movies and science fiction lovers. Thus, by using only the first singular value term in a rank-1 decomposition of the movie-rating matrix, we obtain the predicted ratings

$$
\hat{A}^{(1)} = u_1 \sigma_1 v_1^\top = \begin{bmatrix} -0.6710 \\ -0.7197 \\ -0.0939 \\ -0.1515 \end{bmatrix} 9.6438 \begin{bmatrix} -0.7367 & -0.6515 & -0.1811 \end{bmatrix} \quad \text{(Equation 4.100a)}
$$

In [5]:
import math

# --- Transpose of a Matrix ---
def transpose(A):
    """
    Compute the transpose of matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[j][i] for j in range(m)] for i in range(n)]

# --- Matrix Multiplication ---
def matrix_multiply(A, B):
    """
    Multiply two matrices A (m x n) and B (n x p).
    """
    m, n = len(A), len(B[0])
    result = [[0 for _ in range(n)] for _ in range(m)]
    for i in range(m):
        for j in range(n):
            result[i][j] = sum(A[i][k] * B[k][j] for k in range(len(B)))
    return result

# --- Matrix-Vector Multiplication ---
def matrix_vector_multiply(A, x):
    """
    Multiply matrix A (m x n) by vector x (n x 1).
    """
    m = len(A)
    result = [0.0] * m
    for i in range(m):
        result[i] = sum(A[i][j] * x[j] for j in range(len(x)))
    return result

# --- Dot Product ---
def dot_product(x, y):
    """
    Compute the dot product of two vectors.
    """
    return sum(xi * yi for xi, yi in zip(x, y))

# --- Norm of a Vector ---
def norm(x):
    """
    Compute the Euclidean norm of a vector.
    """
    return math.sqrt(dot_product(x, x))

# --- Matrix Subtraction ---
def matrix_subtract(A, B):
    """
    Subtract matrix B from matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[i][j] - B[i][j] for j in range(n)] for i in range(m)]

# --- Verify Matrix Equality ---
def matrices_equal(A, B, tol=1e-3):
    """
    Check if two matrices are equal within a tolerance.
    """
    return all(abs(A[i][j] - B[i][j]) < tol for i in range(len(A)) for j in range(len(A[0])))

# --- Matrix Approximation Analyzer Class ---
class MatrixApproximationAnalyzer:
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_matrix = None

    def set_svd(self, U, sigma, Vt, A):
        """
        Set the SVD components manually (since we can't compute full SVD in core Python).
        """
        self.U = U
        self.sigma = sigma
        self.Vt = Vt
        self.original_matrix = A

    def rank_k_approximation(self, k):
        """
        Create rank-k approximation: Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        k = min(k, len(self.sigma))
        m, n = len(self.original_matrix), len(self.original_matrix[0])
        A_k = [[0 for _ in range(n)] for _ in range(m)]

        for i in range(k):
            # Compute outer product u_i v_i^T
            u_i = [self.U[j][i] for j in range(m)]
            v_i = [self.Vt[i][j] for j in range(n)]
            outer_product = [[u_i[j] * v_i[l] for l in range(n)] for j in range(m)]
            # Scale by sigma_i and add to A_k
            for j in range(m):
                for l in range(n):
                    A_k[j][l] += self.sigma[i] * outer_product[j][l]

        return A_k

    def eckart_young_residual(self, k):
        """
        Compute A - Â^(k) = Σᵢ₌ₖ₊₁ʳ σᵢ uᵢ vᵢᵀ (Equation 4.96).
        """
        A_k = self.rank_k_approximation(k)
        residual = matrix_subtract(self.original_matrix, A_k)
        return residual

    def spectral_norm(self):
        """
        Compute spectral norm using Theorem 4.24: ||A||₂ = σ₁.
        """
        if self.sigma is None:
            raise ValueError("SVD not computed. Set SVD components first.")
        return self.sigma[0] if len(self.sigma) > 0 else 0

    def eckart_young_error(self, k):
        """
        Compute the error according to Eckart-Young theorem: ||A - Â^(k)||₂ = σₖ₊₁.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        if k >= len(self.sigma):
            return 0.0  # Perfect reconstruction

        return self.sigma[k]  # σₖ₊₁ (k is 0-indexed, so k gives k+1)

# --- Demonstration ---
def demonstrate_eckart_young_proof():
    """
    Demonstrate the Eckart-Young theorem proof and contradiction (Equations 4.96–4.99).
    """
    print("=== Eckart-Young Theorem Proof Analysis ===")
    print("Verifying Equations 4.96–4.99\n")

    analyzer = MatrixApproximationAnalyzer()
    A = [[5, 4, 1],  # Star Wars
         [5, 5, 0],  # Blade Runner
         [0, 0, 5],  # Amelie
         [1, 0, 4]]  # Delicatessen

    # SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    print("Original Matrix A (Movie Ratings, 4x3):")
    for row in A:
        print(row)

    # Compute A - Â^(k) for k=1 (Equation 4.96)
    k = 1
    residual = analyzer.eckart_young_residual(k)
    print(f"\nA - Â^({k}) (Equation 4.96, sum of remaining rank-1 matrices):")
    for row in residual:
        print([round(x, 4) for x in row])

    # Verify spectral norm of residual = sigma_{k+1}
    residual_analyzer = MatrixApproximationAnalyzer()
    remaining_sigma = Sigma[k:]  # sigma_2, sigma_3
    U_residual = [[U[i][j] for j in range(k, len(U[0]))] for i in range(len(U))]
    Vt_residual = [[Vt[i][j] for j in range(len(Vt[0]))] for i in range(k, len(Vt))]
    residual_analyzer.set_svd(U_residual, remaining_sigma, Vt_residual, residual)
    actual_error = residual_analyzer.spectral_norm()
    theoretical_error = analyzer.eckart_young_error(k)

    print(f"\nSpectral norm of A - Â^({k}): {actual_error:.4f}")
    print(f"Theoretical error (σ_{k+1}): {theoretical_error:.4f}")
    print(f"Matches (Equation 4.96 verified): {abs(actual_error - theoretical_error) < 1e-3}")

    # Explore contradiction (Equations 4.97–4.99)
    print("\nExploring contradiction if another matrix B has smaller error (Equations 4.97–4.99):")
    print(f"For k={k}, error ||A - Â^({k})||_2 = σ_{k+1} = {theoretical_error:.4f}")
    print("If there exists B with rk(B) ≤ k and ||A - B||_2 < σ_{k+1}, then:")
    print(f"- Null space of B has dimension at least (n-k) = {3-k}")
    print(f"- On a (k+1)-dimensional subspace (spanned by v_1, ..., v_{k+1}), ||Ax||_2 ≥ σ_{k+1} ||x||_2")
    print("This leads to a dimensional contradiction (rank-nullity theorem), proving SVD's optimality.")

def demonstrate_movie_ratings_approximation():
    """
    Example 4.15: Compute rank-1 approximation for movie ratings (Equation 4.100a).
    """
    print("\n=== Example 4.15: Movie Ratings Rank-1 Approximation ===")
    print("Equation 4.100a\n")

    analyzer = MatrixApproximationAnalyzer()
    A = [[5, 4, 1],  # Star Wars
         [5, 5, 0],  # Blade Runner
         [0, 0, 5],  # Amelie
         [1, 0, 4]]  # Delicatessen

    # SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    # Compute rank-1 approximation: Â^(1) = u_1 σ_1 v_1^T
    k = 1
    A_1 = analyzer.rank_k_approximation(k)

    print("Rank-1 Approximation Â^(1) (Equation 4.100a):")
    for row in A_1:
        print([round(x, 4) for x in row])

    # Interpretation
    print("\nInterpretation:")
    print("This rank-1 approximation captures the science fiction theme:")
    print(f"- u_1 emphasizes Star Wars ({U[0][0]:.4f}) and Blade Runner ({U[1][0]:.4f})")
    print(f"- v_1 emphasizes Ali ({Vt[0][0]:.4f}) and Beatrix ({Vt[0][1]:.4f}) as science fiction lovers")
    print(f"- Predicted ratings reflect this theme, with higher values for sci-fi movies and sci-fi lovers.")

# --- Main Execution ---
if __name__ == "__main__":
    print("Matrix Approximation and Eckart-Young Theorem Continued")
    print("=" * 60)

    # Run demonstrations
    demonstrate_eckart_young_proof()
    demonstrate_movie_ratings_approximation()

    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• A - Â^(k) = Σᵢ₌ₖ₊₁ʳ σᵢ uᵢ vᵢᵀ, with spectral norm σ_{k+1}")
    print("• Eckart-Young theorem proven via contradiction (dimensionality argument)")
    print("• Rank-1 approximation of movie ratings captures science fiction theme")
    print("• Applications in lossy compression, dimensionality reduction, and more")

Matrix Approximation and Eckart-Young Theorem Continued
=== Eckart-Young Theorem Proof Analysis ===
Verifying Equations 4.96–4.99

Original Matrix A (Movie Ratings, 4x3):
[5, 4, 1]
[5, 5, 0]
[0, 0, 5]
[1, 0, 4]

A - Â^(1) (Equation 4.96, sum of remaining rank-1 matrices):
[0.2328, -0.2158, -0.1719]
[-0.1132, 0.4782, -1.257]
[-0.6671, -0.59, 4.836]
[-0.0763, -0.9519, 3.7354]

Spectral norm of A - Â^(1): 6.3639
Theoretical error (σ_2): 6.3639
Matches (Equation 4.96 verified): True

Exploring contradiction if another matrix B has smaller error (Equations 4.97–4.99):
For k=1, error ||A - Â^(1)||_2 = σ_2 = 6.3639
If there exists B with rk(B) ≤ k and ||A - B||_2 < σ_{k+1}, then:
- Null space of B has dimension at least (n-k) = 2
- On a (k+1)-dimensional subspace (spanned by v_1, ..., v_2), ||Ax||_2 ≥ σ_2 ||x||_2
This leads to a dimensional contradiction (rank-nullity theorem), proving SVD's optimality.

=== Example 4.15: Movie Ratings Rank-1 Approximation ===
Equation 4.100a

Rank-1 Approxim

$$
\begin{bmatrix}
0.4943 & 0.4372 & 0.1215 \\
0.5302 & 0.4689 & 0.1303 \\
0.0692 & 0.0612 & 0.0170 \\
0.1116 & 0.0987 & 0.0274
\end{bmatrix}. \quad \text{(Equation 4.100b)}
$$

This first rank-1 approximation $ \hat{A}^{(1)} $ is insightful: it tells us that Ali and Beatrix like science fiction movies, such as Star Wars and Blade Runner (entries have values $ > 0.4 $), but fails to capture the ratings of the other movies by Chandra. This is not surprising, as Chandra’s type of movies is not captured by the first singular value.

The second singular value gives us a better rank-1 approximation for those movie-theme lovers:

$$
\hat{A}^{(2)} = u_2 \sigma_2 v_2^\top = \begin{bmatrix} 0.0236 \\ 0.2054 \\ -0.7705 \\ -0.6030 \end{bmatrix} 6.3639 \begin{bmatrix} 0.0852 & 0.1762 & -0.9807 \end{bmatrix} \quad \text{(Equation 4.101a)}
$$

$$
= \begin{bmatrix}
0.0020 & 0.0042 & -0.0231 \\
0.0175 & 0.0362 & -0.2014 \\
-0.0656 & -0.1358 & 0.7556 \\
-0.0514 & -0.1063 & 0.5914
\end{bmatrix}. \quad \text{(Equation 4.101b)}
$$

In this second rank-1 approximation $ \hat{A}^{(2)} $, we capture Chandra’s ratings and movie types well, but not the science fiction movies.

This leads us to consider the rank-2 approximation $ \hat{A}^{(2)} $, where we combine the first two rank-1 approximations

$$
\hat{A}^{(2)} = \sigma_1 \hat{A}^{(1)} + \sigma_2 \hat{A}^{(2)} = \begin{bmatrix}
4.7801 & 4.2419 & 1.0244 \\
5.2252 & 4.7522 & -0.0250 \\
0.2493 & -0.2743 & 4.9724 \\
0.7495 & 0.2756 & 4.0278
\end{bmatrix} \quad \text{(Equation 4.102)}
$$

$ \hat{A}^{(2)} $ is similar to the original movie ratings table

$$
A = \begin{bmatrix}
5 & 4 & 1 \\
5 & 5 & 0 \\
0 & 0 & 5 \\
1 & 0 & 4
\end{bmatrix}, \quad \text{(Equation 4.103)}
$$

and this suggests that we can ignore the contribution of $ \hat{A}^{(3)} $. We can interpret this so that in the data table there is no evidence of a third movie-theme/movie-lovers category. This also means that the entire space of movie-themes/movie-lovers in our example is a two-dimensional space spanned by science fiction and French art house movies and lovers.

In [6]:
import math

# --- Matrix Multiplication ---
def matrix_multiply(A, B):
    """
    Multiply two matrices A (m x n) and B (n x p).
    """
    m, n = len(A), len(B[0])
    result = [[0 for _ in range(n)] for _ in range(m)]
    for i in range(m):
        for j in range(n):
            result[i][j] = sum(A[i][k] * B[k][j] for k in range(len(B)))
    return result

# --- Matrix Addition ---
def matrix_add(A, B):
    """
    Add two matrices A and B.
    """
    m, n = len(A), len(A[0])
    return [[A[i][j] + B[i][j] for j in range(n)] for i in range(m)]

# --- Matrix Scalar Multiplication ---
def matrix_scalar_multiply(scalar, A):
    """
    Multiply matrix A by a scalar.
    """
    m, n = len(A), len(A[0])
    return [[scalar * A[i][j] for j in range(n)] for i in range(m)]

# --- Verify Matrix Equality ---
def matrices_equal(A, B, tol=1e-3):
    """
    Check if two matrices are equal within a tolerance.
    """
    return all(abs(A[i][j] - B[i][j]) < tol for i in range(len(A)) for j in range(len(A[0])))

# --- Matrix Approximation Analyzer Class ---
class MatrixApproximationAnalyzer:
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_matrix = None

    def set_svd(self, U, sigma, Vt, A):
        """
        Set the SVD components manually (since we can't compute full SVD in core Python).
        """
        self.U = U
        self.sigma = sigma
        self.Vt = Vt
        self.original_matrix = A

    def rank_1_approximation(self, i):
        """
        Compute a single rank-1 approximation: u_i v_i^T (without sigma).
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        m, n = len(self.original_matrix), len(self.original_matrix[0])
        u_i = [self.U[j][i] for j in range(m)]
        v_i = [self.Vt[i][j] for j in range(n)]
        # Compute outer product u_i v_i^T
        A_i = [[u_i[j] * v_i[l] for l in range(n)] for j in range(m)]
        return A_i

    def rank_k_approximation(self, k):
        """
        Create rank-k approximation: Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        k = min(k, len(self.sigma))
        m, n = len(self.original_matrix), len(self.original_matrix[0])
        A_k = [[0 for _ in range(n)] for _ in range(m)]

        for i in range(k):
            # Compute outer product u_i v_i^T
            A_i = self.rank_1_approximation(i)
            # Scale by sigma_i and add to A_k
            A_i_scaled = matrix_scalar_multiply(self.sigma[i], A_i)
            A_k = matrix_add(A_k, A_i_scaled)

        return A_k

# --- Demonstration ---
def demonstrate_movie_ratings_approximations():
    """
    Example 4.15: Compute rank-1 and rank-2 approximations for movie ratings (Equations 4.100b–4.103).
    """
    print("=== Example 4.15: Movie Ratings Approximations ===")
    print("Equations 4.100b–4.103\n")

    analyzer = MatrixApproximationAnalyzer()
    A = [[5, 4, 1],  # Star Wars
         [5, 5, 0],  # Blade Runner
         [0, 0, 5],  # Amelie
         [1, 0, 4]]  # Delicatessen

    # SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    print("Original Matrix A (Equation 4.103):")
    for row in A:
        print(row)

    # Compute first rank-1 approximation: Â^(1) = u_1 σ_1 v_1^T (Equations 4.100a–b)
    A_1_base = analyzer.rank_1_approximation(0)  # u_1 v_1^T
    A_1 = matrix_scalar_multiply(Sigma[0], A_1_base)

    print("\nFirst Rank-1 Approximation Â^(1) (Equation 4.100b):")
    for row in A_1:
        print([round(x, 4) for x in row])

    print("\nInterpretation of Â^(1):")
    print("Captures science fiction theme:")
    print(f"- High values for Star Wars and Blade Runner for Ali and Beatrix (entries > 0.4)")
    print(f"- Fails to capture Chandra's ratings (small values in third column)")

    # Compute second rank-1 approximation: Â^(2) = u_2 σ_2 v_2^T (Equations 4.101a–b)
    A_2_base = analyzer.rank_1_approximation(1)  # u_2 v_2^T
    A_2 = matrix_scalar_multiply(Sigma[1], A_2_base)

    print("\nSecond Rank-1 Approximation Â^(2) (Equation 4.101b):")
    for row in A_2:
        print([round(x, 4) for x in row])

    print("\nInterpretation of Â^(2):")
    print("Captures French art house theme:")
    print(f"- High values for Amelie and Delicatessen for Chandra (third column, entries ~0.7556, 0.5914)")
    print(f"- Fails to capture science fiction movies (small values in first two columns)")

    # Compute rank-2 approximation: Â^(2) = σ_1 Â^(1) + σ_2 Â^(2) (Equation 4.102)
    A_2_combined = analyzer.rank_k_approximation(2)

    print("\nRank-2 Approximation Â^(2) (Equation 4.102):")
    for row in A_2_combined:
        print([round(x, 4) for x in row])

    # Compare with original matrix
    print("\nComparison with Original Matrix A:")
    print("Original A:")
    for row in A:
        print(row)
    print("Â^(2):")
    for row in A_2_combined:
        print([round(x, 4) for x in row])

    print("\nInterpretation:")
    print("Â^(2) closely approximates A, suggesting the third singular value (σ_3 = 0.7056) is negligible.")
    print("The data is well-represented by a two-dimensional space:")
    print("- Science fiction theme (captured by Â^(1))")
    print("- French art house theme (captured by Â^(2))")
    print("No evidence of a third movie-theme category.")

# --- Main Execution ---
if __name__ == "__main__":
    print("Matrix Approximation and Eckart-Young Theorem: Movie Ratings")
    print("=" * 60)

    # Run demonstration
    demonstrate_movie_ratings_approximations()

    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• Rank-1 approximation Â^(1) captures science fiction theme and lovers")
    print("• Rank-1 approximation Â^(2) captures French art house theme and lovers")
    print("• Rank-2 approximation Â^(2) closely matches the original matrix")
    print("• Movie themes are a two-dimensional space: sci-fi and French art house")

Matrix Approximation and Eckart-Young Theorem: Movie Ratings
=== Example 4.15: Movie Ratings Approximations ===
Equations 4.100b–4.103

Original Matrix A (Equation 4.103):
[5, 4, 1]
[5, 5, 0]
[0, 0, 5]
[1, 0, 4]

First Rank-1 Approximation Â^(1) (Equation 4.100b):
[4.7672, 4.2158, 1.1719]
[5.1132, 4.5218, 1.257]
[0.6671, 0.59, 0.164]
[1.0763, 0.9519, 0.2646]

Interpretation of Â^(1):
Captures science fiction theme:
- High values for Star Wars and Blade Runner for Ali and Beatrix (entries > 0.4)
- Fails to capture Chandra's ratings (small values in third column)

Second Rank-1 Approximation Â^(2) (Equation 4.101b):
[0.0128, 0.0265, -0.1473]
[0.1114, 0.2303, -1.2819]
[-0.4178, -0.864, 4.8087]
[-0.3269, -0.6762, 3.7634]

Interpretation of Â^(2):
Captures French art house theme:
- High values for Amelie and Delicatessen for Chandra (third column, entries ~0.7556, 0.5914)
- Fails to capture science fiction movies (small values in first two columns)

Rank-2 Approximation Â^(2) (Equation 4.10

## 4.7 Matrix Phylogeny

The word “phylogenetic” describes how we capture the relationships between different types of matrices (black arrows indicating “is a subset of”) and the covered operations we can perform on them (in blue). We consider all real matrices $ A \in \mathbb{R}^{n \times m} $. For non-square matrices (where $ n \neq m $), the SVD always exists, as we saw in this chapter.

Focusing on square matrices $ A \in \mathbb{R}^{n \times n} $, the determinant informs us whether a square matrix possesses an inverse matrix, i.e., whether it belongs to the class of regular, invertible matrices. If the square $ n \times n $ matrix possesses $ n $ linearly independent eigenvectors, then the matrix is non-defective and an eigendecomposition exists (Theorem 4.12). We know that repeated eigenvalues may result in defective matrices, which cannot be diagonalized.

Non-singular and non-defective matrices are not the same. For example, a rotation matrix will be invertible (determinant is nonzero) but not diagonalizable in the real numbers (eigenvalues are not guaranteed to be real numbers).

We dive further into the branch of non-defective square $ n \times n $ matrices. $ A $ is normal if the condition $ A^\top A = A A^\top $ holds. Moreover, if the more restrictive condition holds that

$$
A^\top A = A A^\top = I,
$$

then $ A $ is called orthogonal (see Definition 3.8). The set of orthogonal matrices is a subset of the regular (invertible) matrices and satisfies $ A^\top = A^{-1} $.

Normal matrices have a frequently encountered subset, the symmetric matrices $ S \in \mathbb{R}^{n \times n} $, which satisfy $ S = S^\top $. Symmetric matrices have only real eigenvalues. A subset of the symmetric matrices consists of the positive definite matrices $ P $ that satisfy the condition of $ x^\top P x > 0 $ for all $ x \in \mathbb{R}^n \setminus \{0\} $. In this case, a unique Cholesky decomposition exists (Theorem 4.18). Positive definite matrices have only positive eigenvalues and are always invertible (i.e., have a nonzero determinant).

Another subset of symmetric matrices consists of the diagonal matrices $ D $. Diagonal matrices are closed under multiplication and addition, but do not necessarily form a group (this is only the case if all diagonal entries are nonzero so that the matrix is invertible). A special diagonal matrix is the identity matrix $ I $.

In [7]:
import math

# --- Matrix Operations ---
def transpose(A):
    """
    Compute the transpose of matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[j][i] for j in range(m)] for i in range(n)]

def matrix_multiply(A, B):
    """
    Multiply two matrices A (m x n) and B (n x p).
    """
    m, n = len(A), len(B[0])
    result = [[0 for _ in range(n)] for _ in range(m)]
    for i in range(m):
        for j in range(n):
            result[i][j] = sum(A[i][k] * B[k][j] for k in range(len(B)))
    return result

def matrices_equal(A, B, tol=1e-6):
    """
    Check if two matrices are equal within a tolerance.
    """
    return all(abs(A[i][j] - B[i][j]) < tol for i in range(len(A)) for j in range(len(A[0])))

def dot_product(x, y):
    """
    Compute the dot product of two vectors.
    """
    return sum(xi * yi for xi, yi in zip(x, y))

# --- Matrix Classifier Class ---
class MatrixClassifier:
    def __init__(self, A):
        self.A = A
        self.m = len(A)
        self.n = len(A[0]) if A else 0
        self.A_T = transpose(A) if A else []

    def is_square(self):
        """
        Check if the matrix is square (m = n).
        """
        return self.m == self.n

    def determinant_2x2(self):
        """
        Compute determinant for a 2x2 matrix.
        """
        if self.m != 2 or self.n != 2:
            raise ValueError("Matrix must be 2x2")
        return self.A[0][0] * self.A[1][1] - self.A[0][1] * self.A[1][0]

    def is_invertible(self):
        """
        Check if the matrix is invertible (nonzero determinant, square matrix only).
        Simplified for 2x2 matrices.
        """
        if not self.is_square():
            return False
        if self.m == 2:
            det = self.determinant_2x2()
            return abs(det) > 1e-6
        # For larger matrices, determinant computation is complex without libraries
        return None  # Placeholder for non-2x2 matrices

    def is_symmetric(self):
        """
        Check if the matrix is symmetric (A = A^T).
        """
        if not self.is_square():
            return False
        return matrices_equal(self.A, self.A_T)

    def is_normal(self):
        """
        Check if the matrix is normal (A^T A = A A^T).
        """
        if not self.is_square():
            return False
        A_T_A = matrix_multiply(self.A_T, self.A)
        A_A_T = matrix_multiply(self.A, self.A_T)
        return matrices_equal(A_T_A, A_A_T)

    def is_orthogonal(self):
        """
        Check if the matrix is orthogonal (A^T A = A A^T = I).
        """
        if not self.is_square():
            return False
        n = self.n
        I = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
        A_T_A = matrix_multiply(self.A_T, self.A)
        return matrices_equal(A_T_A, I) and matrices_equal(matrix_multiply(self.A, self.A_T), I)

    def is_diagonal(self):
        """
        Check if the matrix is diagonal (non-diagonal entries are zero).
        """
        if not self.is_square():
            return False
        for i in range(self.m):
            for j in range(self.n):
                if i != j and abs(self.A[i][j]) > 1e-6:
                    return False
        return True

    def is_identity(self):
        """
        Check if the matrix is the identity matrix.
        """
        if not self.is_diagonal():
            return False
        return all(abs(self.A[i][i] - 1) < 1e-6 for i in range(self.m))

    def is_positive_definite(self):
        """
        Check if the matrix is positive definite (x^T A x > 0 for all x ≠ 0).
        Simplified: check if symmetric and all diagonal entries are positive (for diagonal matrices).
        Full check requires eigenvalues, which is complex without libraries.
        """
        if not self.is_symmetric():
            return False
        if self.is_diagonal():
            return all(self.A[i][i] > 0 for i in range(self.m))
        # For non-diagonal matrices, we'd need eigenvalues
        return None  # Placeholder

    def classify_matrix(self):
        """
        Classify the matrix according to the phylogeny in Figure 4.13.
        """
        print(f"Classifying Matrix (shape {self.m}x{self.n}):")
        for row in self.A:
            print(row)

        properties = []
        operations = []

        # Step 1: Real matrix
        properties.append("Real matrix")

        # Step 2: Square or non-square
        if self.is_square():
            properties.append("Square")
            # Check invertibility
            invertible = self.is_invertible()
            if invertible is True:
                properties.append("Invertible (Regular)")
                operations.append("Inverse exists")
            elif invertible is False:
                properties.append("Singular (det = 0)")

            # Check for normal matrix
            if self.is_normal():
                properties.append("Normal")
                # Check for orthogonal matrix
                if self.is_orthogonal():
                    properties.append("Orthogonal")
                    properties.append("Rotation matrix (if det = 1)")
                    operations.append("A^T = A^-1")

                # Check for symmetric matrix
                if self.is_symmetric():
                    properties.append("Symmetric")
                    operations.append("Eigenvalues are real")
                    # Check for positive definite
                    pd = self.is_positive_definite()
                    if pd is True:
                        properties.append("Positive definite")
                        operations.append("Cholesky decomposition exists")
                        operations.append("Eigenvalues > 0")
                    elif pd is False:
                        properties.append("Not positive definite")

                    # Check for diagonal matrix
                    if self.is_diagonal():
                        properties.append("Diagonal")
                        # Check for identity matrix
                        if self.is_identity():
                            properties.append("Identity")

            # Eigendecomposition (simplified check)
            if invertible is not None and invertible:
                properties.append("Likely non-defective (simplified check)")
                operations.append("Eigendecomposition likely exists")
            else:
                properties.append("Possibly defective (simplified check)")

        else:
            properties.append("Nonsquare")
            operations.append("SVD exists")
            operations.append("Pseudo-inverse exists")

        # Print classification
        print("\nProperties:")
        for prop in properties:
            print(f"- {prop}")

        print("\nOperations/Characteristics:")
        for op in operations:
            print(f"- {op}")

# --- Demonstration ---
def demonstrate_matrix_phylogeny():
    """
    Demonstrate matrix classification using examples.
    """
    print("=== Matrix Phylogeny Classification ===")
    print("Section 4.7: Classifying Matrices per Figure 4.13\n")

    # Test Case 1: Identity Matrix (2x2)
    print("Test Case 1: Identity Matrix")
    A1 = [[1, 0], [0, 1]]
    classifier1 = MatrixClassifier(A1)
    classifier1.classify_matrix()

    # Test Case 2: Symmetric Positive Definite Matrix (2x2)
    print("\nTest Case 2: Symmetric Positive Definite Matrix")
    A2 = [[2, 1], [1, 2]]
    classifier2 = MatrixClassifier(A2)
    classifier2.classify_matrix()

    # Test Case 3: Non-square Matrix (2x3)
    print("\nTest Case 3: Non-square Matrix")
    A3 = [[1, 2, 3], [4, 5, 6]]
    classifier3 = MatrixClassifier(A3)
    classifier3.classify_matrix()

    # Test Case 4: Orthogonal Matrix (Rotation by 90 degrees, 2x2)
    print("\nTest Case 4: Orthogonal Matrix (Rotation by 90 degrees)")
    A4 = [[0, -1], [1, 0]]
    classifier4 = MatrixClassifier(A4)
    classifier4.classify_matrix()

# --- Main Execution ---
if __name__ == "__main__":
    print("Matrix Phylogeny Analysis")
    print("=" * 60)

    # Run demonstration
    demonstrate_matrix_phylogeny()

    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• Classified matrices into square/non-square, normal, symmetric, orthogonal, etc.")
    print("• Identified applicable operations (SVD, eigendecomposition, Cholesky, etc.)")
    print("• Demonstrated the phylogenetic relationships as per Figure 4.13")

Matrix Phylogeny Analysis
=== Matrix Phylogeny Classification ===
Section 4.7: Classifying Matrices per Figure 4.13

Test Case 1: Identity Matrix
Classifying Matrix (shape 2x2):
[1, 0]
[0, 1]

Properties:
- Real matrix
- Square
- Invertible (Regular)
- Normal
- Orthogonal
- Rotation matrix (if det = 1)
- Symmetric
- Positive definite
- Diagonal
- Identity
- Likely non-defective (simplified check)

Operations/Characteristics:
- Inverse exists
- A^T = A^-1
- Eigenvalues are real
- Cholesky decomposition exists
- Eigenvalues > 0
- Eigendecomposition likely exists

Test Case 2: Symmetric Positive Definite Matrix
Classifying Matrix (shape 2x2):
[2, 1]
[1, 2]

Properties:
- Real matrix
- Square
- Invertible (Regular)
- Normal
- Symmetric
- Likely non-defective (simplified check)

Operations/Characteristics:
- Inverse exists
- Eigenvalues are real
- Eigendecomposition likely exists

Test Case 3: Non-square Matrix
Classifying Matrix (shape 2x3):
[1, 2, 3]
[4, 5, 6]

Properties:
- Real matrix
-