In [None]:
'''
 * Copyright (c) 2016 Radhamadhab Dalai
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
'''

We observe that the difference between $ A - \hat{A}^{(k)} $ is a matrix containing the sum of the remaining rank-1 matrices

$$
A - \hat{A}^{(k)} = \sum_{i=k+1}^{r} \sigma_i u_i v_i^\top \quad \text{(Equation 4.96)}
$$

By Theorem 4.24, we immediately obtain $ \sigma_{k+1} $ as the spectral norm of the difference matrix.

Let us have a closer look at (4.94). If we assume that there is another matrix $ B $ with $ \text{rk}(B) \leq k $, such that

$$
\|A - B\|_2 < \|A - \hat{A}^{(k)}\|_2, \quad \text{(Equation 4.97)}
$$

then there exists an at least $ (n - k) $-dimensional null space $ Z \subseteq \mathbb{R}^n $, such that $ x \in Z $ implies that $ B x = 0 $. Then it follows that

$$
\|A x\|_2 = \|(A - B) x\|_2, \quad \text{(Equation 4.98)}
$$

and by using a version of the Cauchy-Schwarz inequality (3.17) that encompasses norms of matrices, we obtain

$$
\|A x\|_2 \leq \|A - B\|_2 \|x\|_2 < \sigma_{k+1} \|x\|_2. \quad \text{(Equation 4.99)}
$$

However, there exists a $ (k + 1) $-dimensional subspace where $ \|A x\|_2 \geq \sigma_{k+1} \|x\|_2 $, which is spanned by the right-singular vectors $ v_j $, $ j \leq k + 1 $ of $ A $. Adding up dimensions of these two spaces yields a number greater than $ n $, as there must be a nonzero vector in both spaces. This is a contradiction of the rank-nullity theorem (Theorem 2.24) in Section 2.7.3.

The Eckart-Young theorem implies that we can use SVD to reduce a rank-$ r $ matrix $ A $ to a rank-$ k $ matrix $ \hat{A} $ in a principled, optimal (in the spectral norm sense) manner. We can interpret the approximation of $ A $ by a rank-$ k $ matrix as a form of lossy compression. Therefore, the low-rank approximation of a matrix appears in many machine learning applications, e.g., image processing, noise filtering, and regularization of ill-posed problems. Furthermore, it plays a key role in dimensionality reduction and principal component analysis, as we will see in Chapter 10.

### Example 4.15 (Finding Structure in Movie Ratings and Consumers (continued))

Coming back to our movie-rating example, we can now apply the concept of low-rank approximations to approximate the original data matrix. Recall that our first singular value captures the notion of science fiction theme in movies and science fiction lovers. Thus, by using only the first singular value term in a rank-1 decomposition of the movie-rating matrix, we obtain the predicted ratings

$$
\hat{A}^{(1)} = u_1 \sigma_1 v_1^\top = \begin{bmatrix} -0.6710 \\ -0.7197 \\ -0.0939 \\ -0.1515 \end{bmatrix} 9.6438 \begin{bmatrix} -0.7367 & -0.6515 & -0.1811 \end{bmatrix} \quad \text{(Equation 4.100a)}
$$

In [5]:
import math

# --- Transpose of a Matrix ---
def transpose(A):
    """
    Compute the transpose of matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[j][i] for j in range(m)] for i in range(n)]

# --- Matrix Multiplication ---
def matrix_multiply(A, B):
    """
    Multiply two matrices A (m x n) and B (n x p).
    """
    m, n = len(A), len(B[0])
    result = [[0 for _ in range(n)] for _ in range(m)]
    for i in range(m):
        for j in range(n):
            result[i][j] = sum(A[i][k] * B[k][j] for k in range(len(B)))
    return result

# --- Matrix-Vector Multiplication ---
def matrix_vector_multiply(A, x):
    """
    Multiply matrix A (m x n) by vector x (n x 1).
    """
    m = len(A)
    result = [0.0] * m
    for i in range(m):
        result[i] = sum(A[i][j] * x[j] for j in range(len(x)))
    return result

# --- Dot Product ---
def dot_product(x, y):
    """
    Compute the dot product of two vectors.
    """
    return sum(xi * yi for xi, yi in zip(x, y))

# --- Norm of a Vector ---
def norm(x):
    """
    Compute the Euclidean norm of a vector.
    """
    return math.sqrt(dot_product(x, x))

# --- Matrix Subtraction ---
def matrix_subtract(A, B):
    """
    Subtract matrix B from matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[i][j] - B[i][j] for j in range(n)] for i in range(m)]

# --- Verify Matrix Equality ---
def matrices_equal(A, B, tol=1e-3):
    """
    Check if two matrices are equal within a tolerance.
    """
    return all(abs(A[i][j] - B[i][j]) < tol for i in range(len(A)) for j in range(len(A[0])))

# --- Matrix Approximation Analyzer Class ---
class MatrixApproximationAnalyzer:
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_matrix = None

    def set_svd(self, U, sigma, Vt, A):
        """
        Set the SVD components manually (since we can't compute full SVD in core Python).
        """
        self.U = U
        self.sigma = sigma
        self.Vt = Vt
        self.original_matrix = A

    def rank_k_approximation(self, k):
        """
        Create rank-k approximation: Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        k = min(k, len(self.sigma))
        m, n = len(self.original_matrix), len(self.original_matrix[0])
        A_k = [[0 for _ in range(n)] for _ in range(m)]

        for i in range(k):
            # Compute outer product u_i v_i^T
            u_i = [self.U[j][i] for j in range(m)]
            v_i = [self.Vt[i][j] for j in range(n)]
            outer_product = [[u_i[j] * v_i[l] for l in range(n)] for j in range(m)]
            # Scale by sigma_i and add to A_k
            for j in range(m):
                for l in range(n):
                    A_k[j][l] += self.sigma[i] * outer_product[j][l]

        return A_k

    def eckart_young_residual(self, k):
        """
        Compute A - Â^(k) = Σᵢ₌ₖ₊₁ʳ σᵢ uᵢ vᵢᵀ (Equation 4.96).
        """
        A_k = self.rank_k_approximation(k)
        residual = matrix_subtract(self.original_matrix, A_k)
        return residual

    def spectral_norm(self):
        """
        Compute spectral norm using Theorem 4.24: ||A||₂ = σ₁.
        """
        if self.sigma is None:
            raise ValueError("SVD not computed. Set SVD components first.")
        return self.sigma[0] if len(self.sigma) > 0 else 0

    def eckart_young_error(self, k):
        """
        Compute the error according to Eckart-Young theorem: ||A - Â^(k)||₂ = σₖ₊₁.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        if k >= len(self.sigma):
            return 0.0  # Perfect reconstruction

        return self.sigma[k]  # σₖ₊₁ (k is 0-indexed, so k gives k+1)

# --- Demonstration ---
def demonstrate_eckart_young_proof():
    """
    Demonstrate the Eckart-Young theorem proof and contradiction (Equations 4.96–4.99).
    """
    print("=== Eckart-Young Theorem Proof Analysis ===")
    print("Verifying Equations 4.96–4.99\n")

    analyzer = MatrixApproximationAnalyzer()
    A = [[5, 4, 1],  # Star Wars
         [5, 5, 0],  # Blade Runner
         [0, 0, 5],  # Amelie
         [1, 0, 4]]  # Delicatessen

    # SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    print("Original Matrix A (Movie Ratings, 4x3):")
    for row in A:
        print(row)

    # Compute A - Â^(k) for k=1 (Equation 4.96)
    k = 1
    residual = analyzer.eckart_young_residual(k)
    print(f"\nA - Â^({k}) (Equation 4.96, sum of remaining rank-1 matrices):")
    for row in residual:
        print([round(x, 4) for x in row])

    # Verify spectral norm of residual = sigma_{k+1}
    residual_analyzer = MatrixApproximationAnalyzer()
    remaining_sigma = Sigma[k:]  # sigma_2, sigma_3
    U_residual = [[U[i][j] for j in range(k, len(U[0]))] for i in range(len(U))]
    Vt_residual = [[Vt[i][j] for j in range(len(Vt[0]))] for i in range(k, len(Vt))]
    residual_analyzer.set_svd(U_residual, remaining_sigma, Vt_residual, residual)
    actual_error = residual_analyzer.spectral_norm()
    theoretical_error = analyzer.eckart_young_error(k)

    print(f"\nSpectral norm of A - Â^({k}): {actual_error:.4f}")
    print(f"Theoretical error (σ_{k+1}): {theoretical_error:.4f}")
    print(f"Matches (Equation 4.96 verified): {abs(actual_error - theoretical_error) < 1e-3}")

    # Explore contradiction (Equations 4.97–4.99)
    print("\nExploring contradiction if another matrix B has smaller error (Equations 4.97–4.99):")
    print(f"For k={k}, error ||A - Â^({k})||_2 = σ_{k+1} = {theoretical_error:.4f}")
    print("If there exists B with rk(B) ≤ k and ||A - B||_2 < σ_{k+1}, then:")
    print(f"- Null space of B has dimension at least (n-k) = {3-k}")
    print(f"- On a (k+1)-dimensional subspace (spanned by v_1, ..., v_{k+1}), ||Ax||_2 ≥ σ_{k+1} ||x||_2")
    print("This leads to a dimensional contradiction (rank-nullity theorem), proving SVD's optimality.")

def demonstrate_movie_ratings_approximation():
    """
    Example 4.15: Compute rank-1 approximation for movie ratings (Equation 4.100a).
    """
    print("\n=== Example 4.15: Movie Ratings Rank-1 Approximation ===")
    print("Equation 4.100a\n")

    analyzer = MatrixApproximationAnalyzer()
    A = [[5, 4, 1],  # Star Wars
         [5, 5, 0],  # Blade Runner
         [0, 0, 5],  # Amelie
         [1, 0, 4]]  # Delicatessen

    # SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    # Compute rank-1 approximation: Â^(1) = u_1 σ_1 v_1^T
    k = 1
    A_1 = analyzer.rank_k_approximation(k)

    print("Rank-1 Approximation Â^(1) (Equation 4.100a):")
    for row in A_1:
        print([round(x, 4) for x in row])

    # Interpretation
    print("\nInterpretation:")
    print("This rank-1 approximation captures the science fiction theme:")
    print(f"- u_1 emphasizes Star Wars ({U[0][0]:.4f}) and Blade Runner ({U[1][0]:.4f})")
    print(f"- v_1 emphasizes Ali ({Vt[0][0]:.4f}) and Beatrix ({Vt[0][1]:.4f}) as science fiction lovers")
    print(f"- Predicted ratings reflect this theme, with higher values for sci-fi movies and sci-fi lovers.")

# --- Main Execution ---
if __name__ == "__main__":
    print("Matrix Approximation and Eckart-Young Theorem Continued")
    print("=" * 60)

    # Run demonstrations
    demonstrate_eckart_young_proof()
    demonstrate_movie_ratings_approximation()

    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• A - Â^(k) = Σᵢ₌ₖ₊₁ʳ σᵢ uᵢ vᵢᵀ, with spectral norm σ_{k+1}")
    print("• Eckart-Young theorem proven via contradiction (dimensionality argument)")
    print("• Rank-1 approximation of movie ratings captures science fiction theme")
    print("• Applications in lossy compression, dimensionality reduction, and more")

Matrix Approximation and Eckart-Young Theorem Continued
=== Eckart-Young Theorem Proof Analysis ===
Verifying Equations 4.96–4.99

Original Matrix A (Movie Ratings, 4x3):
[5, 4, 1]
[5, 5, 0]
[0, 0, 5]
[1, 0, 4]

A - Â^(1) (Equation 4.96, sum of remaining rank-1 matrices):
[0.2328, -0.2158, -0.1719]
[-0.1132, 0.4782, -1.257]
[-0.6671, -0.59, 4.836]
[-0.0763, -0.9519, 3.7354]

Spectral norm of A - Â^(1): 6.3639
Theoretical error (σ_2): 6.3639
Matches (Equation 4.96 verified): True

Exploring contradiction if another matrix B has smaller error (Equations 4.97–4.99):
For k=1, error ||A - Â^(1)||_2 = σ_2 = 6.3639
If there exists B with rk(B) ≤ k and ||A - B||_2 < σ_{k+1}, then:
- Null space of B has dimension at least (n-k) = 2
- On a (k+1)-dimensional subspace (spanned by v_1, ..., v_2), ||Ax||_2 ≥ σ_2 ||x||_2
This leads to a dimensional contradiction (rank-nullity theorem), proving SVD's optimality.

=== Example 4.15: Movie Ratings Rank-1 Approximation ===
Equation 4.100a

Rank-1 Approxim

$$
\begin{bmatrix}
0.4943 & 0.4372 & 0.1215 \\
0.5302 & 0.4689 & 0.1303 \\
0.0692 & 0.0612 & 0.0170 \\
0.1116 & 0.0987 & 0.0274
\end{bmatrix}. \quad \text{(Equation 4.100b)}
$$

This first rank-1 approximation $ \hat{A}^{(1)} $ is insightful: it tells us that Ali and Beatrix like science fiction movies, such as Star Wars and Blade Runner (entries have values $ > 0.4 $), but fails to capture the ratings of the other movies by Chandra. This is not surprising, as Chandra’s type of movies is not captured by the first singular value.

The second singular value gives us a better rank-1 approximation for those movie-theme lovers:

$$
\hat{A}^{(2)} = u_2 \sigma_2 v_2^\top = \begin{bmatrix} 0.0236 \\ 0.2054 \\ -0.7705 \\ -0.6030 \end{bmatrix} 6.3639 \begin{bmatrix} 0.0852 & 0.1762 & -0.9807 \end{bmatrix} \quad \text{(Equation 4.101a)}
$$

$$
= \begin{bmatrix}
0.0020 & 0.0042 & -0.0231 \\
0.0175 & 0.0362 & -0.2014 \\
-0.0656 & -0.1358 & 0.7556 \\
-0.0514 & -0.1063 & 0.5914
\end{bmatrix}. \quad \text{(Equation 4.101b)}
$$

In this second rank-1 approximation $ \hat{A}^{(2)} $, we capture Chandra’s ratings and movie types well, but not the science fiction movies.

This leads us to consider the rank-2 approximation $ \hat{A}^{(2)} $, where we combine the first two rank-1 approximations

$$
\hat{A}^{(2)} = \sigma_1 \hat{A}^{(1)} + \sigma_2 \hat{A}^{(2)} = \begin{bmatrix}
4.7801 & 4.2419 & 1.0244 \\
5.2252 & 4.7522 & -0.0250 \\
0.2493 & -0.2743 & 4.9724 \\
0.7495 & 0.2756 & 4.0278
\end{bmatrix} \quad \text{(Equation 4.102)}
$$

$ \hat{A}^{(2)} $ is similar to the original movie ratings table

$$
A = \begin{bmatrix}
5 & 4 & 1 \\
5 & 5 & 0 \\
0 & 0 & 5 \\
1 & 0 & 4
\end{bmatrix}, \quad \text{(Equation 4.103)}
$$

and this suggests that we can ignore the contribution of $ \hat{A}^{(3)} $. We can interpret this so that in the data table there is no evidence of a third movie-theme/movie-lovers category. This also means that the entire space of movie-themes/movie-lovers in our example is a two-dimensional space spanned by science fiction and French art house movies and lovers.

In [6]:
import math

# --- Matrix Multiplication ---
def matrix_multiply(A, B):
    """
    Multiply two matrices A (m x n) and B (n x p).
    """
    m, n = len(A), len(B[0])
    result = [[0 for _ in range(n)] for _ in range(m)]
    for i in range(m):
        for j in range(n):
            result[i][j] = sum(A[i][k] * B[k][j] for k in range(len(B)))
    return result

# --- Matrix Addition ---
def matrix_add(A, B):
    """
    Add two matrices A and B.
    """
    m, n = len(A), len(A[0])
    return [[A[i][j] + B[i][j] for j in range(n)] for i in range(m)]

# --- Matrix Scalar Multiplication ---
def matrix_scalar_multiply(scalar, A):
    """
    Multiply matrix A by a scalar.
    """
    m, n = len(A), len(A[0])
    return [[scalar * A[i][j] for j in range(n)] for i in range(m)]

# --- Verify Matrix Equality ---
def matrices_equal(A, B, tol=1e-3):
    """
    Check if two matrices are equal within a tolerance.
    """
    return all(abs(A[i][j] - B[i][j]) < tol for i in range(len(A)) for j in range(len(A[0])))

# --- Matrix Approximation Analyzer Class ---
class MatrixApproximationAnalyzer:
    def __init__(self):
        self.U = None
        self.sigma = None
        self.Vt = None
        self.original_matrix = None

    def set_svd(self, U, sigma, Vt, A):
        """
        Set the SVD components manually (since we can't compute full SVD in core Python).
        """
        self.U = U
        self.sigma = sigma
        self.Vt = Vt
        self.original_matrix = A

    def rank_1_approximation(self, i):
        """
        Compute a single rank-1 approximation: u_i v_i^T (without sigma).
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        m, n = len(self.original_matrix), len(self.original_matrix[0])
        u_i = [self.U[j][i] for j in range(m)]
        v_i = [self.Vt[i][j] for j in range(n)]
        # Compute outer product u_i v_i^T
        A_i = [[u_i[j] * v_i[l] for l in range(n)] for j in range(m)]
        return A_i

    def rank_k_approximation(self, k):
        """
        Create rank-k approximation: Â^(k) = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ.
        """
        if self.U is None:
            raise ValueError("SVD not computed. Set SVD components first.")

        k = min(k, len(self.sigma))
        m, n = len(self.original_matrix), len(self.original_matrix[0])
        A_k = [[0 for _ in range(n)] for _ in range(m)]

        for i in range(k):
            # Compute outer product u_i v_i^T
            A_i = self.rank_1_approximation(i)
            # Scale by sigma_i and add to A_k
            A_i_scaled = matrix_scalar_multiply(self.sigma[i], A_i)
            A_k = matrix_add(A_k, A_i_scaled)

        return A_k

# --- Demonstration ---
def demonstrate_movie_ratings_approximations():
    """
    Example 4.15: Compute rank-1 and rank-2 approximations for movie ratings (Equations 4.100b–4.103).
    """
    print("=== Example 4.15: Movie Ratings Approximations ===")
    print("Equations 4.100b–4.103\n")

    analyzer = MatrixApproximationAnalyzer()
    A = [[5, 4, 1],  # Star Wars
         [5, 5, 0],  # Blade Runner
         [0, 0, 5],  # Amelie
         [1, 0, 4]]  # Delicatessen

    # SVD components from Figure 4.10
    U = [[-0.6710, 0.0236, 0.4647, -0.5774],
         [-0.7197, 0.2054, -0.4759, 0.4619],
         [-0.0939, -0.7705, -0.5268, -0.3464],
         [-0.1515, -0.6030, 0.5293, -0.5774]]
    Sigma = [9.6438, 6.3639, 0.7056]  # Diagonal elements
    Vt = [[-0.7367, -0.6515, -0.1811],
          [0.0852, 0.1762, -0.9807],
          [0.6708, -0.7379, -0.0743]]
    analyzer.set_svd(U, Sigma, Vt, A)

    print("Original Matrix A (Equation 4.103):")
    for row in A:
        print(row)

    # Compute first rank-1 approximation: Â^(1) = u_1 σ_1 v_1^T (Equations 4.100a–b)
    A_1_base = analyzer.rank_1_approximation(0)  # u_1 v_1^T
    A_1 = matrix_scalar_multiply(Sigma[0], A_1_base)

    print("\nFirst Rank-1 Approximation Â^(1) (Equation 4.100b):")
    for row in A_1:
        print([round(x, 4) for x in row])

    print("\nInterpretation of Â^(1):")
    print("Captures science fiction theme:")
    print(f"- High values for Star Wars and Blade Runner for Ali and Beatrix (entries > 0.4)")
    print(f"- Fails to capture Chandra's ratings (small values in third column)")

    # Compute second rank-1 approximation: Â^(2) = u_2 σ_2 v_2^T (Equations 4.101a–b)
    A_2_base = analyzer.rank_1_approximation(1)  # u_2 v_2^T
    A_2 = matrix_scalar_multiply(Sigma[1], A_2_base)

    print("\nSecond Rank-1 Approximation Â^(2) (Equation 4.101b):")
    for row in A_2:
        print([round(x, 4) for x in row])

    print("\nInterpretation of Â^(2):")
    print("Captures French art house theme:")
    print(f"- High values for Amelie and Delicatessen for Chandra (third column, entries ~0.7556, 0.5914)")
    print(f"- Fails to capture science fiction movies (small values in first two columns)")

    # Compute rank-2 approximation: Â^(2) = σ_1 Â^(1) + σ_2 Â^(2) (Equation 4.102)
    A_2_combined = analyzer.rank_k_approximation(2)

    print("\nRank-2 Approximation Â^(2) (Equation 4.102):")
    for row in A_2_combined:
        print([round(x, 4) for x in row])

    # Compare with original matrix
    print("\nComparison with Original Matrix A:")
    print("Original A:")
    for row in A:
        print(row)
    print("Â^(2):")
    for row in A_2_combined:
        print([round(x, 4) for x in row])

    print("\nInterpretation:")
    print("Â^(2) closely approximates A, suggesting the third singular value (σ_3 = 0.7056) is negligible.")
    print("The data is well-represented by a two-dimensional space:")
    print("- Science fiction theme (captured by Â^(1))")
    print("- French art house theme (captured by Â^(2))")
    print("No evidence of a third movie-theme category.")

# --- Main Execution ---
if __name__ == "__main__":
    print("Matrix Approximation and Eckart-Young Theorem: Movie Ratings")
    print("=" * 60)

    # Run demonstration
    demonstrate_movie_ratings_approximations()

    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• Rank-1 approximation Â^(1) captures science fiction theme and lovers")
    print("• Rank-1 approximation Â^(2) captures French art house theme and lovers")
    print("• Rank-2 approximation Â^(2) closely matches the original matrix")
    print("• Movie themes are a two-dimensional space: sci-fi and French art house")

Matrix Approximation and Eckart-Young Theorem: Movie Ratings
=== Example 4.15: Movie Ratings Approximations ===
Equations 4.100b–4.103

Original Matrix A (Equation 4.103):
[5, 4, 1]
[5, 5, 0]
[0, 0, 5]
[1, 0, 4]

First Rank-1 Approximation Â^(1) (Equation 4.100b):
[4.7672, 4.2158, 1.1719]
[5.1132, 4.5218, 1.257]
[0.6671, 0.59, 0.164]
[1.0763, 0.9519, 0.2646]

Interpretation of Â^(1):
Captures science fiction theme:
- High values for Star Wars and Blade Runner for Ali and Beatrix (entries > 0.4)
- Fails to capture Chandra's ratings (small values in third column)

Second Rank-1 Approximation Â^(2) (Equation 4.101b):
[0.0128, 0.0265, -0.1473]
[0.1114, 0.2303, -1.2819]
[-0.4178, -0.864, 4.8087]
[-0.3269, -0.6762, 3.7634]

Interpretation of Â^(2):
Captures French art house theme:
- High values for Amelie and Delicatessen for Chandra (third column, entries ~0.7556, 0.5914)
- Fails to capture science fiction movies (small values in first two columns)

Rank-2 Approximation Â^(2) (Equation 4.10

## 4.7 Matrix Phylogeny

The word “phylogenetic” describes how we capture the relationships between different types of matrices (black arrows indicating “is a subset of”) and the covered operations we can perform on them (in blue). We consider all real matrices $ A \in \mathbb{R}^{n \times m} $. For non-square matrices (where $ n \neq m $), the SVD always exists, as we saw in this chapter.

Focusing on square matrices $ A \in \mathbb{R}^{n \times n} $, the determinant informs us whether a square matrix possesses an inverse matrix, i.e., whether it belongs to the class of regular, invertible matrices. If the square $ n \times n $ matrix possesses $ n $ linearly independent eigenvectors, then the matrix is non-defective and an eigendecomposition exists (Theorem 4.12). We know that repeated eigenvalues may result in defective matrices, which cannot be diagonalized.

Non-singular and non-defective matrices are not the same. For example, a rotation matrix will be invertible (determinant is nonzero) but not diagonalizable in the real numbers (eigenvalues are not guaranteed to be real numbers).

We dive further into the branch of non-defective square $ n \times n $ matrices. $ A $ is normal if the condition $ A^\top A = A A^\top $ holds. Moreover, if the more restrictive condition holds that

$$
A^\top A = A A^\top = I,
$$

then $ A $ is called orthogonal (see Definition 3.8). The set of orthogonal matrices is a subset of the regular (invertible) matrices and satisfies $ A^\top = A^{-1} $.

Normal matrices have a frequently encountered subset, the symmetric matrices $ S \in \mathbb{R}^{n \times n} $, which satisfy $ S = S^\top $. Symmetric matrices have only real eigenvalues. A subset of the symmetric matrices consists of the positive definite matrices $ P $ that satisfy the condition of $ x^\top P x > 0 $ for all $ x \in \mathbb{R}^n \setminus \{0\} $. In this case, a unique Cholesky decomposition exists (Theorem 4.18). Positive definite matrices have only positive eigenvalues and are always invertible (i.e., have a nonzero determinant).

Another subset of symmetric matrices consists of the diagonal matrices $ D $. Diagonal matrices are closed under multiplication and addition, but do not necessarily form a group (this is only the case if all diagonal entries are nonzero so that the matrix is invertible). A special diagonal matrix is the identity matrix $ I $.

In [7]:
import math

# --- Matrix Operations ---
def transpose(A):
    """
    Compute the transpose of matrix A.
    """
    m, n = len(A), len(A[0])
    return [[A[j][i] for j in range(m)] for i in range(n)]

def matrix_multiply(A, B):
    """
    Multiply two matrices A (m x n) and B (n x p).
    """
    m, n = len(A), len(B[0])
    result = [[0 for _ in range(n)] for _ in range(m)]
    for i in range(m):
        for j in range(n):
            result[i][j] = sum(A[i][k] * B[k][j] for k in range(len(B)))
    return result

def matrices_equal(A, B, tol=1e-6):
    """
    Check if two matrices are equal within a tolerance.
    """
    return all(abs(A[i][j] - B[i][j]) < tol for i in range(len(A)) for j in range(len(A[0])))

def dot_product(x, y):
    """
    Compute the dot product of two vectors.
    """
    return sum(xi * yi for xi, yi in zip(x, y))

# --- Matrix Classifier Class ---
class MatrixClassifier:
    def __init__(self, A):
        self.A = A
        self.m = len(A)
        self.n = len(A[0]) if A else 0
        self.A_T = transpose(A) if A else []

    def is_square(self):
        """
        Check if the matrix is square (m = n).
        """
        return self.m == self.n

    def determinant_2x2(self):
        """
        Compute determinant for a 2x2 matrix.
        """
        if self.m != 2 or self.n != 2:
            raise ValueError("Matrix must be 2x2")
        return self.A[0][0] * self.A[1][1] - self.A[0][1] * self.A[1][0]

    def is_invertible(self):
        """
        Check if the matrix is invertible (nonzero determinant, square matrix only).
        Simplified for 2x2 matrices.
        """
        if not self.is_square():
            return False
        if self.m == 2:
            det = self.determinant_2x2()
            return abs(det) > 1e-6
        # For larger matrices, determinant computation is complex without libraries
        return None  # Placeholder for non-2x2 matrices

    def is_symmetric(self):
        """
        Check if the matrix is symmetric (A = A^T).
        """
        if not self.is_square():
            return False
        return matrices_equal(self.A, self.A_T)

    def is_normal(self):
        """
        Check if the matrix is normal (A^T A = A A^T).
        """
        if not self.is_square():
            return False
        A_T_A = matrix_multiply(self.A_T, self.A)
        A_A_T = matrix_multiply(self.A, self.A_T)
        return matrices_equal(A_T_A, A_A_T)

    def is_orthogonal(self):
        """
        Check if the matrix is orthogonal (A^T A = A A^T = I).
        """
        if not self.is_square():
            return False
        n = self.n
        I = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
        A_T_A = matrix_multiply(self.A_T, self.A)
        return matrices_equal(A_T_A, I) and matrices_equal(matrix_multiply(self.A, self.A_T), I)

    def is_diagonal(self):
        """
        Check if the matrix is diagonal (non-diagonal entries are zero).
        """
        if not self.is_square():
            return False
        for i in range(self.m):
            for j in range(self.n):
                if i != j and abs(self.A[i][j]) > 1e-6:
                    return False
        return True

    def is_identity(self):
        """
        Check if the matrix is the identity matrix.
        """
        if not self.is_diagonal():
            return False
        return all(abs(self.A[i][i] - 1) < 1e-6 for i in range(self.m))

    def is_positive_definite(self):
        """
        Check if the matrix is positive definite (x^T A x > 0 for all x ≠ 0).
        Simplified: check if symmetric and all diagonal entries are positive (for diagonal matrices).
        Full check requires eigenvalues, which is complex without libraries.
        """
        if not self.is_symmetric():
            return False
        if self.is_diagonal():
            return all(self.A[i][i] > 0 for i in range(self.m))
        # For non-diagonal matrices, we'd need eigenvalues
        return None  # Placeholder

    def classify_matrix(self):
        """
        Classify the matrix according to the phylogeny in Figure 4.13.
        """
        print(f"Classifying Matrix (shape {self.m}x{self.n}):")
        for row in self.A:
            print(row)

        properties = []
        operations = []

        # Step 1: Real matrix
        properties.append("Real matrix")

        # Step 2: Square or non-square
        if self.is_square():
            properties.append("Square")
            # Check invertibility
            invertible = self.is_invertible()
            if invertible is True:
                properties.append("Invertible (Regular)")
                operations.append("Inverse exists")
            elif invertible is False:
                properties.append("Singular (det = 0)")

            # Check for normal matrix
            if self.is_normal():
                properties.append("Normal")
                # Check for orthogonal matrix
                if self.is_orthogonal():
                    properties.append("Orthogonal")
                    properties.append("Rotation matrix (if det = 1)")
                    operations.append("A^T = A^-1")

                # Check for symmetric matrix
                if self.is_symmetric():
                    properties.append("Symmetric")
                    operations.append("Eigenvalues are real")
                    # Check for positive definite
                    pd = self.is_positive_definite()
                    if pd is True:
                        properties.append("Positive definite")
                        operations.append("Cholesky decomposition exists")
                        operations.append("Eigenvalues > 0")
                    elif pd is False:
                        properties.append("Not positive definite")

                    # Check for diagonal matrix
                    if self.is_diagonal():
                        properties.append("Diagonal")
                        # Check for identity matrix
                        if self.is_identity():
                            properties.append("Identity")

            # Eigendecomposition (simplified check)
            if invertible is not None and invertible:
                properties.append("Likely non-defective (simplified check)")
                operations.append("Eigendecomposition likely exists")
            else:
                properties.append("Possibly defective (simplified check)")

        else:
            properties.append("Nonsquare")
            operations.append("SVD exists")
            operations.append("Pseudo-inverse exists")

        # Print classification
        print("\nProperties:")
        for prop in properties:
            print(f"- {prop}")

        print("\nOperations/Characteristics:")
        for op in operations:
            print(f"- {op}")

# --- Demonstration ---
def demonstrate_matrix_phylogeny():
    """
    Demonstrate matrix classification using examples.
    """
    print("=== Matrix Phylogeny Classification ===")
    print("Section 4.7: Classifying Matrices per Figure 4.13\n")

    # Test Case 1: Identity Matrix (2x2)
    print("Test Case 1: Identity Matrix")
    A1 = [[1, 0], [0, 1]]
    classifier1 = MatrixClassifier(A1)
    classifier1.classify_matrix()

    # Test Case 2: Symmetric Positive Definite Matrix (2x2)
    print("\nTest Case 2: Symmetric Positive Definite Matrix")
    A2 = [[2, 1], [1, 2]]
    classifier2 = MatrixClassifier(A2)
    classifier2.classify_matrix()

    # Test Case 3: Non-square Matrix (2x3)
    print("\nTest Case 3: Non-square Matrix")
    A3 = [[1, 2, 3], [4, 5, 6]]
    classifier3 = MatrixClassifier(A3)
    classifier3.classify_matrix()

    # Test Case 4: Orthogonal Matrix (Rotation by 90 degrees, 2x2)
    print("\nTest Case 4: Orthogonal Matrix (Rotation by 90 degrees)")
    A4 = [[0, -1], [1, 0]]
    classifier4 = MatrixClassifier(A4)
    classifier4.classify_matrix()

# --- Main Execution ---
if __name__ == "__main__":
    print("Matrix Phylogeny Analysis")
    print("=" * 60)

    # Run demonstration
    demonstrate_matrix_phylogeny()

    print("\n" + "=" * 60)
    print("Summary of Key Results:")
    print("• Classified matrices into square/non-square, normal, symmetric, orthogonal, etc.")
    print("• Identified applicable operations (SVD, eigendecomposition, Cholesky, etc.)")
    print("• Demonstrated the phylogenetic relationships as per Figure 4.13")

Matrix Phylogeny Analysis
=== Matrix Phylogeny Classification ===
Section 4.7: Classifying Matrices per Figure 4.13

Test Case 1: Identity Matrix
Classifying Matrix (shape 2x2):
[1, 0]
[0, 1]

Properties:
- Real matrix
- Square
- Invertible (Regular)
- Normal
- Orthogonal
- Rotation matrix (if det = 1)
- Symmetric
- Positive definite
- Diagonal
- Identity
- Likely non-defective (simplified check)

Operations/Characteristics:
- Inverse exists
- A^T = A^-1
- Eigenvalues are real
- Cholesky decomposition exists
- Eigenvalues > 0
- Eigendecomposition likely exists

Test Case 2: Symmetric Positive Definite Matrix
Classifying Matrix (shape 2x2):
[2, 1]
[1, 2]

Properties:
- Real matrix
- Square
- Invertible (Regular)
- Normal
- Symmetric
- Likely non-defective (simplified check)

Operations/Characteristics:
- Inverse exists
- Eigenvalues are real
- Eigendecomposition likely exists

Test Case 3: Non-square Matrix
Classifying Matrix (shape 2x3):
[1, 2, 3]
[4, 5, 6]

Properties:
- Real matrix
-