# Eigenvalue Problems

## 04.01 Eigenvalue Problems

Given an $n \times n$ matrix $A$, find scalar eigenvalue $\lambda$ and nonzero eigenvector $x$ such that:
$$
Ax = \lambda x
$$

* Note: $\lambda$ might be complex, even when $A$ is real
* $x$ is sometimes referred to as the *right eigenvector*
* Interpretation: Eigenvalues and eigenvectors decompose complex linear behavior into simpler actions.

Alternatively we have left eigenvector
$$
y^T A = \lambda y^T
$$
where
* $y$ is the left eigenvector
* Theoretically interesting, but not important for computation

**Spectrum** $\lambda(A)$ is set of all eigenvalues of $A$

**Spectral Radius** $\rho(A)$ is maximum absolute value eigenvalue

Use numpy to compute eigenvalues and eigenvectors of some simple matrices.

In [1]:
import numpy as np

# Diagonal matrix.
A = np.array([1,0,0,2]).reshape(2,2)
w, v = np.linalg.eig(A)
# Confirm that Av = wv.
np.testing.assert_almost_equal(np.matmul(A, v), w * v)
# Confirm eigenvalues are equal to diagonal.
np.testing.assert_almost_equal(w, np.diag(A))

# Triangular matrix.
A = np.array([1,1,0,2]).reshape(2,2)
w, v = np.linalg.eig(A)
np.testing.assert_almost_equal(np.matmul(A, v), w * v)
# Confirm eigenvalues are equal to diagonal.
np.testing.assert_almost_equal(w, np.diag(A))

# Symmetric matrix.
A = np.array([3,-1,-1,3]).reshape(2,2)
w, v = np.linalg.eig(A)
np.testing.assert_almost_equal(np.matmul(A, v), w * v)

# Symmetric matrix.
A = np.array([1.5,0.5,0.5,1.5]).reshape(2,2)
w, v = np.linalg.eig(A)
np.testing.assert_almost_equal(np.matmul(A, v), w * v)

# Complex matrix.
A = np.array([0,1,-1,0]).reshape(2,2)
w, v = np.linalg.eig(A)
np.testing.assert_almost_equal(np.matmul(A, v), w * v)

## 04.02 Characteristic Polynomial and Multiplicity

The eigenvalue problem can be rewritten as $A$ with the eigenvalues $\lambda$ subtracted from the diagonal.
$$
(A - \lambda I) x = 0
$$

In order for the eigenvectors $x$ from the equation above to be nonsingular, then the matrix $(A - \lambda I)$ must be singular. The determinant of $(A - \lambda I)$ is known as the **characteristic equation** of the matrix $A$.
$$
\text{det}(A - \lambda I) = 0
$$

$$
=
\begin{bmatrix}
a - \lambda & b \\
c & d - \lambda \\
\end{bmatrix}
\\
= (a - \lambda) (d - \lambda) - bc \\
= \lambda^2 - (a + d) \lambda + (ad - bc)
$$
where
* $(a + d)$ is the trace of the matrix $A$
* $(ad - bc)$ is the determinant of the matrix $A$

In other words, the eigenvalues of $A$ are the roots of the **characteristic equation**.
* Useful theoretical tool, but not useful numerically.

#### Companion Matrix
For any polynomial, there is an associated matrix whose eigenvalues are the roots of the polynomial.
* The polynomial is referred to as **monic**.
* The matrix is referred to as **companion matrix**.
* The companion matrix has value 1 on the subdiagonal and coefficients of the polynomial in the last column of the matrix.

#### Multiplicity
Number of times root appears when polynomial is written as product of linear factors.
* Defective matrix has eigenvalue of multiplicity $k \gt 1$ with fewer than $k$ linearly independent corresponding eigenvectors.

#### Diagonalizable
Nondefective matrix $A$ with $n$ linearly independent eigenvectors has the similarity transform:
$$
\begin{aligned}
AX &= XD \\
X^{-1}AX &= D
\end{aligned}
$$
where
* $D$ is a diagonal matrix with $\lambda_1 \cdots \lambda_n$ along diagonal
* $X$ is a matrix formed from the set of linearly independent eigenvectors $x_1 \cdots x_n$

#### Uniqueness
Eigenvectors are not unique since they can be scaled arbitrarily.  
* Eigenvectors are typically normalized to unit vector.
* They can be scaled arbitrarily (eg multiplied by 1 or -1) to flip sign of components.

Eigenspace, $S_{\lambda}$, is the set of all eigenvectors such that $Ax = \lambda x$.
* $S_{\lambda}$ is a subspace of $\mathbb{R}^n$ or $\mathbb{C}^n$

## 04.03 Relevant Properties of Matrices

Properties Relevant to eigenvalue problems
* diagonal
  * eigenvalues are diagonal entries
* triangular
  * eigenvalues are diagonal entries
* tridiagonal
  * elements above the first superdiagonal and below the first subdiagonal are 0
* Hessenberg
  * like a triangular matrix, except either first superdiagonal or first subdiagonal are also nonzero
* orthogonal
  * $A^T A = A A^T = I$ where columns are orthonormal
* unitary
  * $A^H A = A A^H = I$ where $H$ is the conjugate transpose
  * complex analogue to orthogonal
* symmetric
  * $A = A^T$
  * eigenvalues must be real
* Hermitian
  * $A = H^H$
  * complex analogue to transpose
  * eigenvalues must be real
* normal
  * $A^H A = A A^H$ 
  * eigenvectors are orthogonal

## 04.04 Conditioning of Eigenvalue Problems

Conditioning refers to the sensitivity of the eigenvalues and eigenvectors to changes in matrix.
* Previously we looked at conditioning as a property of a matrix, but for eigenvalue problems it is a property of the eigenvectors.

For normal matrix $A^H A = A A^H$ eigenvectors are orthogonal and eigenvalues are well-conditioned.

Check properties and conditioning of an example matrix with numpy.

In [2]:
import numpy as np

A = np.array([-149,-50,-154,537,180,546,-27,-9,-25]).reshape(3,3)

# Compute the eigenvalues and right eigenvector.
w, v = np.linalg.eig(A)
print("eigenvalues: ", w)

# Since eigenvalues are distinct, A is diagonalizable.
# A X = X D where D is the diagonal matrix formed from eigenvalues.
np.testing.assert_almost_equal(np.matmul(A, v), np.matmul(v, np.diag(w)))

# Check whether A is normal.
is_normal = np.equal(np.matmul(A.T, A), np.matmul(A, A.T))
print("is_normal: ", np.all(is_normal))

# Check the condition number of the right eigenvector.
# If value is large, then eigenvalues are sensitive to perturbations in A.
print("cond(v): ", np.linalg.cond(v))

eigenvalues:  [1. 2. 3.]
is_normal:  False
cond(v):  1288.943965937134


## 04.05 Computing Eigenvalues and Eigenvectors

#### Problem Transformations
* Shift
  * If $Ax = \lambda x$ and $\sigma$ is a scalar, then $(A - \sigma I) x = (\lambda - \sigma) x$.
  * Eigenvalues of shifted matrix are shifted eigenvalues of original matrix.
  * Eigenvectors are unchanged.
* Inversion
  * If $A$ is nonsingular and $Ax = \lambda x$ with $x \neq 0$, then $A^{-1}x = \frac{1}{\lambda}x$.
  * Eigenvectors are unchanged.
* Powers
  * If $Ax = \lambda x$, then $A^k x = \lambda^k x$
  * Raising a matrix to a power $k$ also raises eigenvalues to power $k$.
  * Eigenvectors are unchanged.
* Polynomial
  * If $Ax = \lambda x$ and $p(t)$ is a polynomial, then $p(A)x = p(\lambda)x$
* Similarity
  * $B$ is similar to $A$ if there is a nonsingular matrix $T$ such that $B = T^{-1} A T$.
  * $A$ and $B$ will have the same eigenvalues.
  * If $y$ is an eigenvector of $B$, then $x = Ty$ is an eigenvector of $A$.

Why are similarity transforms useful?
* $X^{-1} A X = D$ yields diagonal matrix $D$
* $D$ has eigenvalues along diagonal
* $X$ has eigenvectors as columns of an identity matrix
* Not all matrices are diagonalizable by similarity transformation.

#### Forms Attainable By Similarity
$B$ is similar to $A$ if there is a nonsingular matrix $T$ such that $B = T^{-1} A T$.

| A | T | B |
|---|---|---|
| distinct eigenvalues | nonsingular | diagonal |
| real symmetric | orthogonal | real diagonal |
| complex Hermitian | unitary | real diagonal |
| normal | unitary | diagonal |
| arbitrary real | orthogonal | real block triangular |
| arbitrary | unitary | upper triangular |
| arbitrary | nonsingular | almost diagonal (Jordan) |

## 04.06 Power Iteration

Compute the dominant eigenvalue $\lambda_k$ and eigenvector $x_k$ of the $n \times n$ matrix $A$.
1. Start with some nonzero vector $x_0$.
2. Compute $x_k = A x_{k-1}$ and $\lambda_k = x_k^T \cdot A x_k$.
  * To avoid overflow (or underflow if $\lambda_k \lt 1$), consider normalizing $x_k$ after each iteration.
3. Repeat the previous step until $| \lambda_k - \lambda_{k-1} | \lt \epsilon$.

After convergence, the value $\lambda_k$ is the dominant eigenvalue with  eigenvector $x_k$.
* The dominant eigenvalue is eigenvalue having maximum absolute value.

The convergence rate of power iteration depends on the ratio $|\lambda_2 / \lambda_1|$ where $|\lambda_2|$ is the eigenvalue having second-largest absolute value.
* Smaller the ratio, the faster the convergence.

Find the dominant eigenvalue and eigenvector of the $n \times n$ matrix $A$ using power iteration.

In [3]:
import numpy as np

def power_iteration(A):
    """
    Use power iteration to compute the dominant eigenvalue of A.
    """
    n = A.shape[0]
    x_k = np.random.random((n,))  # Any nonzero vector.
    lambda_k, lambda_prev = 1., 0. 

    while abs(lambda_k - lambda_prev) > np.finfo('d').eps:
        x_k = np.matmul(A, x_k)
        x_k = x_k / np.linalg.norm(x_k)

        # Use lambda as the termination condition.
        lambda_prev = lambda_k
        lambda_k = np.dot(x_k.T, np.matmul(A, x_k))

    return lambda_k, x_k

A = np.array([3,1,1,3]).reshape(2,2)

# Compute the dominant eigenvalue and corresponding eigenvector.
w_max, v_max = power_iteration(A)

# Compute the eigenvalues and right eigenvector using numpy.
w, v = np.linalg.eig(A)

# Compare the dominant eigenvalue to numpy.
np.testing.assert_almost_equal(w_max, np.max(np.absolute(w)),)

# Compare the dominant eigenvector to numpy.
np.testing.assert_almost_equal(v_max, v[:, np.argmax(w)])

## 04.07 Inverse and Rayleigh Quotient Iterations

To compute the smallest eigenvalue of matrix $A$, make use of eigenvalues of $A^{-1}$ are reciprocals of $A$.  As a result, smallest eigenvalue of $A$ is the reciprocal of largest eigenvalue of $A^{-1}$.
* In practice, we factorize $A$ and solve for $x_k$ rather than explicitly computing $A^{-1}$.

#### Inverse Iteration
Compute the smallest eigenvalue $\lambda_k$ and eigenvector $x_k$ of the $n \times n$ matrix $A$.
1. Start with some nonzero vector $x_0$.
2. Solve $A y_k = x_{k-1}$ for $y_k$.
2. Compute $x_k = y_k / ||y_k||_{\inf}$ and $\lambda_k = x_k^T \cdot A x_k$.
3. Repeat the previous step until $| \lambda_k - \lambda_{k-1} | \lt \epsilon$.

#### Rayleigh Quotient
Given an approximate eigenvector $x$ for real matrix $A$ find best estimate for eigenvalue $\lambda$ by solving $x \lambda \approxeq A x$ for $\lambda$ using **Rayleigh Quotient**:
$$
\lambda = \frac{x^T A x}{x^T x}
$$

The Rayleigh Quotient can be be combined with the inverse iteration method by solving the shifted matrix $(A - \lambda_k I) y_k = x_{k-1}$ for $y_k$.
* Since the shifted matrix changes with each iteration, the factorization must be repeated each time.

Find the smallest eigenvalue and eigenvector of the $n \times n$ matrix $A$ using inverse iteration.

In [4]:
import numpy as np
import scipy.linalg as la

def inverse_iteration(A):
    """
    Use inverse iteration to compute the smallest eigenvalue of A.
    """
    n = A.shape[0]
    x_k = np.random.random((n,))  # Any nonzero vector.
    lambda_k, lambda_prev = 1., 0. 

    # Factorize A into [L|U].
    # NOTE(mmorais): Use P^T to reorder the right-hand side.
    P, L, U = la.lu(A)

    while abs(lambda_k - lambda_prev) > np.finfo('d').eps:
        # Solve A y_k = x_k for y_k.
        # 1. Solve L yy = x_k for yy.
        yy = la.solve_triangular(L, np.matmul(P.T, x_k), lower=True)
        # 2. Solve U y_k = yy for y_k.
        y_k = la.solve_triangular(U, yy, lower=False)
        x_k = y_k / np.linalg.norm(y_k)

        # Use lambda as the termination condition.
        lambda_prev = lambda_k
        lambda_k = np.dot(x_k.T, np.matmul(A, x_k))

    return lambda_k, x_k

A = np.array([3,1,1,3]).reshape(2,2)

# Compute the smallest eigenvalue and corresponding eigenvector.
w_min, v_min = inverse_iteration(A)

# Compute the eigenvalues and right eigenvector using numpy.
w, v = np.linalg.eig(A)

# Compare the smallest eigenvalue to numpy.
np.testing.assert_almost_equal(w_min, np.min(np.absolute(w)),)

# Compare the smallest eigenvector to numpy.
# NOTE(mmorais): Use absolute value of components of eigenvector.
# Since eigenvectors are not unique, they can be scaled arbitrarily
# (eg multiplied by 1 or -1).
np.testing.assert_almost_equal(np.abs(v_min), np.abs(v[:, np.argmin(w)]))

## 04.08 Deflation

Power iteration and inverse iteration can be used to find the maximum and minimum eigenvalues respectively, but another technique is needed to find all the eigenvalues of a matrix.

**Deflation** is the process of removing known eigenvalues in order to compute the remaining eigenvalues of a matrix.

Let $H$ be a nonsingular matrix such that $H x_1 = \alpha e_1$ where $x_1$ is the first eigenvector of the matrix and $e_1$ is the elementary row matrix.  Then similarity transformation below can be used to transform the $n \times n$ matrix $A$ to the $n-1 \times n-1$ matrix $B$ having eigenvalues $\lambda_2, \cdots, \lambda_n$.
$$
H A H^{-1} = 
\begin{bmatrix}
\lambda_1 & b^T \\
0 & B
\end{bmatrix}
$$

After finding the eigenvalue $\lambda_2$ and eigenvector $y_2$ of $B$, then the following can be used to obtain the eigenvector $x_2$ of the original matrix $A$.
$$
x_2 = H^{-1}
\begin{bmatrix}
\alpha \\
y_2
\end{bmatrix},
\qquad
\alpha = \frac{b^T y_2}{\lambda_2 - \lambda_1}
$$

## 04.09 QR Iteration

#### Simultaneous Iteration
Rather than finding eigenvalue and eigenvector pairs one-by-one, the technique of simultaneous iteration finds all pairs starting from a **matrix** of initial vectors.
$$
X_k = A X_{k-1}
$$
where
* $X_k$ is a matrix of eigenvectors, initialized with random values

#### Orthogonal Iteration
Similar to power iteration, normalization is required.  Use the QR factorization of $X_{k-1}$ at each step of the iteration.
$$
\hat{Q}_k R_k = X_{k-1} \\
X_k = A \hat{Q}_k
$$

Why normalize?
1. Avoid overflow and underflow.
2. Columns of $X_k$ increasingly ill-conditioned, orthonormalization helps improve conditioning. 

#### QR Iteration
Rather than explicitly computing the eigenvectors, use QR factorization to make $A_k$ converge to triangular form from which the eigenvalues can be extracted from the diagonals of $A_k$.
1. Initialize $A_0$ to $A$.
2. Compute QR factorization $Q_k R_k$ of $A_{k-1}$.
3. Update $A_k = R_k Q_k$
4. Repeat starting from step 2.

The eigenvectors of $A$ are obtained from product of $Q_k$ matrices generated during iteration.

#### QR Iteration With Shifts
Shift the matrix $A_{k-1}$ by $\sigma_k$ prior to computing the QR factorization.
1. Initialize $A_0$ to $A$.
2. Compute QR factorization $Q_k R_k$ of $A_{k-1} - \sigma_k I$.
3. Update $A_k = R_k Q_k + \sigma_k I$
4. Repeat starting from step 2.

The choice of shift $\sigma_k$ need only approximate an eigenvalue.
* The lower right corner of $A_k$ is a good approximation. 

**Initial Reduction** Transform $A$ to Hessenberg matrix prior to QR iteration using orthogonal similarity transformation such as Householder transformation.
  * Since Hessenberg is nearly triangular, work per QR iteration is reduced and fewer iterations are required since matrix is already nearly triangular.

#### Cost of QR Iteration

| Matrix | Eigenvalues Only | Eigenvalues & Eigenvectors |
|--------|------------------|----------------------------|
| Symmetric | $\frac{4}{3} n^3$ | $9 n^3$ |
| General | $10 n^3$ | $25 n^3$ |

Interpretation: Cost of obtaining eigenvectors by accumulating orthogonal transformations is high. 

## 04.10 Krylov Subspace Methods

Unlike simultaneous methods such as QR iteration, Krylov methods build up the subspace incrementally.
* Requires only matrix-vector multiplication.
* Useful for large sparse matrices.

**Arnoldi iteration** recurrence used to produce unitary matrix $Q_k$ and upper Hessenberg matrix $H_k$ column-by-column using only matrix-vector multiplication.
* The matrix $H_k$ only provides an approximation to eigenvalues and eigenvectors of $A$.

## 04.11 Jacobi Method

Jacobi method used to compute eigenvalues of real symmetric matrix.
$$
A_{k+1} = J_k^T A_k J_k
$$
where
* $J_k$ is a plane rotation used to annihilate pairs of symmetric entries in $A_k$

Eventually $A$ convergences to diagonal form.
* Eigenvalues are diagonal entries.
* Eigenvectors obtained from the product of plane rotations.

## 04.12 Other Methods for Symmetric Matrices

**Bispectrum** or **Spectrum Slicing** used to determine how many eigenvalues are less than $\sigma$.
* By systemmatically choosing values of $\sigma$ any eigenvalue can be isolated using bisection technique.

**Divide and Conquer** used to find eigenvalues and eigenvectors of symmetric tridiagonal matrices.
* Recursive algorithm which expresses the original matrix as sum of two matrices.

## 04.13 Generalized Eigenvalue Problems and SVD

#### Generalized Eigenvalue Problem
Find the eigenvalues and eigenvectors of $A$ and $B$ which are $n \times n$ matrices. 
$$
A x = \lambda B x
$$

Application to structural vibration problems:
* $A$ represents stiffness
* $B$ represents mass

#### SVD


## Summary: Eigenvalue Problems

* Algorithms for computing eigenvalues and eigenvectors are iterative.
* Power and inverse iteration find one eigenvalue-eigenvector pair.
* QR iteration simultaneously finds all eigenvalue and eigenvectors by transforming matrix to triangular form using orthogonal similarity.
  * An initial reduction of the matrix to Hessenberg form is often used in practice in order to improve efficiency.
* Krylov methods are used for large sparse matrices.
* More specialized methods exist for symmetric matrices.
  * Jacobi
  * Spectrum Slicing
  * Divide and Conquer
* SVD can be computed by a variant of QR iteration.