#### Schur decomposition

According to `Schur theorem`, if $A\in \mathbf{R}^{n \times n}$ is a square real matrix, and for simplicity, with `distinct real` eigenvalues, then there exists an orthogonal matrix $Q$ and an upper triangular matrix $T$ such that

$$A=QTQ^T$$

We can show why this is true

##### Setup

For orthogonal matrix $Q_1^T=Q_1^{-1}$ and $Q_1^TQ_1=I$, by letting $Q_1=\begin{bmatrix}q_1 & q_2 & \cdots & q_n\end{bmatrix}$, and $q_1$ being a normalized `eigenvector` of $A$, we can write

$$\begin{align*}
Q_1^TAQ_1&=\begin{bmatrix}q_1^T \\ q_{2:n}^T\end{bmatrix}A\begin{bmatrix}q_1 & q_{2:n}\end{bmatrix} \\
& =\begin{bmatrix}q_1^TAq_1 & q_1^TAq_{2:n} \\ q_{2:n}^TAq_1 & q_{2:n}^TAq_{2:n}\end{bmatrix}
\end{align*}$$

By `construction`, the upper-left block is the corresponding `eigenvalue` $\lambda_1$, since

$$q_1^TAq_1 = q_1^T(\lambda_1q_1)=\lambda_1q_1^Tq_1=\lambda_1$$

For the upper-right block, we denote it $B$

$$B=q_1^TAq_{2:n}=\begin{bmatrix}q_1^TAq_2 & q_1^TAq_3 & \cdots & q_1^TAq_n \end{bmatrix}$$

The lower-left block is zero, since

$$q_{2:n}^TAq_1=\begin{bmatrix}q_2^T \\q_3^T \\ \vdots \\q_n^T \end{bmatrix}Aq_1=\lambda_1\begin{bmatrix}q_2^T \\q_3^T \\ \vdots \\q_n^T \end{bmatrix}q_1=\begin{bmatrix}0 \\0 \\ \vdots \\0 \end{bmatrix}$$

We denote the lower-right block $A_2$ and we have

$$\begin{align*}
Q_1^TAQ_1&=\begin{bmatrix}\lambda_1 & B \\ 0 & A_2\end{bmatrix}
\end{align*}$$

or

$$\begin{align*}
AQ_1&=Q_1\begin{bmatrix}\lambda_1 & B \\ 0 & A_2\end{bmatrix}
\end{align*}$$

In addition, $A_2$ contains the `remaining eigenvalues` $\lambda_2 , \cdots, \lambda_n$ of $A$

To see this, from the properties of similarity transformation, we know that $A$ and $Q_1^TAQ_1$ have the same eigenvalues

Therefore

$$\begin{align*}
\det (A-\lambda I)&=\det (Q_1^TAQ_1-\lambda I) \\
&=\det \begin{bmatrix}\lambda_1-\lambda & B \\ 0 & A_2-\lambda I\end{bmatrix} \\
& \text{property of determinant}\\
&= (\lambda_1-\lambda) \det (A_2-\lambda I)
\end{align*}$$

##### Proof by induction

For matrix of size one, Schur decomposition obviously exists

So starting with matrix of size $n=2$, we know that there exists a Schur decomposition for matrix $A_2$ of size $n-1$:

$$Q_2T_2Q_2^T=A_2$$

or

$$Q_2T_2=A_2Q_2$$

We can proceed with `induction`

If we write

$$Q=Q_1\begin{bmatrix}1 & 0 \\ 0 & Q_2\end{bmatrix}$$

(since both $Q_1$ and $Q_2$ are orthogonal, $Q$ is orthogonal)

we have for matrix $A$ of size $n$

$$\begin{align*}AQ &= A Q_1\begin{bmatrix}1 & 0 \\ 0 & Q_2\end{bmatrix} \\
& = Q_1\begin{bmatrix}\lambda_1 & B \\ 0 & A_2\end{bmatrix}\begin{bmatrix}1 & 0 \\ 0 & Q_2\end{bmatrix} \\
&=Q_1 \begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & A_2Q_2\end{bmatrix} \\
&=Q_1 \begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & Q_2T_2\end{bmatrix} \\
&=Q_1 \begin{bmatrix}1& 0 \\ 0 & Q_2\end{bmatrix}\begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & T_2\end{bmatrix} \\
& = Q\begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & T_2\end{bmatrix}
\end{align*}$$

Now, we can show that Schur decomposition exists for matrix $A$ of size $n$ by letting

$$T=\begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & T_2\end{bmatrix}
$$

##### Upper triangular matrix has eigenvalues on its diagonal

With similarity transformation $A=QTQ^T$, we know that $A$ and $T$ have the same eigenvalues

Let $t_{ii}, i=1, \cdots, n$ be diagonal elements of $T$, then by definition of eigenvalues and the fact that determinant of upper triangular matrix is product of its diagonal elements, we have

$$\det(T-\lambda I)=\prod_{i=1}^n (t_{ii}-\lambda)=0$$

indicating that eigenvalues of $T$, and thus of $A$ are `diagonal elements` in $T$

This is useful as we can obtain eigenvalues of $A$ by transforming $A$ into `Schur form`

If $A$ is `symmetric`, then $T$ is upper triangular and symmetric at the same time, therefore, $T$ must be `diagonal`, meaning that symmetric matrices are diagonalizable, as we have known

#### Orthogonal iterations

Recall our power iterations to find the dominant eigenvalue of $A$, by starting from an arbitrary vector $x$ and alternating between computing $y^k=Ax^k$ and normalizing and updating $x^k$ as $x^k=\frac{y^k}{\|y^k\|}$

To get to Schur decomposition, we would like to do the iterations not just on a single vector $x$, but on $n$ orthonormal vectors that form an `orthogonal` matrix $Q_0$

After each multiplication of $A$, we do `QR decomposition` such that we still have a set of orthonormal vectors for the next iteration

$$AQ_i=Q_{i+1}R_{i+1}$$

We can see that, if $Q_i \rightarrow Q$ as $i\rightarrow \infty$, then $R_i$ must also converges to certain upper triangular $R$, and

$$AQ=QR \Rightarrow A=QRQ^T$$

which is exactly the Schur decomposition we need where $T=R$

#### Compute eigenvectors from Schur form

Once we have the Schur form $AQ=QT$, if we plug in `eigendecomposition` of the upper triangular $T$, that is $TV=V\Lambda$ (since $A$ and $T$ are similar, therefore $T$ has distinct real eigenvalues and is diagonalizable), we have

$$AQV=QTV=Q(TV)=QV\Lambda$$

We see that, by definition, $QV$ must be the `eigenvectors` of $A$

This provides a way to compute eigenvectors of $A$ from orthogonal iterations

Assume we want to compute `ith` eigenvector of $T$, then we can let it be of the form

$$v_i=\begin{bmatrix}\times \\ \times \\ \vdots \\ \times \\ 1 \\ 0 \\ \vdots \\ 0 \\ 0 \end{bmatrix}$$

That is, the leading $(i-1)$ entries are not necessarily 0, the `ith` entry is 1, and the remaining are all zero

Plug in definition of eigenvalue/vector, we have

$$Tv_i=t_{ii}v_i \Longrightarrow (T-t_{ii}I)v_i=0$$

This is a system of $(i-1)$ equations that is already in upper triangular form, which can be solved using back substitution

#### Example

We can first evaluate orthogonal iterations on a square real matrix to get some feeling

We start with $Q_0=I$

We use `Householder` reflector for QR factorization

In [1]:
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})

plt.style.use('dark_background')
# color: https://matplotlib.org/stable/gallery/color/named_colors.htm

In [2]:
def householder(A):
    m, n = A.shape
    R = A.copy()
    Q = np.identity(m)

    # For fat matrices, we only process up to the mth column
    for i in range(min(m, n)):
        x = R[i:, i]
        if np.allclose(x, 0):
            continue  # Skip if x is a zero vector

        v = x.copy()
        sng = np.sign(x[0]) if x[0] != 0 else 1.0 # Per convention in NLA
        v[0] += sng * np.linalg.norm(x)
        v /= np.linalg.norm(v)

        # Since all entries in R[i:, :i] are zero from previous iteration
        # applying transformation to R[i:, i:] would suffice
        R[i:, i:] -= 2 * np.outer(v, v) @ R[i:, i:]

        # If Q is needed explicitly
        Q[i:, :] -= 2 * np.outer(v, v) @ Q[i:, :]

    return Q.T, R

def back_substitution(R, b):
    m, n = R.shape
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (b[i] - np.dot(R[i, i + 1:], x[i + 1:])) / R[i, i]
    return x

def diagonalizable_mat(n):
    # Create diagonal matrix D with eigenvalues
    D = np.diag(np.concatenate((200*np.random.rand(n//2)-100, 0.1*np.random.rand(n-n//2))))

    # Generate a random invertible matrix
    P = np.random.rand(n, n)
    while np.linalg.cond(P) > 1e8:  # Check conditioning
        P = np.random.rand(n, n)

    # Use similarity transformation to create diagonalizable, but nonsymmetric matrix
    return P @ D @ np.linalg.inv(P)

In [3]:
np.random.seed(50)
symmetric = False

A_size = 8
A = diagonalizable_mat(A_size)

if symmetric:
    A = (A+A.T)/2

A_original = A.copy()

# Orthogonal iterations
Q = np.eye(A.shape[0])
Q_0 = Q.copy()

num_iter = 201
for i in range(num_iter):
    Q, R = householder(A @ Q)
    # Diagonal elements of R are approximation of eigenvalues
    if i % 40 == 0:
        print(R.diagonal())

v = []
n = R.shape[0]
for i in range(n):
    R_sub = R[:i, :i] - R[i, i] * np.eye(i) # Subportion of R
    b = -R[:i, i] # - Truncated ith column of R as b
    if i > 0:
        v_upper = back_substitution(R_sub, b)
    else:
        v_upper = np.array([])  # Empty array when i == 0

    v_i = np.zeros(n) # Initialize zero vector
    v_i[:i] = v_upper # Fill in portion before ith entry
    v_i[i] = 1 # ith entry is always one
    v_i /= np.linalg.norm(v_i)
    v.append(v_i)

v = np.column_stack(v)
# r, v = np.linalg.eig(R)
print(f'\nEigenvectors from orthogonal iterations: \n{Q @ v}')

# Compare to NumPy
eigenvalues, eigenvectors = np.linalg.eig(A_original)
print(f'\nEigenvalues from NumPy: \n{eigenvalues}')
print(f'\nEigenvectors from NumPy: \n{eigenvectors}')

[ 124.4855 -31.3821 -16.4464 -1.9810  0.1356 -0.0717 -0.0621 -0.0092]
[-54.3955 -48.8944 -20.7340 -1.0797  0.0997  0.0772  0.0408  0.0378]
[-54.3836 -48.9051 -20.7340 -1.0797  0.0997  0.0772  0.0408  0.0377]
[-54.3834 -48.9052 -20.7340 -1.0797  0.0997  0.0772  0.0408  0.0377]
[-54.3834 -48.9052 -20.7340 -1.0797  0.0997  0.0772  0.0408  0.0377]
[-54.3834 -48.9052 -20.7340 -1.0797  0.0997  0.0772  0.0408  0.0377]

Eigenvectors from orthogonal iterations: 
[[-0.1853  0.2079  0.2574  0.3834  0.5393 -0.3564  0.4128 -0.0885]
 [-0.5309  0.4047  0.2861  0.1581  0.2906 -0.3637  0.4213 -0.3085]
 [-0.0312  0.1962  0.0413  0.3369  0.5134 -0.4545  0.4259 -0.1094]
 [-0.2760  0.5588  0.2974  0.3694  0.3603 -0.1413  0.2719 -0.4155]
 [-0.5249  0.4780  0.3161  0.3440  0.4335 -0.0795  0.3295 -0.5583]
 [-0.2318  0.1769  0.4512  0.4619  0.1416 -0.1137  0.3809 -0.2846]
 [-0.3945  0.2875  0.0537  0.1145  0.0623 -0.3439  0.2169 -0.0845]
 [-0.3490  0.3160  0.6748  0.4830  0.1397 -0.6137  0.3099 -0.5591]]

Eige