#### Schur decomposition

According to `Schur theorem`, if $A\in \mathbf{R}^{n \times n}$ is a square real matrix with real eigenvalues, then there exists an orthogonal matrix $Q$ and an upper triangular matrix $T$ such that

$$A=QTQ^T$$

We can show why this is true

##### Setup

For orthogonal matrix $Q_1^T=Q_1^{-1}$ and $Q_1^TQ_1=I$, by letting $Q_1=\begin{bmatrix}q_1 & q_2 & \cdots & q_n\end{bmatrix}$, and $q_1$ being a normalized `eigenvector` of $A$, we can write

$$\begin{align*}
Q_1^TAQ_1&=\begin{bmatrix}q_1^T \\ q_{2:n}^T\end{bmatrix}A\begin{bmatrix}q_1 & q_{2:n}\end{bmatrix} \\
& =\begin{bmatrix}q_1^TAq_1 & q_1^TAq_{2:n} \\ q_{2:n}^TAq_1 & q_{2:n}^TAq_{2:n}\end{bmatrix}
\end{align*}$$

By `construction`, the upper-left block is the corresponding `eigenvalue` $\lambda_1$, since

$$q_1^TAq_1 = q_1^T(\lambda_1q_1)=\lambda_1q_1^Tq_1=\lambda_1$$

For the upper-right block, we denote it $B$

$$B=q_1^TAq_{2:n}=\begin{bmatrix}q_1^TAq_2 & q_1^TAq_3 & \cdots & q_1^TAq_n \end{bmatrix}$$

The lower-left block is zero, since

$$q_{2:n}^TAq_1=\begin{bmatrix}q_2^T \\q_3^T \\ \vdots \\q_n^T \end{bmatrix}Aq_1=\lambda_1\begin{bmatrix}q_2^T \\q_3^T \\ \vdots \\q_n^T \end{bmatrix}q_1=\begin{bmatrix}0 \\0 \\ \vdots \\0 \end{bmatrix}$$

We denote the lower-right block $A_2$ and we have

$$\begin{align*}
Q_1^TAQ_1&=\begin{bmatrix}\lambda_1 & B \\ 0 & A_2\end{bmatrix}
\end{align*}$$

or

$$\begin{align*}
AQ_1&=Q_1\begin{bmatrix}\lambda_1 & B \\ 0 & A_2\end{bmatrix}
\end{align*}$$

In addition, $A_2$ contains the `remaining eigenvalues` $\lambda_2 , \cdots, \lambda_n$ of $A$

To see this, from the properties of similarity transformation, we know that $A$ and $Q_1^TAQ_1$ have the same eigenvalues

Therefore

$$\begin{align*}
\det (A-\lambda I)&=\det (Q_1^TAQ_1-\lambda I) \\
&=\det \begin{bmatrix}\lambda_1-\lambda & B \\ 0 & A_2-\lambda I\end{bmatrix} \\
& \text{property of determinant}\\
&= (\lambda_1-\lambda) \det (A_2-\lambda I)
\end{align*}$$

##### Proof by induction

For matrix of size one, Schur decomposition obviously exists

So starting with matrix of size $n=2$, we know that there exists a Schur decomposition for matrix $A_2$ of size $n-1$:

$$Q_2T_2Q_2^T=A_2$$

or

$$Q_2T_2=A_2Q_2$$

We can proceed with `induction`

If we write

$$Q=Q_1\begin{bmatrix}1 & 0 \\ 0 & Q_2\end{bmatrix}$$

(since both $Q_1$ and $Q_2$ are orthogonal, $Q$ is orthogonal)

we have for matrix $A$ of size $n$

$$\begin{align*}AQ &= A Q_1\begin{bmatrix}1 & 0 \\ 0 & Q_2\end{bmatrix} \\
& = Q_1\begin{bmatrix}\lambda_1 & B \\ 0 & A_2\end{bmatrix}\begin{bmatrix}1 & 0 \\ 0 & Q_2\end{bmatrix} \\
&=Q_1 \begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & A_2Q_2\end{bmatrix} \\
&=Q_1 \begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & Q_2T_2\end{bmatrix} \\
&=Q_1 \begin{bmatrix}1& 0 \\ 0 & Q_2\end{bmatrix}\begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & T_2\end{bmatrix} \\
& = Q\begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & T_2\end{bmatrix}
\end{align*}$$

Now, we can show that Schur decomposition exists for matrix $A$ of size $n$ by letting

$$T=\begin{bmatrix}\lambda_1 & BQ_2 \\ 0 & T_2\end{bmatrix}
$$

##### Upper triangular matrix has eigenvalues on its diagonal

With similarity transformation $A=QTQ^T$, we know that $A$ and $T$ have the same eigenvalues

Let $t_{ii}, i=1, \cdots, n$ be diagonal elements of $T$, then by definition of eigenvalues and the fact that determinant of upper triangular matrix is product of its diagonal elements, we have

$$\det(T-\lambda I)=\prod_{i=1}^n (t_{ii}-\lambda)=0$$

indicating that eigenvalues of $T$, and thus of $A$ are `diagonal elements` in $T$

This is useful as we can obtain eigenvalues of $A$ by transforming $A$ into `Schur form`

If $A$ is `symmetric`, then $T$ is upper triangular and symmetric at the same time, therefore, $T$ must be `diagonal`, meaning that symmetric matrices are diagonalizable, as we have known

#### `Orthogonal` iterations

Recall our power iterations to find the dominant eigenvalue of $A$, by starting from an arbitrary vector $x$ and alternating between computing $y^k=Ax^k$ and normalizing and updating $x^k$ as $x^k=\frac{y^k}{\|y^k\|}$

To get to Schur decomposition, we would like to do the iterations not just on a single vector $x$, but on $n$ orthonormal vectors that form an `orthogonal` matrix $Q_0$

After each multiplication of $A$, we do `QR decomposition` such that we still have a set of orthonormal vectors for the next iteration

$$AQ_i=Q_{i+1}R_{i+1}$$

We can see that, if $Q_i \rightarrow Q$ as $i\rightarrow \infty$, then $R_i$ must also converges to certain upper triangular $R$, and

$$AQ=QR \Rightarrow A=QRQ^T$$

which is exactly the Schur decomposition we need where $T=R$

We can first evaluate orthogonal iterations on a square real matrix to get some feeling

We start with $Q_0=I$

We use `Householder` reflector for QR factorization

From Numerical Linear Algebra (Trefethen and Bau) and other references, often `only the eigenvalues` from orthogonal iterations are of interest, while computation of eigenvectors require more careful treatment... (unless the matrix is `symmetric` and certain conditions are met)

In [1]:
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})

plt.style.use('dark_background')
# color: https://matplotlib.org/stable/gallery/color/named_colors.htm

In [2]:
def householder(A):
    m, n = A.shape
    R = A.copy()
    Q = np.identity(m)

    for i in range(n):
        x = R[i:, i]
        v = np.sign(x[0]) * np.linalg.norm(x) * np.eye(x.shape[0])[:,0] + x
        v /= np.linalg.norm(v)

        # Since all entries in R[i:, :i] are zero from previous iteration
        # applying transformation to R[i:, i:] would suffice
        R[i:, i:] -= 2 * np.outer(v, v) @ R[i:, i:]

        # If Q is needed explicitly
        Q[i:, :] -= 2 * np.outer(v, v) @ Q[i:, :]

    return Q.T, R

def diagonalizable_mat(n):
    # Create diagonal matrix D with eigenvalues
    D = np.diag(np.concatenate((200*np.random.rand(n//2)-100, 0.1*np.random.rand(n-n//2))))

    # Generate a random invertible matrix
    P = np.random.rand(n, n)
    while np.linalg.cond(P) > 1e8:  # Check conditioning
        P = np.random.rand(n, n)

    # Use similarity transformation to create diagonalizable, but nonsymmetric matrix
    return P @ D @ np.linalg.inv(P)

##### `Nonsymmetric` case, with incorrect eigenvectors

In [3]:
np.random.seed(50)
symmetric = False

A_size = 8
A = diagonalizable_mat(A_size)

if symmetric:
    A = (A+A.T)/2

A_original = A.copy()

# Orthogonal iterations
Q = np.eye(A.shape[0])
Q_0 = Q.copy()

num_iter = 51
for i in range(num_iter):
    Q, R = householder(A @ Q)
    # Diagonal elements of R are approximation of eigenvalues
    if i % 10 == 0:
        print(R.diagonal())

# Compare to NumPy
eigenvalues, eigenvectors = np.linalg.eig(A_original)
print(f'\nEigenvalues from NumPy: \n{eigenvalues}')

# if symmetric:
print(f'\nEigenvectors from orthogonal iterations: \n{Q}')
print(f'\nEigenvectors from NumPy: \n{eigenvectors}')

[ 124.4855 -31.3821 -16.4464 -1.9810  0.1356 -0.0717 -0.0621 -0.0092]
[-54.6859 -48.6211 -20.7398 -1.0797  0.0996  0.0772  0.0403  0.0383]
[-54.4858 -48.8133 -20.7340 -1.0797  0.0997  0.0772  0.0406  0.0380]
[-54.4184 -48.8737 -20.7340 -1.0797  0.0997  0.0772  0.0407  0.0378]
[-54.3955 -48.8944 -20.7340 -1.0797  0.0997  0.0772  0.0408  0.0378]
[-54.3876 -48.9015 -20.7340 -1.0797  0.0997  0.0772  0.0408  0.0377]

Eigenvalues from NumPy: 
[-54.3834 -48.9052 -20.7340 -1.0797  0.0997  0.0377  0.0408  0.0772]

Eigenvectors from orthogonal iteration: 
[[-0.1853  0.0967  0.1650  0.4408  0.5211  0.0093 -0.1815  0.6559]
 [-0.5310 -0.2401 -0.1952 -0.2592  0.4173 -0.1730  0.5897 -0.0572]
 [-0.0311  0.4528 -0.0801  0.5991  0.1011 -0.3927  0.1589 -0.4888]
 [-0.2758  0.8186 -0.0661 -0.3594 -0.1931  0.1242  0.1014  0.2393]
 [-0.5249 -0.0264 -0.1851  0.0963  0.1231  0.5326 -0.4718 -0.3985]
 [-0.2318 -0.1040  0.4869  0.3661 -0.4044  0.4110  0.4806  0.0298]
 [-0.3945 -0.2142 -0.4151  0.2182 -0.5715 -0.3

##### `Symmetric` case, with correct eigenvectors

In [4]:
np.random.seed(50)
symmetric = True

A_size = 8
A = diagonalizable_mat(A_size)

if symmetric:
    A = (A+A.T)/2

A_original = A.copy()

# Orthogonal iterations
Q = np.eye(A.shape[0])
Q_0 = Q.copy()

num_iter = 51
for i in range(num_iter):
    Q, R = householder(A @ Q)
    # Diagonal elements of R are approximation of eigenvalues
    if i % 10 == 0:
        print(R.diagonal())

# Compare to NumPy
eigenvalues, eigenvectors = np.linalg.eig(A_original)
print(f'\nEigenvalues from NumPy: \n{eigenvalues}')

# if symmetric:
print(f'\nEigenvectors from orthogonal iterations: \n{Q}')
print(f'\nEigenvectors from NumPy: \n{eigenvectors}')

[ 72.1914 -47.2004 -29.1390  50.0952  25.1993 -8.8881 -0.5009  0.2263]
[-150.1752  105.0129 -68.4672 -36.7843  18.0442  7.6540 -0.2236  0.1030]
[-150.1813  105.0086 -68.4672 -36.7845  18.0440  7.6540 -0.2236  0.1030]
[-150.1813  105.0086 -68.4672 -36.7845  18.0440  7.6540 -0.2236  0.1030]
[-150.1813  105.0086 -68.4672 -36.7845  18.0440  7.6540 -0.2236  0.1030]
[-150.1813  105.0086 -68.4672 -36.7845  18.0440  7.6540 -0.2236  0.1030]

Eigenvalues from NumPy: 
[-150.1813  105.0086 -68.4672 -36.7845  18.0440  7.6540 -0.2236  0.1030]

Eigenvectors from orthogonal iteration: 
[[-0.4485  0.2023  0.0313  0.3642 -0.2758  0.5190 -0.4489 -0.2783]
 [-0.2939 -0.3019  0.0366  0.1027  0.8789  0.1609 -0.0489 -0.0993]
 [ 0.3458 -0.5910 -0.2489 -0.3344 -0.1380  0.1736 -0.3456 -0.4344]
 [-0.4978 -0.0305 -0.7843 -0.2587 -0.0814 -0.0070  0.1274  0.2154]
 [-0.2388 -0.5041  0.4143 -0.1771 -0.2598  0.4032  0.2499  0.4397]
 [-0.0878 -0.3085  0.0108  0.2771 -0.0640 -0.4777 -0.6192  0.4523]
 [-0.4891  0.0778  0.