## Methods for eigenvalues

**Problem of interest**

Given $m$-by-$m$ matrix $A$, find eigenvalues and eigenvectors.

**Settings/Notation**

| symbol | meaning |
|---|---|
| $A$ | matrix whose eigenvalues are sought ($m$-by-$m$ or $n$-by-$n$)  |
| $\lambda_i$ | eigenvalues in decreasing order in modulus $\vert \lambda_1 \vert \ge \vert \lambda_2 \vert \ge \cdots$ |
| $v_i$ | eigenvectors associated to $\lambda_i$ |
| $\sigma(A)$ | spectrum of $A$, i.e., the set of eigenvalues |
| $M_n(R)$, $M_n(\mathbb{C})$| set of all $n$-by-$n$ real and complex matrices respectively | 

**Remark**

- Charateristic polynomial is not a good approach due to ill-conditioned nature of root-finding of polynomials.
  - For $A=\mathrm{diag}(1,2,\cdots,20)$, we have a characteristic polynomial: $\mathrm{det}(xI - A)=(x-1)(x-2)\cdots(x-20)$ (Wilkinson polynomial).
  - This is so ill-conditioned, no root-finding method can find the roots in a satisfactory manner. (Try it!) 
- There is no direct method for computing eigenvalues ([Sauer (2017) p. 556]). --> iterations

### Preliminary

**Definition** (Gershgorin disks; Salgado and Wise (2022) p. 200)

Let $n \geq 2$ and $\mathrm{A} \in \mathbb{C}^{n \times n}$. The Gershgorin disks $D_i$ of $\mathrm{A}$ are
$$
D_i=\left\{z \in \mathbb{C}|| z-a_{i, i} \mid \leq R_i\right\}, \quad R_i=\sum_{\substack{j=1 \\ j \neq i}}^n\left|a_{i, j}\right|, \quad i=1, \ldots, n,
$$


**Theorem** (Gershgorin Circle Theorem; Salgado and Wise (2022) p. 200) 

Let $n \geq 2$ and $\mathrm{A}=\left[a_{i, j}\right] \in \mathbb{C}^{n \times n}$. Then
$$
\sigma(\mathrm{A}) \subset \bigcup_{i=1}^n D_i
$$

![Gershgorin disks](https://www.wolfram.com/language/12/complex-visualization/assets.en/gerschgorin-disks/O_47.png)

Figure: Wolfram (Gershgorin disks)

###### proof

Suppose that $(\lambda, \boldsymbol{w})$ is an eigenpair of $A$. Then
$$
\sum_{j=1}^n a_{i, j} w_j=\lambda w_i, \quad i=1,2, \ldots, n .
$$

Suppose that
$$
\left|w_k\right|=\|\boldsymbol{w}\|_{\infty}=\max _{i=1}^n\left|w_i\right| .
$$

Observe that $w_k \neq 0$, since $\boldsymbol{w} \neq \mathbf{0}$. Then
$$
\begin{aligned}
\left|\lambda-a_{k, k}\right| \cdot\left|w_k\right| & =\left|\lambda w_k-a_{k, k} w_k\right| \\
& =\left|\sum_{j=1}^n a_{k, j} w_j-a_{k, k} w_k\right| \\
& =\left|\sum_{\substack{j=1 \\
j \neq k}}^n a_{k, j} w_j\right| \\
& \leq \sum_{\substack{j=1 \\
j \neq k}}^n\left|a_{k, j} w_j\right| \\
& \leq \sum_{\substack{j=1 \\
j \neq k}}^n\left|a_{k, j}\right| \cdot\left|w_k\right| \\
& =R_k\left|w_k\right| .
\end{aligned}
$$

Thus, $\lambda \in D_k$.

**Remark**

- The theorem does not say one-to-one correspondence between a disk and an eigenvalue.
  - $k$ is simply the index of largest entry in modulus of $\boldsymbol{w}$, and two eigenvectors may have the largest entry at the index.
- But the next theorem gives such an impression, showing the subtlety of the problem.

**Theorem** (Gershgorin Second Theorem; Salgado and Wise (2022) p. 201). 

Let $n \geq 2$ and $\mathrm{A} \in \mathbb{C}^{n \times n}$. Suppose that $1 \leq p \leq n-1$ and that the Gershgorin disks of the matrix A can be divided into disjoint subsets $D^{(p)}$ and $D^{(q)}$ containing $p$ and $q=n-p$ disks, respectively. Then the union of the disks in $D^{(p)}$ contains $p$ eigenvalues, and the union of the disks in $D^{(q)}$ contains $q$ eigenvalues, counting multiplicities. In particular, if one disk is disjoint from all the others, it contains exactly one eigenvalue. And, if all of the disks are disjoint, then each contains exactly one eigenvalue.

### Power iteration

**Terminology**

- Dominant eigenvalue: $\lambda_1$ (eigenvalue of the largest modulus)
- Dominant eigenvector: $v_1$ (eigenvector associated to the dominant eigenvalue)

**Idea**

Repeat application of $A$ turns a vector toward its dominant eigenvector.


![Power iteration](https://www.gastonsanchez.com/matrix4sl/matrix4sl_files/figure-html/power-method-example-1.png)

Figure: Gaston Sanchez (Illustration of Power iteration)

**Example**

Let $A=\left[\begin{array}{ll}1 & 3 \\ 2 & 2\end{array}\right]$, and $x_0=\left[\begin{array}{r}-5 \\ 5\end{array}\right]$. Then,

$$
\begin{aligned} & x_1=A x_0=\left[\begin{array}{ll}1 & 3 \\ 2 & 2\end{array}\right]\left[\begin{array}{r}-5 \\ 5\end{array}\right]=\left[\begin{array}{r}10 \\ 0\end{array}\right] \\ & x_2=A^2 x_0=\left[\begin{array}{ll}1 & 3 \\ 2 & 2\end{array}\right]\left[\begin{array}{r}10 \\ 0\end{array}\right]=\left[\begin{array}{l}10 \\ 20\end{array}\right] \\ & x_3=A^3 x_0=\left[\begin{array}{ll}1 & 3 \\ 2 & 2\end{array}\right]\left[\begin{array}{l}10 \\ 20\end{array}\right]=\left[\begin{array}{l}70 \\ 60\end{array}\right] \\ & x_4=A^4 x_0=\left[\begin{array}{ll}1 & 3 \\ 2 & 2\end{array}\right]\left[\begin{array}{l}70 \\ 60\end{array}\right]=\left[\begin{array}{l}250 \\ 260\end{array}\right]=260\left[\begin{array}{r}\frac{25}{26} \\ 1\end{array}\right] .\end{aligned}
$$



More vividly, use the knowledge of eigenpairs

$$
(\lambda_1, v_1) = (4, [1,1]^T), \qquad (\lambda_2, v_2) = (−1, [−3,2]^T),
$$

and expand the initial vector

$$
x_0=1\left[\begin{array}{l}
1 \\
1
\end{array}\right]+2\left[\begin{array}{r}
-3 \\
2
\end{array}\right]
$$
Then, the above calculations read:
$$
\begin{aligned}
x_1=A x_0 & =4\left[\begin{array}{l}
1 \\
1
\end{array}\right]-2\left[\begin{array}{r}
-3 \\
2
\end{array}\right] \\
x_2=A^2 x_0 & =4^2\left[\begin{array}{l}
1 \\
1
\end{array}\right]+2\left[\begin{array}{r}
-3 \\
2
\end{array}\right] \\
x_3=A^3 x_0 & =4^3\left[\begin{array}{l}
1 \\
1
\end{array}\right]-2\left[\begin{array}{r}
-3 \\
2
\end{array}\right] \\
x_4=A^4 x_0 & =4^4\left[\begin{array}{l}
1 \\
1
\end{array}\right]+2\left[\begin{array}{r}
-3 \\
2
\end{array}\right] \\
& =256\left[\begin{array}{l}
1 \\
1
\end{array}\right]+2\left[\begin{array}{r}
-3 \\
2
\end{array}\right] .
\end{aligned}
$$

If we introduce normalization, we can focus on the eigen-direction itself. (See the algorithm below)


![Power iteration](https://www.gastonsanchez.com/matrix4sl/matrix4sl_files/figure-html/power-method-rescale-1.png)

Figure: Gaston Sanchez (Illustration of Power iteration with normalization)

**Algorithm** (Power iteration; Sauer (2017) p. 559)

- Given
  - $A$: matrix
  - $x_0$: initial guess vector
- **For** $j=1,2,\cdots$
  - $u_{j-1}=x_{j-1} /\left\|x_{j-1}\right\|_2$
  - $x_j=A u_{j-1}$
  - $\lambda_j=u_{j-1}^T x_j$
- Return
  - $u_{j}=x_{j} /\left\|x_{j}\right\|_2$
  - $\lambda_j$

**Example** 

Implement Power Iteration with a 4-by-4 symmetric matrix, $A$, and an asymmetric one, $B$.

$$
A = \begin{bmatrix}  
                1 & 2 & 3 & 4 \\
                4 & 5 & 6 & 7 \\
                2 & 1 & 5 & 0 \\
                4 & 2 & 1 & 0 
    \end{bmatrix}
    \quad
B = \begin{bmatrix} 
                1 & 2 & 2 & 4 \\
                2 & 5 & 6 & 2 \\
                2 & 6 & 5 & 0 \\
                4 & 2 & 0 & 0 
    \end{bmatrix}


In [1]:
import numpy as np

# DATA
#   1. asymmetric matrix
A = np.array([  [1 , 2 , 3 , 4],
                [4 , 5 , 6 , 7],
                [2 , 1 , 5 , 0],
                [4 , 2 , 1 , 0]], dtype=np.float64)

#   2. symmetric matrix
B = np.array([  [1 , 2 , 2 , 4],
                [2 , 5 , 6 , 2],
                [2 , 6 , 5 , 0],
                [4 , 2 , 0 , 0]], dtype=np.float64)


In [2]:
def power_iter(A, x0=None, max_iter=20):
    """
    Return an approximate eigenvector associated with 
    the largest eigenvalue in modulus.

    Input:
        A (2D array): matrix of interest
        x0 (1D array): initial guess
        max_iter (int): maximum number of iteration of power iteration
    Output:

    """
    # initial guess
    if x0 is None:
        x = np.zeros(A.shape[1])
        x[0] = 1.               # x = [1., 0, 0, ..., 0]
    else:
        x = x0
    
    # main loop of Power iteration
    for k in range(max_iter):
        u = x / np.linalg.norm(x)
        x = A @ u
        lamb = np.dot(u, x)    # same as u^T A u: Rayleigh quotient
    
    u = x / np.linalg.norm(x)

    return lamb, u

Clicker question

In [18]:
# Power iteration with A and B
lamb, u = power_iter(A, max_iter=20)

print(A@u - lamb*u)

[-9.97490979e-09 -1.42785606e-08 -6.35508535e-10  5.48972601e-09]


**Definition** (Rayleigh quotient)

Given an $m$-by-$m$ matrix $A$ and a vector $x$ of length $m$,  

$$
R(A, x)=\frac{x^T A x}{x^T x}
$$

is called *Rayleigh quotient*. 

**Remark** 

- $R(A, x)$ is the best approximation of an eigenvalue provided $x$ points to a similar direction to an eigenvector.
  - If $x$ is an eigenvector associated to an eigenvalue $\lambda$, 

$$
Ax=\lambda x \implies x^T Ax=\lambda x^T x \implies \lambda =\frac{x^T A x}{x^T x}.
$$

- Least square interpretation (Sauer (2017) p. 559)
  - Consider the eigenvalue equation $x\lambda = Ax$, where x is an approximate eigenvector and $\lambda$ is unknown. Looked at this way, the coefficient matrix is the n × 1 matrix $x$. The normal equations say that the least squares answer is the solution of $x^T x\lambda = x^T Ax$, or $\lambda=(x^T Ax)/(x^Tx)$.

**Theorem** (Convergence of Power iteration; Sauer (2017 p. 560))

Let $A$ be an $m \times m$ matrix with real eigenvalues $\lambda_1, \ldots, \lambda_m$ satisfying $\left|\lambda_1\right|>\left|\lambda_2\right| \geq$ $\left|\lambda_3\right| \geq \cdots \geq\left|\lambda_m\right|$. Assume that the eigenvectors of $A$ span $R^m$. For almost every initial vector, Power Iteration converges linearly to an eigenvector associated to $\lambda_1$ with convergence rate constant $S=\left|\lambda_2 / \lambda_1\right|$.

Notice the first inequality is strict.

Proof: 

##### Spectral theorem

There are several versions of spectral theorems. 

**Definition** (Hermitian matrix)

A complex matrix is called *Hermitian* if $A^H=A$, where $A^H = \overline{A}^T$ (conjugate transpose). In particular, if $A$ is a real matrix, Hermitian and symmetric are equivalent.

**Theorem** (Horn and Johnson (2013) Matrix analysis 2ed. Theorem 4.1.5. p. 229) 

A matrix $A \in M_n$ is Hermitian if and only if there is a unitary $U \in M_n$ and a real diagonal $\Lambda \in M_n$ such that $A=U \Lambda U^*$, where $M_n$ is the set of $n$-by- $n$ complex matrices. Moreover, $A$ is real and Hermitian (that is, real symmetric) if and only if there is a real orthogonal $P \in M_n$ and a real diagonal $\Lambda \in M_n$ such that $A=P \Lambda P^T$.

**Remark**

- Observe the subtlety of the statement: If $A$ is symmetric as a complex matrix, then the conclusion is different. (See e.g., [Wikipedia - Complex symmetric matrices](https://en.wikipedia.org/wiki/Symmetric_matrix#Complex_symmetric_matrices))

**Theorem** (Spectral theorem for symmetric matrix; Lay, Lay, McDonald (2014) Linear Algebra and its applications. p. 399)

An $m \times m$ symmetric matrix $A$ has the following properties:
- $A$ has $m$ real eigenvalues, counting multiplicities.
- The dimension of the eigenspace for each eigenvalue $\lambda$ equals the multiplicity of $\lambda$ as a root of the characteristic equation.
- The eigenspaces are mutually orthogonal, in the sense that eigenvectors corresponding to different eigenvalues are orthogonal.
- $A$ is orthogonally diagonalizable.
  - Hence, there is a orthonormal basis for $R^m$ consisting of eigenvectors of $A$.

#### Inverse power iteration and shifted Inverse power iteration

**Lemma** (HW00 #3 of Math 104B 2024 Winter)

Let $A\in R^{m\times m}$ be nonsingular. Then, $\lambda$ is an eigenvalue of $A$ if and only if $\lambda^{-1}$ is an eigenvalue of $A^{-1}$. Also, the corresponding eigenvectors are the same.

**Lemma** (HW07 #2 of Math 104B 2024 Winter)

$\lambda$ is an eigenvalue of $A\in R^{m\times m}$ if and only if $\kappa\lambda+\mu$ is an eigenvalue of $\kappa A+\mu I$, where $\kappa(\neq0), \mu\in R$. Also, the eigenvectors of $A$ and $\kappa A+\mu I$ associated with $\lambda$ and $\kappa\lambda + \mu$, respectively, are the same.
		

**Observation 1**

- Replace $A$ with $A^{-1}$, 
  - $x_j=A u_{j-1}$ $\xrightarrow{\text{replace}}$ $x_j=A^{-1} u_{j-1}$ $\Longleftrightarrow$ solve $Ax_j=u_{j-1}$ for $x_j$
  - This will result in the largest eigenvalue of $A^{-1}$ in modulus, i.e., the smallest eigenvalue of $A$ in modulus.

**Observation 2** 

- Further replace $A^{-1}$ with $(A-sI)^{-1}$,
  - solve $Ax_j=u_{j-1}$ for $x_j$ $\xrightarrow{\text{replace}}$ solve $(A-sI)x_j=u_{j-1}$ for $x_j$
  - This will result in the smallest eigenvalue of $A-sI$ in modulus among $|\lambda_1 - s|, |\lambda_2 - s|, \cdots, |\lambda_k - s|$. In other words, the *closest* eigenvalue to $s$.
  - If we have good candidates of eigenvalues, we can take them as $s$, and find the eigenvalues nearby.

**Algorithm** ((shifted) Inverse Power iteration; Sauer (2017) p. 561)

- Given
  - $A$: matrix
  - $x_0$: initial guess vector
  - $s$: shift
- **For** $j=1,2,\cdots$
  - $u_{j-1}=x_{j-1} /\left\|x_{j-1}\right\|_2$
  - Solve $(A-sI)x_j=u_{j-1}$
  - $\lambda_j=u_{j-1}^T x_j$
- Return
  - $u_{j}=x_{j} /\left\|x_{j}\right\|_2$
  - $s+\lambda_j^{-1}$

Clicker Question

**Remark**

- Once an eigenvalue $\mu$ of $(A-sI)^{-1}$ is found, the eigenvalue of $A$ found is $s+\mu^{-1}$.
- This technique depends on the fact that the eigenvectors are all the same for shift ($A-sI$) and inversion ($A^{-1}$).

#### Rayleigh quotient iteration

**Idea**

- Inverse Power Iteration will perform better if good shifts are provided.
- Use Rayleigh quotient for the shifts.

**Algorithm** (Rayleigh quotient iteration; Sauer (2017) p. 561)

- Given
  - $A$: matrix
  - $x_0$: initial guess vector
- **For** $j=1,2,\cdots$
  - $u_{j-1}=x_{j-1} /\left\|x_{j-1}\right\|_2$
  - $\lambda_{j-1}=u_{j-1}^T Au_{j-1}$
  - Solve $(A-\lambda_{j-1}I)x_j=u_{j-1}$
- Return
  - $u_{j}=x_{j} /\left\|x_{j}\right\|_2$
  - $\lambda_{j}=u_{j}^T Au_{j}$

**Remark** (Convergence of Rayleigh Quotient Iteration)

- (Convergence)
  - While Inverse Power Iteration converges linearly, Rayleigh Quotient Iteration is quadratically convergent for simple (nonrepeated) eigenvalues and will converge cubically if the matrix is symmetric. [Sauer (2017) p. 562]
  - After convergence, the matrix $A − \lambda_{j−1} I$ is singular and no more steps can be performed. As a result, trial and error should be used to stop the iteration just before this occurs. [Sauer (2017) p. 562]
- (Complexity) 
  - Inverse Power Iteration requires only one LU factorization for $A-sI$; but for Rayleigh Quotient Iteration, each step requires a new factorization for $A − \lambda_{j−1} I$ since the shift has changed. [Sauer (2017) p. 563]
  - Even so, Rayleigh Quotient Iteration is the fastest converging method among what we have presented in this section on finding one eigenvalue at a time. [Sauer (2017) p. 563]

### QR algorithm

![QR algorithm outline](../images/fig_QRalgorithmOutline.png)

**Remark** 

- QR algorithm finds all eigenvalues at once. 
- This algorithm relies on ideas that are far from natural as it stands. So, we approach QR algorithm trying to understand what's already known rather than trying to derive it.
<!-- - The performance of QR algorithm is sensitive to how eigenvalues are located. We discuss favorable settings to more general settings. -->

#### Preliminary

**Theorem** (Schur triangularization)

Every square matrix is unitarily similar to an upper triangular matrix. That is, given $m$-by-$m$ matrix $A$, there exists a square matrix $Q$ with $Q^H Q = QQ^H=I$ such that $A = Q T Q^H$, where $T$ is an $m$-by-$m$ upper triangular matrix. This factorization is called *Schur decomposition, factorization, or triangularization*.

**Definition** (Real Schur form)

A matrix $T$ has real Schur form if it is upper triangular, except possibly for $2 \times 2$ blocks on the main diagonal.

$$
\left[\begin{array}{lllll}
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
& \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
& & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
& & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
& & & & \mathrm{*}
\end{array}\right]
$$

**Theorem** (Eigenvalues of real Schur form)

- The determinant of a matrix in real Schur form is the product of the determinants of the $1 \times 1$ and $2 \times 2$ blocks on the main diagonal. 
- The eigenvalues of a matrix in real Schur form are the eigenvalues of the $1 \times 1$ and $2 \times 2$ blocks on the main diagonal.

Proof: HW problem

**Theorem** (Existence of Real Schur form; Horn and Johnson (2013) Matrix Analysis p. 103)

Let $A$ be a real $n$-by-$n$ matrix. Then, there is an $n$-by-$n$ real orthogonal matrix $Q$ such that $Q^T A Q$ is a real upper block-triangular matrix (real Schur form) with the following properties: (i) its 1-by-1 diagonal blocks display the real eigenvalues of $A$; (ii) each of its 2-by-2 diagonal blocks has a conjugate pair of non-real eigenvalues (but no special form); (iii) the ordering of its diagonal blocks may be prescribed in the following sense: If the real eigenvalues and conjugate pairs of non-real eigenvalues of A are listed in a prescribed order, then the real eigenvalues and conjugate pairs of non-real eigenvalues of the respective diagonal blocks $A_{(1)}, \ldots, A_{(p)}$ of $Q^T A Q$ are in the same order.

$$
\left[\begin{array}{llll}
A_{(1)} & & & \star \\
& A_{(2)}& & \\
& & \ddots & \\
0 & & & A_{(p)}
\end{array}\right]
$$

**Algorithm** (Unshifted QR algorithm)

- Given
  - $A$: a symmetric matrix
- $A_0 = A$
- **For** $j=0,1,2,\cdots$
  - $Q_{j}R_{j}=A_{j}$ (QR factorization)
  - $A_{j+1}=R_{j}Q_{j}$

**Remark**

- Loosely speaking, the full QR algorithm iteratively moves an arbitrary matrix A toward its real Schur factorization by a series of similarity transformations. [Sauer (2017) p. 568]

**Theorem** (Convergence of QR algorithm)

Assume that $A$ is a symmetric $m \times m$ matrix with eigenvalues $\lambda_i$ satisfying $\left|\lambda_1\right|>$ $\left|\lambda_2\right|>\cdots>\left|\lambda_m\right|$. The unshifted QR algorithm converges linearly to the eigenvectors and eigenvalues of $A$. As $j \rightarrow \infty$, $A_j$ converges to a diagonal matrix containing the eigenvalues on the main diagonal and $\bar{Q}_j=Q_1 \cdots Q_j$ converges to an orthogonal matrix whose columns are the eigenvectors.

We don't prove the whole claim, but check some pieces of facts that help us understand the algorithm.

**Fact**

- $A_j$'s in the QR algorithm are all similar. 
  - Therefore, they have the same eigenvalues.
$$
A_{j+1} = R_j Q_j = (Q_j^T Q_j) R_j Q_j =Q_j^T (Q_j R_j) Q_j = Q_j^T A_j Q_j,
$$
where we used the orthogonality of $Q_j$.
- If we repeat this over $j$, we have, for $j=0,1,2,\cdots$,
$$
A_{j+1} = \bar Q_j^T A \bar Q_j, \text{ where } \bar Q_j = Q_0 Q_1 Q_{2}\cdots Q_j 
$$

**Remark**

- The above theorem does **not** say that we can apply unshifted QR algorithm only to symmetric matrices. It only says that we can say something for certain when the situations are favorable.

**Example** (Modification of Kincaid and Cheney (2002) p. 301)

Carry out QR algorithm for 10 iterations to the following symmetric matrix.
$$
A=\left[\begin{array}{cccc}
1 & 2 & 2 & 4\\
2 & 5 & 6 & 2\\
2 & 6 & 5 & 0\\
4 & 2 & 0 & 0
\end{array}\right]
$$

<!-- $$
A=\left[\begin{array}{cccc}
1 & 2 & 3 & 4 \\
4 & 5 & 6 & 7 \\
2 & 1 & 5 & 0 \\
4 & 2 & 1 & 0
\end{array}\right]
$$ -->

<!-- $$
A=\left[\begin{array}{cc}
{[1]} & {\left[\begin{array}{lll}
2 & 3 & 4
\end{array}\right]} \\
{\left[\begin{array}{l}
4 \\
2 \\
4
\end{array}\right]} & {\left[\begin{array}{lll}
5 & 6 & 7 \\
1 & 5 & 0 \\
2 & 1 & 0
\end{array}\right]}
\end{array}\right]
$$ -->

In [55]:
import numpy as np
from internallib import qr

def qr_alg_unshift(A, max_iter=10):
    """
    Return approximate Schur factorization using QR algorithm.

    Input:
        A (array): A square matrix
    Output:
        T (array): approximate Schur form of A
        U (array): The unitary matrix involved in the similarity T=U^H*A*U
    Note: 
        - When A is a real matrix, T is an approxiate real Schur form.
        In this case, U is an orthogonal matrix.
        - 
    """
    m = A.shape[0]
    T = A.copy()
    U = np.eye(m)
    for _ in range(max_iter):
        Q, R = qr(T)
        T = R @ Q
        U = U @ Q
    
    return T, U


**Remark**

- The decision of right-multiplication by `Q` for `U` is based on the property of the QR algorithm.
  - For $j=0,1,2,\cdots$,
$$
A_{j+1} = \bar Q_j^T A \bar Q_j, \text{ where } \bar Q_j = Q_0 Q_1 Q_{2}\cdots Q_j 
$$

In [74]:
"""
Suggested implementation
- max_iter = 10, 100, 1000
"""

A = np.array([  [1 , 2 , 2 , 4],
                [2 , 5 , 6 , 2],
                [2 , 6 , 5 , 0],
                [4 , 2 , 0 , 0]], dtype=np.float64)

T, U = qr_alg_unshift(A, max_iter=100)

with np.printoptions(precision=5, suppress=True):
    print(A)
    print(T)
    print(np.allclose(A, U @ T @ U.T))
    print(np.allclose(A @ U, np.diag(T)*U)) 

[[1. 2. 2. 4.]
 [2. 5. 6. 2.]
 [2. 6. 5. 0.]
 [4. 2. 0. 0.]]
[[12.25824  0.       0.       0.     ]
 [-0.      -3.95885 -0.00008  0.     ]
 [-0.      -0.00008  3.52016  0.     ]
 [ 0.       0.      -0.      -0.81954]]
True
False


**Plan** 

We will improve (unshifted) QR algorithm as follows.

1. Convert a matrix into upper Hessenberg form.
    -  Introduce as many zeros as possible.
2. Apply shifting to QR algorithm.
    -  Accelerate finding an eigenvalue.
3. Deflate into smaller matrices.
   - "Harvest" an eigenvalue and narrow down to smaller problem.

___

#### Hessenberg form

- Efficiency of the QR algorithm increases considerably if we first put $A$ into upper Hessenberg form. [Sauer (2017) p. 570]
  - Hessenberg form introduces as many zeros into $A$ as possible while preserving all eigenvalues. 
- Upper Hessenberg form eliminates the final difficulty—convergence to multiple complex eigenvalues— and the QR iteration will always proceed to 1 × 1 or 2 × 2 blocks. [Sauer (2017) p. 570]

**Definition** (Hessenberg form)

The $m \times n$ matrix $A$ is in upper Hessenberg form if $a_{i j}=0$ for $i>j+1$.

$$
\left[\begin{array}{lllll}
\times & \times & \times & \times & \times \\
\times & \times & \times & \times & \times \\
& \times & \times & \times & \times \\
& & \times & \times & \times \\
& & & \times & \times
\end{array}\right]
$$


**Theorem** (Hessenberg form)

Let $A$ be a real square matrix. There exists an orthogonal matrix $Q$ such that $A=Q B Q^T$ and $B$ is in upper Hessenberg form.

Proof: See construction below.

##### Construction of Hessenberg form

**Idea**: A similar to QR factorization via Householder reflector yields Hessenberg form. But if we do that less agressive, meaning, zeroing only the below subdiagonal, as opposed to directly below diagonal, we obtain similiarity. 

1.a. Zero out 3rd entry and below of the first column.

$$
H_1 A=\left[\begin{array}{c:cccc}
1 & 0 & 0 & 0 & 0 \\
\hdashline 0 & & & \\
0 & & \hat{H}_1 & & \\
0 & & & & \\
0 & & & &
\end{array}\right]\left[\begin{array}{c:cccc}
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\hdashline \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*}
\end{array}\right]=\left[\begin{array}{c:cccc}
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\hdashline \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*}
\end{array}\right] .
$$

Here, $\hat H_1$ is $m$-by-$m$ matrix that sends $x=A[1:, 0]$ (in NumPy slicing notation; lower $(m-1)$-vector of the 1st column of A) to $(\mathrm{sign}(x_0) \Vert x \Vert, 0, 0, \cdots, 0)$.

1.b. Also right-multiply by $H_1$.

$$
H_1 A H_1=\left[\begin{array}{c:cccc}
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\hdashline \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*}
\end{array}\right]\left[\begin{array}{c:cccc}
1 & 0 & 0 & 0 & 0 \\
\hdashline 0 & & & \\
0 & & \hat{H}_1 & & \\
0 & & & & \\
0 & & & &
\end{array}\right]=\left[\begin{array}{c:cccc}
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\hdashline \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*}
\end{array}\right]
$$

A block multiplication shows that the zeros introduced by left-multiplication by $H_1$ is kept after left-multiplication by $H_1$.

The resulting product is similar to $A$ since $H_1=H_1^{-1}(=H_1^T)$ (symmetric, orthoginal). 

2. Repeat the same to bottom right $(m-1)$-by-$(m-1)$ submatrix of the result.

$$
H_2\left(H_1 A H_1\right)=\left[\begin{array}{cc:ccc}
1 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 \\
\hdashline 0 & 0 & & & \\
0 & 0 & & \hat{H}_2 & \\
0 & 0 & & &
\end{array}\right]\left[\begin{array}{cc:ccc}
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\hdashline 0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*}
\end{array}\right]=\left[\begin{array}{cc:ccc}
\mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
\hdashline \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & \mathrm{*} & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & 0 & \mathrm{*} & \mathrm{*} & \mathrm{*} \\
0 & 0 & \mathrm{*} & \mathrm{*} & \mathrm{*}
\end{array}\right]
$$

3. We end up getting an upper Hessenberg form.

$$
H_3 H_2 H_1 A H_1^T H_2^T H_3^T=H_3 H_2 H_1 A\left(H_3 H_2 H_1\right)^T=Q A Q^T
$$

**Remark**

- If the column to reflect is pathological, i.e., 0 vector, then we can take the Householder reflector $H=I$ and move on to the next submatrix. Therefore, the theorem holds for all general case even though the above construction cannot proceed.

**Remark**

- If we tried to zero out from the 2nd entry of the 1st column, right-multiplication by $H_1$ would mess up the zeros back.

**Remark**


- There is a finite algorithm for putting matrices in upper Hessenberg form by similarity transformations. See the construction below.


**Remark** (Practicality of Hessenberge form)

- Since Hessenberg form is similar to $A$, it has the same eigenvalues as $A$. 
  - We can find eigenvalues of the Hessenberg form instead, which has way more zeros.
- Hessenberg structure is preserved by QR algorithm.
  - Computational efficiency is gained throughout the iterations of QR algorithm.

**Lemma** (Hessenberg and triangular)

Let $B$ an upper Hessenberge matrix and $R$ be an upper triangular matrix. Then, $BR$ and $RB$ are upper Hessenberg form.

We discuss only an intuitive argument.

1. For $BR$, 
   - view the product as a horizontal stack of columns, each of which is $BR_i$, where $R_i$ is $i$-th column of $R$.
   - Also, view each $BR_i$ as linear combination of columns of $B$ with entries of $R_i$ being the coefficients. Then, the following structure is evident.

$$
BR=\left[\begin{array}{lllll}
\mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
\mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
& \mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
& & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
& & & \mathbf{*} & \mathbf{*}
\end{array}\right]
\left[\begin{array}{lllll}
\mathbf{\times} & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} \\
 & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} \\
&  & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} \\
& &  & \mathbf{\times} & \mathbf{\times} \\
& & &  & \mathbf{\times}
\end{array}\right]
=
\left[\begin{array}{lllll}
\mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
\mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
& \mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
& & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
& & & \mathbf{+} & \mathbf{+}
\end{array}\right]
$$

2. The same argument applies to $RB$.


$$
BR=
\left[\begin{array}{lllll}
\mathbf{\times} & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} \\
 & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} \\
&  & \mathbf{\times} & \mathbf{\times} & \mathbf{\times} \\
& &  & \mathbf{\times} & \mathbf{\times} \\
& & &  & \mathbf{\times}
\end{array}\right]
\left[\begin{array}{lllll}
\mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
\mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
& \mathbf{*} & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
& & \mathbf{*} & \mathbf{*} & \mathbf{*} \\
& & & \mathbf{*} & \mathbf{*}
\end{array}\right]
=
\left[\begin{array}{lllll}
\mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
\mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
& \mathbf{+} & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
& & \mathbf{+} & \mathbf{+} & \mathbf{+} \\
& & & \mathbf{+} & \mathbf{+}
\end{array}\right]
$$

**Lemma** (QR algorithm preserve Hessenberg form)

QR algorithm preserve Hessenberg form.

Proof:

Let $B$ is an upper Hessenberg matrix. Then, by the construction of the QR algorithm, we have

$$
B_1 = B, \qquad B_1 = Q_1 R_1, \qquad B_2 = R_1 Q_1.
$$

We want to show: $B_2$ is upper Hessenberg form. 

Then, we can repeat the argument and $B_i$ ($i=3,4,\cdots$) are all upper Hessenberg form.

Now, from the second equality above, we have $Q_1 = B_1 R_1^{-1}$. But the inverse of the upper triangular matrix is again an upper triangular. (See `na05linsystem1LU.ipynb` Theorem (Triangular matrices and their algebraic structure)). Therefore, by the previous lemma $Q_1$ is upper Hessenberg form. 

Plug this into the third equality above, we have $B_2 = Q_1 R_1$. Then, by the previous lemma, we again have $B_2$ is also upper Hessenberg.

QED

<!-- **Remark** 

- In the limit, $B_i$ will converges to real Schur form. (**To be double-checked**.) -->

##### Computation of Hessenberg form

Computing upper Hessenberg form

- Basically similar to QR factorization via Householder reflector.
  - Use the same reflection operator once $x$ is determined. (But $x$ is different.)
  - Slicing for matrix multiplications are apparent when if writing out the procedure in a block form.
- Differences of Hessenberg form compared to QR factorization
  - Only square matrix is allowed due to similarity of matrices
  - loop over the first $m-2$ columns: 
  - $x$ (reflected vector) is sliced from one entry lower in row index than QR factorization.
    - $k$ (length of reflected vector) is smaller than QR by 1
  - $\hat H$ is also multiplied on the right after on the left.
  - If $B=Q A Q^T$ is desired, multiply Q by H on the left. 
    - If $A=Q B Q^T$ is wanted, multiply Q by H on the right.


In [20]:
import numpy as np

def hessen_v1(A):
    """
    Return a upper Hessenberg form that is similar to given matrix.

    Input:
        A (array): a square matrix.
    Output:
        B, Q (array): Hessenberg matrix and an orthogonal matrix 
            such that B = Q A Q^T
    """
    m, n = A.shape
    assert m == n
    B = A.copy()
    Q = np.eye(m)
    
    for i in range(m-2):
        k = m - i - 1

        x = B[(i+1):, i]
        w = np.zeros_like(x)
        w[0] = - np.sign(x[0])*np.linalg.norm(x)

        v = w - x
        H = np.eye(k) - 2.*(np.outer(v, v))/(np.dot(v, v))
        
        B[(i+1):, :] = H @ B[(i+1):, :]
        B[:, (i+1):] = B[:, (i+1):] @ H

        # Multiply by H on the left: e.g. Q = H_3 H_2 H_1 
        Q[(i+1):, :] = H @ Q[(i+1):, :]

    return B, Q
    


In [21]:
m = 5
A = np.random.rand(m,m)
B, Q = hessen_v1(A)

with np.printoptions(precision=2, suppress=True):
    print(B)
    print(f"upper Hessenberg form? --> {np.allclose(np.tril(B, -2), np.zeros((m,m)))}")
    print(Q)
    print(Q.T @ Q)
    print(f"Q orthogonal? --> {np.allclose(Q.T @ Q, np.eye(m))}")
    print(f"B = QAQ^T? --> {np.allclose(B, Q @ A @ Q.T)}")



[[ 0.04 -0.67 -1.17 -0.17 -0.28]
 [-0.65  1.65  0.78 -0.47 -0.18]
 [-0.    1.11  0.49  0.24  0.16]
 [-0.    0.   -1.03 -0.08 -0.08]
 [-0.   -0.    0.    0.53 -0.54]]
upper Hessenberg form? --> True
[[ 1.    0.    0.    0.    0.  ]
 [ 0.   -0.37 -0.37 -0.85 -0.1 ]
 [ 0.   -0.14 -0.4   0.34 -0.84]
 [ 0.    0.86  0.1  -0.37 -0.34]
 [ 0.    0.33 -0.83  0.17  0.41]]
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0. -0. -0.]
 [ 0.  0.  1. -0.  0.]
 [ 0. -0. -0.  1. -0.]
 [ 0. -0.  0. -0.  1.]]
Q orthogonal? --> True
B = QAQ^T? --> True


**Remark**

- We can further make the code more efficient as we did in QR factorization.

In [22]:
import numpy as np

def hessen(A):
    """
    Return a upper Hessenberg form that is similar to given matrix.

    Input:
        A (array): a square matrix.
    Output:
        B, Q (array): Hessenberg matrix and an orthogonal matrix 
            such that A = Q^T B Q
    """
    m, n = A.shape
    assert m == n
    B = A.copy()
    Q = np.eye(m)
    
    for i in range(m-2):
        k = m - i - 1

        x = B[(i+1):, i].reshape(-1, 1)
        w = np.zeros_like(x).reshape(-1, 1)
        w[0] = - np.sign(x[0])*np.linalg.norm(x)

        v = w - x
        v_ = ((2./(v.T @ v))*v)
        
        B[(i+1):, :] = B[(i+1):, :] - v_ @ (v.T @ B[(i+1):, :])
        B[:, (i+1):] = B[:, (i+1):] - (B[:, (i+1):] @ v_) @ v.T

        # Multiply by H on the left: e.g. Q = H_3 H_2 H_1 
        Q[(i+1):, :] = Q[(i+1):, :] - v_ @ (v.T @ Q[(i+1):, :])
        
    return B, Q


In [24]:

m = 5
A = np.random.rand(m,m)
B, Q = hessen(A)

with np.printoptions(precision=2, suppress=True):
    print(B)
    print(f"upper Hessenberg form? --> {np.allclose(np.tril(B, -2), np.zeros((m,m)))}")
    print(Q)
    print(Q.T @ Q)
    print(f"Q orthogonal? --> {np.allclose(Q.T @ Q, np.eye(m))}")
    print(f" B= QAQ^T? --> {np.allclose(B, Q @ A @ Q.T)}")



[[ 0.7  -0.19 -0.14 -0.03 -0.12]
 [-0.93  0.79  1.32 -0.   -0.18]
 [ 0.    1.05  0.31 -0.09 -0.21]
 [ 0.   -0.    0.33  0.09  0.09]
 [ 0.   -0.    0.    0.09  0.44]]
upper Hessenberg form? --> True
[[ 1.    0.    0.    0.    0.  ]
 [ 0.   -0.38 -0.22 -0.9  -0.02]
 [ 0.   -0.24 -0.72  0.29 -0.59]
 [ 0.    0.8   0.01 -0.33 -0.49]
 [ 0.    0.39 -0.66 -0.02  0.64]]
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0. -0.]
 [ 0.  0.  0.  1. -0.]
 [ 0.  0. -0. -0.  1.]]
Q orthogonal? --> True
 B= QAQ^T? --> True


**Example** (Kincaid and Cheney (2002) p. 301)

Reduce the following (asymmetric) matrix to upper Hessenberg form by means of unitary similar­ity transforms. And then carry out QR algorithm for 10 iterations. 

$$
A=\left[\begin{array}{cccc}
1 & 2 & 3 & 4 \\
4 & 5 & 6 & 7 \\
2 & 1 & 5 & 0 \\
4 & 2 & 1 & 0
\end{array}\right]
$$


In [75]:
"""
Symmetric case:
    max_iter = 105
Asymmetric case:
    max_iter = 6
"""

# A = np.array([  [1 , 2 , 3 , 4],
#                 [4 , 5 , 6 , 7],
#                 [2 , 1 , 5 , 0],
#                 [4 , 2 , 1 , 0]], dtype=np.float64)

A = np.array([  [1 , 2 , 2 , 4],
                [2 , 5 , 6 , 2],
                [2 , 6 , 5 , 0],
                [4 , 2 , 0 , 0]], dtype=np.float64)

B, Q = hessen(A)
T, U = qr_alg_unshift(B, max_iter=100)

with np.printoptions(precision=5, suppress=True):
    print(B)
    print(T)
    print(np.allclose(T, U.T @ B @ U))
    print(np.allclose(B @ U, np.diag(T)*U)) 

[[ 1.      -4.89898 -0.      -0.     ]
 [-4.89898  5.       5.7735   0.     ]
 [-0.       5.7735   5.4     -2.12289]
 [-0.       0.      -2.12289 -0.4    ]]
[[12.25824  0.      -0.      -0.     ]
 [-0.      -3.95885  0.00003  0.     ]
 [ 0.       0.00003  3.52016 -0.     ]
 [-0.       0.      -0.      -0.81954]]
True
False


#### Shifting and deflating

**Motivating Example** (continued)

In the previous example, the process is still sluggish (in terms of the number of iterations) even after converting to Hessenberg form. Note that the matrix $A$ has distinct (approximate) eigenvalues $\{11.106, -3.8556, 3.5736, 0.17645\}$. Even 6 iterations have revealed the last eigenvalue quite accurately. 

**Idea** 

- Deflation
  - "Harvest" the eigenvalue when it is certain.
- Shifting
  - Use shifting, adding $-sI$ for some $s$, to find that eigenvalue even faster.
  - This is the same idea as in Inverse Power Iteration.

##### Deflation

**Lemma** (Spectrum of block triangular matrix; Kincaid and Cheney (2002) p. 303)

Let $A$ be a matrix in partitioned form
$$
A=\left[\begin{array}{ll}
B & C \\
0 & E
\end{array}\right]
$$
in which $B$ and $E$ are square matrices. Then the spectrum of $A$ (i.e., the set of its eigenvalues) is the union of the spectra of $B$ and $E$.

Proof: (Kincaid and Cheney (2002) p. 303)

The equation $A x=\lambda x$, in partitioned form, is
$$\tag{Eq1}
\left[\begin{array}{ll}
B & C \\
0 & E
\end{array}\right]\left[\begin{array}{l}
u \\
v
\end{array}\right]=\lambda\left[\begin{array}{l}
u \\
v
\end{array}\right]
$$
or, equivalently,

$$
\left\{\begin{aligned}
B u+C v & =\lambda u \\
E v & =\lambda v
\end{aligned}\right.
$$

If $\lambda$ is an eigenvalue of $A$, then (Eq1) has a nontrivial solution $(u, v)^T$. If $v \neq 0$, then $\lambda$ is an eigenvalue of $E$. If $v=0$, then $u \neq 0$, and $\lambda$ is an eigenvalue of $B$. This proves that $\mathrm{sp}(A) \subseteq \operatorname{sp}(B) \cup \mathrm{sp}(E)$.

Conversely, if $\lambda$ is an eigenvalue of $B$, and if $u$ is a corresponding (nonzero) eigenvector, then $(u, 0)^T$ will solve (Eq1). If $\lambda$ is an eigenvalue of $E$ but not an eigenvalue of $B$, then let $v$ be a nonzero vector satisfying $E v=\lambda v$. Next, solve the equation $(B-\lambda I) u=-C v$. This can be done since $\lambda$ is not an eigenvalue of $B$. Then the vector $(u, v)^T$ solves (Eq1). This proves that $\operatorname{sp}(B) \cup \operatorname{sp}(E) \subseteq \operatorname{sp}(A)$.

**Observation** (Deflation)

If the last row has all near 0 but the last entry, then we declare that the last entry corresponds to $E$ in the previous lemma. Find the rest of the eigenvalues from $B$.

##### Shifting

**Observation**: Shifting preserves similarity.

- Each step, apply the shift, compute a QR factorization, and then take the shift back. 
$$
\begin{aligned}
A_0-s I & =Q_1 R_1 \\
A_1 & =R_1 Q_1+s I .
\end{aligned}
$$

Note that
$$
\begin{aligned}
A_1-s I & =R_1 Q_1 \\
& =Q_1^T\left(A_0-s I\right) Q_1 \\
& =Q_1^T A_0 Q_1-s I
\end{aligned}
$$

Therefore, we have

$$
A_1=Q_1^T A_0 Q_1.
$$

**Remark** 

- Q's appearing in the shifted version will be different from those in unshifted version. But that will lead to a faster path to compute the nearest eigenvalue to $s$.

In [76]:
import numpy as np
from internallib import qr

def qr_alg(A, max_iter=10, shift=False):
    """
    Return approximate Schur factorization using QR algorithm.

    Input:
        A (array): A square matrix
    Output:
        T (array): approximate Schur form of A
        U (array): The unitary matrix involved in the similarity T=U^H*A*U
    Note: 
        - When A is a real matrix, T is an approxiate real Schur form.
        In this case, U is an orthogonal matrix.
        - 
    """
    m = A.shape[0]
    T = A.copy()
    U = np.eye(m)
    for _ in range(max_iter):
        
        sI = T[-1,-1]*np.eye(T.shape[0]) if shift == True else None
        
        Q, R = qr(T) if shift==False else qr(T - sI)

        T = R @ Q if shift==False else R @ Q + sI
        
        U = U @ Q
    
    return T, U


In [77]:
"""
Suggested implementation
1. Effect of only deflation
    - shift=False
    - max_iter = 10, 100, 1000
2. Effect of shifting + deflation
    - shift=True
    - max_iter = 5
"""

A = np.array([  [1 , 2 , 3 , 4],
                [4 , 5 , 6 , 7],
                [2 , 1 , 5 , 0],
                [4 , 2 , 1 , 0]], dtype=np.float64)

m = A.shape[0]
B, Q = hessen(A)
eig = np.full(m, np.nan)

shift = True
max_iter = 5

for i in range(m):
    print(B.shape)
    if B.shape[0] == 1:
        eig[i] = B[-1, -1]
        break

    T, U = qr_alg(B, max_iter=max_iter, shift=shift)
    with np.printoptions(precision=5, suppress=True):
        print(T)
    
    if np.allclose(T[-1, :-1], 0):
        eig[i] = T[-1, -1]
        B = T[:-1, :-1]
    else:
        print("No deflation carried out: the last row not of the form: [0, 0, ..., r]")
    
print(f"eigenvlaues = [{eig}]")
    

(4, 4)
[[ 1.97781  5.33406  3.25384  2.41068]
 [ 9.9802   5.21467  3.16599  2.92323]
 [-0.       0.09477  0.2339   0.19199]
 [-0.       0.       0.       3.57362]]
(3, 3)
[[11.06754 -4.84049  4.19047]
 [-0.1171  -3.8176  -1.31762]
 [ 0.       0.       0.17645]]
(2, 2)
[[11.10552 -4.72338]
 [ 0.      -3.85559]]
(1, 1)
eigenvlaues = [[ 3.57361662  0.17645187 -3.85558822 11.10551973]]


#### 2-by-2 blocks

**Remark** (Sauer (2017) p. 569)

- For complex eigenvalues, we must allow for 2 × 2 blocks on the diagonal of the real Schur form.
- If deflating a 1 × 1 diagonal block in the bottom right corner fails (after a user-specified number of tries), we declare a 2 × 2 block. 
  - Find the pair of eigenvalues, and then deflates by 2. 
- This will make the algorithm converge to real Schur form for most, but not all, input matrices. 

### Appendix

#### Miscellaneous

##### Spectral theorem on normal matrix

**Theomrem** (Trefethen and Bau, Numerical Linear Algebra, p. 187.; Horn and Johnson (2013) Matrix Analysis p. 133)

A matrix is unitarily diagonalizable iff it is normal: $A^H A= A A^H$.

**Remark**

- The eigenvalues of a normal matrix may be complex while Hermitian matrices have real eigenvalues.