#### Power iterations

Assume $A\in \mathbf{R}^{n \times n}$ is `diagonalizable`, we know its `eigenvectors` (normalized) $v_i, \cdots, v_n$ form a `basis` for $\mathbf{R}^n$

For a vector $x\in \mathbf{R}^n$, we can express it using the eigenvectors of $A$ as

$$x=c_1v_1 + c_2v_2 + \cdots + c_nv_n$$

If we compute $Ax$ and assume $|\lambda_1|>|\lambda_2|\geq|\lambda_3| \cdots \geq |\lambda_n|$, we can write

$$\begin{align*}
Ax&=A(c_1v_1 + c_2v_2 + \cdots + c_nv_n) \\
&=c_1\lambda_1v_1 + c_2\lambda_2v_2+\cdots + c_n\lambda_nv_n
\end{align*}$$

If we keep multiplying A on the left, we get

$$\begin{align*}
A^kx&=A(c_1v_1 + c_2v_2 + \cdots + c_nv_n) \\
&=c_1\lambda_1^kv_1 + c_2\lambda_2^kv_2+\cdots + c_n\lambda_n^kv_n \\
&=\sum_{j=1}^n c_j\lambda_j^kv_j \\
&=\lambda_1^k\left(c_1v_1+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^kv_2 + \cdots + c_n\left(\frac{\lambda_n}{\lambda_1}\right)^kv_n\right) \\
& k \rightarrow \infty, \left(\frac{\lambda_i}{\lambda_1}\right)\rightarrow 0, i\neq 1 \\
&=\lambda_1^kc_1v_1
\end{align*}$$

Therefore, we have a way to `converge to eigenvector` $v_1$, with error in the order of $\left(\frac{\lambda_2}{\lambda_1}\right)^k$

#### Rayleigh quotient

After we get $x^k\rightarrow cv_1$, the corresponding eigenvalue $\lambda_1$ is obtained by noticing $Ax^k \rightarrow \lambda_1cv_1$ and computing `Rayleigh quotient`

$$ \frac{(x^k)^T A x^k}{(x^k)^Tx^k}$$

To see this, we can plug $x^k=A^kx=\sum_{j=1}^n c_j\lambda_j^kv_j$ into the numerator and denominator

$$(x^k)^TAx^k=(x^k)^Tx^{k+1}=\sum_{j=1}^n\sum_{l=1}^n c_jc_l\lambda_j^k\lambda_l^{k+1}v_j^Tv_l$$

and

$$(x^k)^Tx^k=\sum_{j=1}^n\sum_{l=1}^n c_jc_l\lambda_j^k\lambda_l^kv_j^Tv_l$$

Consider that eigenvectors are normalized, that is, $v_j^Tv_j=1$, we have

$$(x^k)^TAx^k=\sum_{j=1}^n |c_j|^2\lambda_j^{2k+1}+\sum_{l \neq j}c_j c_l \lambda_j^k \lambda_l^{k+1} v_j^T v_l$$

and

$$(x^k)^Tx^k=\sum_{j=1}^n|c_j|^2\lambda_j^{2k}+\sum_{l \neq j}c_j c_l \lambda_j^k \lambda_l^k v_j^T v_l$$

Assume $c_1\neq 0$, we factor $|c_1|^2\lambda_1^{2k+1}$ out in the numerator and $|c_1|^2\lambda_1^{2k}$ out in the denominator, we can write

$$(x^k)^TAx^k=|c_1|^2\lambda_1^{2k+1}\left(1+\sum_{j\neq 1}\left|\frac{c_j}{c_1}\right|^2\left(\frac{\lambda_j}{\lambda_1}\right)^{2k+1}+\sum_{l\neq j}\frac{c_jc_l}{|c_1|^2}\left(\frac{\lambda_j}{\lambda_1}\right)^k\left(\frac{\lambda_l}{\lambda_1}\right)^{k+1}v_j^Tv_l\right)$$

and

$$(x^k)^Tx^k=|c_1|^2\lambda_1^{2k}\left(1+\sum_{j\neq 1}\left|\frac{c_j}{c_1}\right|^2\left(\frac{\lambda_j}{\lambda_1}\right)^{2k}+\sum_{l\neq j}\frac{c_jc_l}{|c_1|^2}\left(\frac{\lambda_j}{\lambda_1}\right)^k\left(\frac{\lambda_l}{\lambda_1}\right)^{k}v_j^Tv_l\right)$$

Since $\left|\frac{\lambda_i}{\lambda_1}\right|<1,\forall i\neq 1$, therefore, the terms involving higher powers of $k$ decay faster, and the slowest decaying term would be the cross-term where one of $j, l$ is 1 and the other is 2

Using Taylor expansion, for small $\epsilon, \epsilon'$

$$\lambda_1\frac{1+\epsilon'}{1+\epsilon}=\lambda_1(1+\epsilon')(1-\epsilon+\epsilon^2-\epsilon^3\cdots\approx\lambda_1(1+\epsilon')(1-\epsilon)\approx \lambda_1(1+(\epsilon'-\epsilon))$$

and plug in $\epsilon'\sim\left(\frac{\lambda_2}{\lambda_1}\right)^{k+1}$ and $\epsilon\sim\left(\frac{\lambda_2}{\lambda_1}\right)^{k}$ (neglecting quadratic terms as they decay faster), we have

$$\frac{(x^k)^T A x^k}{(x^k)^Tx^k}\approx \lambda_1\left(1+O\left(\left(\frac{\lambda_2}{\lambda_1}\right)^{k}\right)\right)$$

In addition, for symmetric matrices that have orthonormal eigenvectors, the cross-terms vanish, and we would be left with `quadratic convergence` of Rayleigh quotient to $\lambda_1$

#### Normalization at each iteration



In practice, we `normalize` the outcome of $A^kx$ along the process to keep output from going to infinity to going to zero

* starting from $x^0$
* compute $y^k=Ax^{k-1}$
* get new $x^{k}$ by normalizing $y^k$ ($l_2$ norm, infinity norm, etc)
$$x^{k}=\frac{y^k}{\|y^k\|}$$

#### Example of non-symmetric matrix

$$A=\begin{bmatrix}8 & 3 \\2 &7\end{bmatrix}, x^0=\begin{bmatrix}1 \\ 1\end{bmatrix}$$

In [None]:
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})

plt.style.use('dark_background')
# color: https://matplotlib.org/stable/gallery/color/named_colors.htm

In [None]:
A = np.array([[8., 3.], [2., 7.]]) # diagonalizable
x = np.array([1., 0.])
x /= np.linalg.norm(x)

eigenvalues, eigenvectors = np.linalg.eig(A)
print(f'True eigenvalues: {eigenvalues}')
print(f'True eigenvectors (columns): {eigenvectors}\n')

num_iter = 20
for iter in range(num_iter):
    y = A @ x
    x= y / np.linalg.norm(y)

    # Rayleigh quotient
    lambda_1 = np.dot(A @ x, x)
    print(f'# {iter+1}: lambda_1: {lambda_1:.4f}')

print(f'\nv_1: {x}')

True eigenvalues: [ 10.0000  5.0000]
True eigenvectors (columns): [[ 0.8321 -0.7071]
 [ 0.5547  0.7071]]

# 1: lambda_1: 9.1176
# 2: lambda_1: 9.6552
# 3: lambda_1: 9.8624
# 4: lambda_1: 9.9412
# 5: lambda_1: 9.9732
# 6: lambda_1: 9.9873
# 7: lambda_1: 9.9938
# 8: lambda_1: 9.9970
# 9: lambda_1: 9.9985
# 10: lambda_1: 9.9992
# 11: lambda_1: 9.9996
# 12: lambda_1: 9.9998
# 13: lambda_1: 9.9999
# 14: lambda_1: 10.0000
# 15: lambda_1: 10.0000
# 16: lambda_1: 10.0000
# 17: lambda_1: 10.0000
# 18: lambda_1: 10.0000
# 19: lambda_1: 10.0000
# 20: lambda_1: 10.0000

v_1: [ 0.8321  0.5547]


#### General eigenvalues problems for symmetric matrices

After computing the dominant eigenvalue and its corresponding eigenvector, we proceed by updating the matrix $A$ through

$$A\leftarrow A - \lambda_1 v_1 v_1^T$$

This process is repeated iteratively to find more eigenvalues and eigenvectors

Unfortunately, this approach works meaningfully `only for symmetric matrices`, due to the `orthogonality` of their eigenvectors

In symmetric matrices, all eigenvectors corresponding to distinct eigenvalues are orthogonal

When we subtract a term involving $v_1 v_1^T$, we effectively remove the contribution of $v_1$ from the matrix

The orthogonality ensures that removing $v_1$'s influence doesn't interfere with the subsequent eigenvectors, allowing the power iterations to correctly converge to the next dominant eigenvalue and eigenvector

In [None]:
def power_iteration(A_sym, num_eigen, num_iter=2000, converge_tol=1e-6, eigen_tol=1e-6):
    n = A_sym.shape[0]
    eigenvalues = []
    eigenvectors = []
    A_current = A_sym.copy()

    for k in range(num_eigen):
        x = np.random.rand(n)
        x /= np.linalg.norm(x)

        for j in range(num_iter):
            y = A_current @ x
            norm_y = np.linalg.norm(y)

            # If norm is too small, stop iteration
            if norm_y < eigen_tol:
                print(f"Norm of (#{k+1}) eigenvector is too small, stopping iteration")
                break

            x_next = y / norm_y

            if np.linalg.norm(x_next - x) < converge_tol:
                break

            x = x_next

        if norm_y < eigen_tol:
            continue

        print(f'{j+1} iterations for eigenvalue #{k+1}')

        # Rayleigh quotient (since x is normalized, we can omit the denominator)
        eigenvalue = np.dot(A_sym @ x, x)

        if abs(eigenvalue) < eigen_tol:
            print(f"Eigenvalue {eigenvalue} is too small and is ignored")
            continue

        eigenvalues.append(eigenvalue)
        eigenvectors.append(x)

        A_current -= eigenvalue * np.outer(x, x)

    if not eigenvalues:
        print("No valid eigenvalues found.")
        return np.array([]), np.array([])

    return np.array(eigenvalues), np.column_stack(eigenvectors)

In [None]:
np.random.seed(42)

k = 5

A = np.random.rand(5, 5)

A_sym = (A + A.T) / 2

eigenvalues, eigenvectors = power_iteration(A_sym, k)

print("\nComputed eigenvalues:")
for idx, eigenvalue in enumerate(eigenvalues, 1):
    print(f"# {idx}: {eigenvalue:.4f}")

print("\nComputed eigenvectors (columns):")
print(eigenvectors)

true_eigenvalues, true_eigenvectors = np.linalg.eig(A_sym)
print("\nEigenvalues from NumPy:")
print(true_eigenvalues)

print("\nEigenvectors from NumPy:")
print(true_eigenvectors)

# Reconstruct A_sym from eigendecomposition and compare to original A_sym
A_sym_recon = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T
print("\nA_sym reconstructed from eigendecomposition:")
print(A_sym_recon)

print("\nOriginal A_sym:")
print(A_sym)

12 iterations for eigenvalue #1
2000 iterations for eigenvalue #2
10 iterations for eigenvalue #3
8 iterations for eigenvalue #4
2000 iterations for eigenvalue #5

Computed eigenvalues:
# 1: 2.2645
# 2: -0.6570
# 3: 0.4611
# 4: 0.0990
# 5: -0.0145

Computed eigenvectors (columns):
[[ 0.4067  0.2198  0.3005  0.0428  0.8332]
 [ 0.4865 -0.8603 -0.1312 -0.0666  0.0402]
 [ 0.5749  0.4306 -0.6653 -0.1404 -0.1471]
 [ 0.3846  0.0929  0.3070  0.7856 -0.3633]
 [ 0.3456  0.1326  0.5963 -0.5973 -0.3881]]

Eigenvalues from NumPy:
[ 2.2645 -0.6570  0.4611 -0.0145  0.0990]

Eigenvectors from NumPy:
[[-0.4067 -0.2198  0.3005  0.8332 -0.0428]
 [-0.4865  0.8603 -0.1312  0.0402  0.0666]
 [-0.5749 -0.4306 -0.6653 -0.1471  0.1404]
 [-0.3846 -0.0929  0.3070 -0.3633 -0.7856]
 [-0.3456 -0.1326  0.5963 -0.3881  0.5973]]

A_sym reconstructed from eigendecomposition:
[[ 0.3745  0.5534  0.3763  0.3910  0.3839]
 [ 0.5534  0.0581  0.9180  0.4527  0.4238]
 [ 0.3763  0.9180  0.8324  0.3685  0.2370]
 [ 0.3910  0.4527 

The results look good so far

Next, we do a quick test on symmetric matrix with zero eigenvector(s)

In [None]:
A_zero_eigen = np.array([[4., 0.], [1e-10, 0.]])
eigenvalues, eigenvectors = power_iteration(A_zero_eigen, num_eigen=2)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)

2 iterations for eigenvalue #1
Norm of (#2) eigenvector is too small, stopping iteration
Eigenvalues: [ 4.0000]
Eigenvectors: [[ 1.0000]
 [ 0.0000]]


#### Compute `SVD`

While power iterations are restricted to symmetric matrix, one useful application is singular value decomposition (SVD)

Recall for matrix $A\in \mathbf{R}^{m \times n}$ (tall, square or fat), its SVD is given by

$$A=U\Sigma V^T=\sum_{i=1}^r\sigma_i u_i v_i^T$$

where
* $A\in \mathbf{R}^{m \times n}$, $\text{rank}(A)=r$
* $U\in \mathbf{R}^{m \times r}$, $U^TU=I$
* $V\in \mathbf{R}^{n \times r}$, $V^TV=I$
* $\Sigma =\text{diag}(\sigma_1, \cdots, \sigma_r)$, $\sigma_1 \geq\cdots\geq \sigma_r \geq 0$
* $\sigma_1, \cdots, \sigma_r$ are `nonzero` singular values of $A$
* $v_i\in \mathbf{R}^n$ are `right singular vectors` of $A$
* $u_i\in \mathbf{R}^m$ are `left singular vectors` of $A$

In particular

* $v_i\in \mathbf{R}^n$ are eigenvectors of symmetric $A^TA$ (corresponding to `nonzero` eigenvalues), and `orthonormal basis` for $R(A^T)$
* $\sigma_i=\sqrt{\lambda_i(A^TA)}$ are `square root` of nonzero eigenvalues of $A^TA$
* $u_i\in \mathbf{R}^m$ are eigenvectors of symmetric $AA^T$ (corresponding to `nonzero` eigenvalues), and `orthonormal basis` for $R(A)$

To make things consistent, we don't want to compute twice the power iterations for left and right singular vectors separately, instead, we can use the following equation to get the right singular vector $u_i$ after we get the corresponding singular value $\sigma_i$ and left singular vector $v_i$

$$Av_i=\sigma_iu_i$$

In [None]:
np.random.seed(42)

A_svd = np.random.rand(5, 3)
num_sigma = min(A_svd.shape)

# Singular values and right singular vectors
sigma_sq_v, v_mat = power_iteration(A_svd.T @ A_svd, num_sigma)
sigmas = np.sqrt(sigma_sq_v)
sigma_mat = np.diag(sigmas)

# Left singular vectors
u_mat = []
for i in range(num_sigma):
    v_i = v_mat[:, i]
    u_i = A_svd @ v_i / sigmas[i]
    u_i /= np.linalg.norm(u_i)
    u_mat.append(u_i)

u_mat = np.column_stack(u_mat)

# Reconstruct A
A_reconstructed = u_mat @ sigma_mat @ v_mat.T

# Compare with NumPy's svd function
u_mat_np, sigma_np, vt_mat_np = np.linalg.svd(A_svd, full_matrices=False)
sigma_mat_np = np.diag(sigma_np)
A_reconstructed_np = u_mat_np @ sigma_mat_np @ vt_mat_np

10 iterations for eigenvalue #1
14 iterations for eigenvalue #2
2 iterations for eigenvalue #3


In [None]:
print("Singular values via power iteration:")
print(sigmas)
print("\nSingular values via NumPy:")
print(sigma_np)

print("\nv_mat")
print(v_mat.T)
print("\nv_mat via NumPy:")
print(vt_mat_np)

print("\nu_mat")
print(u_mat)
print("\nu_mat via NumPy:")
print(u_mat_np)

print("\nA_reconstructed:")
print(A_reconstructed)
print("\nA_reconstructed via NumPy:")
print(A_reconstructed_np)

Singular values via power iteration:
[ 1.9906  1.0096  0.5777]

Singular values via NumPy:
[ 1.9906  1.0096  0.5777]

v_mat
[[ 0.5246  0.5427  0.6559]
 [ 0.7287 -0.6847 -0.0163]
 [-0.4403 -0.4865  0.7546]]

v_mat via NumPy:
[[-0.5246 -0.5427 -0.6559]
 [ 0.7287 -0.6847 -0.0163]
 [-0.4403 -0.4865  0.7546]]

u_mat
[[ 0.5991 -0.3862 -0.1299]
 [ 0.2517  0.3238 -0.3839]
 [ 0.4495 -0.5552  0.0115]
 [ 0.5118  0.4815  0.7100]
 [ 0.3372  0.4539 -0.5758]]

u_mat via NumPy:
[[-0.5991 -0.3862 -0.1299]
 [-0.2517  0.3238 -0.3839]
 [-0.4495 -0.5552  0.0115]
 [-0.5118  0.4815  0.7100]
 [-0.3372  0.4539 -0.5758]]

A_reconstructed:
[[ 0.3745  0.9507  0.7320]
 [ 0.5987  0.1560  0.1560]
 [ 0.0581  0.8662  0.6011]
 [ 0.7081  0.0206  0.9699]
 [ 0.8324  0.2123  0.1818]]

A_reconstructed via NumPy:
[[ 0.3745  0.9507  0.7320]
 [ 0.5987  0.1560  0.1560]
 [ 0.0581  0.8662  0.6011]
 [ 0.7081  0.0206  0.9699]
 [ 0.8324  0.2123  0.1818]]
