## GMRES

### Overview

#### What is GMRES?

- GMRES is an iterative solver for a system of linear equations: $Ax=b$.
- GMRES generalizes the Conjugate gradient method to *asymmetric* matrix $A$.
- The name is short for *generalized minimum residual* method.
- Why do we discuss that in a chapter of least square?
  - Part of its algorithm involves solving least square problem.

### Why do we care?

- GMRES can deal with asymmetric matrix, which the conjugate gradient method fails with.
- GMRES is a good choice for the solution of large, sparse, asymmetric (square) linear system $Ax=b$. [Sauer (2017) p. 235]
- GMRES deals with ill-conditioning using orthogonality. [Sauer (2017) p. 235]

### The method

#### Idea

- $K_j:=\{r, Ar, A^2 r, \cdots, A^j r\}$ is called *Krylov* space. 
- The approximate solution $x_k$ at $k$-th iteration is the best approximation of the true solution $x$ in $K_{k}$. 
  - Conjugate gradient method uses the similar idea. And they both belong to *Krylov methods*.
- As $k$ increases, $K_k$ expands and the approximation gets better and better.
  - In theory, GMRES is a direct method: It terminates at $n$-th iteration with the exact solution if $A$ is nonsingular. [Sauer (2017) p. 237]

**Algorithm** (GMRES; Sauer (2017) p. 235)

- Given
  - $A$: $n$-by-$n$ matrix
  - $b$: vector of length $n$
- Initialize
  - $x_0$: initial guess
  - $r=b-A x_0$: initial residual 
  - $q_1=r /\|r\|_2$
- Compute
  - **for** $k=1,2, \ldots, m$
    - $y=A q_k$
    - **for** $j=1,2, \ldots, k$
      - $h_{j k}=q_j^T y $
      - $y=y-h_{j k} q$
    - $h_{k+1, k}=\|y\|_2$ (If $h_{k+1, k}=0$, skip next line and terminate at bottom.)
    - $q_{k+1}=y / h_{k+1, k}$
    - Minimize $\left\|H_k c_k- [\|r\|_2, 0, 0, \ldots, 0]^T \right\|_2$ for $c_k$
    - $x_k=Q_k c_k+x_0$

At $k$-th step, 
- $[\|r\|_2, 0, 0, \ldots, 0]$ is length $k+1$.
- $c_k$ is of length $k$.
- $H_k$ is of size $(k+1)\times k$ and given by 
 
$$
H = \left[\begin{array}{cccc}
h_{11} & h_{12} & \cdots & h_{1 k} \\
h_{21} & h_{22} & \cdots & h_{2 k} \\
& h_{32} & \cdots & h_{3 k} \\
& \ddots & \vdots \\
& & & h_{k+1, k}
\end{array}\right]
$$

  - $Q_k$ is of size $n\times k$ and given by

$$
\left[\begin{array}{c:c:c} 
& & \\
& & \\
q_1 & \cdots & q_k \\
& & \\
& &
\end{array}\right]
$$


**Detail 1**

- At step $k$ of the method, we enlarge the Krylov space by adding $A^k r$, 
- reorthogonalize the basis (i.e., inner loop for modified Gram-Schimidt), 
- and then use least squares to find the best approximation in $K_k$.
  - This is done by finding $x_{add}$ ($Q_k c_k$ in the algorith) and add it to $x_0$.
  - This step involves its own details.

**Detail 2**


- It holds that $AQ_k = Q_{k+1} H_k$ for each $k$
  - This is a consequence of Gram-Schmidt (inner loop): with $y=A q_j$, we have orthogonal decomposition $y = \underbrace{(q_1^T y)}_{h_{1,j}} q_1 + \underbrace{(q_2^T y)}_{h_{2,j}} q_2 + \cdots + \underbrace{(q_{j+1}^T y)}_{h_{j+1,j}}q_{j+1}$ for $j=1,2,\cdots,k$.

$$
\begin{split}
AQ_k &= A\left[\begin{array}{c:c:c} 
  & & \\
  & & \\
  q_1 & \cdots & q_k \\
  & & \\
  & &
  \end{array}\right] \\
&=\left[\begin{array}{c:c:c} 
  & & \\
  & & \\
  Aq_1 & \cdots & Aq_k \\
  & & \\
  & &
  \end{array}\right]
\\
&=
\left[
	\begin{array}{c:c:c:c}
		Q_{k+1} 
		\begin{pmatrix}
  			h_{11}\\
  			h_{21}\\
  			\\
  			\\
  		\end{pmatrix} 
  		&
  		Q_{k+1} 
		\begin{pmatrix}
  			h_{12}\\
  			h_{22}\\
  			h_{32}\\
  			\\
  		\end{pmatrix}
	  	&
 		\cdots 
		&
  		Q_{k+1} 
		\begin{pmatrix}
  			h_{12}\\
  			h_{22}\\
  			h_{32}\\
  			\vdots \\
  			h_{j+1, j}
  		\end{pmatrix}
	\end{array}
  \right]
\\
& = Q_{k+1}
\left[\begin{array}{c:c:c:c}
h_{11} & h_{12} & \cdots & h_{1 k} \\
h_{21} & h_{22} & \cdots & h_{2 k} \\
& h_{32} & \cdots & h_{3 k} \\
& \ddots & \vdots \\
& & & h_{k+1, k}
\end{array}\right]
\\
&= Q_{k+1}H_k
% \left[\begin{array}{l:l:l:l} 
% & & & \\
% q_1 & \cdots & q_k & q_{k+1} \\
% & & & \\
% & & &
% \end{array}\right]\left[\begin{array}{cccc}
% h_{11} & h_{12} & \cdots & h_{1 k} \\
% h_{21} & h_{22} & \cdots & h_{2 k} \\
% & h_{32} & \cdots & h_{3 k} \\
% & \ddots & \vdots \\
% & & & h_{k+1, k}
% \end{array}\right]
\end{split}
$$


### Computation

Computation of GMRES is a topic of computational project.

## Nonlinear least square

**Theorem** (Vector dot product rule)

Let $u\left(x_1, \ldots, x_n\right)$ and $v\left(x_1, \ldots, x_n\right)$ be $\mathbb{R}^n$-vector-valued functions, and let $A\left(x_1, \ldots, x_n\right)$ be an $n \times n$ matrix function. The dot product $u^T v$ is a scalar function. Then, we have


$$
\nabla\left(u^T v\right)=v^T D u+u^T D v,
$$

and

$$
D(A v)=A \cdot D v+\sum_{i=1}^n v_i D a_i,
$$

where $a_i$ denotes the $i$ th column of $A$.

### Appendix

#### More information on GMRES

- Convergence
  - (Pessimistic) For every nonincreasing sequence $a_1, \cdots, a_{m−1}, a_m = 0$, one can find a matrix A such that the $\|r_n\| = a_n$ for all $n$, where $r_n$ is the $n$-th residual. In particular, it is possible to find a matrix for which the residual stays constant for $m − 1$ iterations, and only drops to zero at the last iteration. (Reference: [Wikipedia](https://en.wikipedia.org/wiki/Generalized_minimal_residual_method#Convergence) and also the instructor heard this in a plenary talk of a very reliable conference, though details not remembered. I remember I got surprised by the fact that even solving a linear system can be inherently difficult.)
  - (Optimistic in practice) In practice, though, GMRES often performs well. This can be proven in specific situations. 
- Relationship with MINRES
  - MINRES is similar to Conjugate Gradient (CG) method, but it assumes the matrix to be only symmetric, allowing indefinite matrices, whereas CG assumes it to be symmetric positive definite.
  - The GMRES method is essentially a generalization of MINRES for arbitrary matrices. (See the technical remark for more detials if interested.)
- $H_k$ appearing in the GMRES is an upper (nonsquare) *Hessenberg matrix*. 
  - A upper Hessenberg matrix has zero entries below tridiaginal.
  - This remark is meant to familarize with terminology.
  

###### Technical remarks

**Remark** (Technical remarks on GMRES)

- (GMRES and MINRES) The GMRES method is essentially a generalization of MINRES for arbitrary matrices. Both minimize the 2-norm of the residual and do the same calculations in exact arithmetic when the matrix is symmetric. MINRES is a short-recurrence method with a constant memory requirement, whereas GMRES requires storing the whole Krylov space, so its memory requirement is roughly proportional to the number of iterations. On the other hand, GMRES tends to suffer less from loss of orthogonality. (Reference: Wikipedia, one of whose the original references is broken; treat this remark as advice, but not as truth before confirmation.)
- (No Krylov space; further study needed) The Arnoldi iteration (computations for $q_j$'s) reduces to the Lanczos iteration for symmetric matrices. The corresponding Krylov subspace method is the minimal residual method (MinRes) of Paige and Saunders. 
    - Unlike the unsymmetric case, the MinRes method is given by a three-term recurrence relation. 
    - It can be shown that there is no Krylov subspace method for general matrices, which is given by a short recurrence relation and yet minimizes the norms of the residuals, as GMRES does. (Reference: [Wikipedia](https://en.wikipedia.org/wiki/Generalized_minimal_residual_method#Comparison_with_other_solvers))
- (Hessenberg matrix) Any matrix is unitarily similar to Henssenberg. (Reference: [Wikipedia](https://en.wikipedia.org/wiki/Hessenberg_matrix))
    - The validity of this statement is trivial by *Schur triangularization* or *Schur decomposition* ([Schur decomposition Wikipedia](https://en.wikipedia.org/wiki/Schur_decomposition)).
  - When triangularization is needed, computing a Hessenberg matrix, then moving on to a triangular matrix is more efficient. (See more detailed remark on [Wikipedia](https://en.wikipedia.org/wiki/Hessenberg_matrix#Computer_programming))
- (Convergence of GMRES) 
  - According to Greenbaum, Pták and Strakoš states that for every nonincreasing sequence $a_1, \cdots, a_{m−1}, a_m = 0$, one can find a matrix A such that the $\|r_n\| = a_n$ for all $n$, where $r_n$ is the $n$-th residual. In particular, it is possible to find a matrix for which the residual stays constant for $m − 1$ iterations, and only drops to zero at the last iteration. (Reference: [Wikipedia](https://en.wikipedia.org/wiki/Generalized_minimal_residual_method#Convergence) and also the instructor heard this in a plenary talk of a very reliable conference, though details not remembered. I remember I got surprised by the fact that even solving a linear system can be inherently difficult.)
  - In practice, though, GMRES often performs well. This can be proven in specific situations. If the symmetric part of $A$, that is $\left(A^T+A\right) / 2$, is positive definite, then
$$
\left\|r_n\right\| \leq\left(1-\frac{\lambda_{\min }^2\left(1 / 2\left(A^T+A\right)\right)}{\lambda_{\max }\left(A^T A\right)}\right)^{n / 2}\left\|r_0\right\|,
$$
where $\lambda_{\min }(M)$ and $\lambda_{\max }(M)$ denote the smallest and largest eigenvalue of the matrix $M$, respectively. 

If $A$ is symmetric and positive definite, then we even have
$$
\left\|r_n\right\| \leq\left(\frac{\kappa_2(A)^2-1}{\kappa_2(A)^2}\right)^{n / 2}\left\|r_0\right\| .
$$
where $\kappa_2(A)$ denotes the condition number of $A$ in the Euclidean norm.

In the general case, where $A$ is not positive definite, we have
$$
\frac{\left\|r_n\right\|}{\|b\|} \leq \inf _{p \in P_n}\|p(A)\| \leq \kappa_2(V) \inf _{p \in P_n} \max _{\lambda \in \sigma(A)}|p(\lambda)|,
$$
where $P_n$ denotes the set of polynomials of degree at most $n$ with $p(0)=1, V$ is the matrix appearing in the spectral decomposition of $A$, and $\sigma(A)$ is the spectrum of $A$. Roughly speaking, this says that fast convergence occurs when the eigenvalues of $A$ are clustered away from the origin and $A$ is not too far from normality. (Reference: [Wikipedia](https://en.wikipedia.org/wiki/Generalized_minimal_residual_method#Convergence), whose original reference is Lloyd N. Trefethen and David Bau, III, Numerical Linear Algebra, Society for Industrial and Applied Mathematics, 1997)

---
This work is licensed under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)