## GMRES

### Overview

#### What is GMRES?

- GMRES is an iterative solver for a system of linear equations: $Ax=b$.
- GMRES generalizes the Conjugate gradient method to *asymmetric* matrix $A$.
- The name is short for *generalized minimum residual* method.
- Why do we discuss that in a chapter of least square?
  - Part of its algorithm involves solving least square problem.

### Why do we care?

- GMRES can deal with asymmetric matrix, which the conjugate gradient method fails with.
- GMRES is a good choice for the solution of large, sparse, asymmetric (square) linear system $Ax=b$. [Sauer (2017) p. 235]
- GMRES deals with ill-conditioning using orthogonality. [Sauer (2017) p. 235]

### The method

#### Idea

- $K_j:=\{r, Ar, A^2 r, \cdots, A^j r\}$ is called *Krylov* space. 
- The approximate solution $x_k$ at $k$-th iteration is the best approximation of the true solution $x$ in $K_{k}$. 
  - Conjugate gradient method uses the similar idea. And they both belong to *Krylov methods*.
- As $k$ increases, $K_k$ expands and the approximation gets better and better.
  - In theory, GMRES is a direct method: It terminates at $n$-th iteration with the exact solution if $A$ is nonsingular. [Sauer (2017) p. 237]

**Algorithm** (GMRES; Sauer (2017) p. 235)

- Given
  - $A$: $n$-by-$n$ matrix
  - $b$: vector of length $n$
- Initialize
  - $x_0$: initial guess
  - $r=b-A x_0$: initial residual 
  - $q_1=r /\|r\|_2$
- Compute
  - **for** $k=1,2, \ldots, m$
    - $y=A q_k$
    - **for** $j=1,2, \ldots, k$
      - $h_{j k}=q_j^T y $
      - $y=y-h_{j k} q$
    - $h_{k+1, k}=\|y\|_2$ (If $h_{k+1, k}=0$, skip next line and terminate at bottom.)
    - $q_{k+1}=y / h_{k+1, k}$
    - Minimize $\left\|H_k c_k- [\|r\|_2, 0, 0, \ldots, 0]^T \right\|_2$ for $c_k$
    - $x_k=Q_k c_k+x_0$

At $k$-th step, 
- $[\|r\|_2, 0, 0, \ldots, 0]$ is length $k+1$.
- $c_k$ is of length $k$.
- $H_k$ is of size $(k+1)\times k$ and given by 
 
$$
H = \left[\begin{array}{cccc}
h_{11} & h_{12} & \cdots & h_{1 k} \\
h_{21} & h_{22} & \cdots & h_{2 k} \\
& h_{32} & \cdots & h_{3 k} \\
& \ddots & \vdots \\
& & & h_{k+1, k}
\end{array}\right]
$$

  - $Q_k$ is of size $n\times k$ and given by

$$
\left[\begin{array}{c:c:c} 
& & \\
& & \\
q_1 & \cdots & q_k \\
& & \\
& &
\end{array}\right]
$$


**Detail 1**

- At step $k$ of the method, we enlarge the Krylov space by adding $A^k r$, 
- reorthogonalize the basis (i.e., inner loop for modified Gram-Schimidt), 
- and then use least squares to find the best approximation in $K_k$.
  - This is done by finding $x_{add}$ ($Q_k c_k$ in the algorith) and add it to $x_0$.
  - This step involves its own details.

**Detail 2**


- It holds that $AQ_k = Q_{k+1} H_k$ for each $k$
  - This is a consequence of Gram-Schmidt (inner loop): with $y=A q_j$, we have orthogonal decomposition $y = \underbrace{(q_1^T y)}_{h_{1,j}} q_1 + \underbrace{(q_2^T y)}_{h_{2,j}} q_2 + \cdots + \underbrace{(q_{j+1}^T y)}_{h_{j+1,j}}q_{j+1}$ for $j=1,2,\cdots,k$.

$$
\begin{split}
AQ_k &= A\left[\begin{array}{c:c:c} 
  & & \\
  & & \\
  q_1 & \cdots & q_k \\
  & & \\
  & &
  \end{array}\right] \\
&=\left[\begin{array}{c:c:c} 
  & & \\
  & & \\
  Aq_1 & \cdots & Aq_k \\
  & & \\
  & &
  \end{array}\right]
\\
&=
\left[
	\begin{array}{c:c:c:c}
		Q_{k+1} 
		\begin{pmatrix}
  			h_{11}\\
  			h_{21}\\
  			\\
  			\\
  		\end{pmatrix} 
  		&
  		Q_{k+1} 
		\begin{pmatrix}
  			h_{12}\\
  			h_{22}\\
  			h_{32}\\
  			\\
  		\end{pmatrix}
	  	&
 		\cdots 
		&
  		Q_{k+1} 
		\begin{pmatrix}
  			h_{12}\\
  			h_{22}\\
  			h_{32}\\
  			\vdots \\
  			h_{j+1, j}
  		\end{pmatrix}
	\end{array}
  \right]
\\
& = Q_{k+1}
\left[\begin{array}{c:c:c:c}
h_{11} & h_{12} & \cdots & h_{1 k} \\
h_{21} & h_{22} & \cdots & h_{2 k} \\
& h_{32} & \cdots & h_{3 k} \\
& \ddots & \vdots \\
& & & h_{k+1, k}
\end{array}\right]
\\
&= Q_{k+1}H_k
% \left[\begin{array}{l:l:l:l} 
% & & & \\
% q_1 & \cdots & q_k & q_{k+1} \\
% & & & \\
% & & &
% \end{array}\right]\left[\begin{array}{cccc}
% h_{11} & h_{12} & \cdots & h_{1 k} \\
% h_{21} & h_{22} & \cdots & h_{2 k} \\
% & h_{32} & \cdots & h_{3 k} \\
% & \ddots & \vdots \\
% & & & h_{k+1, k}
% \end{array}\right]
\end{split}
$$


## Nonlinear least square

**Theorem** (Vector dot product rule)

Let $u\left(x_1, \ldots, x_n\right)$ and $v\left(x_1, \ldots, x_n\right)$ be $\mathbb{R}^n$-vector-valued functions, and let $A\left(x_1, \ldots, x_n\right)$ be an $n \times n$ matrix function. The dot product $u^T v$ is a scalar function. Then, we have


$$
\nabla\left(u^T v\right)=v^T D u+u^T D v,
$$

and

$$
D(A v)=A \cdot D v+\sum_{i=1}^n v_i D a_i,
$$

where $a_i$ denotes the $i$ th column of $A$.

### Appendix

---
This work is licensed under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)