### Conjugate gradient method

**Problem of interest**

Given an $n$-by-$n$ symmetric positive definite (SPD) matrix $A$ and a $\mathbb{R}^n$-vector $b$, find $\mathbb{R}^n$-vector $x$ such that

$$ Ax = b. $$


#### Method: Conjugate gradient method

**Data**

- $x_0 \in \mathbb{R}^n$: initial guess

**Initialize**

- $d_0=r_0=b-Ax_0$

**Main computation**

- **for** k = 0, 1, 2, ..., $n$ - 1
    - **if** $r_k = 0$ **stop**, **end**
    - $\alpha_{k}=\frac{r_{k}^{T} r_{k}}{d_{k}^{T} A d_{k}}$ (compute $d_k$ component of error)
    - $x_{k+1}=x_{k}+\alpha_{k} d_{k}$ (subtract it out)
    - $r_{k+1}=r_{k}-\alpha_{k} A d_{k}$ (compute the new residual)
    - $\beta_{k}=\frac{r_{k+1}^{T} r_{k+1}}{r_{k}^{T} r_{k}}$ (compute $d_k$ component of the residual)
    - $d_{k+1}=r_{k+1}+\beta_{k} d_{k}$ (conduct Gram-Schmidt with respect to $A$-inner product)
**end**

**History**

1. First proposed by Schmidt (1908) [^1] (*the* Schmidt in Gram-Schmidt)
1. Independently re-invented by Fox, Huskey, and Wilkinson (1948) [^2]
1. Hestenes and Stiefel (1952) made this idea explicit and practical. [^3]
1. CG does not reach the solution in $n$ steps in practice due to round off errors. It became popular only after Reid (1971) showed its value as an iterative method for large, sparse matrices. [^4] 

[^1]: Schmidt (1908) Uber die Auflosung linearer Gleichungen mit Unendlich vielen unbekannten (accent removed)

[^2]: Fox, Huskey, and Wilkinson (1948) Notes on the solution of algebraic linear simultaneous equations

[^3]: Hestenes and Stiefel (1952) Methods of conjugate gradients for solving linear systems 

[^4]: Reid (1971) On the method of conjugate gradients for the solution of large
	sparse systems of linear equations

#### Notation/Settings

| expression | meaning |
|---|---|
| $x$ | true solution ($Ax=b$) |
| $x_k$ | $k$-th approximate solution by the conjugate gradient method ($k=0,1,2,\cdots$)|
| $e_k$ | $=x - x_k$ error caused by $x_k$ |
| $r_k$ | $=b-Ax_k$ residual caused by $x_k$ |
| $d_k$ | conjugate directions |

#### Idea behind the conjugate gradient method



##### Warm up (Gaussian elimination)

1. Change the perspective

View the process of finding the solution as removing components of error one by one. Unless we are extremely lucky, our error caused by the initial guess will be full of possible components. 

For the moment, to avoid introducing too many symbols, let us override the notations and let $d_k$'s be canonical basis of $\mathbb{R}^{n}$. 

We can expand the initial error in the canonical basis.

$$
e_{0}=\sum_{k=0}^{n-1} \eta_k b_k
$$

2. Remove component one by one

Then the back substitution step can be seen as removing $d_{n-2}$ component from the error, then $d_{n-3}$ component, all the way to $d_{0}$. ($d_{n-1}$ component is already absent since the last component of $x$ is precise.)

$$
\begin{bmatrix}
1&-\frac{1}{2}&\frac 3 4 & \frac 7 4 \\
0&1& \frac 3 5 & \frac {26} 5\\
0&0&1& \underbrace{2}_{x_1} 
\end{bmatrix}
\longrightarrow	
\begin{bmatrix}
1&-\frac{1}{2}&0 & -1 \\
0&1& 0& 4\\
0&0&1& \underbrace{2}_{x_2}  
\end{bmatrix}
\longrightarrow	
\begin{bmatrix}
1&0&0 & 1 \\
0&1& 0& 4\\
0&0&1& \underbrace{2}_{x_3}  
\end{bmatrix}
$$

![Gaussian elimination from error improvement point of view](../images/CG01.png)

**Remark**

3. Algorithm-friendly summary

- Removing $d_k$ from the approximate solution $x_k$ and removing it from the error $e_k$ is equivalent because they differ only by a fixed vector, the true solution $x$, that is, $e_k = x - x_k$. 

Given $x_k$, take exact step to remove $d_k$ component each time

$$
x_{k+1}=x_k \blue{- \eta_k} d_k,
$$

or

$$
e_{k+1}=e_k \blue{- \eta_k} d_k
$$


