## Linear System

### Overview



#### Take-aways

After studying this material, we will be able to

- TBF

**Problem of interest**

Given $F:\mathbb{R}^{n}\to \mathbb{R}^{n}$, find $x \in \mathbb{R}^{n}$ such that

$$ F(x) = 0, $$

where $0$ means $n$ dimensional zero vector.

#### Methods

- Newton's method
- Broyden's methods



**Remark**

- There are many other methods. 
- Methods for systems of nonlinear equations are way harder to study.
- We touch on the most basic ones. 

### Newton's method

**Motivation**

We can generalize 1D version of Newton's method.

> **Algorithm** (Newton's method 1D)
>
> Given a differentiable function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, compute, for $n\ge 0$,
>
> $$ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. $$

**Question** 

What's the analog of dividing by derivative in higher dimension?

![Derivation of Newton's method](https://jhparkyb.github.io/resources/notes/na/der_NewtonMethodTaylor_lp2000.png)

**Definition** (Jacobian matrix)

If $F:\mathbb{R}^{n} \to \mathbb{R}^{n}$ is defined by $F(x)=(f_1 (x), f_2(x), \cdots, f_m(x))$, where $x=(x_1, x_2, \cdots, x_n)$, then its *Jacobian matrix* is given by

$$
\left[\begin{array}{ccc}
\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{array}\right].
$$


<!-- 
For example, if $n=3$ and $F:\mathbb{R}^{3} \to \mathbb{R}^{3}$ is defined by $F(x)=(f_1 (x), f_2(x), f_3(x))$, where $x=(u,v,w)$, or in component form, $F(u, v, w)=(f_1(u, v, w), f_2(u, v, w), f_3(u, v, w))$, then its Jacobian matrix is given by

$$
D F(x)=\begin{bmatrix}
\frac{\partial f_1}{\partial u} & \frac{\partial f_1}{\partial v} & \frac{\partial f_1}{\partial w} \\
\frac{\partial f_2}{\partial u} & \frac{\partial f_2}{\partial v} & \frac{\partial f_2}{\partial w} \\
\frac{\partial f_3}{\partial u} & \frac{\partial f_3}{\partial v} & \frac{\partial f_3}{\partial w}
\end{bmatrix}.
$$

$$
\begin{aligned}
& f_1(u, v, w)=0 \\
& f_2(u, v, w)=0 \\
& f_3(u, v, w)=0
\end{aligned}
$$ -->

**Example**

Suppose 

$$
F(u, v)=\left(e^{u+v}, \sin u\right).
$$

Then, the Jacobian matrix at $(0,0)$ is 
$$
DF(0,0) =
\left[\begin{array}{cc}
e^0 & e^0 \\
\cos 0 & 0
\end{array}\right]
=
\left[\begin{array}{cc}
1 & 1 \\
1 & 0
\end{array}\right]
$$

**Remark**

- Jacobian matrix must be evaluated at a point before it is used just as we do with the 1D derivative $f'$. 

**Algorithm** (Newton's method - multi-dimensional version)

**Input**

- $F:\mathbb{R}^{n} \to \mathbb{R}^{n}$ (vector field)
- $DF:\mathbb{R}^{n} \to \mathbb{R}^{n\times n}$ (Jacobian matrix)
- $x_0 \in \mathbb{R}^{n}$ (initial guess)

**Main loop**

- **For** $k=0,1,2,\cdots$, do
  - $x_{k+1}=x_k-\left(D F\left(x_k\right)\right)^{-1} F\left(x_k\right)$

**Output**

- $x_\infty$ (approximate solution of $F(x)=0$, where $0$ is the zero vector of length $n$)

**Remark** 

- In the main loop, we use Gauss elimination instead of inverting the Jacobian matrix.(Inverting a matrix is really expensive.)
- The actual algorithm, then reads, 
  - $D F\left(x_k\right) s=-F\left(x_k\right)$ (solve for $s$)
  - $x_{k+1}=x_k+s$

#### Analysis

**Summary**

- The analysis of multidimensional Newton's method is beyond the scope of our class.
- The convergence rate of multidimensional Newton's method remains to be of 2nd order. 
- The derivation of the method and its convergence rate relies on multidimensional Taylor theorem. 
- The following is an linear approximation version of multidimensional Taylor theorem.

$$
F(x)=F\left(x_0\right)+D F\left(x_0\right) \cdot\left(x-x_0\right)+O\left(\|x-x_0 \|^2\right)
$$

**Remark** (Multidimensional Taylor Theorem)

- Multidimensional version of Taylor theorem requires a bit of preparation called *multi-index*. Otherwise, the most motivated person would find it extremely taxing working in only four dimension. However, using *multi-index,* the theorem reads almost exactly the same as in 1D. (For a look, see [Wikipedia](https://en.wikipedia.org/wiki/Taylor%27s_theorem#Taylor's_theorem_for_multivariate_functions))



### Broyden's methods

**Motivation**

What if Jacobian matrix is not available? 

**Remark**

- In 1D, we have Secant Method. However, since secant line is not clear in multi-dimensional setting.

**Algorithm** (Broyden's method 1)

The following algorithm is borrowed from Sauer (2017) Numerical Analysis 3rd ed. p. 140.

**Input**

- $F:\mathbb{R}^{n} \to \mathbb{R}^{n}$ (vector field)
- $A_0\in\mathbb{R}^{n\times n}$ (initial approximate Jacobian matrix)
- $x_0 \in \mathbb{R}^{n}$ (initial guess)

**Main loop**

- **For** $i=0,1,2,\cdots$, do
  - $x_{i+1}=x_i-A_i^{-1} F\left(x_i\right)$
  - $\delta_{i+1}=x_{i+1}-x_i$
  - $\Delta_{i+1}=F\left(x_{i+1}\right)-F\left(x_i\right)$
  - $A_{i+1}=A_i+\frac{\left(\Delta_{i+1}-A_i \delta_{i+1}\right) \delta_{i+1}^T}{\delta_{i+1}^T \delta_{i+1}}$

**Output**

- $x_\infty$ (approximate solution of $F(x)=0$, where $0$ is the zero vector of length $n$)

**Algorithm** (Broyden's method 2)

The following algorithm is borrowed from Sauer (2017) Numerical Analysis 3rd ed. p. 141.

**Input**

- $F:\mathbb{R}^{n} \to \mathbb{R}^{n}$ (vector field)
- $B_0\in\mathbb{R}^{n\times n}$ (initial approximate inverse Jacobian matrix)
- $x_0 \in \mathbb{R}^{n}$ (initial guess)

**Main loop**

- **For** $i=0,1,2,\cdots$, do
  - $x_{i+1}=x_i-B_i F\left(x_i\right)$
  - $\delta_{i+1}=x_{i+1}-x_i$
  - $\Delta_{i+1}=F\left(x_{i+1}\right)-F\left(x_i\right)$
  - $B_{i+1}=B_i+\frac{\left(\delta_{i+1}-B_i \Delta_{i+1}\right) \delta_{i+1}^T B_i}{\delta_{i+1}^T B_i \Delta_{i+1}}$

**Output**

- $x_\infty$ (approximate solution of $F(x)=0$, where $0$ is the zero vector of length $n$)

**Remark** 

- Broyden's method 2 directly approximate the inverse of the Jacobian while Broyden's method 1 approximate the Jacobian itself.
  - The last line of the main loop updates the approximation of the Jacobian or inverse of the Jacobian.
- Both methods converges superlinearly (to simple roots): faster than linear convergence but slower than quadratic convergence. 
- If Jacobian is available, setting $B_0=DF(x_0)^{-1}$ usually speeds up the method 2.
- Broyden named the method 1 *good* method, and the method 2 *bad* method. However, later, many other researchers found that method 2 works better overall. 
- There are further improved methods based on Broyden's methods. But we don't discuss them.

**Remark** (Comparison between Broyden's method 1 and 2)

| | Broyden's method 1 | Broyden's method 2 |
|---|---|---|
| Linear system  | must be solved to move forward | no need to solve a linear system |
| Approximate Jacobian | Available | Not available (only approximation of inverse is available) |

- It may seem that obtaining approximate Jacobian is not necessary as long as the algorithm gives us the solution. However, according to Sauer (2017) Numerical Analysis 3rd ed. (p. 141), some applications need approximate Jacobian as well as the solution. 



In [39]:
import numpy as np

def broyden(F, x0, B0=None, tol=1e-6, maxiter=100):
    """
    Solve F(x) = 0 using Broyden's "bad" method.
    
    Shapes
        Input: row vectors
        Internal computation: convert to column vectors
        Output: convert back to row vectors
    """
    x = x0.reshape(-1, 1)
    B = B0 if B0 else np.eye(len(x0))
    for i in range(maxiter):
        Fx = F(x).reshape(-1, 1)
        x_new = x - B @ Fx
        d = x_new - x
        D = F(x_new).reshape(-1, 1) - Fx
        B += ((d - B @ D) @ d.T @ B) / (d.T @ B @ D)
        x = x_new
        if np.linalg.norm(F(x)) < tol:
            break
    return x.reshape(-1,), i+1

In [46]:
def F(x):
    """
    A vector feild from 2D to 2D.

    Shapes
        Input: any 1D array of length 2 or 2D array with shape (2, 1)
        Output: row vector of shape (2,)
    """
    if x.ndim == 1:
        x = x.reshape(-1, 1)
    
    u = x[0, 0]
    v = x[1, 0]

    f1 = 6*u**3 + u*v - 3*v**2 - 4
    f2 = u**2 - 18*u*v**2 + 16*v**3 + 1

    return np.array([f1, f2])

x0 = np.array([1.5, 1.5])

x_exact = np.array([1., 1.])

x, N = broyden(F, x0)

print("N =", N)
print("x =", x)
print("F(x) =", F(x))
print("||F(x)|| =", np.linalg.norm(F(x)))
print("||x - x_exact|| =", np.linalg.norm(x - x_exact))

N = 53
x = [0.99999998 0.99999998]
F(x) = [-3.47756147e-07  1.54001867e-07]
||F(x)|| = 3.8033000524341814e-07
||x - x_exact|| = 2.9093487026234643e-08
