### Root Finding

#### Take-aways

After studying this chapter, we will be able to

- say what is the main problem of interest,
- explain some standard root finding methods, 
  - write the methods (pseudo-algorithm): bisection, Newton's method, secant method, and fixed point iteration
  - explain their mathematical and computational pros and cons, 
- explain why they work or related facts at an intuitive level,
  - intuition behind the four methods,
- give theoretical arguments about important facts,
  - derivation of Newton's method,
  - contraction mapping theorem,
  - convergence of fixed point iteration,
- give precise results on the four methods and related facts with the help of reference

#### Overview

##### Problem of interest

> ***Problem of interest***
>
> Given a function $f:\mathbb{R} \to \mathbb{R}$, find $\xi\in\mathbb{R}$ such that
> $$f(\xi)=0.$$

##### Methods

1. Bisection method
1. Newton's method
1. Secant method
1. Fixed point iteration

#### Bisection method

##### Method

![Bisection illustration](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Bisection_method.svg/1024px-Bisection_method.svg.png)

Figure: Wikipedia


> **Algorithm** (Bisection method)
> 
> - Get an initial interval $[a, b]$ with a sign-change: $f(a) f(b) < 0$.
> 
> - Choose $N$, maximum number of iterations.
> 
> - for i from 1 to N:
> <br>$\quad$ $\displaystyle c \leftarrow \frac{a + b}{2}$
> <br>$\quad$ if $f(a) f(c) \le 0$ then:
> <br>$\quad$ $\quad$ $b \leftarrow c$
> <br>$\quad$ else:
> <br>$\quad$ $\quad$ $a \leftarrow c$
> <br>$\quad$ end if
> <br>end for
> 
> - The approximate root is the final value of $c$.

##### Summary

- If convergent, the bisection method converges to the solution linearly.
> **Definition** (Linear convergence) 
> 
> Let $\{x_n\}_{n\in\mathbb{N}_0}$ be a sequence that converges to $\xi$. We say that it converges *linearly* if there exists $\lambda\in(0,1)$ such that
> $$ e_{n+1} = \lambda e_n, $$
> where $e_n:=|x_n - \xi|$ for $n=0, 1, 2,\cdots$. In words, it means *errors get shrunken by a factor of a fraction*.

> **Remark** (flexibility in definitions of asymptotic behaviors)
>
> The above definition is too strong (i.e., hard to satisfy) to be useful. 
> 
> 1. For example, if the following is the case,
> 
> $$
> e_n= 1, 1/2, \mathbf{1/3}, 1/4, 1/8, 1/16, \cdots,
> $$
> 
> then a single number $1/3$ mess up the definition. But it is more reasonable that we still consider this error decays linearly.
> Therefore, we usually require the condition $e_{n+1}=\lambda e_n$ except possibly finite number of exceptions: "there exists $N\in \mathbb{N}$ such that, for all $n\ge N$, we have $e_{n+1}=\lambda e_n$."
> 
> 2. Also, if 
> 
> $$
> e_n= 1, 1, 1/2, 1/3, 1/4, 1/9, 1/8, 1/27, \cdots,
> $$
> 
> the behaviors are similar to the linear convergence, but not quite: the pattern differs depending on odd- or even-numbered terms. Thus, in a more rigorous context, where word-by-word translations are fundamental to communications between people from broad backgrounds, we usually prefer to use inequalities: "there exists $N\in \mathbb{N}$ such that, for all $n\ge N$, we have $e_{n+2}\le \lambda e_n$." In the current example, if we set $\lambda=1/2$ the statement is true.
> 
> However, in this course, we will not pursue that level of rigor for convergence rates. We will use a bit flexible description unless there is an appropriate reason.


##### Analysis

![Convergence of bisection](https://jhparkyb.github.io/resources/notes/na/104ASlides_RootFinding011.png)

Proof: See Kincaid and Cheney (2002) p. 79.

#### Newton's method

##### Method

**Terminology**

It is also called *Newton-Raphson* method.


**Geometric intuition**

[Newton's method: Geogebra interactive module](https://www.geogebra.org/m/n6KXp4hE)

Creator: Lenore Horner

[Newton's method: Still illustration](https://math24.net/images/newtons-method1.svg)

Figure: https://math24.net/newtons-method.html

> **Algorithm** (Newton's method)
>
> Given a differentiable function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, compute, for $n\ge 0$,
>
> $$ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. $$

**Derivation of Newton's method using tangent line intuition**

See Board work.

> ***Remark***
>
> There are different styles of algorithm or pseudo-algorithm.
>
| Mathematics- or idea-oriented pseudo-algorithm | Coding-oriented pseudo-algorithm |
|---|---|
|Given an initial guess $x_0$, <br> compute <br> $ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$ for $n\ge 0$. | Input (or Data): $x_0$, $f$, $f'$ <br> Set: $Tol>0$, $x \gets x_0$ <br> While $\|x - x_{pre}\| > Tol$: <br> $ \quad \quad x_{pre} \gets x $ <br> $\quad \quad x \gets x - \frac{f(x)}{f'(x)}$ |
| Focus on the essence | Also consider some details in implementation. In particular, this usually includes *stopping criteria*. |  

##### Summary

> **Theorem** (Local, quadratic convergence of Newton's method)
> 
> Let $f$ be twice continuously differentiable and $f(\xi)=0$. If $f'(\xi) \neq 0$, then Newton's method is locally and quadratically convergent to $x$. That is, the method converges to the zero $\xi$ if the initial guess $x_0$ is sufficiently close to $\xi$. 

> **Definition** (Quadradic convergence) 
> 
> Let $\{x_n\}_{n\in\mathbb{N}}$ be a sequence that converges to $\xi$. We say that it converges quadratically fast if there exists $C>0$ such that
> $$ \lim_{n\to\infty} \frac{e_{n+1}}{e_n^2} = C, $$
> where $e_n:=|x_n - \xi|$ for $n=0, 1, 2,\cdots$. In words, it means *errors get shrunken by a square of the previous error*.

> **Question** (How fast is a quadratic convergence?)
>
> Suppose Newton's method starts to manifest quadratic converge from 5th iteration with $e_5 = 0.01$. What will be the error after four more iterations? For simplicity, assume $C=1$.
>
> (Reminder) This is **about atmosphere**, not getting it right.
> 
> 1. Think for a short time.
> 2. Share your guess with your pair.
> 3. Type your answer in clicker.
> 4. Feel free to say out loud.

##### Analysis

**Derivation of Newton's method using Taylor theorem**

![Derivation of Newton's method](https://jhparkyb.github.io/resources/notes/na/104ASlides_RootFinding014.png)

[Derivation of Newton's method](https://jhparkyb.github.io/resources/notes/na/der_NewtonMethodTaylor_lp2000.png)



In favor of more computational activities, we skip the proof the quadratic convergence of Newton's method. But since Newton's method is highly relevant even these days and since it inspires many other methods, we include some history about it.

> **Historical note**
>
> 1. Babylonians (1894 BC - 539 BC) used the method to approximate square roots: $\sqrt{2}$ accurately up to seven places. (Ref: [2, 3])
>       ![Babylonian clay tablet](https://projectlovelace.net/static_prod/img/YBC7289.jpg)
>
>       Figure: Project Lovelace
> 1. In 1669, the method was employed by Newton for the cubic equation $3x^3 -2x-5 = 0$. (Ref: [1])
> 1. In 1690, Raphson described the method for a general cubic equation $x^3 — bx = c$. (Ref: [1])
> 1. In 1818, Fourier proved the quadratic convergence of the method. (Ref: [1])
> 1. In 1829, Cauchy proved a convergence theorem which does not assume the existence of a solution. (existence of a solution is a consequence; but it assumes some other conditions on the iterates) (Ref: [1])
> 1. In 1939, Kantorovich proved a convergence theorem in a very general setting. (Ref: [1])
> 1. In 1948, Kantorovich proved an improved version, which is now called Kantorovich's theorem or the Newton-Kantorovich theorem: existence of a solution is not assumed and the convergence is quadratic in a very general setting. (Ref: [1])
> 
> Reference
> 
> [1] Brezinski (2001) Numerical Analysis: Historical Developments in the 20th Century. p. 242
> 
> [2] Sauer (2017) Numerical Analysis p. 41
> 
> [3] Wikipedia (Babylonia) 




In [1]:
import numpy as np

def newton(f, fp, ini, tol=1e-8, max_iter=20):
    """
    Return an approximate root of a function using Newton's method.

    INPUT
        f: function whose zero is sought.
        fp: derivative of f (name from 'f prime')
        ini: initial guess
        tol: tolerance for stopping criterion. If consecutive iterates differ by less than this, it is considered convergenct.
        max_iter: maximum number of iterations
    OUTPU
        approximated zero and the number of iterations. When the maximum number of iterations is reached, the last iterate with a warning message.
    """
    x = ini
    for i in range(max_iter):
        x_pre = x
        x = x - f(x)/fp(x)

        if np.abs(x - x_pre) < tol: 
            break
    
    if i == max_iter - 1:
        print("   Warning (newton): maximum number of iteration reached.\n     --> The output may not be close enough to the zero.")
    return x, i + 1

# find the square root
f = lambda x: x*x - 2.
fp = lambda x: 2*x

x0 = 10.

appr, iter = newton(f, fp, x0, max_iter=7)
sol = np.sqrt(2.)

print("Newton's method : ", appr, f"   ({iter} iterations taken)")
print("True solution   : ", sol)
print("Error           : ", appr - sol)


     --> The output may not be close enough to the zero.
Newton's method :  1.4142135623730954    (7 iterations taken)
True solution   :  1.4142135623730951
Error           :  2.220446049250313e-16


#### Secant method

##### Method

**Idea**: Replace $f'(x_n)$ with something similar in Newton's method.

![Secant method](https://mathworld.wolfram.com/images/eps-svg/SecantMethod_800.svg)

Figure: Wolfram MathWorld.

> ***Algorithm*** (Secant method)
>
> Given $x_0, x_1\in\mathbb{R}$, compute, for $n\ge 1$,
>
> $$ x_{n+1}=x_{n}-f\left(x_{n}\right)\frac{\left(x_{n}-x_{n-1}\right)}{f\left(x_{n}\right)-f\left(x_{n-1}\right)} $$

##### Summary

- If the secant method converges, its rate of convergence is the *golden ratio* ($\approx 1.618$).
- User must feed **two initial guesses**.
- It requires **only the function evaluation**, but not the derivatives. 

##### Analysis

In favor of more computational activities, we skip the proof the *superlinear* convergence (i.e., a convergence rate that is faster the linear: $e_{k+1} \approx C e_k^\alpha$ with $\alpha>1$) of the secant method.  

#### Fixed point iteration

##### Method

**Terminology**

It is also called *Picard iteration* or *functional iteration*.

**Geometric interpretation**

"A picture paints a thousand words." 

![Fixed point iteration](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Cosine_fixed_point.svg/1920px-Cosine_fixed_point.svg.png)

Figure: Wikipedia

[Fixed point iteration](https://www.geogebra.org/m/qUbg7Z6W) (Geogebra construction due to stuart.cork)



> **Algorithm** (Fixed point iteration - general)
>
> Given a function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, compute, for $n\ge 0$,
>
> $$ x_{n+1} = f(x_n). $$

> **Algorithm** (Fixed point iteration - root finding for $f$)
>
> Given a function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, set $g(x)=x+f(x)$, compute, for $n\ge 0$, 
>
> $$ x_{n+1} = g(x_n). $$

##### Summary

- If repeated applications of a function $g$ converges to $\xi$, then it solves $x=g(x)$. (Some condition on $f$ is needed: see Analysis below.)
- If converges, the fixed point iteration method converges *linearly*: there exists $C>0$ such that $e_{k+1}\approx \lambda e_k$ with $0<\lambda<1$. 
- If you want to solve the equation $f(x)=0$, set $g(x):=x+f(x)$ and apply the fixed point iteration to $g$. Then, the fixed point $\xi$ satisfies 
    $$\xi = g(\xi)=\xi+f(\xi) \quad \text{implies} \quad f(\xi)=0.$$

##### Analysis

> **Definition** (Fixed point)
> $x$ is called a *fixed point* of the function $g$ if $g(x)=x$.


> **Definition** (Contractive/Contraction mapping)
> A function $g:D \to \mathbb{R}$ is called *contractive* or a *contractive mapping/contraction* if there is $\lambda\in[0,1)$ such that $|g(x)-g(y)|\le \lambda|x-y|$ for all $x,y\in D$.

> **Theorem** (Contraction mapping is continuous)
> If $g:D\to \mathbb{R}$ is contractive, it is continuous.

> **Theorem** (Absolute convergence implies convergence)
> If $\sum_{n=1}^\infty x_n$ is absolutely convergent, i.e., $\sum_{n=1}^\infty |x_n| < \infty$, then $\sum_{n=1}^\infty x_n$ also converges.

> **Theorem** (Contraction Mapping Theorem)
> Let $D$ be a closed subset of $\mathbb{R}$. If $g:D \to D$ is a contraction, then it has a unique fixed point. Moreover, this fixed point is the limit of the functional iteration starting with any initial guess.

[Proof of contraction mapping theorem 1](https://jhparkyb.github.io/resources/notes/na/pf_ContractionMappingThm1_lp3000.png)

[Proof of contraction mapping theorem 2](https://jhparkyb.github.io/resources/notes/na/pf_ContractionMappingThm2_lp3001.png)

Proof outline

1. $x_n= (x_n - x_{n-1}) + (x_{n-1} - x_{n-2}) + \cdots + (x_{1} - x_{0}) + x_0$ absolutely converges, hence converges.
   - $|x_n - x_{n-1}| \le \lambda^{n-1} |x_1 - x_0|$
2. Pass $x_{n+1}=g(x_n)$ to the limit $n\to \infty$.
3. Uniqueness



> **Question** 
>
> The above proof outline did not use one condition and used another condition implicitly. What are they? 
>
> (Reminder) This is **about atmosphere**, not getting it right.
> 
> 1. Think for a short time.
> 2. Share your guess with your pair.
> 3. Type your answer in clicker.
> 4. Feel free to say out loud.

(Homework questions will ask you what happens if you ignore them.)

#### Comparisons of root-finding methods

| | Bisection | Newton | Secant | Fixed point |
|---|---|---|---|---|
| need $f(x)$ | O | O | O | O |
| need $f'(x)$ | - | O | - | - |
| rate of convergence | 1 | 2 | 1.618 | 1 |
| rate of convergence <br> per two function eval's | 1 <br> (with smaller contraction constant) | 2 | $1.618^2\approx 2.618$ | 1 <br> (with smaller contraction constant) |
| global convergence | yes <br> if $f(a)f(b)<0$ | no | no | practially no |
| solution boxed | yes | no | no | generally, no |  
| generalization <br> to high dimensions <br> (intellectual effort) | awkward | yes, <br> but gradient may be unavailable  | yes, <br> but not very trival <br> (called quasi-Newton methods)| yes |
| generalization <br> to high dimensions <br> (numerical aspects) | N/A | demanding | depends | depends |


---
This work is licensed under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)
Part of the content of this notebook is borrowed from [Elementary Numerical Analysis (with Python)](https://lemesurierb.people.cofc.edu/elementary-numerical-analysis-python/preface.html) written by Brenton LeMesurier, College of Charleston and University of Northern Colorado. Thanks to Dr. LeMesurier for sharing excellent notes.

<!-- 
[proof of contraction mapping theorem 1](https://jhparkyb.github.io/resources/notes/na/104ABoardWork_RootFinding015.png)

[proof of contraction mapping theorem 2](https://jhparkyb.github.io/resources/notes/na/104ABoardWork_RootFinding016.png)
-->