### Root Finding

#### Take-aways (Chapter)

After studying this chapter, we will be able to

- say what is the main problem of interest,
- explain some standard root finding methods, 
  - write the methods (pseudo-algorithm): bisection, Newton's method, secant method, and fixed point iteration
  - explain their mathematical and computational pros and cons, 
- explain why they work or related facts at an intuitive level,
  - intuition behind the four methods,
- give theoretical arguments about important facts,
  - derivation of Newton's method,
  - contraction mapping theorem,
  - convergence of fixed point iteration,
- give precise results on the four methods and related facts with the help of reference
- write a program that solve an equation,
  - write a code that implements at least two of the main root finding methods,
  - report computational results that highlight some important aspects of the methods or problem.

#### Overview


##### Problem of interest

> ***Problem of interest***
>
> Given a function $f:\mathbb{R} \to \mathbb{R}$, find $\xi\in\mathbb{R}$ such that
> $$f(\xi)=0.$$

##### Methods

1. Bisection method
1. Newton's method
1. Secant method
1. Fixed point iteration

##### Why do we care about root finding?

1. mathematical problem 
   1. Polynomials of degree 5 or higher do not have solution formula. (Galois and Abel)
   2. We often need only approximate zeros to even polynomials of degree 3 or 4. And their formula are complicated.
   3. Transcendental equations.
2. Many other applications end up resulting in equations to solve.
   1. $x-\tan(x)=0$ (diffraction of light)
   2. $ x -a \sin(x) = b$, where $a,b$ take various values (planetary orbits)
   3. Finding solution to differential equations (ODE and PDE) result in a system of algebraic equations. 

#### Bisection method

##### Method

![Bisection illustration](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Bisection_method.svg/1024px-Bisection_method.svg.png)

Figure: Wikipedia


> **Algorithm** (Bisection method)
> 
> **Data**
> - $f$: function
> - $[a, b]$: initial interval with a sign-change: $f(a) f(b) < 0$
> 
> **Initialize**
> - TOL: error tolerance
>
> **Main computation**
> 
> - **while** $(b-a)/2 > $ TOL 
>   - $c \leftarrow \frac{a + b}{2}$
>   - if $f(c) = 0$, **stop**, **end**
>   - if $f(a) f(c) < 0$ then:
>       - $b \leftarrow c$
>   - else:
>       - $a \leftarrow c$
>
> **Result**
>
> - The final interval $[a,b]$ contains a root.
> - The approximate root is the final value of $c$.

##### Summary

- The bisection method converges to the solution linearly.

> **Definition** (Linear convergence) 
> 
> Let $\{x_n\}_{n\in\mathbb{N}_0}$ be a sequence that converges to $\xi$. We say that it converges *linearly* if there exists $\lambda\in(0,1)$ such that
> $$ e_{n+1} = \lambda e_n, $$
> where $e_n:=|x_n - \xi|$ for $n=0, 1, 2,\cdots$. In words, it means *errors get shrunken by a factor of a fraction*.

> **Theorem** (Linear convergence of bisection method)
>
> Suppose the bisection method is applied to solve an equation $f(x)=0$, where $f:[a,b]\to{\mathbb{R} }$ is a continuous function and satisfies $f(a)f(b) < 0$. Let $[a_0, b_0]=[a,b], [a_1, b_1], [a_2, b_2], \cdots$ be the intervals generated by the method and let $c_n=(a_n+b_n)/2$ be the midpoint of $[a_n,b_n]$. Then $\lim_{n\to\infty} a_n=\lim_{n\to\infty} b_n = \lim_{n\to\infty} c_n=\xi$, where $\xi\in[a,b]$ satisfies $f(\xi)=0$. Furthermore, the error satisfies
> 
> $$
> |c_n - \xi| \le 2^{-(n+1)}(b-a)
> $$
>
> Proof: See Kincaid and Cheney (2002) p. 79.

> **Remark** (flexibility in definitions of convergence rate)
>
> The above definition is too strong (i.e., hard to satisfy) to be useful. 
> 
> 1. For example, if the following is the case,
> 
> $$
> e_n= 1, 1/2, \mathbf{1/3}, 1/4, 1/8, 1/16, \cdots,
> $$
> 
> then a single number $1/3$ messes up the definition. But it is more reasonable that we still consider this error decays linearly.
> Therefore, we usually require the condition $e_{n+1}=\lambda e_n$ except possibly finite number of exceptions. This is why we often see the following state often in a more rigorous context: "there exists $N\in \mathbb{N}$ such that, for all $n\ge N$, we have $e_{n+1}=\lambda e_n$."
> 
> 2. Also, if 
> 
> $$
> e_n= 1, 1, 1/2, 1/3, 1/4, 1/9, 1/8, 1/27, \cdots,
> $$
> 
> the behaviors are similar to the linear convergence, but not quite: the pattern differs depending on odd- or even-numbered terms. Thus, in a more rigorous context, where word-by-word translations are fundamental to communications between people from broad backgrounds, we usually prefer to use inequalities: "there exists $N\in \mathbb{N}$ such that, for all $n\ge N$, we have $e_{n+2}\le \lambda e_n$." In the current example, if we set $\lambda=1/2$ the statement is true.
> 
> In this course, we will pay more attention to the idea rather than trying to be very accurate about the statements of convergence rates.


##### Analysis

- We skip the proof of the convergence of the bisection method (a) for it is evident from our intuition, and (b) to include more hands-on computations. 
- However, the proof is a great exercise involving what we have learned from real analysis. I encourage you trying it and welcome any questions.

#### Newton's method

##### Take-aways (lecture)



After studying this chapter, we will be able to

- start by clarifying the problem of interest,
- explain some standard root finding methods, 
  - write the methods (pseudo-algorithm): Newton's method (and maybe secant method)
  - explain their mathematical and computational pros and cons, 
- explain why they work or related facts at an intuitive level,
  - intuition behind Newton's method,
    - geometric
    - Taylor theorem
- give theoretical arguments about important facts,
  - derivation of Newton's method,
    - geometric
    - Taylor theorem
  - describe quadratic convergence with concrete numbers,
- give precise convergence results on the Newton's methods and related facts with the help of reference,
- write a program that solve an equation,
  - write a code that implements the Newton's methods,
  - report computational results that highlight some important aspects of the methods or problem.

##### First step



> ***Problem of interest***
>
> Given a function $f:\mathbb{R} \to \mathbb{R}$, find $\xi\in\mathbb{R}$ such that
> $$f(\xi)=0.$$

##### Method



**Terminology**

It is also called *Newton-Raphson* method.


**Geometric intuition**

[Newton's method: Geogebra interactive module](https://www.geogebra.org/m/n6KXp4hE)

Creator: Lenore Horner

[Newton's method: Still illustration](https://math24.net/images/newtons-method1.svg)

Figure: https://math24.net/newtons-method.html

**Derivation of Newton's method using tangent line intuition**

(Tell the teacher what you want to try.)

See Board work.

##### Algorithm

> **Algorithm** (Newton's method)
>
> Given a differentiable function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, compute, for $n\ge 0$,
>
> $$ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. $$

> **Question** (The Newton's method)
>
> Write down the Newton's method on a piece of paper.
>  
> - Repeat it until you get the method precisely. 
> - Consult notes/books only after finishing a trial.
> - Feel free to a conversation with your peers. 
> - Share the clues/tricks/mnemonic device, etc.
>
> (Reminder) This is **about atmosphere and process**, not getting it right at once.
> 
> 1. Think for a short time.
> 2. Share your guess with your pair.
> 3. Feel free to say out loud.

> ***Remark***
>
> There are different styles of algorithm or pseudo-algorithm.
>
| Mathematics- or idea-oriented pseudo-algorithm | Coding-oriented pseudo-algorithm |
|---|---|
|Given an initial guess $x_0$, <br> compute <br> $ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$ for $n\ge 0$. | Input (or Data): $x_0$, $f$, $f'$ <br> Set: $Tol>0$, $x \gets x_0$ <br> While $\|x - x_{pre}\| > Tol$: <br> $ \quad \quad x_{pre} \gets x $ <br> $\quad \quad x \gets x - \frac{f(x)}{f'(x)}$ |
| Focus on the essence | Also consider some details in implementation. In particular, this usually includes *stopping criteria*. |  

##### Summary



- Convergence of the Newton's method is not guaranteed.
- If convergent, it converges to quadratically fast.

> **Theorem** (Local, quadratic convergence of Newton's method)
> 
> Let $f$ be twice continuously differentiable and $f(\xi)=0$. If $f'(\xi) \neq 0$, then Newton's method is locally and quadratically convergent to $x$. That is, (local convergence) the method converges to the zero $\xi$ if the initial guess $x_0$ is sufficiently close to $\xi$, and (quadratic convergence) there exists $C>0$ such that
> $$ \lim_{n\to\infty} \frac{e_{n+1}}{e_n^2} = C, $$
> where $e_n:=|x_n - \xi|$ and $x_n$'s ($n=0,1,2,\cdots$) are the sequence generated by the Newton's method.

> **Definition** (Quadradic convergence) 
> 
> Let $\{x_n\}_{n\in\mathbb{N}}$ be a sequence that converges to $\xi$. We say that it converges quadratically fast if there exists $C>0$ such that
> $$ \lim_{n\to\infty} \frac{e_{n+1}}{e_n^2} = C, $$
> where $e_n:=|x_n - \xi|$ for $n=0, 1, 2,\cdots$. In words, it means *errors get shrunken by a square of the previous error*.

> **Question** (How fast is a quadratic convergence?)
>
> Suppose Newton's method starts to manifest quadratic converge from 5th iteration with $e_5 = 0.01$. Guess what will be the error after four more iterations? For simplicity, assume $C=1$.
>
> (Reminder) This is **about atmosphere**, not getting it right.
> 
> 1. Think for a short time.
> 2. Share your guess with your pair.
> 3. Type your answer in clicker.
> 4. Feel free to say out loud.

**Remark** (Divergence of Newton's method)

- Newton's method may diverge while it converges fast if it does.
- If it happens to be $f'(x_n)=0$, the method breaks down.

![Divergence of Newton's method](https://amsi.org.au/ESA_Senior_Years/imageSenior/2a_numerical_methods_graph_7.png)

Figure: https://amsi.org.au/


##### Analysis



<!-- 
- In favor of more computational activities, we skip the proof the quadratic convergence of Newton's method. 
- Instead, we derive the Newton's method from calculus point of view. It involves a great application of Taylor's theorem. 
- Also, we include some history of the Newton's method in place of its convergence proof. 
- Newton's method is highly relevant even these days and since it inspires many other methods. 
- -->

**Derivation of Newton's method using Taylor theorem**

1. Let $\xi$ is a root, i.e., $f(\xi)=0$. 
2. Expand $f(\xi)$ around the current position, say, $x_n$. 
3. Take the linear approximation, namely, ignore the second order term or higher.
4. Solve for $\xi$, and call it $x_{n+1}$. 

**Remark**

- (In step 1) Pretending to know the solution is often start of a magic.
- (In step 2) What about the other way around?
- (In step 3) What did we lose and what did we obtain? 

<!-- ![Derivation of Newton's method](https://jhparkyb.github.io/resources/notes/na/104ASlides_RootFinding014.png) -->

[Derivation of Newton's method](https://jhparkyb.github.io/resources/notes/na/der_NewtonMethodTaylor_lp2000.png)


In favor of more computational activities, we skip the proof the quadratic convergence of Newton's method. But since Newton's method is highly relevant even these days and since it inspires many other methods, we include some history about it.

> **Historical note**
>
> 1. Babylonians (1894 BC - 539 BC) used the method to approximate square roots: $\sqrt{2}$ accurately up to seven places. (Ref: [2, 3])
>       ![Babylonian clay tablet](https://projectlovelace.net/static_prod/img/YBC7289.jpg)
>
>       Figure: Project Lovelace
> 1. In 1669, the method was employed by Newton for the cubic equation $3x^3 -2x-5 = 0$. (Ref: [1])
> 1. In 1690, Raphson described the method for a general cubic equation $x^3 — bx = c$. (Ref: [1])
> 1. In 1818, Fourier proved the quadratic convergence of the method. (Ref: [1])
> 1. In 1829, Cauchy proved a convergence theorem which does not assume the existence of a solution. (existence of a solution is a consequence; but it assumes some other conditions on the iterates) (Ref: [1])
> 1. In 1939, Kantorovich proved a convergence theorem in a very general setting. (Ref: [1])
> 1. In 1948, Kantorovich proved an improved version, which is now called Kantorovich's theorem or the Newton-Kantorovich theorem: existence of a solution is not assumed and the convergence is quadratic in a very general setting. (Ref: [1])
> 
> Reference
> 
> [1] Brezinski (2001) Numerical Analysis: Historical Developments in the 20th Century. p. 242
> 
> [2] Sauer (2017) Numerical Analysis p. 41
> 
> [3] Wikipedia (Babylonia) 

##### Computational example (Babylonians)



> **Problem** (Computing $\sqrt{2}$)
>
> Write a code that computes approximate value of $\sqrt{2}$ using Newton's method.

(Step 1) Cast the problem as a root finding problem and summarize it. (Intellectual work needed for $f$ and $f'$)

(Step 2) Write a (programming) function that implements Newton's method.

(Step 3) Set up the computation (function, initial guess, etc.) and implement it.

(Step 4) Reorganize the result for specific purposes.


**Remark**

- True solution can be obtained from [Wolfram alpha: N[sqrt[2], 20]](https://www.wolframalpha.com/input?i=N%5Bsqrt%5B2%5D%2C+20%5D).
  - `N[sqrt[2], 20]` (Numerical value of $\sqrt{2}$ up to 20 decimal digit) gives us 1.4142135623730950488.
- 20 decimal digits are enough because computers can distinguish only up to around $2^{−52} \approx 2.22\times 10^{-16}$ when they use floating point arithmetic.
  - This number is called *machine epsilon*.
  - Machine epsilon depends on data type. (See [Wikipedia](https://en.wikipedia.org/wiki/Machine_epsilon) page for details)
  - Wolfram alpha can handle higher precision by using more computing resources than floating point arithmetic.

In [5]:
import numpy as np

def newton(f, fp, ini, tol=1e-8, max_iter=20):
    """
    Return an approximate root of a function using Newton's method.

    INPUT
        f: function whose zero is sought.
        fp: derivative of f (name from 'f prime')
        ini: initial guess
        tol: tolerance for stopping criterion. If consecutive iterates differ by less than this, it is considered convergenct.
        max_iter: maximum number of iterations
    OUTPU
        approximated zero and the number of iterations. When the maximum number of iterations is reached, the last iterate with a warning message.
    """
    x = ini
    for i in range(max_iter):
        x_pre = x
        x = x - f(x)/fp(x)

        if np.abs(x - x_pre) < tol: 
            break
    """
    if i == max_iter - 1:
        print("   Warning (newton): maximum number of iteration reached.\n     --> The output may not be close enough to the zero.")
    """
    return x, i + 1

# find the square root
f = lambda x: x*x - 2.
fp = lambda x: 2.*x

x0 = 10.
max_iter = 4

appr, iter = newton(f, fp, x0, max_iter=max_iter)
sol = 1.4142135623730950488 # obtained from Wolfram Alpha
err = np.abs(appr - sol)

print("Newton's method : ", appr, f"   ({iter} iterations taken)")
print("True solution   : ", sol)
print("Error           : ", err)

Newton's method :  1.444238094866232    (4 iterations taken)
True solution   :  1.4142135623730951
Error           :  0.030024532493136746


In [6]:
import pandas as pd

N = 10
df = pd.DataFrame(columns=['\# iterations', 'error'])

for i in range(1, N+1):
    appr, iter = newton(f, fp, x0, max_iter=i)
    err = np.abs(appr - sol)
    df.loc[i] = [iter, appr - sol]

df

Unnamed: 0,\# iterations,error
1,1.0,3.685786
2,2.0,1.331865
3,3.0,0.3229813
4,4.0,0.03002453
5,5.0,0.0003120928
6,6.0,3.442917e-08
7,7.0,2.220446e-16
8,8.0,0.0
9,8.0,0.0
10,8.0,0.0


#### Secant method

**Motivation**

Newton's method is great. But it requires $f'(x)$ as well as $f(x)$.

How can we overcome this?

##### Method

**Idea**: Replace $f'(x_n)$ in Newton's method with something similar.

**Geometric intuition**

![Secant method](https://mathworld.wolfram.com/images/eps-svg/SecantMethod_800.svg)

Figure: Wolfram MathWorld.

[Secant method: Geogebra interactive module](https://www.geogebra.org/m/vpk4geyu)

Author: Marian Choy


> ***Algorithm*** (Secant method)
>
> Given $x_0, x_1\in\mathbb{R}$, compute, for $n\ge 1$,
>
> $$ x_{n+1}=x_{n}-f\left(x_{n}\right)\frac{\left(x_{n}-x_{n-1}\right)}{f\left(x_{n}\right)-f\left(x_{n-1}\right)} $$

##### Summary

- If the secant method converges, its rate of convergence is the *golden ratio* ($\approx 1.618$).
- User must feed **two initial guesses**.
- It requires **only the function evaluation**, but not the derivatives. 

##### Analysis

In favor of more computational activities, we skip the proof the *superlinear* convergence (i.e., a convergence rate that is faster the linear: $e_{k+1} \approx C e_k^\alpha$ with $\alpha>1$) of the secant method.  

#### Fixed point iteration

##### Method

**Terminology**

It is also called *Picard iteration* or *functional iteration*.

**Geometric interpretation**

"A picture paints a thousand words." 

![Fixed point iteration](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Cosine_fixed_point.svg/1920px-Cosine_fixed_point.svg.png)

Figure: Wikipedia

[Fixed point iteration: Geogebra interactive module](https://www.geogebra.org/m/qUbg7Z6W) 

Author: stuart.cork



> **Algorithm** (Fixed point iteration - general)
>
> Given a function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, compute, for $n\ge 0$,
>
> $$ x_{n+1} = f(x_n). $$

> **Algorithm** (Fixed point iteration - root finding for $f$)
>
> Given a function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, set $g(x)=x+f(x)$, compute, for $n\ge 0$, 
>
> $$ x_{n+1} = g(x_n). $$

##### Summary

- If repeated applications of a function $g$ converges to $\xi$, then it solves $x=g(x)$. (Some condition on $f$ is needed: see Analysis below.)
- If converges, the fixed point iteration method converges *linearly*: there exists $C>0$ such that $e_{k+1}\approx \lambda e_k$ with $0<\lambda<1$. 
- If you want to solve the equation $f(x)=0$, set $g(x):=x+f(x)$ and apply the fixed point iteration to $g$. Then, the fixed point $\xi$ satisfies 
    $$\xi = g(\xi)=\xi+f(\xi) \quad \text{implies} \quad f(\xi)=0.$$

##### Analysis

> **Definition** (Fixed point)
> $x$ is called a *fixed point* of the function $g$ if $g(x)=x$.


> **Definition** (Contractive/Contraction mapping)
> A function $g:D \to \mathbb{R}$ is called *contractive* or a *contractive mapping/contraction* if there is $\lambda\in[0,1)$ such that $|g(x)-g(y)|\le \lambda|x-y|$ for all $x,y\in D$.

> **Theorem** (Contraction mapping is continuous)
> If $g:D\to \mathbb{R}$ is contractive, it is continuous.

> **Theorem** (Absolute convergence implies convergence)
> If $\sum_{n=1}^\infty x_n$ is absolutely convergent, i.e., $\sum_{n=1}^\infty |x_n| < \infty$, then $\sum_{n=1}^\infty x_n$ also converges.

> **Theorem** (Contraction Mapping Theorem)
> Let $D$ be a closed subset of $\mathbb{R}$. If $g:D \to D$ is a contraction, then it has a unique fixed point. Moreover, this fixed point is the limit of the functional iteration starting with any initial guess.

[Proof of contraction mapping theorem 1](https://jhparkyb.github.io/resources/notes/na/pf_ContractionMappingThm1_lp3000.png)

[Proof of contraction mapping theorem 2](https://jhparkyb.github.io/resources/notes/na/pf_ContractionMappingThm2_lp3001.png)

Proof outline

1. $x_n= (x_n - x_{n-1}) + (x_{n-1} - x_{n-2}) + \cdots + (x_{1} - x_{0}) + x_0$ absolutely converges, hence converges.
   - $|x_n - x_{n-1}| \le \lambda^{n-1} |x_1 - x_0|$
2. Pass $x_{n+1}=g(x_n)$ to the limit $n\to \infty$.
3. Uniqueness



> **Question** 
>
> The above proof outline did not use one condition and used another condition implicitly. What are they? 
>
> (Reminder) This is **about atmosphere**, not getting it right.
> 
> 1. Think for a short time.
> 2. Share your guess with your pair.
> 3. Type your answer in clicker.
> 4. Feel free to say out loud.

(Homework questions will ask you what happens if you ignore them.)

#### Comparisons of root-finding methods

| | Bisection | Newton | Secant | Fixed point |
|---|---|---|---|---|
| need $f(x)$ | O | O | O | O |
| need $f'(x)$ | - | O | - | - |
| rate of convergence | 1 | 2 | 1.618 | 1 |
| rate of convergence <br> per two function eval's | 1 <br> (with smaller contraction constant) | 2 | $1.618^2\approx 2.618$ | 1 <br> (with smaller contraction constant) |
| global convergence | yes <br> if $f(a)f(b)<0$ | no | no | practially no |
| solution boxed | yes | no | no | generally, no |  
| generalization <br> to high dimensions <br> (intellectual effort) | awkward | yes, <br> but gradient may be unavailable  | yes, <br> but not very trival <br> (called quasi-Newton methods)| yes |
| generalization <br> to high dimensions <br> (numerical aspects) | N/A | demanding | depends | depends |


---
This work is licensed under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)
Part of the content of this notebook is borrowed from [Elementary Numerical Analysis (with Python)](https://lemesurierb.people.cofc.edu/elementary-numerical-analysis-python/preface.html) written by Brenton LeMesurier, College of Charleston and University of Northern Colorado. Thanks to Dr. LeMesurier for sharing excellent notes.

<!-- 
[proof of contraction mapping theorem 1](https://jhparkyb.github.io/resources/notes/na/104ABoardWork_RootFinding015.png)

[proof of contraction mapping theorem 2](https://jhparkyb.github.io/resources/notes/na/104ABoardWork_RootFinding016.png)
-->