## Root Finding

#### Take-aways (Chapter)

After studying this chapter, we will be able to

- say what is the main problem of interest,
- explain some standard root finding methods, 
  - write the methods (pseudo-algorithm): bisection, Newton's method, secant method, and fixed point iteration
  - explain their mathematical and computational pros and cons, 
- explain why they work or related facts at an intuitive level,
  - intuition behind the four methods,
- give theoretical arguments about important facts,
  - derivation of Newton's method,
  - contraction mapping theorem,
  - convergence of fixed point iteration,
- give precise results on the four methods and related facts with the help of reference
- write a program that solve an equation,
  - write a code that implements at least two of the main root finding methods,
  - report computational results that highlight some important aspects of the methods or problem.

### Overview


#### Problem of interest

> ***Problem of interest***
>
> Given a function $f:\mathbb{R} \to \mathbb{R}$, find $x\in\mathbb{R}$ such that
> $$f(x)=0.$$

#### Methods

1. Bisection method
1. Newton's method
1. Secant method
1. Fixed point iteration

#### Why do we care about root finding?

1. mathematical problem 
   1. Polynomials of degree 5 or higher do not have solution formula. (Galois and Abel)
   2. We often need only approximate zeros to even polynomials of degree 3 or 4. And their formula are complicated.
   3. Transcendental equations.
2. Many other applications end up resulting in equations to solve.
   1. $x-\tan(x)=0$ (diffraction of light)
   2. $ x -a \sin(x) = b$, where $a,b$ take various values (planetary orbits)
   3. Finding solution to differential equations (ODE and PDE) result in a system of algebraic equations. 




**Remark** (Aside: Cubic and quartic formula)

- In regard to the first reason, even cubic (degree 3) or quartic (degree 4) equations are complicated 
- Also, they involve unusual numerical procedures: complex numbers appear even when there is only one real simple root.

Solution for $ax^3+bx^2+cx+d=0$:

![Cubic formula](https://www.curtisbright.com/quartic/cubic.png)

Solution for $ax^4 + bx^3 + cx^2 + dx + e = 0$:

![Quartic formula](https://www.curtisbright.com/quartic/quartic.png)

Figures: https://www.curtisbright.com/quartic/


### Bisection method

#### Method


##### Idea


![Bisection illustration](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Bisection_method.svg/1024px-Bisection_method.svg.png)

Figure: Wikipedia


##### Algorithm

**Algorithm** (Bisection method)

**Data**
- $f$: function
- $[a, b]$: initial interval with a sign-change: $f(a) f(b) < 0$

**Initialize**
- TOL: error tolerance
**Main computation**

- **while** $(b-a)/2 > $ TOL 
  - $c \leftarrow \frac{a + b}{2}$
  - if $f(c) = 0$, **stop**, **end**
  - if $f(a) f(c) < 0$ then:
      - $b \leftarrow c$
  - else:
      - $a \leftarrow c$
**Result**
- The final interval $[a,b]$ contains a root.
- The approximate root is the final value of $c$.

#### Convergence

- The bisection method converges to the solution globally and linearly.



**Definition** (Linear convergence) 

Let $e_i$ denote the error at step i of an iterative method. If

$$ 
\lim_{i\to\infty}\frac{e_{i+1}}{e_i} = S < 1,
$$

the method is said to obey linear convergence with rate S.



**Theorem** (Linear convergence of bisection method)

Suppose the bisection method is applied to solve an equation $f(x)=0$, where $f:[a,b]\to{\mathbb{R} }$ is a continuous function and satisfies $f(a)f(b) < 0$. Let $[a_0, b_0]=[a,b], [a_1, b_1], [a_2, b_2], \cdots$ be the intervals generated by the method and let $c_i=(a_i+b_i)/2$ be the midpoint of $[a_i,b_i]$. Then $\lim_{i\to\infty} a_i=\lim_{i\to\infty} b_i = \lim_{i\to\infty} c_i=x$, where $x\in[a,b]$ satisfies $f(x)=0$. Furthermore, the error satisfies

$$
|c_i - x| \le 2^{-(i+1)}(b-a)
$$



Proof: See Kincaid and Cheney (2002) p. 79.

##### Analysis

- We skip the proof of the convergence of the bisection method (a) for it is evident from our intuition, and (b) to include more hands-on computations. 
- However, the proof is a great exercise involving what we have learned from real analysis. I encourage you trying it and welcome any questions.

### Newton's method

#### Overview



- Part 1: What we can do
  - Derive Newton's method using Taylor's theorem
  - Compute $\sqrt{2}$
  - Quadratic convergence
- Part 2: What to be careful of
  - Local convergence
  - Slow down $\longrightarrow$ Modified Newton's method

#### Part 1: What we can do


> ***Problem of interest***
>
> Given a function $f:\mathbb{R} \to \mathbb{R}$, find $x\in\mathbb{R}$ such that
> $$f(x)=0.$$

##### Why do we care?


**Newton's method**

- Fast convergence
- Generalizes to more complex settings
- Inspire many other methods

**Root finding**

1. Many applications boil down to solving an equation.
   1. $x-\tan(x)=0$ (diffraction of light)
   2. $ x -a \sin(x) = b$, where $a,b$ take various values (planetary orbits)
   3. Finding solution to differential equations (ODE and PDE) result in a system of algebraic equations. 
2. mathematical problem 
   1. Transcendental equations.
   2. Polynomials of degree 5 or higher do not have solution formula. (Galois and Abel)
   3. We often need only approximate zeros to even polynomials of degree 3 or 4. And their formula are complicated.


**Remark** (Aside: Cubic and quartic formula)

- In regard to the first reason, even cubic (degree 3) or quartic (degree 4) equations are complicated 
- Also, they involve unusual numerical procedures: complex numbers appear even when there is only one real simple root.

Solution for $ax^3+bx^2+cx+d=0$:

![Cubic formula](https://www.curtisbright.com/quartic/cubic.png)

Solution for $ax^4 + bx^3 + cx^2 + dx + e = 0$:

![Quartic formula](https://www.curtisbright.com/quartic/quartic.png)

Figures: https://www.curtisbright.com/quartic/


##### Method


> **Algorithm** (Newton's method or Newton-Raphson's method)
>
> Given a differentiable function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, compute, for $i\ge 0$,
>
> $$ x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)}. $$



##### Geometric intuition

![Newton's method: Still illustration](https://math24.net/images/newtons-method1.svg)

Figure: https://math24.net/newtons-method.html

##### Derivation


**Remark** (Refresher: Geometric understanding of Taylor series)

![Power series as a function approximation](https://suzyahyah.github.io/assets/Calculus-taylor.png)

[Interactive Geogebra by Guillermo Bautista: $e^x$](https://www.geogebra.org/m/u25naf28)

<!-- [Interactive Geogebra by matheagle: $ln(1 + x^2) + sin(3x)$](https://www.geogebra.org/m/YtnuMjEF)

Geogebra activity suggestion
- Type in a simple function such as $e^x$, $1/(1-x)$, $\sin(x)$, etc.
- Move the slide bar to vary $n$.  -->


In words,

- Partial sums behave more and more similarly to the specific function, in this case $e^x$: 
    - $1$, 
    - $1+x$, 
    - $1+x+\frac{x^2}{2!}$, 
    - $1+x+\frac{x^2}{2!}+\frac{x^3}{3!}$, 
    - $\vdots$
- We can imagine that if we add infinitely many of them, that will behave exactly the same as $e^x$.


**Theorem** (Taylor's theorem)

Let $x$ and $x_0$ be real numbers, and let $f$ be $k + 1$ times continuously differentiable on the interval between $x$ and $x_0$. Then there exists a number $c$ between $x$ and $x_0$ such that

$$
\begin{split}
f(x) &= f(x_0) + f'(x_0)(x-x_0)+ \frac{f''(x_0)}{2!}(x-x_0)^2+\cdots
\\
&\quad +\frac{f^{(k)}(x_0)}{k!}(x-x_0)^k +\frac{f^{(k+1)}(c)}{(k+1)!}(x-x_0)^{k+1}.
\end{split}
$$


**Proof** 

Notation: In the proof, $\xi$ plays the role of $c$.

[proof of Taylor's theorem with Lagrange remainder 1](https://jhparkyb.github.io/resources/notes/na/pf_TaylorThmLag1_lp3000.png)

[proof of Taylor's theorem with Lagrange remainder 2](https://jhparkyb.github.io/resources/notes/na/pf_TaylorThmLag2_lp3001.png)

**Corollary** (Taylor's theorem - linear approximation version; $k=1$)

Let $x$ and $x_0$ be real numbers, and let $f$ be twice continuously differentiable on the interval between $x$ and $x_0$. Then there exists a number $c$ between $x$ and $x_0$ such that

$$
f(x) = f(x_0) + f'(x_0)(x-x_0)+ \frac{f''(c)}{2!}(x-x_0)^2.
$$



**Derivation of Newton's method using Taylor theorem**

1. Let $x$ be a root, i.e., $f(x)=0$. 
2. Expand $f(x)$ around the current position, say, $x_i$. 
3. Take the linear approximation, namely, ignore the second order term or higher.
4. Solve for $x$, and call it $x_{i+1}$. 



[Derivation of Newton's method](https://jhparkyb.github.io/resources/notes/na/der_NewtonMethodTaylor_lp2000.png)



**Remark**

- (In step 1) Pretending to know the solution is often start of a magic.
- (In step 2) What about the other way around?
- (In step 3) What did we lose and what did we obtain? 

<!-- ![Derivation of Newton's method](https://jhparkyb.github.io/resources/notes/na/104ASlides_RootFinding014.png) -->


##### Computational example (computing $\sqrt{2}$)


> **Problem** (Computing $\sqrt{2}$)
>
> Write a code that computes approximate value of $\sqrt{2}$ using Newton's method.

(Step 1) Cast the problem as a root finding problem and summarize it. (Intellectual work needed for $f$ and $f'$)

(Step 2) Write a (programming) function that implements Newton's method.

(Step 3) Set up the computation (function, initial guess, etc.) and implement it.

(Step 4) Post-process and report the result.


###### Details

**Remark**

- True solution can be obtained from [Wolfram alpha: N[sqrt[2], 20]](https://www.wolframalpha.com/input?i=N%5Bsqrt%5B2%5D%2C+20%5D).
  - `N[sqrt[2], 20]` (Numerical value of $\sqrt{2}$ up to 20 decimal digit) gives us 1.4142135623730950488.
- 20 decimal digits are enough because computers can distinguish only up to around $2^{−52} \approx 2.22\times 10^{-16}$ when they use floating point arithmetic.
  - This number is called *machine epsilon*.
  - Machine epsilon depends on data type. (See [Wikipedia](https://en.wikipedia.org/wiki/Machine_epsilon) page for details)
  - Wolfram alpha can handle higher precision by using more computing resources than floating point arithmetic.

###### Implementation

In [1]:
import numpy as np

def newton(f, fp, ini, tol=1e-8, max_iter=20):
    """
    Return an approximate root of a function using Newton's method.

    INPUT
        f: function whose zero is sought.
        fp: derivative of f (name from 'f prime')
        ini: initial guess
        tol: tolerance for stopping criterion. If consecutive iterates differ by less than this, it is considered convergenct.
        max_iter: maximum number of iterations
    OUTPU
        approximated zero and the number of iterations. When the maximum number of iterations is reached, the last iterate with a warning message.
    """
    x = ini
    for i in range(max_iter):
        x_pre = x
        x = x - f(x)/fp(x)

        if np.abs(x - x_pre) < tol: 
            break
    """
    if i == max_iter - 1:
        print("   Warning (newton): maximum number of iteration reached.\n     --> The output may not be close enough to the zero.")
    """
    return x, i + 1


In [2]:
# Problem settings
f = lambda x: x*x - 2.
fp = lambda x: 2.*x

x0 = 10.

# Numerical settings
max_iter = 100

appr, iter = newton(f, fp, x0, max_iter=max_iter)
sol = 1.4142135623730950488 # obtained from Wolfram Alpha
err = np.abs(appr - sol)

print("Newton's method : ", appr, f"   ({iter} iterations taken)")
print("True solution   : ", sol)
print("Error           : ", err)

Newton's method :  1.4142135623730951    (8 iterations taken)
True solution   :  1.4142135623730951
Error           :  0.0


##### Convergence


**Summary**

- Convergence of the Newton's method is not guaranteed. $\longrightarrow$ local convergence.
- If convergent, it is very fast. $\longrightarrow$ quadratic convergence.



**Terminology** (local convergence)

An iterative method is called locally convergent to $r$ if the method converges to $r$ for initial guesses sufficiently close to $r$.


**Definition** (Quadradic convergence) 

Let $e_i$ denote the error after step $i$ of an iterative method. The iteration is quadratically convergent if

$$
M=\lim _{i \rightarrow \infty} \frac{e_{i+1}}{e_i^2}<\infty.
$$

In words, it means *errors get shrunken by a square of the previous error*.


**Theorem** (Local, quadratic convergence of Newton's method)

Let $f$ be twice continuously differentiable and $f (r) = 0$. If $f'(r) \neq 0$, then Newton’s Method is locally and quadratically convergent to $r$. The error $e_i$ at step $i$ satisfies

$$
M=\lim _{i \rightarrow \infty} \frac{e_{i+1}}{e_i^2}<\infty,
$$

where $M= f''(r)/(2 f'(r))$.

> **Question** (How fast is a quadratic convergence?)
>
> Suppose Newton's method starts to manifest quadratic converge from 5th iteration with $e_5 = 0.1$. Guess what will be the error after three more iterations? For simplicity, assume $M=1$. It is more fun to guess without thinking much.
>
> (Reminder) This is **about atmosphere**, not getting it right.
> 
> 1. Think for a short time.
> 2. Share your guess with your pair.
> 3. Feel free to say out loud.

**Remark**

- Revisit computation of $\sqrt 2$, and see if it shows quadratic convergence.

In [4]:
import pandas as pd

N = 10
df = pd.DataFrame(columns=['\# iterations', 'error'])

for i in range(1, N+1):
    appr, iter = newton(f, fp, x0, max_iter=i)
    err = np.abs(appr - sol)
    df.loc[i] = [iter, appr - sol]

df

Unnamed: 0,\# iterations,error
1,1.0,3.685786
2,2.0,1.331865
3,3.0,0.3229813
4,4.0,0.03002453
5,5.0,0.0003120928
6,6.0,3.442917e-08
7,7.0,2.220446e-16
8,8.0,0.0
9,8.0,0.0
10,8.0,0.0


**Remark**

- If this did not surprise you, look at the example below.

In [8]:
"""This example shows how computer arithmetic can be limited by the precision of floating point numbers. The output must be 1 from mathematical point of view.

    Suggested parameter
        N = 10, 100, 170, 171
"""

N = 171

prod = 1.

for i in range(1, N+1):
    prod = prod * i

for i in range(1, N+1):
    prod = prod / i

print(prod)

inf


#### Part 2: What to be careful of

**Summary**

- $f'$ must exists and $f'(x_i)\neq 0$.
- Newton's method may diverge. 
- $f'(r)=0$ (Multiple roots) slow down to a linear convergence.
  - Modified Newton's method recovers quadratic convergence.

##### $f'$ must be available

##### Divergence

**Remark** (Divergence of Newton's method)

- Newton's method may diverge while it converges fast if it does. (i.e., only locally convergent)
- If it happens to be $f'(x_i)=0$, the method breaks down.

<!-- | | | |
|---|---|---|
|![Divergence of Newton's method](https://amsi.org.au/ESA_Senior_Years/imageSenior/2a_numerical_methods_graph_7.png) | ![Oscillation of Newton's method](https://i.stack.imgur.com/yPC4a.png)| ![Failure of Newton's method](https://mmerevise.co.uk/app/uploads/2021/07/Method-Fail-2-e1650551922512.png.webp) | -->

![Divergence of Newton's method](https://amsi.org.au/ESA_Senior_Years/imageSenior/2a_numerical_methods_graph_7.png) 

![Oscillation of Newton's method](https://i.stack.imgur.com/yPC4a.png)

![Failure of Newton's method](https://mmerevise.co.uk/app/uploads/2021/07/Method-Fail-2-e1650551922512.png.webp)


Figure: https://amsi.org.au/, StackExchange, MME


**Remark** 

- Local convergence can still be useful.
  - In applications of time evolution, next state can be formulated as a solution to an equation.
  - Previous state serves as a good initial guess for the next state.

![Time evolution: Phase field crystal equation](https://www.mdpi.com/crystals/crystals-12-01271/article_deploy/html/images/crystals-12-01271-g006-550.jpg)

Figure: MDPI (Crystal phase field morphology; an example of time evolution)

##### $f'(r)=0$ slows down

**Example** (Slow down of Newton's method)

Implement Newton's method for $x^2=0$.

In [44]:
g = lambda x: x*x
gp = lambda x: 2*x

sol = 0.

x0 = 10


In [45]:
import pandas as pd

N = 10
df = pd.DataFrame(columns=['\# iterations', 'error'])

for i in range(1, N+1):
    appr, iter = newton(g, gp, x0, max_iter=i)
    err = np.abs(appr - sol)
    df.loc[i] = [iter, appr - sol]

df

Unnamed: 0,\# iterations,error
1,1.0,5.0
2,2.0,2.5
3,3.0,1.25
4,4.0,0.625
5,5.0,0.3125
6,6.0,0.15625
7,7.0,0.078125
8,8.0,0.039062
9,9.0,0.019531
10,10.0,0.009766


**What is happening?**

Geogebra interactive module: suggested settings

1. x^2 - 2
2. x^2

[Newton's method: Geogebra interactive module](https://www.geogebra.org/m/n6KXp4hE)

Creator: Lenore Horner


Mathematics makes it clearer.

- $g(x)=x^2$
- $g'(x)=2x$

$$
\begin{aligned}
x_{i+1} & =x_i-\frac{f\left(x_i\right)}{f^{\prime}\left(x_i\right)} \\
& =x_i-\frac{x_i^2}{2 x_i} \\
& =\frac{x_i}{2}
\end{aligned}
$$


**Theorem** (Linear convergence for multiple roots)

Assume that the $(m + 1)$-times continuously differentiable function $f$ on $[a,b]$ has a multiplicity $m$ root at $r$. Then Newton’s Method is locally convergent to $r$, and the error $e_i$ at step $i$ satisfies

$$
\lim _{i \rightarrow \infty} \frac{e_{i+1}}{e_i}=S,
$$

where $S=(m-1)/m$.


##### Remedy: Modified Newton's method

**Theorem** (Modified Newton's method)

If $f$ is $(m + 1)$-times continuously differentiable on $[a,b]$, which contains a root $r$ of multiplicity $m > 1$, then *Modified Newton’s Method* 

$$
x_{i+1}=x_i-\frac{m f\left(x_i\right)}{f^{\prime}\left(x_i\right)}
$$

converges locally and quadratically to $r$.

### Secant method

**Motivation**

Newton's method is great. But it requires $f'(x)$ as well as $f(x)$.

How can we overcome this?

##### Method

**Idea**: Replace $f'(x_i)$ in Newton's method with something similar.

**Geometric intuition**

![Secant method](https://mathworld.wolfram.com/images/eps-svg/SecantMethod_800.svg)

Figure: Wolfram MathWorld.

[Secant method: Geogebra interactive module](https://www.geogebra.org/m/vpk4geyu)

Author: Marian Choy


> ***Algorithm*** (Secant method)
>
> Given $x_0, x_1\in\mathbb{R}$, compute, for $i\ge 1$,
>
> $$ x_{i+1}=x_{i}-f\left(x_{i}\right)\frac{\left(x_{i}-x_{i-1}\right)}{f\left(x_{i}\right)-f\left(x_{i-1}\right)} $$

##### Summary

- If the secant method converges, its rate of convergence is the *golden ratio* ($\approx 1.618$).
- User must feed **two initial guesses**.
- It requires **only the function evaluation**, but not the derivatives. 

##### Analysis

In favor of more computational activities, we skip the proof the *superlinear* convergence (i.e., a convergence rate that is faster the linear: $e_{k+1} \approx C e_k^\alpha$ with $\alpha>1$) of the secant method.  

### Fixed point iteration

##### Method

**Terminology**

It is also called *Picard iteration* or *functional iteration*.

**Geometric interpretation**

"A picture paints a thousand words." 

![Fixed point iteration](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Cosine_fixed_point.svg/1920px-Cosine_fixed_point.svg.png)

Figure: Wikipedia

[Fixed point iteration: Geogebra interactive module](https://www.geogebra.org/m/qUbg7Z6W) 

Author: stuart.cork



> **Algorithm** (Fixed point iteration - general)
>
> Given a function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, compute, for $i\ge 0$,
>
> $$ x_{i+1} = f(x_i). $$

> **Algorithm** (Fixed point iteration - root finding for $f$)
>
> Given a function $f:\mathbb{R}\to\mathbb{R}$ and an initial guess $x_0\in\mathbb{R}$, set $g(x)=x+f(x)$, compute, for $i\ge 0$, 
>
> $$ x_{i+1} = g(x_i). $$

##### Summary

- If repeated applications of a function $g$ converges to $x$, then it solves $x=g(x)$. (Some condition on $f$ is needed: see Analysis below.)
- If converges, the fixed point iteration method converges *linearly*: there exists $C>0$ such that $e_{k+1}\approx \lambda e_k$ with $0<\lambda<1$. 
- If you want to solve the equation $f(x)=0$, set $g(x):=x+f(x)$ and apply the fixed point iteration to $g$. Then, the fixed point $x$ satisfies 
    $$x = g(x)=x+f(x) \quad \text{implies} \quad f(x)=0.$$

##### Analysis

> **Definition** (Fixed point)
> $x$ is called a *fixed point* of the function $g$ if $g(x)=x$.


> **Definition** (Contractive/Contraction mapping)
> A function $g:D \to \mathbb{R}$ is called *contractive* or a *contractive mapping/contraction* if there is $\lambda\in[0,1)$ such that $|g(x)-g(y)|\le \lambda|x-y|$ for all $x,y\in D$.

> **Theorem** (Contraction mapping is continuous)
> If $g:D\to \mathbb{R}$ is contractive, it is continuous.

> **Theorem** (Absolute convergence implies convergence)
> If $\sum_{i=1}^\infty x_i$ is absolutely convergent, i.e., $\sum_{i=1}^\infty |x_i| < \infty$, then $\sum_{i=1}^\infty x_i$ also converges.

> **Theorem** (Contraction Mapping Theorem)
> Let $D$ be a closed subset of $\mathbb{R}$. If $g:D \to D$ is a contraction, then it has a unique fixed point. Moreover, this fixed point is the limit of the functional iteration starting with any initial guess.

[Proof of contraction mapping theorem 1](https://jhparkyb.github.io/resources/notes/na/pf_ContractionMappingThm1_lp3000.png)

[Proof of contraction mapping theorem 2](https://jhparkyb.github.io/resources/notes/na/pf_ContractionMappingThm2_lp3001.png)

Proof outline

1. $x_i= (x_i - x_{i-1}) + (x_{i-1} - x_{i-2}) + \cdots + (x_{1} - x_{0}) + x_0$ absolutely converges, hence converges.
   - $|x_i - x_{i-1}| \le \lambda^{i-1} |x_1 - x_0|$
2. Pass $x_{i+1}=g(x_i)$ to the limit $i\to \infty$.
3. Uniqueness



> **Question** 
>
> The above proof outline did not use one condition and used another condition implicitly. What are they? 
>
> (Reminder) This is **about atmosphere**, not getting it right.
> 
> 1. Think for a short time.
> 2. Share your guess with your pair.
> 3. Type your answer in clicker.
> 4. Feel free to say out loud.

(Homework questions will ask you what happens if you ignore them.)

#### Comparisons of root-finding methods

| | Bisection | Newton | Secant | Fixed point |
|---|---|---|---|---|
| need $f(x)$ | O | O | O | O |
| need $f'(x)$ | - | O | - | - |
| rate of convergence | 1 | 2 | 1.618 | 1 |
| rate of convergence <br> per two function eval's | 1 <br> (with smaller contraction constant) | 2 | $1.618^2\approx 2.618$ | 1 <br> (with smaller contraction constant) |
| global convergence | yes <br> if $f(a)f(b)<0$ | no | no | practially no |
| solution boxed | yes | no | no | generally, no |  
| generalization <br> to high dimensions <br> (intellectual effort) | awkward | yes, <br> but gradient may be unavailable  | yes, <br> but not very trival <br> (called quasi-Newton methods)| yes |
| generalization <br> to high dimensions <br> (numerical aspects) | N/A | demanding | depends | depends |


### Appendix

#### Styles of algorithm

| Mathematics- or idea-oriented pseudo-algorithm | Coding-oriented pseudo-algorithm |
|---|---|
|Given an initial guess $x_0$, <br> compute <br> $ x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)}$ for $i\ge 0$. | Input (or Data): $x_0$, $f$, $f'$ <br> Set: $Tol>0$, $x \gets x_0$ <br> While $\|x - x_{pre}\| > Tol$: <br> $ \quad \quad x_{pre} \gets x $ <br> $\quad \quad x \gets x - \frac{f(x)}{f'(x)}$ |
| Focus on the essence | Also consider some details in implementation. In particular, this usually includes *stopping criteria*. |  

#### History of Newton's method



In favor of more computational activities, we skip the proof the quadratic convergence of Newton's method. But since Newton's method is highly relevant even these days and since it inspires many other methods, we include some history about it.

> **Historical note**
>
> 1. Babylonians (1894 BC - 539 BC) used the method to approximate square roots: $\sqrt{2}$ accurately up to seventh decimal place. (Ref: [2, 3])
>       ![Babylonian clay tablet](https://projectlovelace.net/static_prod/img/YBC7289.jpg)
>
>       Figure: Project Lovelace
> 1. In 1669, the method was employed by Newton for the cubic equation $3x^3 -2x-5 = 0$. (Ref: [1])
> 1. In 1690, Raphson described the method for a general cubic equation $x^3 — bx = c$. (Ref: [1])
> 1. In 1818, Fourier proved the quadratic convergence of the method. (Ref: [1])
> 1. In 1829, Cauchy proved a convergence theorem which does not assume the existence of a solution. (existence of a solution is a consequence; but it assumes some other conditions on the iterates) (Ref: [1])
> 1. In 1939, Kantorovich proved a convergence theorem in a very general setting. (Ref: [1])
> 1. In 1948, Kantorovich proved an improved version, which is now called Kantorovich's theorem or the Newton-Kantorovich theorem: existence of a solution is not assumed and the convergence is quadratic in a very general setting. (Ref: [1])
> 
> Reference
> 
> [1] Brezinski (2001) Numerical Analysis: Historical Developments in the 20th Century. p. 242
> 
> [2] Sauer (2017) Numerical Analysis p. 41
> 
> [3] Wikipedia (Babylonia) 

#### Wilkinson's polonomial

**Example** (Wilkinson's polynomial)

$$
\begin{aligned}
P(x) & =\prod_{i=1}^{20}(x-i)=(x-1)(x-2) \cdots(x-20) \\
& =x^{20}-210 x^{19}+20615 x^{18}+\cdots+2432902008176640000
\end{aligned}
$$

In [33]:
wilkinson_coeff= np.array([    
                     1.,
                  -210.,
                 20615.,
              -1256850.,
              53327946.,
           -1672280820.,
           40171771630.,
         -756111184500.,
        11310276995381.,
      -135585182899530.,
      1307535010540395.,
    -10142299865511450.,
     63030812099294896.,
   -311333643161390656.,
   1206647803780373248.,
  -3599979517947607040.,
   8037811822645051392.,
 -12870931245150988288.,
  13803759753640704000.,
  -8752948036761600000.,
   2432902008176640000.])

appr_roots = np.roots(wilkinson_coeff)

def poly_eval(coeff, x):
    """
    Return the value of Wilkinson's polynomial at x.
    """
    n = len(coeff)
    sum = 0.
    for i in range(n):
        sum += coeff[n-1-i]*x**i
    return sum

print(poly_eval(wilkinson_coeff, appr_roots[19]))
print(appr_roots[19])

15324.993485094896
0.9999999999998672


---
This work is licensed under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)
Part of the content of this notebook is borrowed from [Elementary Numerical Analysis (with Python)](https://lemesurierb.people.cofc.edu/elementary-numerical-analysis-python/preface.html) written by Brenton LeMesurier, College of Charleston and University of Northern Colorado. Thanks to Dr. LeMesurier for sharing excellent notes.

<!-- 
[proof of contraction mapping theorem 1](https://jhparkyb.github.io/resources/notes/na/104ABoardWork_RootFinding015.png)

[proof of contraction mapping theorem 2](https://jhparkyb.github.io/resources/notes/na/104ABoardWork_RootFinding016.png)
-->