# Systems of Nonlinear Equations

---

__Problem.__ Let us find a solution $\xi=(\xi_1,\xi_2,\ldots,\xi_n)$ of the system of $n$ equations 

\begin{align*}
f_1(x)&=0,\\
f_2(x)&=0,\\
&\vdots \\
f_n(x)&=0,
\end{align*}

and $n$ unknowns $x=(x_1,x_2,\ldots,x_n)$. Denoting $f=(f_1,f_2,\ldots,f_n)^T$, we can write this system as

$$
f(x)=0.
$$

We shall describe the __Newton's method__ and three __quasi-Newton__ methods:

2. __Broyden's__ method,
3. __Davidon-Fletcher-Powell__ method, and 
3. __Broyden-Fletcher-Goldfarb-Schano__ method.

Given starting approximation $x^{(0)}$, all methods generate sequence of points $x^{(n)}$ which, under certain condditions, converges towards the solution $\xi$. 

__Remark.__ Description of the methods and examples are taken from the textbook [Numerička matematika, poglavlje 4.4][RS04].

[RS04]: http://www.mathos.unios.hr/pim/Materijali/Num.pdf "R. Scitovski, 'Numerička matematika', Sveučilište u Osijeku, Osijek, 2004."

## Newton's method

__Jacobian__ or __Jacobi matrix__ of functions $f$ in the point $x$ is the matrix of first partial derivatives 

$$
J(f,x)=\begin{pmatrix} \displaystyle\frac{\partial f_1(x)}{\partial x_1} & \displaystyle\frac{\partial f_1(x)}{\partial x_2} & \cdots &
\displaystyle\frac{\partial f_1(x)}{\partial x_n} \\
\displaystyle\frac{\partial f_2(x)}{\partial x_1} & \displaystyle\frac{\partial f_2(x)}{\partial x_2} & \cdots &
\displaystyle\frac{\partial f_2(x)}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\displaystyle\frac{\partial f_n(x)}{\partial x_1} & \displaystyle\frac{\partial f_n(x)}{\partial x_2} & \cdots &
\displaystyle\frac{\partial f_n(x)}{\partial x_n} 
\end{pmatrix}.
$$

Given starting approximation $x^{(0)}$, we compute the sequence of points

$$
x^{(k+1)}=x^{(k)}-s^{(k)}, \quad k=0,1,2,\ldots,
$$

where $s^{(k)}$ is the solution of the system of linear equations

$$
J\big(f,x^{(k)}\big)\cdot s=f\big(x^{(k)}\big).
$$

Jacobians are computed using the package [`ForwardDiff.jl`](http://www.juliadiff.org/ForwardDiff.jl/perf_diff.html#derivatives). Plotting is done using the package `Plots.jl`.

In [1]:
using ForwardDiff
using Plots
plotly()
using LinearAlgebra

In [2]:
function Newton(f::Function,J::Function,x::Vector{T},ϵ::Float64=1e-10) where T
    iter=0
    s=ones(T,size(x))
    ξ=x
    while norm(s)>ϵ && iter<100
        s=J(x)\f(x)
        ξ=x-s
        iter+=1
        x=ξ
    end
    ξ,iter
end

Newton (generic function with 2 methods)

### Example 1

(Dennis and Schnabel, 1996) The solutions of the system

\begin{align*}
2(x+y)^2+(x-y)^2-8&=0\\
5x^2+(y-3)^2-9&=0
\end{align*}

are the points $T_1=(1,1)$ and $T_2\approx(-1.18,1.59)$.

In [3]:
# Vector function
f₁(x)=[2(x[1]+x[2])^2+(x[1]-x[2])^2-8,5*x[1]^2+(x[2]-3)^2-9]

f₁ (generic function with 1 method)

In [4]:
f₁((1.0,2))

2-element Array{Float64,1}:
 11.0
 -3.0

Let us plot the functions and the contours in order to approximately locate the zeros: 

In [5]:
# Number of points
m=101
X=range(-2,stop=3,length=m)
Y=range(-2,stop=3,length=m)
# First applicate
surface(X,Y,(x,y)->f₁([x,y])[1],xlabel="x",ylabel="y",colorbar=false)
# Second applicate
surface!(X,Y,(x,y)->f₁([x,y])[2],seriescolor=:blues)

In [6]:
# Locate the solutions using contour plot
contour(X,Y,(x,y)->f₁([x,y])[1],contour_labels=true)
contour!(X,Y,(x,y)->f₁([x,y])[2],contour_labels=true)

In [7]:
# Clearer picture
contour!(clims=(0,0.01),xlabel="x",ylabel="y",colorbar=:none)

We see that the approximate zeros are $T_1=(-1.2,1.5)$ i $T_2=(1,1)$. Moreover, $T_2$ is exactly equal $(1,1)$ (one iteration in the third computation). Furthermore, the method does not always converge (fourth computation).   

In [8]:
J₁(x)=ForwardDiff.jacobian(f₁,x)

J₁ (generic function with 1 method)

In [9]:
# For example
J₁([1.0,2])

2×2 Array{Float64,2}:
 10.0  14.0
 10.0  -2.0

In [10]:
Newton(f₁,J₁,[-1.0,0.0]), Newton(f₁,J₁,[0.5,1.1]), 
Newton(f₁,J₁,[1.0,1.0]), Newton(f₁,J₁,[0.0,0.0])

(([-1.183467003241957, 1.5868371427229244], 8), ([1.0, 1.0], 6), ([1.0, 1.0], 1), ([NaN, NaN], 2))

### Example 2

(Dennis and Schnabel, 1996) The solutions of the system

\begin{align*}
x_1^2-x_2^2-2&=0\\
e^{x_1-1}+x_2^3-2&=0
\end{align*}

are the points $T_1=(1,1)$ and $T_2\approx (-0.71,1.22)$.

In [11]:
f₂(x)=[x[1]^2+x[2]^2-2,exp(x[1]-1)+x[2]^3-2]
contour(X,Y,(x,y)->f₁([x,y])[1],contour_labels=true)
contour!(X,Y,(x,y)->f₁([x,y])[2],contour_labels=true)
contour!(clims=(0,0.01),xlabel="x",ylabel="y",colorbar=:none)

In [12]:
J₂(x)=ForwardDiff.jacobian(f₂,x)
Newton(f₂,J₂,[-1.0,1]), Newton(f₂,J₂,[0.8,1.2])

(([-0.7137474114864426, 1.220886822189675], 5), ([1.0, 0.9999999999999999], 5))

### Example 3

(Dennis and Schnabel, 1996) Solve $f(x)=0$, where

$$
f(x)=\begin{pmatrix}x_1 \\ x_2^2-x_2 \\ e^{x_3}-1 \end{pmatrix}.
$$

The exact solutions are $T_1=(0,0,0)$ and $T_2=(0,-1,0)$. We shall compute the solutions using several starting points.

In [13]:
f₃(x)=[x[1],x[2]^2+x[2],exp(x[3])-1]
J₃(x)=ForwardDiff.jacobian(f₃,x)

J₃ (generic function with 1 method)

In [14]:
Newton(f₃,J₃,[-1.0,1.0,0.0]),Newton(f₃,J₃,[1.0,1,1]),
Newton(f₃,J₃,[-1.0,1,-10]),Newton(f₃,J₃,[0.5,-1.5,0])

(([0.0, 0.0, 0.0], 7), ([0.0, 0.0, 7.783745890945912e-17], 7), ([0.0, 0.06666666666666665, NaN], 2), ([0.0, -1.0, 0.0], 6))

### Example 4

(Rosenbrock parabolic valley) Given is the function 

$$
f(x)=100\,(x_2-x_1)^2+(1-x_1)^2.
$$

We want to find possible extremal points, thet is, we want to solve the equation

$$
\mathop{\mathrm{grad}} f(x)=0.
$$

In [15]:
f₄(x)=100(x[2]-x[1]^2)^2+(1-x[1])^2

f₄ (generic function with 1 method)

In [16]:
# Plot the function using X and Y from Example 1
surface(X,Y,(x,y)->f₄([x,y]), seriescolor=:blues, xlabel="x", ylabel="y")

In [17]:
# This function is very demanding w.r.t. finding extremal points
g₄(x)=ForwardDiff.gradient(f₄,collect(x))
contour(X,Y,(x,y)->g₄([x,y])[1],contour_labels=true)
contour!(X,Y,(x,y)->g₄([x,y])[2],contour_labels=true)
contour!(clims=(-0.5,0.5),xlabel="x",ylabel="y",colorbar=:none)

By observing contours, we conclude that this example is numerically demanding, while it is easy to see analytically that the only zero is $T=(1,1)$.

Here the vector function is given as the gradient of the scalar function $f$, so the Jacobi matrix is computed using function `FowardDiff.hessian()` which approximates the matrix of second partial derivatives of the scalar function $f$. 

In [18]:
Newton(g₄,x->ForwardDiff.hessian(f₄,x),[-1.0,2.0])

([1.0, 1.0], 7)

### Example 5

Let

$$
f(x)=\sum_{i=1}^{11} \bigg(x_3 \cdot \exp\bigg(-\frac{(t_i-x_1)^2}{x_2}\bigg)-y_i\bigg)^2,
$$

where the pairs  $(t_i,y_i)$ are given by the table:

| $i$ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 | 11 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| $t_i$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 | 
| $y_i$ | 0.001 | .01 | .04 | .12 | .21 | .25 | .21 | .12 | .04 | .01 | .001 |

We want to solve the equation

$$
\mathop{\mathrm{grad}} f(x)=0.
$$


Unlike previous examples, where the condition number is

$$\kappa(J)=O(10)$$ 

in Examples 1, 2 and 3 and

$$\kappa(J)=O(1000)$$ 

in Example 4, here

$$\kappa(J)>O(10^6)$$ 

so the method is inaccurate and does not converge towrds the exact solution $x=(4.93,2.62,0.28)$.

In [19]:
t=collect(0:10)
y=[0.001,0.01,0.04,0.12,0.21,0.25,0.21,0.12,0.04,0.01,0.001]
f₅(x)=sum([( x[3]*exp(-((t[i]-x[1])^2/x[2]))-y[1])^2 for i=1:11])

f₅ (generic function with 1 method)

In [20]:
# Starting point is very near the solution
x₀=[4.9,2.63,0.28]
f₅(x₀)
g₅(x)=ForwardDiff.gradient(f₅,x)
J₅(x)=ForwardDiff.hessian(f₅,x)
f₅(x₀), g₅(x₀), cond(J₅(x₀))

(0.15775257532454934, [2.715533686349289e-6, 0.029985961264699698, 1.1324744072032398], 173703.69351181446)

In [21]:
x₅,iter₅=Newton(g₅,J₅,x₀,1e-8)

([6.502441314556678, 0.0024508472251587193, -1.608880054863579e-7], 100)

In [22]:
g₅(x₅)

3-element Array{Float64,1}:
  1.523560685122021e-51
  2.043007334415462e-49
 -3.073712817083802e-47

In [23]:
Newton(g₅,J₅,[4.9,2.62,0.28],1e-8)

([NaN, NaN, NaN], 13)

## Broyden's method

Given starting approximation $x_0$ and matrix $B_0$, for each $k=0,1,2,\ldots$, we compute:

\begin{align*}
B_k \cdot s_k & = -f(x_k) \quad \textrm{(system)}\\
x_{k+1}&=x_{k}+s_k\\
y_k&=f(x_{k+1})-f(x_{k})\\
B_{k+1}&=B_k+\frac{(y_k-B_ks_k)s_k^T}{s_k\cdot s_k}
\end{align*}

In this manner, we avoid the computation of the Jacobi matrix in each step. We can take
$B_0=J(x_0)$, but also some other matrix.

In [24]:
function Broyden(f::Function,B::Matrix,x::Vector{T},ϵ::Float64=1e-10) where T
    iter=0
    s=ones(T,length(x))
    ξ=x
    while norm(s)>ϵ && iter<100
        s=-(B\f(x))
        ξ=x+s
        y=f(ξ)-f(x)
        B=B+(y-B*s)*(s/(s⋅s))'
        x=ξ
        iter+=1
    end
    ξ,iter
end

Broyden (generic function with 2 methods)

In [25]:
# Example 1
Broyden(f₁,J₁([-1.0,0.0]),[-1.0,0.0]), 
Broyden(f₁,J₁([1.0,1.5]),[1.0,1.5])

(([-1.1834670032419574, 1.586837142722924], 12), ([0.9999999999999917, 0.9999999999999969], 7))

In [26]:
# Explain behaviour the method when we set B₀=I
eye(n)=Matrix{Float64}(I,n,n)
Broyden(f₁,eye(2),[-1.0,0.0]), Broyden(f₁,eye(2),[1.0,1.5]),
Broyden(f₁,eye(2),[-1,1.5])

(([-0.035639080851908034, -2.5316602095531238], 100), ([0.9160215171939703, -3.045472257505457], 100), ([-1.183467003241957, 1.5868371427229242], 14))

In [27]:
# Example 2
x0=[-1.0,1]
x1=[0.8,1.2]
Broyden(f₂,J₂([-1.0,1]),[-1.0,1]), Broyden(f₂,J₂([0.8,1.2]),[0.8,1.2])

(([-0.7137474114864439, 1.2208868221896745], 9), ([1.0000000000000002, 0.9999999999999999], 9))

In [28]:
# Example 3
Broyden(f₃,J₃([-1.0,1,0]),[-1.0,1,0]), Broyden(f₃,J₃([0.5,-1.5,0]),[0.5,-1.5,0])

(([0.0, 5.965361234900634e-26, 0.0], 9), ([0.0, -1.0, 0.0], 8))

In [29]:
# Example 4
Broyden(g₄,(x->ForwardDiff.hessian(f₄,x))([-1.0,2]),[-1.0,2]), # ali
Broyden(g₄,(x->ForwardDiff.hessian(f₄,x))([1,2.0]),[-1.0,2]),
Broyden(g₄,(x->ForwardDiff.hessian(f₄,x))([0.8,0.5]),[0.8,0.5])

(([0.8340640648114012, 0.6950420895200371], 100), ([1.0000000000000009, 1.0000000000000018], 4), ([1.0000000000000073, 1.0000000000000144], 29))

In [30]:
# Example 5
x₀=[4.9,2.6,0.2]
x₅,iter₅=Broyden(g₅,(x->ForwardDiff.hessian(f₅,x))(x₀),x₀)

([18.90031560465981, 1.3239864192858994, 0.06655170834063737], 6)

In [31]:
g₅(x₅)

3-element Array{Float64,1}:
  1.8552699559685368e-29
 -6.235898383995474e-29
 -2.0734616070093272e-29

## Davidon-Fletcher-Powell (DFP) method

DFP is an optimization method which finds extremal points of the function 
$F:\mathbb{R}^n \to \mathbb{R}$, in which case $f(x)=\mathop{\mathrm{grad}}F(x)$.

Given initial approximation $x_0$ and matrix $H_0$, for $k=0,1,2,\ldots$, we compute:

\begin{align*}
s_k&=-H_k f(x_k)\\
\beta_k&=\mathop{\mathrm{arg\ min}}_\beta F(x_{k}+\beta s_k) \\
s_k&=\beta_k s_k\\
x_{k+1}&=x_{k}+s_k \\
y_k&=f(x_{k+1})-f(x_{k})\\
H_{k+1}&=H_k+ \frac{s_k s_k^T}{y_k\cdot s_k}-\frac{H_k y_k y_k^T H_k}{y_k\cdot (H_k y_k)}.
\end{align*}

We can take $H_0=I$. Notice that the iteration step does not require solving a system linear equations. Instead,
all updates are performed using $O(n^2)$ operations.

The one-dimensional minimzation along the line $x_{k}+\beta s_k$ is preformed by finding zeros of the directional derivative using bisection.

In [32]:
function Bisection(f::Function,a::T,b::T,ϵ::Float64=1e-10) where T
    fa=f(a)
    fb=f(b)
    x=T
    fx=T
    if fa*fb>zero(T)
        # return "Incorrect interval"
        if abs(fa)>abs(fb)
            return b,fb,0
        else
            return a,fa,0
        end
    end
    iter=0
    while b-a>ϵ && iter<1000
        x=(b+a)/2.0
        fx=f(x)
        if fa*fx<zero(T)
            b=x
            fb=fx
        else
            a=x
            fa=fx
        end
        iter+=1
        # @show x,fx
    end
    x,fx,iter
end

Bisection (generic function with 2 methods)

In [33]:
function DFP(f::Function,H::Matrix,x::Vector{T},ϵ::Float64=1e-10) where T
    iter=0
    s=ones(T,length(x))
    ξ=x
    while norm(s)>ϵ && iter<50
        s=-H*f(x)
        s0=s/norm(s)
        F(ζ)=f(x+ζ*s)⋅s0
        β,fx,iterb=Bisection(F,0.0,1.0,10*eps())
        s*=β
        ξ=x+s
        y=f(ξ)-f(x)
        z=H*y
        H=H+(s/(y⋅s))*s'-(z/(y⋅z))*z'
        x=ξ
        iter+=1
    end
    ξ,iter
end

DFP (generic function with 2 methods)

### Example 6

Let us find extremal points of the function 

$$
f(x,y)=(x+2y-7)^2+(2x+y-5)^2.
$$

The function has minimum at $(1,3)$.

In [34]:
f₆(x) = (x[1] + 2*x[2]-7)^2 + (2*x[1] + x[2]-5)^2

f₆ (generic function with 1 method)

In [35]:
f₆([1,2])

5

In [36]:
g₆(x)=ForwardDiff.gradient(f₆,x)

g₆ (generic function with 1 method)

In [37]:
DFP(g₆,eye(2),[0.8,2.7],eps())

([1.0, 3.0], 4)

In [38]:
# Example 4
DFP(g₄,eye(2),[0.9,1.1])

([1.0000000000000002, 1.0000000000000004], 9)

In [39]:
# Example 5
DFP(g₅,eye(3),[4.9,2.6,0.2])

([4.896114590499003, 46.19785538124626, 0.0015891269930557736], 25)

## Broyden-Fletcher-Goldfarb-Schano (BFGS) method

BFGS is an optimization method for finding extremal points of the function
$F:\mathbb{R}^n \to \mathbb{R}$, in which case $f(x)=\mathop{\mathrm{grad}}F(x)$.

The method is similar to the DFP method, with somewhat better convergence properties.

Let $F:\mathbb{R}^n \to \mathbb{R}$, whose minmum we seek, and let
$f(x)=\mathop{\mathrm{grad}} F(x)$.

Given initial approximation $x_0$ and matrix $H_0$, for $k=0,1,2,\ldots$, we compute:

\begin{align*}
s_k&=-H_k f(x_k)\\
\beta_k&=\mathop{\mathrm{arg\ min}}F(x_{k}+\beta_k s_k) \\
s_k&=\beta_k s_k\\
x_{k+1}&=x_{k}+s_k \\
y_k&=f(x_{k+1})-f(x_{k})\\
H_{k+1}&=\bigg(I-\frac{s_k y_k^T}{y_k\cdot s_k}\bigg)H_k
\bigg( I-\frac{y_k s_k^T}{y_k\cdot s_k}\bigg)+\frac{s_k s_k^T}{y_k\cdot s_k}.
\end{align*}

We can take $H_0=I$. Notice that the iteration step does not require solving a system linear equations. Instead,
all updates are performed using $O(n^2)$ operations.

The one-dimensional minimzation along the line $x_{k}+\beta s_k$ is preformed by finding zeros of the directional derivative using bisection.

In [40]:
function BFGS(f::Function,H::Matrix,x::Vector{T},ϵ::Float64=1e-10) where T
    iter=0
    s=ones(T,length(x))
    ξ=x
    while norm(s)>ϵ && iter<50
        s=-H*f(x)
        s0=s/norm(s)
        F(ζ)=f(x+ζ*s)⋅s0
        β,fx,iterb=Bisection(F,0.0,1.0,10*eps())
        s*=β
        ξ=x+s
        y=f(ξ)-f(x)
        z=H*y
        α=y⋅s
        s1=s/α
        H=H-s1*z'-z*s1'+s1*(y⋅z)*s1'+s1*s'
        x=ξ
        iter+=1
    end
    ξ,iter
end

BFGS (generic function with 2 methods)

In [41]:
BFGS(g₆,eye(2),[0.8,2.7],eps())

([1.0, 3.0], 4)

In [42]:
# Example 4
BFGS(g₄,eye(2),[0.9,1.1])

([1.0, 1.0000000000000002], 9)

In [43]:
# Example 5
BFGS(g₅,eye(3),[4.9,2.6,0.2])

([2.0890765836488763, 31226.563051597408, 0.0010005592043970898], 50)

## Julia packages

Previous programs are simple illustrative implementations of the mentioned algorithms.
Julia package [NLsolve.jl](https://github.com/JuliaNLSolvers/NLsolve.jl) is used for solving systems of linear equations and the package [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) is used for nonlinear optimization.

In [44]:
using NLsolve

In [45]:
# Example 1
function f₁!(fvec,x)
    fvec[1] = 2(x[1]+x[2])^2+(x[1]-x[2])^2-8
    fvec[2] = 5*x[1]^2+(x[2]-3)^2-9
end

f₁! (generic function with 1 method)

In [46]:
nlsolve(f₁!,[-1.0,0])

Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 0.0]
 * Zero: [-1.1834670032425283, 1.5868371427230779]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

In [47]:
nlsolve(f₁!,[0.5,1.1])

Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.5, 1.1]
 * Zero: [1.0000000000000002, 1.0000000000000002]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6

In [48]:
# Example 2
function f₂!(fvec,x)
    fvec[1] = x[1]^2+x[2]^2-2
    fvec[2] = exp(x[1]-1)+x[2]^3-2
end
nlsolve(f₂!,[-1.0,1]), nlsolve(f₂!,[0.8,1.2])

(Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 1.0]
 * Zero: [-0.7137474114758742, 1.2208868222037403]
 * Inf-norm of residuals: 0.000000
 * Iterations: 4
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 5
 * Jacobian Calls (df/dx): 5, Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.8, 1.2]
 * Zero: [0.9999999999940328, 1.0000000000115203]
 * Inf-norm of residuals: 0.000000
 * Iterations: 4
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 5
 * Jacobian Calls (df/dx): 5)

In [49]:
# Example 3
function f₃!(fvec,x)
    fvec[1] = x[1]
    fvec[2] = x[2]^2+x[2]
    fvec[3] = exp(x[3])-1
end
nlsolve(f₃!,[-1.0,1.0,0.0]), nlsolve(f₃!,[1.0,1,1]),
nlsolve(f₃!,[-1.0,1,-10]), nlsolve(f₃!,[0.5,-1.5,0])

(Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 1.0, 0.0]
 * Zero: [0.0, 2.3283064364709337e-10, 0.0]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6, Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [1.0, 1.0, 1.0]
 * Zero: [0.0, 2.3283064364709337e-10, 1.223235687436471e-12]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6, Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 1.0, -10.0]
 * Zero: [0.0, 4.9978109751571114e-11, 8.131707812509303e-17]
 * Inf-norm of residuals: 0.000000
 * Iterations: 20
 * Convergence: 

In [50]:
using Optim



In [51]:
# Example 4
optimize(f₄,[-1.0,2],Optim.BFGS())

 * Status: success

 * Candidate solution
    Final objective value:     5.375030e-17

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 5.13e-09 ≰ 0.0e+00
    |x - x'|/|x'|          = 5.13e-09 ≰ 0.0e+00
    |f(x) - f(x')|         = 9.67e-17 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.80e+00 ≰ 0.0e+00
    |g(x)|                 = 2.10e-11 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    35
    f(x) calls:    102
    ∇f(x) calls:   102


In [52]:
# Example 5 - again there is no convergence
optimize(f₅,[4.9,2.6,0.2],Optim.BFGS())

 * Status: success

 * Candidate solution
    Final objective value:     3.572548e-12

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 1.87e+05 ≰ 0.0e+00
    |x - x'|/|x'|          = 8.70e-02 ≰ 0.0e+00
    |f(x) - f(x')|         = 6.03e-16 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.69e-04 ≰ 0.0e+00
    |g(x)|                 = 9.69e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    41
    f(x) calls:    137
    ∇f(x) calls:   137
