# Nonlinear systems

## Newton method

Optimisation:
$$
x_{k+1} = x_k-\nabla^2 f(x_k)^{-1} \nabla f(x_k)
$$
or
$$
\nabla^2 f(x_k) x_{k+1} = \nabla^2 f(x_k) x_k- \nabla f(x_k)
$$
It is equivalent to search $x_k$ such that $\nabla f(x_k) = 0$.

The Newton method can be also applied to more general nonlinear system.

In [1]:
using LinearAlgebra

In [2]:
function NLSNewton(g::Function, h:: Function,
        xstart::Vector, δ::Float64 = 1e-4, nmax::Int64 = 1000)

    k = 1
    x = xstart
    n = length(x)
    δ2 = δ*δ
    H = zeros(n,n)
    dfx = ones(n)
    
    g(x, dfx)

    while (dot(dfx,dfx) > δ2 && k <= nmax)
        k += 1
        g(x,dfx)
        h(x,H)
        # println(x, dfx, H)
        # Hs = dfx, x_{k+1} = x_k - s
        x -= H\dfx  # x = x - s
    end
    
    return x
end

NLSNewton (generic function with 3 methods)

Consider for instance the nonlinear system
\begin{align*}
    4x+2y+2xz-10 &= 0 \\
    2x+y+yz-6 &= 0 \\
    x^2 + y^2 - 5 &= 0
\end{align*}
We first implement the correponding mapping.

In [3]:
function g(x::Vector, d::Vector)
    d[1] = 4*x[1]+2*x[2]+2*x[1]*x[3]-10
    d[2] = 2*x[1]+x[2]+x[2]*x[3]-6
    d[3] = x[1]*x[1]+x[2]*x[2]-5    
end

g (generic function with 1 method)

We can easily check that (1,2,1) satisfies this system.

In [4]:
d = zeros(3)
x = [ 1; 2; 1]
h = g(x, d)

0

Jacobian
$$
\begin{pmatrix}
    4+2z & 2 & 2x \\
    2 & 1+2z & y \\
    2x & 2y & 0
\end{pmatrix}
$$

In [5]:
function J(x::Vector, H::Matrix)
    H[1,1] = 4+2*x[3]
    H[2,2] = 2+2*x[3]
    H[3,3] = 0.0
    H[1,2] = H[2,1] = 2
    H[1,3] = H[3,1] = 2*x[1]
    H[2,3] = x[2]
    H[3,2] = 2*x[2]
end

J (generic function with 1 method)

In [6]:
NLSNewton(g, J, x)

3-element Array{Int64,1}:
 1
 2
 1

In [7]:
x0 = [1,1,1.0]

3-element Array{Float64,1}:
 1.0
 1.0
 1.0

In [8]:
NLSNewton(g, J, x0)

3-element Array{Float64,1}:
 0.9999968216652463
 2.000001589214705 
 1.000007945411086 

The Newton method to solve a nonlinear however suffers from the same issues than in the optimization context. Its convergence is guaranteed only if the starting point is in a neighborhood of the solution. The recurence can even be not well defined.

In [9]:
NLSNewton(g, J, [0, 0 ,0.0])

SingularException: SingularException(3)

## Example 2

Solve the problem
\begin{align*}
x^3 + y &= 1 \\
y^3 − x &= −1.
\end{align*}

The mapping of the system can be defined by

In [10]:
function g2(x::Vector, d::Vector)
    d[1] = x[1]^3+x[2]-1
    d[2] = x[2]^3-x[1]+1
    return d
end

g2 (generic function with 1 method)

In [11]:
d = [1.0, 1.0]
x = [1.0, 0]

2-element Array{Float64,1}:
 1.0
 0.0

In [12]:
g2(x, d)
d

2-element Array{Float64,1}:
 0.0
 0.0

In [13]:
function J2(x::Vector, H::Matrix)
    H[1,1] = 3*x[1]^2
    H[1,2] = 1
    H[2,1] = -1
    H[2,2] = 3*x[2]^2
    return H
end

J2 (generic function with 1 method)

In [14]:
H = [0 0; 0 0]
J2(x,H)

2×2 Array{Int64,2}:
  3  1
 -1  0

In [15]:
x = [1.0, 1.0]
NLSNewton(g2, J2, x)

2-element Array{Float64,1}:
 0.9999999999999971    
 1.8263043495851585e-12

Perform the first iteration by hand. We have to solve the system
$$
\begin{pmatrix}
3*1^2 & 1 \\ -1 & 3*1^2
\end{pmatrix}
\begin{pmatrix}
d_1 \\ d_2
\end{pmatrix}
=
\begin{pmatrix}
1^1+1-1 \\ 1^3-1+1
\end{pmatrix}
$$
or
$$
\begin{pmatrix}
3 & 1 \\ -1 & 3
\end{pmatrix}
\begin{pmatrix}
d_1 \\ d_2
\end{pmatrix}
=
\begin{pmatrix}
1 \\ 1
\end{pmatrix}
$$
Equivalently, we have to solve
\begin{align*}
3d_1 + d_2 &= 1 \\
-d_1 + 3d_2 &= 1 \\
\end{align*}
We deduce from it that
$$
10 d_2 = 4
$$
or
$$
d_2 = 0.4
$$
and therefore
$$
d_1 = 0.2
$$
We can verify this result by numerically solving the corresponding linear system

In [16]:
A = [3 1 ; -1 3]
b = [1 ; 1]
A\b

2-element Array{Float64,1}:
 0.20000000000000004
 0.39999999999999997

Therefore, the Newton iteration gives
$$
\begin{pmatrix}
x_{k+1} \\ y_{k+1}
\end{pmatrix}
=
\begin{pmatrix}
1 \\ 1
\end{pmatrix}
-
\begin{pmatrix}
0.2 \\ 0.4
\end{pmatrix}
=
\begin{pmatrix}
0.8 \\ 0.6
\end{pmatrix}
$$

## Example 3

\begin{align*}
a+b+c &= 6 \\
a^2+b^2+c^2 &= 14 \\
a^3+b^3+c^3 &= 36
\end{align*}

In [17]:
function g3(x::Vector, d::Vector)
    for i = 1:3
        d[i] = x[1]^i+x[2]^i+x[3]^i
    end
    d[1] -= 6
    d[2] -= 14
    d[3] -= 36
    return d
end

g3 (generic function with 1 method)

In [18]:
function J3(x::Vector, H::Matrix)
    for i = 1:3
        for j = 1:3
            H[i,j] = i*x[j]^(i-1)
        end
    end
    return H
end

J3 (generic function with 1 method)

In [19]:
x = [1.0, 1.0, 1.0]
NLSNewton(g3, J3, x)

SingularException: SingularException(2)

In [20]:
x = [2.0, 1.0, 0.0]
NLSNewton(g3, J3, x)

3-element Array{Float64,1}:
 2.9999999999999996
 0.9999999999999994
 2.0000000000000013

## Application to KKT conditions

Consider the linearly constrained problem
\begin{align*}
\min\ & f(x) \\
\mbox{t.q. } & Ax = b
\end{align*}
where $A \in \mathbb{R}^{m \times n}$.

The Lagrangian of this problem is
$$
L(x,\mu) = f(x) + \sum_{i = 1}^m \mu_i(a_i^Tx-b_i)
$$
where $a_i$ designs the $i^{th}$ row of the matrix and $b_i$ the $i^{th}$ element of $b$.

The KKT conditions of this problem are
\begin{align*}
\nabla f(x) + A^T\mu &= 0 \\
Ax - b &= 0
\end{align*}
This is a nonlinear system that can be solved using the Newton method.

Consider for instance the problem
$$
\min f(x,y) = -10x^2+10y^2+4\sin(xy)-2x+x^4
$$

In [109]:
f = x -> -10*x[1]^2+10*x[2]^2+4*sin(x[1]*x[2])-2*x[1]+x[1]^4
function ∇f(x:: Vector, g:: Vector)
    g[1] = -20*x[1]+4*x[2]*cos(x[1]*x[2])-2+4*x[1]^3
    g[2] = 20*x[2]+4*x[1]*cos(x[1]*x[2])
    return g
end
function Hess(x:: Vector, H:: Matrix)
    H[1,1] = -20-4*x[2]^2*sin(x[1]*x[2])+12*x[1]^2
    H[2,1] = H[1,2] = 4*cos(x[1]*x[2])-4*x[1]*x[2]*sin(x[1]*x[2])
    H[2,2] = 20-4*x[1]^2*sin(x[1]*x[2])
    return H
end

Hess (generic function with 1 method)

In [129]:
using ForwardDiff

gr = x -> ForwardDiff.gradient(f, x);
He = x -> ForwardDiff.hessian(f, x)

function gr!(x::Vector, storage::Vector)
    s = gr(x)
    storage[1:length(s)] = s[1:length(s)]
end

function He!(x::Vector, storage::Matrix)
    s = He(x)
    n, m = size(s)
    storage[1:n,1:m] = s[1:n,1:m]
end

He! (generic function with 1 method)

In [130]:
f(x)

-30.26405504009334

In [131]:
x = [2.0; -3.0]

2-element Array{Float64,1}:
  2.0
 -3.0

In [132]:
grad = zeros(2)
∇f(x,grad)

2-element Array{Float64,1}:
 -21.522043439804392
 -52.31863770679707 

In [133]:
gr!(x,grad)

2-element Array{Float64,1}:
 -21.522043439804392
 -52.31863770679707 

In [134]:
hess = zeros(2,2)
Hess(x,hess)

2×2 Array{Float64,2}:
 17.941   10.5467
 10.5467  15.5294

In [135]:
He!(x,hess)
hess

2×2 Array{Float64,2}:
 17.941   10.5467
 10.5467  15.5294

In [136]:
sol = NLSNewton(∇f, Hess, x)

2-element Array{Float64,1}:
 -0.09632573825923413 
  0.019265114479965564

In [137]:
sol = NLSNewton(gr!, He!, x)

2-element Array{Float64,1}:
 -0.09632573825923413 
  0.019265114479965564

In [138]:
∇f(sol, grad)

2-element Array{Float64,1}:
 -3.5158508049359938e-15
  5.551115123125783e-17 

In [139]:
Hess(sol, hess)

2×2 Array{Float64,2}:
 -19.8887    3.99998
   3.99998  20.0001 

We have identifies a saddle point!

Let's start with another starting point. 

In [140]:
sol = [-2.21022, 0.329748]
∇f(sol, grad)

2-element Array{Float64,1}:
 -1.91503605222465e-5  
 -1.5587288675789068e-5

In [141]:
x = [-10; -10.0]
sol = NLSNewton(gr!, He!, x)

2-element Array{Float64,1}:
 -2.210219520077777 
  0.3297484569954491

In [142]:
sol = NLSNewton(∇f, Hess, x)

2-element Array{Float64,1}:
  2.3066301277034658 
 -0.33230864873179117

In [143]:
sol = [2.30663, -0.332309]
∇f(sol, grad)

2-element Array{Float64,1}:
 -5.903555937436522e-6 
 -1.2307122703170137e-5

In [144]:
x = [2.5, -0.3]
sol = NLSNewton(gr!, He!, x)

2-element Array{Float64,1}:
  2.306630127703466 
 -0.3323086487317913

In [126]:
∇f(sol, grad)

2-element Array{Float64,1}:
  2.842170943040401e-14
 -5.329070518200751e-15

Consider now the constrained program
\begin{align*}
\min\ & f(x,y) = -10x^2+10y^2+4\sin(xy)-2x+x^4 \\
\text{s.t. } & 0.1x+y=1
\end{align*}
The Lagrangien is now
$$
L(x,y,\mu) = -10x^2+10y^2+4\sin(xy)-2x+x^4 + \mu(0.1x+y-1)
$$
and the KKT conditions are
\begin{align*}
-20x+4y\cos(xy)-2+4x^3 + 0.1\mu &= 0 \\
20y+4x\cos(xy) + \mu &= 0 \\
0.1x+y-1 &= 0
\end{align*}

In order to solve this system, develop the Jacobian matrix
$$
J(x,y,\mu) =
\begin{pmatrix}
-20 - 4y\sin(xy)+12x^2 & 4\cos(xy)-4xy\cos(xy) & 0.1 \\
4\cos(xy)-4xy\cos(xy) & 20 - 4x^2\sin(xy) & 1 \\
0.1 & 1 & 0
\end{pmatrix}
$$

Let's implement these operators.

In [146]:
function L!(x, L)
    L[1] = -20*x[1]+4*x[2]*cos(x[1]*x[2])-2+4*x[1]^3+0.1*x[3]
    L[2] = 20*x[2]+4*x[1]*cos(x[1]*x[2])+x[3]
    L[3] = 0.1*x[1]+x[2]-1
    return L
end

function J!(x, J)
    J[1,1] = -20-4*x[2]^2*sin(x[1]*x[2])+12*x[1]^2
    J[2,1] = J[1,2] = 4*cos(x[1]*x[2])-4*x[1]*x[2]*sin(x[1]*x[2])
    J[2,2] = 20-4*x[1]^2*sin(x[1]*x[2])
    J[3,3] = 0
    J[1,3] = J[3,1] = 0.1
    J[2,3] = J[3,2] = 1
    return J
end

J! (generic function with 1 method)

In [147]:
x = [2.5, -0.3, 1.0]
sol = NLSNewton(L!, J!, x)

3-element Array{Float64,1}:
   2.3298897130224083
   0.7670110286977592
 -13.340493481573121 

In [148]:
x = [-2.5, -2.5, 1.0]
sol = NLSNewton(L!, J!, x)

3-element Array{Float64,1}:
  -1.9843163000402988
   1.19843163000403  
 -29.702536010170014 

We have found two solutions to the KKT problem.

Note that in this example, we could also have used the linear constraint to eliminate one variable in the objective function, and only have to solve a one-dimensional program.

Two difficulties remain.

Firstly, we have to globalize the method, i.e. ensures that it converges from any starting point.

Secondly, in order to deal with inequality constraints, we have to be able to determine the active set at the solution.