# Méthodes de recherche linéaire

Considérons $f \in C^2$.  Une méthode de descente consiste à calculer itérativement
$$
x_{k+1} = x_k + \alpha^* d_k
$$
où $\alpha^*$ minimise approximativement $f(x_k - \alpha d_k)$.

In [1]:
using Optim
using Plots

Plusieurs techniques de recherche linéaire sont proposées en Julia, comme expliqué à la page https://github.com/JuliaNLSolvers/LineSearches.jl

Considérons à nouveau l'exemple de Rosenbrock, dont l'expression mathématique est

$$
f(x,y) = (1-x)^2 + 100(y-x^2)^2
$$

Son gradient peut être calculé comme

$$
\nabla f(x,y) =
\begin{pmatrix}
-2(1-x)-400x(y-x^2) \\
200(y-x^2)
\end{pmatrix}
$$

$$
\nabla^2 f(x,y) =
\begin{pmatrix}
2 - 400(y-x^2) + 800x^2 & -400x \\
-400x & 200
\end{pmatrix}
=
\begin{pmatrix}
2 - 400y + 1200x^2 & -400x \\
-400x & 200
\end{pmatrix}
$$

Le minimiseur est situé en $(1,1)$. En effet,
$$
\nabla f(1,1) = \begin{pmatrix} 0 \\ 0 \end{pmatrix}
$$
et
$$
\nabla^2 f(1,1) =
\begin{pmatrix}
802 & -400 \\ -400 & 200
\end{pmatrix}
$$
Les déterminants des mineurs principaux sont positifs comme ils valent respectivement 802 et $802\times200-400^2= 400$, aussi la matrice hessienne est définie positive.

In [2]:
# Rosenbrock function
# Source: https://bitbucket.org/lurk3r/optim.jl

function rosenbrock(x::Vector)
    return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end

function rosenbrock_gradient!(storage::Vector, x::Vector)
    storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
    storage[2] = 200.0 * (x[2] - x[1]^2)
end

function rosenbrock_hessian!(storage::Matrix, x::Vector)
    storage[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2
    storage[1, 2] = -400.0 * x[1]
    storage[2, 1] = -400.0 * x[1]
    storage[2, 2] = 200.0
end

rosenbrock_hessian! (generic function with 1 method)

In [3]:
using Plots

default(size=(600,600), fc=:heat)
x, y = 0:0.02:1.0, 0:0.02:1.0
z = Surface((x,y)->rosenbrock([x,y]), x, y)
surface(x,y,z, linealpha = 0.3)

In [4]:
Plots.contour(x,y,z, linealpha = 0.1, levels=1600)

We can optimize it using the optimize function present in the package optim.jl

In [None]:
res = optimize(rosenbrock, rosenbrock_gradient!,
               [20.0, 20.0],
               Optim.GradientDescent(),
               Optim.Options(g_tol = 1e-12,
                             store_trace = true,
                             show_trace = true))

Iter     Function value   Gradient norm 
     0     1.444036e+07     3.040038e+06
 * time: 0.004014015197753906
     1     2.680958e+06     8.881503e+05
 * time: 0.5950589179992676
     2     6.986401e+02     4.430920e+03
 * time: 0.5972509384155273
     3     1.422712e+01     2.423975e+02
 * time: 0.5991179943084717
     4     1.241984e+01     7.750435e-01
 * time: 0.600794792175293
     5     1.241547e+01     1.200614e+01
 * time: 0.602463960647583
     6     1.241110e+01     8.544267e-01
 * time: 0.6040987968444824
     7     1.241103e+01     8.557174e-01
 * time: 0.6064949035644531
     8     1.241096e+01     8.544292e-01
 * time: 0.6080398559570312
     9     1.241089e+01     8.557152e-01
 * time: 0.6092000007629395
    10     1.241081e+01     8.544285e-01
 * time: 0.6119699478149414
    11     1.241074e+01     8.557130e-01
 * time: 0.6123499870300293
    12     1.241067e+01     8.544278e-01
 * time: 0.6125988960266113
    13     1.241060e+01     8.557108e-01
 * time: 0.6128399372

   119     1.240293e+01     8.555956e-01
 * time: 0.6556689739227295
   120     1.240285e+01     8.543922e-01
 * time: 0.6558599472045898
   121     1.240278e+01     8.555934e-01
 * time: 0.6560509204864502
   122     1.240271e+01     8.543915e-01
 * time: 0.6562387943267822
   123     1.240264e+01     8.555912e-01
 * time: 0.6564288139343262
   124     1.240256e+01     8.543909e-01
 * time: 0.6566169261932373
   125     1.240249e+01     8.555891e-01
 * time: 0.6568069458007812
   126     1.240242e+01     8.543902e-01
 * time: 0.6570010185241699
   127     1.240235e+01     8.555869e-01
 * time: 0.6571907997131348
   128     1.240227e+01     8.543896e-01
 * time: 0.6573808193206787
   129     1.240220e+01     8.555848e-01
 * time: 0.657585859298706
   130     1.240213e+01     8.543889e-01
 * time: 0.65777587890625
   131     1.240206e+01     8.555826e-01
 * time: 0.6579668521881104
   132     1.240198e+01     8.543882e-01
 * time: 0.6581549644470215
   133     1.240191e+01     8.555805e

In [None]:
res = optimize(rosenbrock, rosenbrock_gradient!,
               [20.0, 20.0],
               Optim.BFGS(),
               Optim.Options(g_tol = 1e-12,
                             store_trace = true,
                             show_trace = true))

In [None]:
using BenchmarkTools

@btime res = optimize(rosenbrock, rosenbrock_gradient!,
                   [0.0, 0.0], Optim.BFGS(),
               Optim.Options(g_tol = 1e-12,
                             store_trace = true,
                             show_trace = false))

In [None]:
iter = Optim.trace(res)

## Differentiation in Julia

Computing gradient and Hessian of complicated, and even sometimes simple, functions can be tedious. In order to alleviate this burden, it is possible to use numerical derivates or automatic differentiation.

### Numerical derivatives

Numerical derivatives function are provided in the package Calculus, as illustrated below.

In [None]:
using Calculus, LinearAlgebra
rg = Calculus.gradient(rosenbrock)

Let's evaluate the newly constructed gradient function at the solution [1,1].

In [None]:
gsol = rg([1,1])

We are close to zero, but there are approximation errors, that can prevent the convergence to the right solution, or at least impact the solution accuracy, as

In [None]:
norm(gsol)

In [None]:
storage = [0.0,0.0]
function rg!(storage::Vector, x::Vector)
    s = rg(x)
    storage[1:length(s)] = s[1:length(s)]
end

In [None]:
storage

In [None]:
@btime res = optimize(rosenbrock, rg!,
               [0.0, 0.0],
               Optim.BFGS(),
               Optim.Options(g_tol = 1e-12,
                             store_trace = true,
                             show_trace = false))

### Automatic differentiation

In [None]:
using Pkg
Pkg.add("ForwardDiff")

In [None]:
using ForwardDiff

g = x -> ForwardDiff.gradient(rosenbrock, x);
H = x -> ForwardDiff.hessian(rosenbrock, x)

function g!(storage::Vector, x::Vector)
    s = g(x)
    storage[1:length(s)] = s[1:length(s)]
end

In [None]:
g([1.0,1.0])

In [None]:
res = optimize(rosenbrock, g!,
               [0.0, 0.0],
               Optim.BFGS(),
               Optim.Options(g_tol = 1e-12,
                             store_trace = true,
                             show_trace = false))

In [None]:
@btime res = optimize(rosenbrock, g!,
               [0.0, 0.0],
               Optim.BFGS(),
               Optim.Options(g_tol = 1e-12,
                             store_trace = true,
                             show_trace = false))

## Newton method

$$
x_{k+1} = x_k-\nabla^2 f(x_k)^{-1} \nabla f(x_k)
$$
or
$$
\nabla^2 f(x_k) x_{k+1} = \nabla^2 f(x_k) x_k- \nabla f(x_k)
$$


In [None]:
function Newton(f::Function, g::Function, h:: Function,
        xstart::Vector, verbose::Bool = false,
        δ::Float64 = 1e-6, nmax::Int64 = 1000)

    k = 1
    x = xstart
    n = length(x)
    δ2 = δ*δ
    H = zeros(n,n)+I
    dfx = ones(n)
    
    if (verbose)
        fx = f(x)
        println("$k. x = $x, f(x) = $fx")
    end

    g(dfx, x)

    while (dot(dfx,dfx) > δ2 && k <= nmax)
        k += 1
        g(dfx,x)
        h(H,x)
        # Hs = dfx, x_{k+1} = x_k - s
        x -= H\dfx  # x = x - s
        if (verbose)
            fx = f(x)
            println("$k. x = $x, f(x) = $fx ")
        end
    end
end

In [None]:
Newton(rosenbrock, rosenbrock_gradient!, rosenbrock_hessian!, [-100.0,100.0], true)

## Implementation of a linesearch algorithm

A very basic linesearch implementation skeleton follows.

In [None]:
function ls(f::Function, g::Function, h:: Function,
        x0::Vector,
        direction::Function, steplength::Function,
        δ::Float64 = 1e-8, nmax::Int64 = 1000)

    k = 0
    x = x0
    δ2 = δ*δ
    n = length(x)

    dfx = ones(n)

    g(dfx, x)

#    println("$k. $x")

    while (dot(dfx,dfx) > δ2 && k <= nmax)
        # Compute the search direction
        d, dfx = direction(f,g,h,x)
        # Compute the step length along d
        α = steplength(f,dfx,x,d)
        # Update the iterate
        x += α*d
        k += 1
#        println("$k. $x")
    end
end

In [None]:
constantStep(f::Function, dfx:: Vector, x:: Vector, d::Vector) = 1

In [None]:
function direction(f::Function, g:: Function, h:: Function, x::Vector)
    n = length(x)
    df = ones(n)
    H = zeros(n,n)+I
    g(df,x)
    h(H,x)
    return -H\df, df
end

In [None]:
ls(rosenbrock, rosenbrock_gradient!, rosenbrock_hessian!,
    [0.0,0.0], direction, constantStep)

In [None]:
function ArmijoStep(f::Function, dfx::Vector, x::Vector, d:: Vector,
    αmax:: Float64 = 1.0, β:: Float64 =0.1, κ:: Float64 =0.2)
    
    s = β*dot(dfx,d)
    α = αmax
    
    fx = f(x)
    fxcand = f(x+α*d)
    
    while (fxcand >= fx+α*s)
        α *= κ
        fxcand = f(x+α*d)        
    end
    
    return α
end

In [None]:
@btime ls(rosenbrock, rosenbrock_gradient!, rosenbrock_hessian!,
          [0.0,0.0], direction, ArmijoStep)

In [None]:
function direction3(f::Function, g:: Function, h:: Function, x::Vector)
    n = length(x)
    df = ones(n)
    H = zeros(n,n)+I
    g(df,x)
    h(H,x)
    H[1,2] = H[2,1]= 0.0
    return -H\df, df
end

In [None]:
ls(rosenbrock, rosenbrock_gradient!, rosenbrock_hessian!,
    [0.0,0.0], direction3, ArmijoStep)