**1. Optimization** Here I simply generate the code to practice a little bit optimization techniques. This is fundamental for my understanding of the programming languages so I need to invest a lot of time on this this year. Either with Python or with Julia. Think it is useful to check both

**1.1 Gradient descent method** This is quite intuitive and is based simply in using the gradient and setting it close to 0. In all cases, what we want to do is find a routine or algorithm that, for a given function $F(x)$, we find $x_{k+1}= x_k + \sigma_k \delta_k$. This works well for local optima, though it has slow convergence. Here $\delta_k=F'(x_k)$, the gradient and $\lambda>0$ is set arbitrarily -it is not yet endogeneized in the algorithm. 

Let's try to first replicat the first example given in class. We want to optimize the function \(F(x,y)=(1-x)^2 + (y-x^{2})^{2}\). We can solve very easily for the analytical solution, which is simply $(x,y)=(1,1)$. Let's try to see if our method delivers this same function. 


In [15]:
using ForwardDiff

F(x)= (1-x[1])^2 + 100*(x[2]-x[1]^2)^2
J= x->ForwardDiff.gradient(F,x)
λ=0.0001

x0=[0.0,0.0]

maxiter=10_000
crit=1.0
tol=1e-8
iter=0

while crit>tol
    iter+=1
    x1= x0- λ*J(x0)
    crit=maximum(abs.(x1-x0))
    x0= copy(x1)
    if crit<=tol
    println(round.(x0; digits=6))
    end

end 



*Now let's try another function with multiple solutions. We can try and see if we can write an aglorithm that uses this method to find the two solutions. As I said before, one of the main limitations of this method is that it does not work well to detect global maximum.

In [None]:
using ForwardDiff

G(x)= -(x[1]^2 - 1)^2 - (x[2]^2 - 1)^2 ##Gaussian formula
J= x->ForwardDiff.gradient(G,x)

x0=[0.0,0.0]
λ=0.01
crit=1.0
tol=1e-8

maxiter = 10_000

if maximum(abs.(J(x0))) < 1e-12
    x0 .+= 1 .* randn(2)   # small random perturbation 
end

while crit > tol    
    iter +=1
    x1=x0+λ*J(x0)
    crit=maximum(abs.(x1-x0))
    x0=copy(x1)
    

end 

##This is an interesting problem because it has a saddle point in 0,0 and we need to perturbate the initial point. Here the issue is that is a local maxima that we keep finding and depending on my perturbation we find 4 different maximums (1,-1), (1,1)...To find them all I need to randomize the direction of the perturbation.

#So bottom line: it is important to understand the class of functions and the methods we have at hand to deal with the particular problem efficiently. 

**2. Newton method** this is a better method, where we endogeneize lambda and omega.

In [None]:
using ForwardDiff
F(x)= (1-x[1])^2 + 100*(x[2]-x[1]^2)^2
J= x->ForwardDiff.gradient(F,x)
H= x->ForwardDiff.hessian(F,x)

x0=[0.0,0.0]
crit=1
tol= 1e-8

while crit>tol
    x1 = x0 - H(x0) \ J(x0) ##funcó interessant left hand division!! more efficient!
    crit=maximum(abs.(J(x0)))
    x0=copy(x1)
end

#This is faster than before but it needs to have a twice differentiable eq etc... not so easy....Also, it is still a method that uses only local approximations. 

In [1]:
##Finalment, podem aplicar directament optimization package a Julia. 

using Optim
F = x->(1-x[1])^2+100*(x[2]-x[1]*x[1])^2 
x0 = [0.0,0.0]
result = optimize(F, x0, BFGS(),
                   Optim.Options(g_tol = 1e-12);
                   autodiff=:forward)

 * Status: success

 * Candidate solution
    Final objective value:     1.429810e-30

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 1.31e-10 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.31e-10 ≰ 0.0e+00
    |f(x) - f(x')|         = 7.65e-21 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 5.35e+09 ≰ 0.0e+00
    |g(x)|                 = 4.35e-14 ≤ 1.0e-12

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    17
    f(x) calls:    55
    ∇f(x) calls:   55
