## Chapter 27: Optimization

This Chapter covers how to find the maximum or minimum of a function. We'll start with simple functions of 1 variable and move to functions of 2 variables. 

In [None]:
using Plots, Revise

We'll start with a simple parabola:

In [None]:
f(x) = (x-1)^2

In [None]:
plot(f,-0.25,2.25,legend=false)

In [None]:
includet("../julia-files/Rootfinding.jl")
using .Rootfinding, ForwardDiff, LinearAlgebra

To find the minimum, we'll seach for where the derivative is 0.  Instead of taking the derivative, though, we'll use the automatic differentiation in the `ForwardDiff` package.

In [None]:
newton(x->ForwardDiff.derivative(f,x),0)

Well that was fast.  The reason it was, however was that the derivative was a linear function. 

In [None]:
f2(x) = sin(x)^2

In [None]:
plot(f2,-1,8)

From the plot above, there are a lot of local minima.  We'll try to find the one new 3.5

In [None]:
newton(x->ForwardDiff.derivative(f2,x),3.5)

#### Optimizing functions of more that one variable

Let's examine a function of two variables.  The following is a circular paraboloid, which you can think of as a parabola that is 

In [None]:
g(x::Vector) = x[1]^2+x[2]^2

In [None]:
xrange = LinRange(-2,2,101)
yrange = LinRange(-2,2,101)
surface(xrange,yrange,(x,y)->g([x,y]))

The following is a contour plot of the same:

In [None]:
xrange = LinRange(-2,2,251)
yrange = LinRange(-2,2,251)
contour(xrange,yrange,(x,y)->g([x,y]), aspect_ratio = :equal, fill=true)

In [None]:
ForwardDiff.gradient(g,[0.5,1])

We're going to look at the steepest descent method which take a point somewhere and moving opposite the gradient, which is the direction of steepest descent.  

In [None]:
function gradientDescent(f::Function,x₀::Vector; γ = 0.25, max_steps = 100)
  local steps = 0
  local ∇f₀ = [1,1] # initialize it to get into while loop
  while norm(∇f₀)> 1e-8 && steps < max_steps
    ∇f₀ = ForwardDiff.gradient(f,x₀)
    x₀ -= γ*∇f₀
    steps += 1
  end
  steps < max_steps || throw(ErrorException("The number of steps has exceeded $max_steps"))
  @show steps
  x₀
end

In [None]:
gradientDescent(g,[0.5,1])

Let's look at another function (this is famous in optimization circles):

In [None]:
rose(x::Vector) = (1-x[1])^2+50*(x[2]-x[1]^2)^2

In [None]:
xrange = LinRange(-1,3,251)
yrange = LinRange(-1,4,251)
surface(xrange,yrange,(x,y)->rose([x,y]))

In [None]:
contour(xrange,yrange,(x,y)->rose([x,y]),levels=[1,2,3,10,50,100,200,500,1000,10_000], fill=true)

In [None]:
ForwardDiff.gradient(rose,[-0.5,0.5])

If we run the gradient descent method on the rose function:

In [None]:
gradientDescent(rose,[-0.5,0.5])

What's going on with this?

In [None]:
x0 = [-0.5,0.5]
∇f0 = ForwardDiff.gradient(rose,x0)

In [None]:
x1 = x0 - 0.25∇f0

In [None]:
∇f1 = ForwardDiff.gradient(rose,x1)

Let's play with the $\gamma$ parameter.

From the example above, we eventually found the solution, but had to fiddle with the $\gamma$ parameter.  The following uses an adaptive value of
$$\gamma = \frac{|(\vec{x}_1-\vec{x}_0)\cdot (\nabla f(\vec{x}_1) - \nabla f(\vec{x}_0))|}{||\nabla f(\vec{x}_1) - \nabla f(\vec{x}_0))||^2}$$

In [None]:
function gradientDescentBB(f::Function,x₀::Vector; max_steps = 100)
  local steps = 0
  local ∇f₀ = ForwardDiff.gradient(f,x₀)
  local x₁ = x₀ - 0.25 * ∇f₀ # need to start with a value for x₁
  while norm(∇f₀)> 1e-4 && steps < max_steps
    ∇f₁ = ForwardDiff.gradient(f,x₁)
    Δ∇f = ∇f₁-∇f₀
    x₂ = x₁ - abs(dot(x₁-x₀,Δ∇f))/norm(Δ∇f)^2*∇f₁
    x₀ = x₁
    x₁ = x₂
    ∇f₀ = ∇f₁
    steps += 1
  end
  @show steps
  steps < max_steps || throw(ErrorException("The number of steps has exceeded $max_steps"))
  x₁
end

In [None]:
gradientDescentBB(rose,[-0.5,0.5])

#### Exercise
- Produce a contour plot of the function $f(x,y) = \sin(0.5x^2-0.25y^2+2)*\cos(x+y)$ on the domain $[0,\pi]\times[0,\pi]$
- See if you can find the minimum of $f(x,y)$ using gradient descent. 
- See if you can find the minimum of $f(x,y)$ using the  Barzilai–Borwein gradient descent method.
- Find the maximum of $f(x,y)$ by minimizing $-f(x,y)$.

In [None]:
### Using the JuMP package

In [None]:
using JuMP, Ipopt

In [None]:
model = Model(Ipopt.Optimizer)
set_optimizer_attribute(model,"print_level",5) # this can be level 1 through 12.  1 minimal.
@variable(model, x, start = 0.0)
@variable(model, y, start = 0.0)

@NLobjective(model, Min, (1 - x)^2 + 100 * (y - x^2)^2)

optimize!(model)
@show value(x),value(y)

### Exercise
Use the JuMP package to find both the minimum and maximum of the above function $f(x,y)$.

### Minimizing a function of more that 3 variables

This is a bit difficult to visualize in many cases.  There are 3D contour plots, but generally are hard to read.  We'll use some of the above techniques to find the minimum of 
$$ h(x,y,z) = \sin(x+y^2-\pi z)\cos(2x+3z^3)$$

In [None]:
h(x) = sin(x[1]-x[2]^2-pi*x[3])*cos(2*x[1]+3*x[3]^3)

In [None]:
min_h = gradientDescentBB(h,[1,1,1])

In [None]:
h(min_h)

In [None]:
model = Model(Ipopt.Optimizer)
@variable(model, x, start = 1.0)
@variable(model, y, start = 1.0)
@variable(model, z, start=1.0)

@NLobjective(model, Min, sin(x-y^2-pi*z)*cos(2*x+3*z^3))

optimize!(model)
@show value(x),value(y),value(z)