# 4-11 Gradient-Based Optimization, Part 1
* Gradients in Calculus.jl
* Steepest Descent Method

In [None]:
using Revealables
include("files/answers.jl")

##Using the `Calculus` Package
The language for finding gradients and Hessians in the Calculus package for Julia is a little complex. Try the following:

In [None]:
using Calculus
f(x, y) = x^2 + 2x*y
g = Calculus.gradient(x -> f(x[1], x[2]))

This creates a function `g` (you could name it whatever you like) that finds the gradient of `f` at a point. 
You can evaluate the gradient using:

In [None]:
g([2, 1])

This should give you `[6.0; 4.0]`.

Verify that this is, in fact, the gradient of $x^2 + 2xy$ at $(2, 1)$.

You can also evaluate the Hessian using similar language.

In [None]:
f(x, y) = x^2 + 2x*y
h = hessian(x -> f(x[1], x[2]))

In [None]:
h([2.0, 1.0])  # This should give the matrix

In [None]:
eig(h([2.0, 1.0]))  # Should give -1.236, 3.236 -- a saddle point, apparently!

##Tips When Using `Calculus`
* Remember to start with `using Calculus` every time you open a new interface with Julia.
* Note that `x[1]`, `x[2]` and `x[3]` must be used within `gradient` and `hessian` instead of `x`, `y`, `z`. This indexes the elements in the array x.
* When in doubt, type in decimals. This creates a `float64` array instead an integer (`int64`) array, which will prevent most type-based errors.

###Practice Problem A
Use the Calculus package to evaluate the gradient and the eigenvalues of the Hessian of:
1. $5x^2 – 3xy^2 + 2y$ at $(1, -5)$
2. $3xy + 2x^2z – y^2z$ at $(3, 1, 0)$


In [None]:
# Code here

In [None]:
revealable(ans411A)

##Minimizing With Steepest Descent
The Steepest Descent Method for minimization begins with the idea that the gradient is a vector pointing in the direction of fastest increase for the function. The steepest descent method uses the <font color="red">negative</font> value of the gradient to find the ideal direction of decrease. 

After that, it works much like the Cyclic Coordinate Search technique, where the function is minimized in that direction and the procedure repeated.

##First Steps
Suppose you are trying to minimize $f(x, y) = x^2 – 4x + 5y^2 – 3y$,
from initial point $(5, 6)$.

First, you would find the gradient at $(5, 6)$. This comes out to `[6  57]`, which gives us the direction of <font color="blue">ascent</font>. Since we are trying to <font color="red">minimize</font>, we will use the opposite vector: $<-6, -57>$.

The next step is to use the vector translation formula: 
	$$new~point = old~point + scalar · vector$$
to get coordinates for a new point:
	$$\begin{align}
    new~point &= (5, 6) + a · <-6, -57> \\
	new~point &= (5 – 6a, 6 – 57a)\end{align}$$
    
Then we can plug this new point back into the function: $f(\color{green}{x}, \color{purple}{y}) = \color{green}{x}^2 – 4\color{green}{x} + 5\color{purple}{y}^2 – 3\color{purple}{y}$ becomes 
$$f(a) = \color{green}{(5 – 6a)}^2 – 4\color{green}{(5 – 6a)} + 5\color{purple}{(6 – 57a)}^2 – 3\color{purple}{(6 – 57a)}$$



###Practice Problem B
Let $f(x, y) = x^2 + 2y^2 – y$. 
1. Find the gradient at the initial point $(10, 12)$.
2. Use the negative gradient to transform $f(x, y)$ to a function $f(a)$ using a for the unknown scalar multiplier. Do not simplify.

In [None]:
# Calculate here

In [None]:
revealable(ans411B)

##The Next Steps
After transforming $f(x, y)$ into $f(a)$, the next step is to minimize $f(a)$.

You can then substitute the minimized a value into the equations for $x$ and $y$ to find the actual coordinates of the new point. 

Essentially, what this procedure does is find the vector of steepest descent, then create the cross-section along that vector, then minimize the function along the cross-section.

###Practice Problem C
1. Using any minimization program from Unit 2, minimize your $f(a)$ from Problem A. 
2. Use the resulting value of $a$ to find the new point.

In [None]:
# Run an old program here

In [None]:
# Find the new point here

In [None]:
revealable(ans411C)

##The Last Step
The last step is to repeat the procedure until a condition is met – often one or a combination of: 
* a number of iterations
* a low enough value of the gradient (which approaches 0 as the optimal point is neared)
* a low enough change in the vector $a$
* a low enough change in the function value between the old and new points

###Practice Problem D
Repeat the procedure twice more for the function $f(x, y) = x^2 + 2y^2 – y$.

Automate as much as possible, but don’t write a full program (yet).

In [None]:
# Minimize

In [None]:
# Find the new point

In [None]:
revealable(ans411D)

###Practice Problem E
Write a program that will repeat the procedure for $n$ iterations. 

The program should print out the last two calculated points (so you can see if you need to raise the iterations for better convergence).

Test for 5 iterations, then 10.

In [None]:
# Program here

In [None]:
# Test here

In [None]:
revealable(ans411E)

###Practice Problem F
Use your program to minimize 
$f(x, y) = (x – 2)^4 + (x – 2y)^2$
from initial point $(0, 0)$.

In [None]:
# Run your program here

In [None]:
revealable(ans411F)

##Problems with Steepest Descent
As you may have noticed on Problem E, there are occasions where the function may take a very long time to converge using the steepest descent method. 

The reason is that if the function is very flat near the minimum (as in an $x^4$ function), the gradient gets very small and each iteration goes a very short distance.

There are refinements, but the best one is an entirely different method – next lesson!