University of Michigan - ROB 101 Computational Linear Algebra

# Homework 9.  Optimization
### Due: 11/19 at 9 PM Eastern

#### Purpose: Learn how to optimize functions using julia.
- Skills
    - Computing a Gradient of a function
    - Computing the Hessian of a function
    - Finding Local minima and maxima of functions
- Knowledge
    - Using the SimPy package to do symbolic math for you. All those messy derivatives
      in Calculus that a person learns to due by hand, a computer can also do, 
      and without mistakes. 
    - Understand the significance of minima and maxima in different contexts
 
    
#### Task:
Complete and run the cells below as directed.

### Problem 1:  Calculate the gradient of f(x, y, z) at (1, 2, 3) using a symmetric differences method.
Use a step-size h of 0.001.  Your answer should be a 3d vector.
## $$ f(x, y, z) = xye^z - 14yz +sin(x)cos(y) -e^{tan(z/y)}$$

An example of how to use symmetric difference method to find the gradient at point (1, 2, 3) in the x-direction is as follows: $$ \frac{\delta f(x, y, z)} {\delta x} = \frac {f(x+h, y, z) - f(x-h, y, z)}{2h} $$
You will need to find the gradient in all three directions(x, y, and z).

In [None]:
#f is declared for you.  Use step size h, and starting point (x0, y0, z0)
f(x, y, z) = x*y*exp(z) - 14y*z + sin(x)*cos(y) - exp((tan(z/y)))
h = 0.001
(x0, y0, z0) = (-1, -1.1, 1)

#first, calculate df/dx 
# your code here
throw(ErrorException())

#calculate df/dy 
# your code here
throw(ErrorException())

#calculate df/dz 
# your code here
throw(ErrorException())

#Create the gradient vector, and save it as gradVect
# your code here
throw(ErrorException())

In [None]:
#autograder cell

### Problem 2:  Solving for the gradient analytically with SymPy
The Trick:  There is a python package that we can use in julia called SymPy.  It includes a lot of symbolic math functions, but there is one in particular we are interested in today:

The diff() function. It can calculate analytical forms for functions that would frighten
most humans! But not you, a denizen of ROB 101. You know that Julia is your friend when
you are in need! [Truth in advertising: MATLAB does symbolic math too!]


In [None]:
#an example
#using Pkg
#Pkg.add("SymPy")
using SymPy
@vars x y z # declare your variables
g(x,y,z)=exp(x*y*z)+sin(x/y) # make a gnarly function
diff(g(x,y,z),z) #finds the partial derivative with respect to z
                 # you can do the same with x and y. See below!

#### diff(f(x, y, z), x)  will return an expression representing the gradient in the x direction for any (x, y, z)

Using SymPy, find the three expressions for df/dx, df/dy, and df/dz.  

In [None]:
#Use SymPy to find an expression for df/dx
#here is f declared again, if you need it
f(x, y, z) = x*y*exp(z) - 14y*z + sin(x)*cos(y) - exp((tan(z/y)))
# your code here
throw(ErrorException())

In [None]:
#use SymPy to find an expression for df/dy
# your code here
throw(ErrorException())

In [None]:
#use SymPy to find an expression for df/dz
# your code here
throw(ErrorException())

In [None]:
# your code here
throw(ErrorException())

In [None]:
#Your implementation of Grad should return the same answer as you got in problem 1
@show grad_at_1 = Grad(-1, -1.1, 1)
@assert isapprox(gradVect, grad_at_1, atol = 1E-4)

### Example:  The Hessian
A reminder that the Hessian is the Jacobian of the transpose of the gradient of a function $f$: is that a mouthful or what? Why do we care? Extrema of a function (that is, max, min, and something else called a saddle point) "live" in the zero set of $ \nabla f(x)$. Hence, if $x^\ast$ is a local minimun of $f$, it satisfies $\nabla f(x^\ast)=0$, which means it is a root of $\nabla f(x)=0$. We can therefore apply Newton-Raphson to the gradient function in order to find its roots, and hence find local minima, for example. 

$$\begin{equation}
    \label{eq:Hessian}
    \nabla^2 f(x) := \frac{\partial}{\partial x} \left[ \nabla f(x) \right]^\top
\end{equation}$$
Where $\nabla^2 f(x) $ is the Hessian of $f$ at point $x$.

The following function uses a symmetric differences approximation to compute the Hessian of f at x0.

In [None]:
# run me, don't change me. I will compute gradients and Hessians for you. 
# I am a workhorse of a function!
#
function gradHess(f,x0) 
    n=size(x0,1)
    H=zeros(n,n)
    grad1=zeros(1,n)
    Id=diagm(0=>fill(1., size(H,1)))
    delta=0.01
    h=delta
    for i=1:n
        grad1[i]=(f(x0+ h*Id[:,i]) -f(x0 -  h*Id[:,i]))[1]/(2*h)
        for j=1:n
            H[i,j]=(f(x0+ h*Id[:,i] + delta*Id[:,j])-  f(x0+  h*Id[:,i]-delta*Id[:,j]) - f(x0-  h*Id[:,i] + delta*Id[:,j])+ f(x0- h*Id[:,i]-delta*Id[:,j]))[1]/(4*h*delta)
        end
    end
    return  grad1, H
end

In [None]:
# run me, don't change me. I am creating a super complicated function to show you
# how efficiently gradHess can compute the gradient and the Hessian. I wish you luck
# doing this anlytically! Go for it, make my day!  
#
#f and x0 are declared to be random matrices/vectors of size 20 
using LinearAlgebra
using Random
Random.seed!(4321);
n=20;
A2=rand(n,n);
A4=rand(n,n);
f(x)= x'*A2*x + x'*x*x'*A4*x;
x0=rand(n,1)
#Here is in example using the Hessian function 
(gradN, Hess) =gradHess(f, x0)
@show gradN
Hess

### Problem 3:  Use gradHess to minimize a function g(x), where $x\in \mathbb{R}^{20}$
Please re-read Section 11.4 in our textbook. A few key points are summarized here, but for the full context, please see the book! We have $g:\mathbb{R}^n \to \mathbb{R}$. We seek
$$x^\ast = {\rm arg~min}_{x\in \mathbb{R}^n} g(x) $$
We know that $x^\ast$ is a root of $\nabla g(x)$. Because the gradient is a row vector, we take its transpose and turn it into a column vector so that
$$ \nabla g^\top: \mathbb{R}^n \to \mathbb{R}^n.$$
We know how to find roots of vector-valued functions: we must apply Newton-Raphson to the function. To do that, we need the Jacobian of $\nabla g^\top(x)$, which is the Hessian. Yes, $\nabla^2g(x):= \frac{\partial }{\partial x }\nabla g^\top(x).$

Lucky you, we have provided free of charge the awesome function:

**(grad, Hess) = gradHess(g, x0)**

which computes both the gradient and the Hessian. You'll have to transpose the gradient yourself! We think you can handle it. Your mission therefore, is to implement the algorithm below on a gnarly function $g:\mathbb{R}^n \to \mathbb{R}$ and find its minimum to a tolerance of ${\rm tol}=10^{-6}$!
### Newton-Raphson applied to the transpose of the gradient so that a local minimum can be found: You must put this in some kind of loop(For or While), just as you did in HW for the Bisection Algorithm. 
### In a loop, solve the top equation for $\Delta x_{k} $ and then use it to update the second equation.
$$
\nabla^2 g(x_k)~ \Delta x_{k} = - \left[\nabla g(x_k) \right]^\top
$$
$$
x_{k+1}= x_k + \Delta x_{k}
$$
#### Exit the loop when $||\nabla g(x_k)^\top|| < {\rm tol} $.
g(x) and an initial xk for k=0 are declared for you.  You may find that page 188 of the booklet is helpful here.

In [None]:
# run me, don't change me.
# this cell declares g(x), a function that depends on 20 variables, and xk for k=0, 
# which you are to use as your starting point
n=20 
Random.seed!(4321); 
A2=rand(n,n) 
A4=rand(n,n) 
g(x)= -x'*A2'*A2*x + x'*x*x'*A4'*A4*x
k=0
xk=100*rand(n,1)-200*rand(n,1)

In [None]:
#Use this cell to find the minimum
#Save the value of the minimizer, x*, in a variable called x_star
#Save the minumum value of g in a variable called g_min
#Use the gradHess function that we provided to calculate the Hessian
# your code here
throw(ErrorException())

In [None]:
#autograder cell

In [None]:
#autograder cell