# 4-12 Gradient-Based Optimization, Part 2
* Conjugate Gradient Method

In [None]:
using Revealables
include("files/answers.jl")

##The Premise
<img src="files/4-12/convslow.png" style="padding:0em 1em 0em 0em" width=200 align="left" />In the last lesson, we found that the steepest descent method is slow to converge when the gradient is close to 0. The reason is that each step is orthogonal to (at right angles to) the previous step.

Part of the problem is that when the contour lines are non-circular, perpendicular (orthogonal) lines don't head exactly towards the center of the region.


The __conjugate gradient method__ avoids the problems of the steepest descent method by nudging the vectors so that they are not precisely orthogonal to each other. 

<img src="files/4-12/nudge.png" width=500 />

Instead, they are distorted so that if the contours were circular, the vectors would be orthogonal; the less circular the contours, the more distortion is needed.

Vectors formed by this type of distortion are called __conjugates__. 

There is a lot of complicated math behind how to accomplish this nudging of the vectors, which we won't get into. 

What you need to know is that it involves a multiplier, which can be found using the two vectors involved, in this formula:

$$s=\frac{v_n \cdot v_n}{v_{n-1} \cdot v_{n-1}}$$

where the · symbol is the *dot product*, detailed in the next section. This formula is the same as finding the square of the magnitude of each vector, which is an alternate way of thinking about it.

##Dot Products (Review)
The dot product of two vectors a and b, symbolized a · b, is the sum of the products of corresponding elements. *(What?)*

If $a = [a_1, a_2, a_3, ...]$ and $b = [b_1, b_2, b_3, ...]$,

then $a · b = a_1b_1 + a_2b_2 + a_3b_3 + ...$.

So, $$\begin{align}
\left[ \begin{array}{ccc} 3 & 1 & -2 \end{array} \right] · \left[ \begin{array}{ccc} -6 & 3 & 0\end{array} \right] &= \\
	-18 + 3 + 0 &= \\
			   &= -15\end{align}$$

In Julia, two column arrays' dot product can be found using `dot(a,b)`. The arrays have to be in *column form*, entered `[3, 1, -2]` or `[3; 1; -2]` (rather than `[3 1 -2]`).

###Practice Problem A
Using the two given gradients, and the formula

$$s=\frac{v_n \cdot v_n}{v_{n-1} \cdot v_{n-1}}$$

where v is the negative-gradient vector, find the value of the multiplier $s$.
1. $g_0 = [3, 2]$, $g_1 = [0.5, -0.2]$
2. $g_2 = [0.2, 0.01]$, $g_1 = [1, -0.2]$

If you did these by hand, check your work using Julia. 

In [None]:
# Find the value of `s`

In [None]:
revealable(ans412A)

#The First Steps
The conjugate gradient method begins in exactly the same way as the steepest descent method:
1. Find the gradient at the original point $x_0$ and use its negative vector as the steepest descent direction.
2. Use $new~point = old point + scalar · vector$ to formulate a new point in terms of the scalar variable $a$. 
3. Substitute the $new~point$ into $f$ to obtain a single-variable function $f(a)$, then minimize $f(a)$.
4. Substitute the minimized value of a into the $new~point$ formula to find the new point, $x_1$.

###Practice Problem B
Using the function $f(x, y) = (x – y)^2 + 20y^2$ at the initial point $(2, -2)$, use your steepest-descent algorithm or program from the last lesson to find the next point.

Then, find the gradient and negative-gradient vector at the new point.

In [None]:
# Calculate here

In [None]:
revealable(ans412B)

##Next Steps
After the new point is found, its gradient and directional vector are calculated. The next step is to nudge this new vector, which is currently orthogonal to the previous one, so that it instead becomes the conjugate.

First, calculate the multiplier:
$$s=\frac{v_n \cdot v_n}{v_{n-1} \cdot v_{n-1}}$$

Then, form the new vector as follows:
$$v = vn + sn·vn-1$$

###Practice Problem C
From problem 2, you should have:
* Original point $(2, -2)$ with gradient $[8  -88]$ and directional vector $<-8, 88>$
* New point $(1.8097, 0.935)$ with gradient $[3.432, 0.308]$ and directional vector $<-3.432,  -0.308>$

Use these to calculate the new (conjugate) vector.

In [None]:
# Calculate here

In [None]:
revealable(ans412C)

##Final Steps
Once you have the conjugate vector, repeat the procedure with the current point and new vector.

The steps, in order:
1. From point $x_0$:
  * use gradient to get vector $v_0$
  * minimize $f(x_0 + a·v_0)$ to get $x_1$.<br /><br />
2. At point $x_1$:
  * use gradient to get vector $v_1$
  * calculate $s$
  * nudge vector $v_1$ by $s·v_0$ to get modified $v_1$
  * minimize $f(x_1 + a·v_1)$ to get $x_2$.<br /><br />
3. Repeat Step 2 until convergence is attained.

###Practice Problem D
Using the conjugate vector $<-3.4452, -0.174>$ and the point $(1.8097, 0.0935)$ from Problem C, complete one more iteration of the conjugate gradient  method. 

In [None]:
# One more time!

In [None]:
revealable(ans412D)

###Practice Problem E
Modify your steepest descent program to include the conjugate-gradient vector nudge.

Then, use it to find the minimum of 
	$f(x, y) = (x – 2)^4 + (x – 2y)^2$
from initial point $(0, 0)$ – the same problem that caused trouble with steepest-descent in the last lesson.

In [None]:
# Edit your steepest descent program

In [None]:
# Test your program here

In [None]:
revealable(ans412E)