# 1. Programming Gradient Descent in 1D

#### Exercise 1:

Improve the `gradient_descent` functions by adding:

* *a tolerance and a maximum number of iterations*: stop the iterations only when `Df(x)` is smaller than the tolerance or the maximum number of iterations has been reached.
* *progress info*: print the current number of iterations, the value of `x` and that of `f(x)` every 100 iterations (*Hint*: in the second code cell you can see how to format variables into text) 
* *a `verbose` parameter*, to activate or deactivate the showing of progress info.

#### Solution

In [None]:
""" Third version of the gradient descent. Stops when the gradient is smaller than `TOL`, 
    or when the maximum number of iterations `maxiter` has been reached"""
function gradient_descent(f, Df, x; alpha = 0.1, TOL = 1e-10, maxiter = 1000, verbose = false)
    
    N_iter = 0
    grad = Df(x)
    while (N_iter < maxiter) & (abs(grad) > TOL)
        x = x - alpha*grad
        grad = Df(x)
        N_iter += 1
        
        if (N_iter % 100 == 0) & (verbose == true) # print progress
            fx = f(x)
            println("Iter. $N_iter,\tx = $x,\tf(x) = $fx")
        end
    end
    
    return (x, f(x))
end

# 2. Arrays, packages and Multidimensional Gradient Descent

#### Exercise 2

Build the function $g(x) = x_1^2 + 2x_2^2$ and its gradient using only matrix and scalar-matrix operations (Hint: the transpose of the vector `x` is `x'`).

#### Solution

In [None]:
g2(x) = x' * [1 0; 0 2] * x
Dg2(x) = 2 .* [1 0; 0 2] * x

#### Exercise 3

Bring the gradient descent of the first notebook into the multidimensional realm, and make it output the _history_ of `x`s and `f(x)`s (Hint: consider using the concatenate functions that were explained above. You may also use `push!`.)

In [None]:
""" Gradient descent for multidimensional functions Stops when the gradient is smaller than `TOL`, 
    or when the maximum number of iterations `maxiter` has been reached"""
function gradient_descent(f, Df, x; alpha = 0.1, TOL = 1e-10, maxiter = 1000, verbose = false)
    N_iter = 0
    grad = Df(x)
    
    xn = x
    fn = [f(x)]
    while (N_iter < maxiter) & (norm(grad) > TOL)
        x = x - alpha*grad
        
        xn = [xn x]
        fn = [fn f(x)]
        
        grad = Df(x)
        N_iter += 1
        
        if (N_iter % 100 == 0) & (verbose == true)# print progress
            fx = fn[end]
            println("Iter. $N_iter,\tx = $x,\tf(x) = $fx")
        end
    end
    
    return (xn, fn)
end