# Gradient Descent Practice

Gradient Descent is the core numerical optimization technique that is used in Machine Learning. In this practice we are going to code the Gradient Descent Algorithm and use it on 1D and 2D functions.

Exactly the same way it works with higher dimensional functions with the only exception that it is impossible to visualize the process.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import json_tricks

answer = {}

# Task 1. Code Gradient Descent algorith.

The inputs into your function are:
- current position $\mathbf x$, `x`
- gradient of the objective function $\nabla L$, `grad`
- learning rate $\alpha$

Code the update step of Gradient Descent algorithm.

In [None]:
def grad_descent(x, grad, alpha=0.1):
    res = x
    ## YOUR CODE HERE
    return res

# Task 2. Code Objective Function

We will use a super simple objective function:

$f(x) = x^2$

Code this function and its gradient below.

In [None]:
def f_1d(x):
    res = x
    ## YOUR CODE HERE
    return res

def grad_f_1d(x):
    res = x
    ## YOUR CODE HERE
    return res


In [None]:
position = 5

history = []
for index in range(10):
    grad = grad_f_1d(position)
    position = grad_descent(position, grad)
    history.append(position)

xs = np.linspace(-6, 6, 100)
ys = f_1d(xs)

history = np.array(history)

plt.plot(xs, f_1d(xs), label="Loss function")
plt.plot(history, f_1d(history), 'o', label="Optimization path")
dx = np.diff(history)  # Change in x
dy = np.diff(f_1d(history))       # Change in y
plt.quiver(history[:-1], f_1d(history)[:-1], dx, dy, angles="xy", scale_units="xy", scale=1, color="red", label="Steps")

plt.legend()
plt.title("Optimization Path with Arrows")
plt.show()

answer['1d'] = history.tolist()

# Task 3. 2D optimization

The simple test case is passed. Let us take a look at a 2D function and how is it going to be optimized.

In this case we will be optimizing a slightly more sophisticated function:

$f(x, y) = (1 - x)^2 + 10 (y - x^2)^2$

Note that in this case, the input in loss function is a 2D point with coordinates:

```
numpy.array([1, 2])
```

Write a function that calculates the objective function and its gradient

In [None]:
def loss_2d(x):
    res = x.sum()
    ## YOUR CODE HERE
    return res

def loss_2d_grad(x):
    grad = np.zeros_like(x)
    ## YOUR CODE HERE
    return grad

In [None]:
position = np.array([-1, 0])

history = [position]
for index in range(1000):
    grad = loss_2d_grad(position)
    position = grad_descent(position, grad, alpha=0.02)
    history.append(position)

xs = np.linspace(-2, 2, 400)
ys = np.linspace(-1, 3, 400)

xs, ys = np.meshgrid(xs, ys)

history = np.array(history)

plt.contour(xs, ys, loss_2d(np.stack([xs, ys])), label="Loss function", levels=np.logspace(0, 3, 20))
x, y = history[:, 0], history[:, 1]
dx = np.diff(x)
dy = np.diff(y)
plt.quiver(x[:-1], y[:-1], dx, dy, angles="xy", scale_units="xy", scale=1, color="red", width=0.005)
plt.plot(x, y, 'o', label="Optimization path")

answer['2d'] = history.tolist()

# Afterword

That is really impressive! We have coded one of the two main algorithms in Deep Learning: a Gradient Descent!

- How does the gradient descent work? Is its behavior logical?
- Which of the steps can potentially be dangerous?
- How would you improve Gradient Descent so that there is no that unwanted dangerous step?


In [None]:
json_tricks.dump(answer, '.answer.json')