<!--NAVIGATION-->
# < [Autograd](2-Autograd.ipynb) | Optimization | [Modules](4-Modules.ipynb) >

# Optimization

## Notebook Introduction

In this short notebook, we will see how to use the gradient obtained with Autograd to perform optimization of an objective function.  
Then we will also present some off-the-shelf Pytorch optimizers and learning rate schedulers.  
As an eye candy, we will finish with some live optimization vizualisations.

___

## Google Colab only!

In [7]:
# execute only if you're using Google Colab
# !wget -q https://raw.githubusercontent.com/ahug/amld-pytorch-workshop/master/binder/requirements.txt -O requirements.txt
# !pip install -qr requirements.txt
# !wget -q https://raw.githubusercontent.com/ahug/amld-pytorch-workshop/master/live_plot.py -O live_plot.py

___

In [8]:
import torch
import numpy as np

In [9]:
%matplotlib notebook
torch.set_printoptions(precision=3)

## Optimizing "by hand"

We will start with a simple example : minimizing the square function.


In [10]:
def f(x):
    return x ** 2

We will minimize the function $f$ "by hand" using the gradient descent algorithm.

As a reminder, the update step of the algorithm is:
$$x_{t+1} = x_{t} - \lambda \nabla_x f (x_t)$$

Note:
- The gradient information $\nabla_x f (x)$ is stored in `x.grad`. Once we have run the `backward` function, we can use it to do our update step.
- We need to do `x.data = ...` in the update step since want to change x in place but don't want autograd to track this change 

In [11]:
# YOUR TURN

x0 = 8
lr = 0.01
iterations = 10

x = torch.Tensor([x0]).requires_grad_()
y = f(x)

for i in range(iterations):

    # < YOUR CODE HERE >

    print(y.data)

tensor([64.])
tensor([64.])
tensor([64.])
tensor([64.])
tensor([64.])
tensor([64.])
tensor([64.])
tensor([64.])
tensor([64.])
tensor([64.])


#### Why do we have x.data?

If you do `x = ...`, then x is not a leaf variable anymore and will have a computation history. Since it is not a leaf anymore after the first iteration, its gradient will not be available at the second iteration.

Workarounds:
 - Define x as a new leaf variable requiring gradient at each iterations using `detach()` and `require_grad_()`
 - Update `x.data` so that it is not recorded by autograd

## Optimizing with an optimizer

### Different optimizers
Pytorch provides most common optimization algorithms encapsulated into "optimizer classes".  
An optimizer is a useful object that automatically loops through all the numerous parameters of your model and performs the (potentially complex) update step for you.

You first need to execute `import optim`. 

In [12]:
import torch.optim as optim

Below are the most commonly used optimizers. Each of them have its specific parameters that you can check on the [Pytorch Doc](https://pytorch.org/docs/master/optim.html#algorithms).

In [13]:
parameters = [x]  # This should be the list of model parameters

optimizer = optim.SGD(parameters, lr=0.01, momentum=0.9)
optimizer = optim.Adam(parameters, lr=0.01)
optimizer = optim.Adadelta(parameters, lr=0.01)
optimizer = optim.Adagrad(parameters, lr=0.01)
optimizer = optim.RMSprop(parameters, lr=0.01)
optimizer = optim.LBFGS(parameters, lr=0.01)

# and there is more ...

### Using an optimizer

Now, let's use an optimizer to do the optimization !

You will need 2 new functions:
- `optimizer.zero_grad()` : This function sets the gradient of the parameters (x here) to 0 (otherwise it will get accumulated)
- `optimizer.step()` :  This function applies an update step

In [14]:
# YOUR TURN

x0 = 8
lr = 0.01
iterations = 10

x = torch.Tensor([x0]).requires_grad_()
y = f(x)

# Define your optimizer
optimizer =  # < YOUR CODE HERE >

for i in range(iterations):
    
    # < YOUR CODE HERE >
    
    print(y.data)

SyntaxError: invalid syntax (Temp/ipykernel_13356/2646135193.py, line 11)

### Learning rate scheduler

Learning rate scheduler seek to adjust the learning rate during training by reducing the learning rate according to a pre-defined schedule.  
Below are some of the scheduler available in pytorch.

In [None]:
optim.lr_scheduler.LambdaLR
optim.lr_scheduler.ExponentialLR
optim.lr_scheduler.MultiStepLR
optim.lr_scheduler.StepLR

# and some more ...

Let's try optim.lr_scheduler.ExponentialLR

In [None]:
def f(x):
    return x.abs() * 5

In [None]:
x0 = 8
lr = 0.5
iterations = 15

x = torch.Tensor([x0]).requires_grad_()
optimizer = optim.SGD([x], lr=lr)
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, 0.8)

for i in range(iterations):
    optimizer.zero_grad()
    y = f(x)
    y.backward()
    optimizer.step()
    scheduler.step()
    print(y.data, " | lr : ", optimizer.param_groups[0]['lr'])

## Live Plots 

Below are some live plots to see what actually happens when you optimize a function.  
You can play with learning rates, optimizers and also define new functions to optimize !

### 2D Plot - Optimization process

In [None]:
from live_plot import init_2dplot, add_point_2d

In [None]:
def function_2d(x):
    return x ** 2 / 20 + x.sin().tanh()

In [None]:
x0 = 8
lr = 3
iterations = 15

x_range = torch.arange(-10, 10, 0.1)
init_2dplot(x_range, function_2d, delta_=0.5)

x = torch.Tensor([x0]).requires_grad_()
optimizer = torch.optim.Adam([x], lr=lr)

for i in range(iterations):
    optimizer.zero_grad()
    f = function_2d(x)
    f.backward()
    add_point_2d(x, f)
    optimizer.step()

### 3D Plot - Optimization process

In [None]:
from live_plot import init_3dplot, add_point_3d

__Choose a function below and run the cell__

In [None]:
elev, azim = 40, 250
x0, y0 = 6, 0
x_range = torch.arange(-10, 10, 1).float()
y_range = torch.arange(-15, 10, 2).float()

def function_3d(x, y):
    return x ** 2 - y ** 2

In [None]:
elev, azim = 30, 130
x0, y0 = 10, -4
x_range = torch.arange(-10, 15, 1).float()
y_range = torch.arange(-15, 10, 2).float()

def function_3d(x, y):
    return x ** 3 - y ** 3

In [None]:
elev, azim = 80, 130
x0, y0 = 4, -5
x_range = torch.arange(-10, 10, .5).float()
y_range = torch.arange(-10, 10, 1).float()

def function_3d(x, y):
    return (x ** 2 + y ** 2).sqrt().sin()

In [None]:
elev, azim = 37, 120
x0, y0 = 6, -15
x_range = torch.arange(-10, 12, 1).float()
y_range = torch.arange(-25, 5, 1).float()

# lr 0.15 momentum 0.5
def function_3d(x, y):
    return (x ** 2 / 20 + x.sin().tanh()) * (y.abs()) ** 1.2 + 5 * x.abs() + (y + 7)**2 / 10

__Optimize the function__

In [None]:
init_3dplot(x_range, y_range, function_3d, elev, azim, delta_=0.1)

#x0 = 
#y0 = 

lr = 0.01
iterations = 15

x = torch.Tensor([x0]).requires_grad_()
y = torch.Tensor([y0]).requires_grad_()
optimizer = torch.optim.SGD([x, y], lr=lr)

for i in range(iterations):
    optimizer.zero_grad()
    f = function_3d(x, y)
    f.backward()
    add_point_3d(x, y, f)
    optimizer.step()

___

## Don't forget to download the notebook, otherwise your changes may be lost

![Download the notebook](figures/notebook-download.png)

<!--NAVIGATION-->
# < [Autograd](2-Autograd.ipynb) | Optimization | [Modules](4-Modules.ipynb) >