<a href="https://colab.research.google.com/github/monicasjsu/deep_learning/blob/master/3_optimizers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import torch.optim as optim
import torch

In [2]:
dir(optim)

['ASGD',
 'Adadelta',
 'Adagrad',
 'Adam',
 'AdamW',
 'Adamax',
 'LBFGS',
 'Optimizer',
 'RMSprop',
 'Rprop',
 'SGD',
 'SparseAdam',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'lr_scheduler']

Create params and instantiate a gradient descent optimizer

In [0]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-5
optimizer = optim.SGD([params], lr=learning_rate)

In [0]:
# t_c = torch.tensor([1.0, 3.0, 7.0])
# t_u = torch.tensor([35.7, 55.9, 58.2])
# t_un = 0.1 * t_u


t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0])
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4])
t_un = 0.1 * t_u

In [0]:
def model(t_u, w, b):
 return w * t_u + b

In [0]:
def loss_fn(t_p, t_c):
 squared_diffs = (t_p - t_c)**2
 return squared_diffs.mean()

In [7]:
t_p = model(t_u, *params)
loss = loss_fn(t_p, t_c)
loss.backward()
optimizer.step()
params

tensor([ 9.5483e-01, -8.2600e-04], requires_grad=True)

Loop-ready code, with the extra zero_grad in the right spot

In [8]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)
t_p = model(t_un, *params)
loss = loss_fn(t_p, t_c)
optimizer.zero_grad()
loss.backward()
optimizer.step()
params

tensor([1.7761, 0.1064], requires_grad=True)

Updating the training loop by providing the list of params

In [0]:
def training_loop(n_epochs, optimizer, params, t_u, t_c):
 for epoch in range(1, n_epochs + 1):
  t_p = model(t_u, *params)
  loss = loss_fn(t_p, t_c)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()
  if epoch % 500 == 0:
   print('Epoch %d, Loss %f' % (epoch, float(loss)))
 return params

In [10]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)
training_loop(
 n_epochs = 5000,
 optimizer = optimizer,
 params = params,
 t_u = t_un,
 t_c = t_c)

Epoch 500, Loss 7.860118
Epoch 1000, Loss 3.828538
Epoch 1500, Loss 3.092191
Epoch 2000, Loss 2.957697
Epoch 2500, Loss 2.933134
Epoch 3000, Loss 2.928648
Epoch 3500, Loss 2.927830
Epoch 4000, Loss 2.927680
Epoch 4500, Loss 2.927651
Epoch 5000, Loss 2.927648


tensor([  5.3671, -17.3012], requires_grad=True)

Instantiating a different optimizer, such as Adam, instead of SGD

In [11]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-1
optimizer = optim.Adam([params], lr=learning_rate)
training_loop(
 n_epochs = 2000,
 optimizer = optimizer,
 params = params,
 t_u = t_u,
 t_c = t_c)

Epoch 500, Loss 7.612903
Epoch 1000, Loss 3.086700
Epoch 1500, Loss 2.928578
Epoch 2000, Loss 2.927646


tensor([  0.5367, -17.3021], requires_grad=True)

Shuffling the elements of a tensor amounts to finding a permutation of its indices using randperm

In [12]:
n_samples = t_u.shape[0]
n_val = int(0.2 * n_samples)
shuffled_indices = torch.randperm(n_samples)
train_indices = shuffled_indices[:-n_val]
val_indices = shuffled_indices[-n_val:]
train_indices, val_indices 

(tensor([ 3, 10,  0,  7,  4,  6,  9,  2,  5]), tensor([8, 1]))

get index tensors that you can use to build training and validation sets starting
from the data tensors

In [0]:
train_t_u = t_u[train_indices]
train_t_c = t_c[train_indices]
val_t_u = t_u[val_indices]
val_t_c = t_c[val_indices]
train_t_un = 0.1 * train_t_u
val_t_un = 0.1 * val_t_u

Training loop doesn’t change. Wanted to evaluate the validation loss at every
epoch to have a chance to recognize whether it is overfitting

In [0]:
def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c,
val_t_c):
 for epoch in range(1, n_epochs + 1):
  train_t_p = model(train_t_u, *params)
  train_loss = loss_fn(train_t_p, train_t_c)
  val_t_p = model(val_t_u, *params)
  val_loss = loss_fn(val_t_p, val_t_c)
  optimizer.zero_grad()
  train_loss.backward()
  optimizer.step()
  if epoch <= 3 or epoch % 500 == 0:
   print('Epoch {}, Training loss {}, Validation loss {}'.format(
   epoch, float(train_loss), float(val_loss)))
 return params

In [15]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)
training_loop(
 n_epochs = 3000,
 optimizer = optimizer,
 params = params,
 train_t_u = train_t_un,
 val_t_u = val_t_un,
 train_t_c = train_t_c,
 val_t_c = val_t_c)

Epoch 1, Training loss 90.21489715576172, Validation loss 36.03684616088867
Epoch 2, Training loss 41.471065521240234, Validation loss 11.09175968170166
Epoch 3, Training loss 34.153743743896484, Validation loss 12.061867713928223
Epoch 500, Training loss 6.606430530548096, Validation loss 7.840882301330566
Epoch 1000, Training loss 3.0961873531341553, Validation loss 5.740539073944092
Epoch 1500, Training loss 2.6341655254364014, Validation loss 5.080069065093994
Epoch 2000, Training loss 2.5733556747436523, Validation loss 4.853822231292725
Epoch 2500, Training loss 2.565351724624634, Validation loss 4.773495197296143
Epoch 3000, Training loss 2.564297676086426, Validation loss 4.744585037231445


tensor([  5.3099, -16.8489], requires_grad=True)

Making sure that this context manager
works by checking the value of the requires_grad attribute on the val_loss tensor

In [0]:
def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c,
val_t_c):
 for epoch in range(1, n_epochs + 1):
  train_t_p = model(train_t_u, *params)
  train_loss = loss_fn(train_t_p, train_t_c)
  with torch.no_grad():
   val_t_p = model(val_t_u, *params)
   val_loss = loss_fn(val_t_p, val_t_c)
   assert val_loss.requires_grad == False
  optimizer.zero_grad()
  train_loss.backward()
  optimizer.step()

Defining a calc_forward
function that takes data in input and runs model and loss_fn with or without autograd, according to a Boolean train_is argument

In [0]:
def calc_forward(t_u, t_c, is_train):
 with torch.set_grad_enabled(is_train):
  t_p = model(t_u, *params)
  loss = loss_fn(t_p, t_c)
  return loss