# Week 5, Callbacks

<h2>Schedule for the next few weeks:</h2>

**Week 5:** Callbacks

**Week 6:** Optimizers

**Week 7:** Loss Functions

**Week 8:** Custom Architecture Introduction

<h3> <b>Key Terms and Definitions:</b> </h3>

**Model**:  'A mathematical representation of a real-world-process,' in our case consists of an architecture and we call these Deep Learning Models

**Loss**: The loss function associated with grading performance to the model.

**Metric**: How we grade the workings of our model in a 'pretty' way. Eg: accuracy vs Cross Entropy Loss

**Gradient Decent**: Optimization algorithm used to update the parameters of our model, help 'fit'

**Callback**: A function that occurs in case something happens


Everything in Fast.AI is done during the training loop. Every single step inside of it can be altered via a callback.

<h2>Examples of useful callbacks, some you already know!</h2>

*   <b>OneCycleScheduler</b>

*   <b>LRFinder</b>

* SaveModelCallback
* CSVLogger
* MixedPrecision

Today we are going to go through how each of these work, what it's like building callbacks within fastai, and what all can be done




<h2> The basic training loop </h2>

Here is the code

In [0]:
# train_dl is the dataloader for your training set in the databunch. 
#      It is a link to where your databunch set is saved
# opt_fn is the optimization function, there are a few. We use Adam.
#      Standard Gradient Decent is also commonly found in papers and models
def train(train_dl, epochs, opt_fn, loss_func):
  for _ in range(epoch):
    # Well what is _? _ is a placeholder for some increment we dont care about modifying, 
    # see Code Examples section for more
    model.train() # set the model to train mode
    for xb,yb in train_dl: # for each batch of x and y in the dataset
      out = model(xb) # This is our forward pass, we get our models output
      loss = loss_func(out, yb) # We take the models output and our label and grade
      loss.backward() # Our model's backward pass, accumulates gradients (graph)
      opt_fn.step() # Updates a parameter on the current gradient
      opt_fn.zero_grad() # zeros our gradients

<h2>Pytorch for comparison</h2>

In [0]:
def pyTrain(train_dl, model, opt_func, loss_func):
  losses = AverageMeter('Loss', ':.4e') # how we want to see the losses
  top1 = AverageMeter('Acc@1', ':6.2f')
  top5 = AverageMeter('Acc@5, '6.2f)
  
  model.train() # set model to train
  for i, (images, target) in enumerate(train_dl): # for every item in the training loader
    target = target.cuda(args.gpu, non_blocking=True) # load the target tensor to the GPU
    
    output = model(images) # compute output and loss
    loss = criterion(output, target)
    
    acc1, acc5 = accuracy(output, target, topk=(1,5)) # topk returns top 'x' predictions
    losses.update(loss.item(), images.size(0))
    top1.update(acc1[0], images.size(0))
    top5.update(acc5[0], images.size(0))
    
    
    optimizer.zero_grad() # zero our gradient
    loss.backward() # backward pass
    optimizer.step() # Update parameters

![alt text](https://pouannes.github.io/fastai-callbacks/fastai_training_loop_vanilla.png#center)

Now lets view where the callback functionalities can apply

In [0]:
def train(train_dl, epoch, opt_fn, loss_func):
  callbacks.on_train_begin() # we can call even before we do any epochs
  
  for _ in range(epochs):
    callbacks.on_epoch_begin() # we can call here
    model.train()
    
    for xb,yb in train_dl:
      callbacks.on_batch_begin() # even before we get our output!
      out = model(xb)
      
      callbacks.on_loss_begin() # here we have the output and need to modify it
      loss = loss_func(out, yb)
      
      callback.on_backward_begin() # our start of backward pass
      loss.backward()
      callbacks.on_backward_end() # end of our backward pass
      
      opt_fn.step()
      callbacks.on_step_end() # after we update a parameter
      
      opt.zero_grad()
      callbacks.on_batch_end() # after we finish a batch
      
    callbacks.on_epoch_end() # end of our epoch
    
  callbacks.on_train_end() # end of our training

![alt text](https://pouannes.github.io/fastai-callbacks/fastai_training_loop_callbacks.png#center)

<h2> Callback Handler </h2>

We can do most of whatever we want but we're missing something. How do they handle ANYTHING without access to the state of training itself? (see they're around but not within?)

Welcome to the <b>CallbackHandler</b>

The CallbackHandler takes relevent data and transmits it to the callbacks and returns value depending on behaviour

Let's have an example. We can use callbacks to send the new training data, *xb* and and *yb* for the batch.
Here we will have a CallbackHandler called cb_handler:

In [0]:
cb_handler.on_batch_begin(xb,yb)

Now lets think of another example. I want to skip the backward pass if my loss is too high. Also, if my loss is too small, perhaps scale it up!

In [0]:
loss, skip_backward = cb_handler.on_backward_begin(loss)
if not skip_backward: loss.backward() # if we decide we don't want to skip backward

The new training loop

In [0]:
def train(learn, epochs, callbacks, metrics):
  cb_handler = CallbackHandler(callbacks)
  cb_handler.on_train_begin(epochs, learn, metrics)
  
  for epoch in range(epochs):
    learn.model.train()
    cb_handler.on_epoch_begin(epoch)
    
    for xb,yb in train_dl:
      xb,yb = cb_handler.on_batch_begin(xb,yb)
      
      out = learn.model(xb)
      out = cb_handler.on_loss_begin(out)
      
      loss = learn.loss_func(out, yb)
      
      loss, skip_backward = cb_handler.on_backward_begin(loss)
      if not skip_backward: loss.backward()
      if not cb_handler.on_backward_end(): learn.opt.step()
        
      if not cb_handler.on_step_end(): learn.opt.zero_grad()
      if not cb_handler.on_batch_end(): break
        
    val_loss, mets = validate(learn.data.valid_dl, model, metrics)
    if not cb_handler.on_epoch_end(val_loss, mets): break
      
  cb_handler.on_train_end()

<h2> Let's build a few </h2>

I will have 3 examples for us to follow today, OneCycleScheduler, LRFinder, and CSV Logger.

What is OneCycleScheduler? Based on this [paper](https://arxiv.org/pdf/1803.09820.pdf) by Leslie Smith. 
From the documentation: 

"Create a Callback that handles the hyperparameters settings following the 1cycle policy for learn. lr_max should be picked with the lr_find test. In phase 1, the learning rates goes from lr_max/div_factor to lr_max linearly while the momentum goes from moms[0] to moms[1] linearly. In phase 2, the learning rates follows a cosine annealing from lr_max to 0, as the momentum goes from moms[1] to moms[0] with the same annealing.""

https://docs.fast.ai/callbacks.one_cycle.html#OneCycleScheduler

https://docs.fast.ai/callbacks.one_cycle.html#What-is-1cycle?


* One difference to note: in pytorch to break we pass False, in fastai we pass True 

In [0]:
from fastai.torch_core import *
from fastai.basic_data import DataBunch
from fastai.callback import *
from fastai.basic_train import Learner, LearnerCallback

<h3> OneCycleScheduler</h3>

In [0]:
class OneCycleScheduler(LearnerCallback): # We give it a name and say we inherit from the LearnerCallback class
  def __init__(self, learn:Learner, lr_max:float, moms:Floats=(0.95,0.85), # pass in learner, a max lr, and momentums
               div_factor:float=25., pct_start:float=0.3, final_div:float=None, # pct_start is how far into training we start slowing the speed
              tot_epochs:int=None, start_epoch:int=None):
    super().__init__(learn)
    self.lr_max = lr_max
    self.div_factor = div_factor
    self.pct_start = pct_start
    self.final_div = final_div
    
    if self.final_div is None: self.final_div = div_factor * 1e4
      
    self.moms = tuple(listify(moms,2)) # see below what listify is
    if is_listy(self.lr_max): self.lr_max = np.array(self.lr_max)
      
    self.start_epoch, self.tot_epochs = start_epoch, tot_epoch

In [0]:
  def steps(self, *steps_cfg:StartOptEnd):
    'Build a learning rate schedule for all parameters'
    return [Scheduler(step, n_iter, func=func)
           for (step, (n_iter,func)) in zip(steps_cfg, self.phases)]

In [0]:
  def on_train_begin(self, n_epochs:int, epoch:int, **kwargs:Any)->None:
    'Initialize the parameters bsed on the schedule above'
    res = {'epoch':self.start_epoch} if self.start_epoch is not None else None
    self.start_epoch = ifnone(self.start_epoch, epoch)
    self.tot_epochs = ifnone(self.tot_epochs, n_epochs)
    
    n = len(self.learn.data.train_dl) * self.tot_epochs
    a1 = int(n*self.pct_start)
    a2 = n-a1
    
    self.phases = ((a1, annealing_cos), (a2, annealing_cos))
    low_lr = self.lr_max/self.div_factor # sets our lower LR
    
    self.lr_scheds = self.steps((low_lr, self.lr_max),
                               (self.lr_max, self.lr_max/self.final_div))
    
    self.moms_scheds = self.steps(self.moms, (self.moms[1], self.moms[0]))
    
    self.opt = self.learn.opt
    self.opt.lr, self.opt.mom = self.lr_scheds[0].start, self.mom_scheds[0].start
    self.idx_s = 0
    return res

In [0]:
  def jump_to_epoch(self, epoch:int)->None:
    for _ in range(len(self.learn.data.train_dl) * epoch):
      self.on_batch_end(True)

In [0]:
  def on_batch_end(self, train, **kwargs:Any)->None:
    'Take one step forward on the schedule for optim parameters'
    if train:
      if self.idx_s >= len(self.lr_scheds): return {'stop_training': True, 'stop_epoch': True}
      self.opt.lr = self.lr_scheds[self.idx_s].step()
      self.opt.mom = self.moms_scheds[self.idx_s].step()
      
      # When one scheduler is done, move to the next. 
      # (in one cycle we have two)
      
      if self.lr_scheds[self.idx_s].is_done:
        self.idx_s += 1

In [0]:
   def on_epoch_end(self, epoch, **kwargs:Any)->None:
    'Tell the learner to stop training on finish'
    if epoch > self.tot_epochs: return {'stop_training': True}

<h3>LRFinder</h3>

In [0]:
class LRFinder(LearnerCallback):
  "Mock training from `start_lr` to `end_lr` for `num_it` iterations."
  def __init__(self, learn:Learner, start_lr:float=1e-7, end_lr:float=10, 
               num_it:int=100, stop_div:bool=True):
    super().__init__(learn)
    self.data = learn.data
    self.stop_div = stop_div
    self.sched = Scheduler((start_lr, end_lr), num_it, annealing_exp) # annealing exp is a function

In [0]:
  def on_train_begin(self, pbar, **kwargs:Any)->None:
    'Initialize optimizer and hyperparameters'
    setattr(pbar, 'clean_on_interrupt', True)
    # pbar.clean_on_interrupt = True
    
    self.learn.save('tmp')
    self.opt = self.learn.opt
    self.opt.lr = self.sched.start
    self.stop = False
    self.best_loss = 0.
    return {'skip_validate': True}

In [0]:
  def on_batch_end(self, iteration:int, smooth_loss:TensorOrNumber, **kwargs:Any)->None:
    'Determine if loss is getting exponentially worse quickly'
    if iteration == 0 or smooth_loss < self.best_loss: self.best_loss = smooth_loss
    
    self.opt.lr = self.sched.step() # go to the next LR available in our two steps
    
    if self.sched.is_done or (self.stop_div and (smooth_loss > 4*self.best_loss or torch.isnan(smooth_loss))):
      'Use the smooth loss to decide if stopping since it is less shaky'
      return {'stop_epoch': True, 'stop_training': True} # end epoch early and stop the training

In [0]:
  def on_train_end(self, **kwargs:Any)->None:
    'Cleanup the weights'
    self.learn.load('tmp', purge=False)
    if hasattr(self.learn.model, 'reset'): self.learn.model.reset()
    for cb in self.callbacks:
      if hasattr(cb, 'reset'): cb.reset()
    print('LR Finder is complete, type {learner_name}.recorder.plot() to see the graph')

<h2> Accuracy</h2>

<h3>Metric</h3>

In [0]:
def accuracy(input:Tensor, targs:Tensor)->Rank0Tensor:
    "Compute accuracy with `targs` when `input` is bs * n_classes."
    n = targs.shape[0]
    input = input.argmax(dim=-1).view(n,-1)
    targs = targs.view(n,-1)
    return (input==targs).float().mean()

<h3> Callback </h3>

In [0]:
class AverageMetric(Callback):
    "Wrap a `func` in a callback for metrics computation."
    def __init__(self, func):
        # If it's a partial, use func.func
        name = getattr(func,'func',func).__name__
        self.func, self.name = func, name

    def on_epoch_begin(self, **kwargs):
        "Set the inner value to 0."
        self.val, self.count = 0.,0

    def on_batch_end(self, last_output, last_target, **kwargs):
        "Update metric computation with `last_output` and `last_target`."
        if not is_listy(last_target): last_target=[last_target]
        self.count += last_target[0].size(0)
        val = self.func(last_output, *last_target)
        self.val += last_target[0].size(0) * val.detach().cpu()

    def on_epoch_end(self, last_metrics, **kwargs):
        "Set the final result in `last_metrics`."
        return add_metrics(last_metrics, self.val/self.count)

In [0]:
{'last_metrics': last_metrics + [self.val/self.count]}

# Useful Links

https://docs.fast.ai/callbacks.html

https://docs.fast.ai/callback.html

https://forums.fast.ai/t/using-the-callback-system-in-fastai/16216


# Code Examples

### for _ in range(epoch)

In [0]:
for _ in range(3):
  print('hello')

hello
hello
hello


### Listify

In [0]:
from fastai.core import *
from fastai.callback import *
from fastai.basic_train import *

In [0]:
moms = (0.95,0.85)
listify(moms,2)

[0.95, 0.85]

In [0]:
tuple(listify(moms,2))

(0.95, 0.85)

### Is Listy

In [0]:
moms = listify(slice(0.95,0.85)) # if we pass in a slice as lr_max
is_listy(moms)

True

In [0]:
if is_listy(moms): moms = np.array(moms)
print(moms)

[slice(0.95, 0.85, None)]
