# Data

A **Transform** is an object that 
- behaves like a function
- has an optional setup method that will initialize some inner state 
- has an optional decode that will reverse the function (this reversal may not be perfect)

These steps are needed for most data preprocessing tasks, so fastai provides a class that encapsulates them.

In general, our data is always a tuple (input,target) (sometimes with more than one input or more than one target).
A special behavior of Transforms is that they always get applied over tuples. 

When applying a transform on an item, we don't want to resize the tuple as a whole; instead, we want to resize the input (if applicable) and the target (if applicable) separately.

It's the same for batch transforms that do data augmentation: when the input is an image and the target is a segmentation mask, the transform needs to be applied (the same way) to the input and the target.

In [None]:
class Transform:
    def setups(self, items): 
    def encodes(self, x):
    def decodes(self, x):
        
tfm = Transform()
tfm.setup([...])
x2 = tfm(x1)
x1 = tfm.decode(x2)

To compose several transforms together, fastai provides the **Pipeline** class.

We define a Pipeline by passing it a list of Transforms; it will then compose the transforms inside it. When you call Pipeline on an object, it will automatically call the transforms inside, in order.

The only part that doesn't work the same way as in Transform is the setup. To properly set up a Pipeline of Transforms on some data, you need to use a TfmdLists.

In [None]:
tfms = Pipeline([tfm1, tfm2])

Your data is usually a set of raw items (like filenames, or rows in a DataFrame) to which you want to apply a succession of transformations. We just saw that a succession of transformations is represented by a Pipeline in fastai. 

The class that groups together this Pipeline with your raw items is called **TfmdLists**.

At initialization, the TfmdLists will automatically call the setup method of each Transform in order, providing them not with the raw items but the items transformed by all the previous Transforms in order. 

We can get the result of our Pipeline on any raw element just by indexing into the TfmdLists.

The TfmdLists is named with an "s" because it can handle a training and a validation set with a splits argument. You just need to pass the indices of which elements are in the training set, and which are in the validation set. You can then access them through the train and valid attributes.

In [None]:
cut = int(len(items)*0.8)
splits = [list(range(cut)), list(range(cut,len(items)))]

tls = TfmdLists(items, [tfm1, tfm2], splits=splits)
x2 = tls.train[0]
x1 = tls.decode(x2)
tls.show(x1)

But then we end up with two separate objects for our inputs and targets, which is not what we want.

**Datasets** will apply two (or more) pipelines in parallel to the same raw object and build a tuple with the result.

Like TfmdLists, it will automatically do the setup.

Like TfmdLists, we can pass along splits to split our data between training and validation sets.

When we index into a Datasets, it will return us a tuple with the results of each pipeline.

It can also decode any processed tuple or show it directly.

In [None]:
dsets = Datasets(items, [x_tfms, y_tfms], splits=splits)
x,y = dsets.valid[0]
dsets.decode((x,y))

The last step is to convert our Datasets object to a **DataLoaders**, which can be done with the dataloaders method. 

dataloaders directly calls DataLoader on each subset of our Datasets. fastai's DataLoader expands the PyTorch class of the same name and is responsible for collating the items from our datasets into batches. 

It has a lot of points of customization, but the most important ones are:
- after_item : Applied on each item after grabbing it inside the dataset.
- before_batch : Applied on the list of items before they are collated. This is the ideal place to pad items to the same size.
- after_batch : Applied on the batch as a whole after its construction.

The dl_type argument tells dataloaders to use the SortedDL class of DataLoader, and not the usual one. SortedDL constructs batches by putting samples of roughly the same lengths into batches.

In [None]:
dls = dsets.dataloaders(dl_type=SortedDL, before_batch=pad_input)

Use Dataloaders.test_dl() method to create a batch from indivdual items :

In [None]:
self.dls.test_dl([item], rm_type_tfms=rm_type_tfms, num_workers=0)

Type transforms :

In [2]:
class TransformBlock():
    "A basic wrapper that links defaults transforms for the data block API"
    
    def __init__(self, type_tfms=None, item_tfms=None, batch_tfms=None, dl_type=None, dls_kwargs={}):
        self.type_tfms  =            L(type_tfms)
        self.item_tfms  = ToTensor + L(item_tfms)
        self.batch_tfms =            L(batch_tfms)
        self.dl_type    =              dl_type
        self.dls_kwargs =              dls_kwargs          

In [3]:
def CategoryBlock(vocab=None, sort=True, add_na=False):
    "`TransformBlock` for single-label categorical targets"
    
    type_tfms=Categorize(vocab=vocab, sort=sort, add_na=add_na)

In [4]:
def MultiCategoryBlock(encoded=False, vocab=None, add_na=False):
    "`TransformBlock` for multi-label categorical targets"
    
    if encoded:
        type_tfms=EncodedMultiCategorize(vocab=vocab) 
    else:
        type_tfms=[MultiCategorize(vocab=vocab, add_na=add_na), OneHotEncode]

In [5]:
def RegressionBlock(n_out=None):
    "`TransformBlock` for float targets"
    
    type_tfms=RegressionSetup(c=n_out)

In [None]:
class DataBlock():
    "Generic container to quickly build `Datasets` and `DataLoaders`"
       
    source =>  Datasets.items
    get_items => Datasets.items
    splitter =>  Datasets.splits    
    
    blocks = (TransformBlock,TransformBlock)*
    n_inp =>  Datasets.n_inp
    
    getters =>  Datasets.tfms
    type_tfms* =>  Datasets.tfms    
    
    default_item_tfms* => Dataloaders.after_item
    item_tfms => Dataloaders.after_item
    
    default_batch_tfms* => Dataloaders.after_batch
    batch_tfms => Dataloaders.after_batch
    
    dl_type = TfmdDL* =>  Datasets.dl_type    
    -> dataloaders
    dls_kwargs* => Dataloaders.kwargs
    
    def __init__(self, blocks=None, dl_type=None, getters=None, n_inp=None, item_tfms=None, batch_tfms=None):
        # Properties initialized by blocks
        blocks = L(b() if callable(b) else b for b in blocks)
        self.type_tfms = blocks.attrgot('type_tfms', L())
        
        self.default_item_tfms  = _merge_tfms(*blocks.attrgot('item_tfms',  L()))
        self.default_batch_tfms = _merge_tfms(*blocks.attrgot('batch_tfms', L()))
        
        for b in blocks: 
            if getattr(b, 'dl_type', None) is not None: self.dl_type = b.dl_type
        if dl_type is not None: self.dl_type = dl_type
        self.dataloaders = delegates(self.dl_type.__init__)(self.dataloaders)
        self.dls_kwargs = merge(*blocks.attrgot('dls_kwargs', {}))
        
        # Pipeline
        self.n_inp = ifnone(n_inp, max(1, len(blocks)-1))
        self.getters = ifnone(getters, [noop]*len(self.type_tfms))
        if self.get_x:
            self.getters[:self.n_inp] = L(self.get_x)
        if self.get_y:
            self.getters[self.n_inp:] = L(self.get_y)
        self.new(item_tfms, batch_tfms)
        
    def new(self, item_tfms=None, batch_tfms=None):
        "Create a new `DataBlock` with other `item_tfms` and `batch_tfms`"
        self.item_tfms  = _merge_tfms(self.default_item_tfms,  item_tfms)
        self.batch_tfms = _merge_tfms(self.default_batch_tfms, batch_tfms)
        
    def datasets(self, source, verbose=False):
        "Create a `Datasets` object from `source`"
        self.source = source                     
        items = (self.get_items or noop)(source)
        splits = (self.splitter or RandomSplitter())(items)  
        return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)
    
    def dataloaders(self, source, path='.', verbose=False, **kwargs):
        dsets = self.datasets(source)
        kwargs = {**self.dls_kwargs, **kwargs, 'verbose': verbose}
        return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)

# Model

In [None]:
def save_model(file, model, opt, with_opt=True, pickle_protocol=2)
def load_model(file, model, opt, with_opt=None, device=None, strict=True)

file can be a Path object, a string or an opened file object. 

pickle_protocol is passed along to torch.save.

If a device is passed, the model is loaded on it, otherwise it's loaded on the CPU.

If strict is True, the file must exactly contain weights for every parameter key in model, if strict is False, only the keys that are in the saved model are loaded in model.

# Loss function & Optimizer

## Loss function

In [None]:
@log_args
class BBoxLblLoss(Module):    
    def __init__(self):
        self.reduction = 'mean'
        self.l1_loss = F.l1_loss
        self.binary_cross_entropy = F.binary_cross_entropy_with_logits

    def forward(self, activations, bbox_target, label_target):        
        bbox_activations, label_activations = activations
        encoded_label_target = one_hot_encode_class(label_target, 20)
        bbox_loss = self.l1_loss(bbox_activations, bbox_target.squeeze(1), reduction=self.reduction)            
        label_loss = self.binary_cross_entropy(label_activations, encoded_label_target, reduction=self.reduction)        
        if self.reduction=='none':
            bbox_loss = bbox_loss.mean(1)
            label_loss = label_loss.mean(1)
        return bbox_loss + 20*label_loss # scale the two numbers in the same range
    
    def activation(self, activations):               
        bbox_activations, label_activations = activations
        return bbox_activations, torch.sigmoid(label_activations)
    
    def decodes(self, activations):
        bbox_activations, label_activations = activations
        return bbox_activations, (torch.argmax(label_activations,1)+1).unsqueeze(1)

## Optimizer

Learner.create_opt()

Creates an optimizer with default hyper-parameters

This method is called internally to create the optimizer, the hyper-parameters are then adjusted by what you pass to Learner.fit or your particular schedulers (see callback.schedule).

# Training

## Learner - training loop

The training loop is defined in Learner a bit below and consists in a minimal set of instructions: looping through the data we:
- compute the output of the model from the input
- calculate a loss between this output and the desired target
- compute the gradients of this loss with respect to all the model parameters
- update the parameters accordingly
- zero all the gradients

Any tweak of this training loop is defined in a Callback to avoid over-complicating the code of the training loop, and to make it easy to mix and match different techniques (since they'll be defined in different callbacks). 

Learner groups together a model, dataloaders and a loss_func to handle training.

opt_func will be used to create an optimizer when Learner.fit is called, with lr as a default learning rate. 

splitter is a function that takes self.model and returns a list of parameter groups (or just one parameter group if there are no different parameter groups). The default is trainable_params, which returns all trainable parameters of the model.

cbs is one or a list of Callbacks to pass to the Learner. Callbacks are used for every tweak of the training loop. Each Callback is registered as an attribute of Learner (with camel case). At creation, all the callbacks in defaults.callbacks (TrainEvalCallback, Recorder and ProgressCallback) are associated to the Learner.

metrics is an optional list of metrics, that can be either functions or Metrics (see below).

path and model_dir are used to save and/or load models. Often path will be inferred from dls, but you can override it or pass a Path object to model_dir. Make sure you can write in path/model_dir!

wd is the default weight decay used when training the model; moms, the default momentums used in Learner.fit_one_cycle. wd_bn_bias controls if weight decay is applied to BatchNorm layers and bias.

Lastly, train_bn controls if BatchNorm layers are trained even when they are supposed to be frozen according to the splitter. Our empirical experiments have shown that it's the best behavior for those layers in transfer learning.

You can use regular PyTorch functionality for most of the arguments of the Learner, although the experience will be smoother with pure fastai objects and you will be able to use the full functionality of the library. The expectation is that the training loop will work smoothly even if you did not use fastai end to end. What you might lose are interpretation objects or showing functionality. The list below explains how to use plain PyTorch objects for all the arguments and what you might lose.

The most important is opt_func. If you are not using a fastai optimizer, you will need to write a function that wraps your PyTorch optimizer in an OptimWrapper. See the optimizer module for more details. This is to ensure the library's schedulers/freeze API work with your code.

dls is a DataLoaders object, that you can create from standard PyTorch dataloaders. By doing so, you will lose all showing functionality like show_batch/show_results. You can check the data block API or the mid-level data API tutorial to learn how to use fastai to gather your data!

model is a standard PyTorch model. You can use anyone you like, just make sure it accepts the number of inputs you have in your DataLoaders and returns as many outputs as you have targets.

loss_func can be any loss function you like. It needs to be one of fastai's if you want to use Learn.predict or Learn.get_preds, or you will have to implement special methods (see more details after the BaseLoss documentation).

In [None]:
class Learner():
    "Group together a `model`, some `dls` and a `loss_func` to handle training"
        
    path = if path is None: dls.path or Path('.')
    model_dir = 'models'
    
    dls
    model -> model.to(dls.device) / if hasattr(model,'reset') model.reset()
    splitter = trainable_params   
    loss_func = if None: dls.train_ds.loss_func 
                else:  "Could not infer loss function from the data"
    
    opt_func = Adam
    lr = defaults.lr
    wd = None
    wd_bn_bias = False
    train_bn = True
    moms = (0.95,0.85,0.95)
    opt = None
    
    metrics -> _metrics / _metrics = L(v).map(mk_metric)
    
    training = False
    cbs = L(defaults.callbacks)+L(cbs)
       
    loss = tensor(0.)
    n_epoch = 1
    epoch = 0
    dl = dls.train / sdls.valid
    n_iter = len(dl)
    iter = i
    xb,yb = _split(b): self.dls.n_inp or len(b)-1
    pred = model(*xb)
    loss = loss_func(pred, *yb)
    
    def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
                 metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
                 moms=(0.95,0.85,0.95)):
 
    Learner.x,Learner.y = add_props(lambda i,x: detuplify((x.xb,x.yb)[i]))

    @delegates(save_model)
    def save(self, file, **kwargs):
        "Save model and optimizer state (if `with_opt`) to `self.path/self.model_dir/file`" 
    @delegates(load_model)
    def load(self, file, with_opt=None, device=None, **kwargs):
        "Load model and optimizer state (if `with_opt`) from `self.path/self.model_dir/file` using `device`"
    
     
        
    def add_cbs(self, cbs):
        "Add `cbs` to the list of `Callback` and register `self` as their learner"
    def add_cb(self, cb):
        "Add `cb` to the list of `Callback` and register `self` as their learner"
    def remove_cbs(self, cbs):
        "Remove `cbs` from the list of `Callback` and deregister `self` as their learner"
    def remove_cb(self, cb):
        "Add `cb` from the list of `Callback` and deregister `self` as their learner"
    @contextmanager
    def added_cbs(self, cbs):
        "Context manager that temporarily adds `cbs`"      
    @contextmanager
    def removed_cbs(self, cbs):
        "Context manage that temporarily removes `cbs`"
    def ordered_cbs(self, event):
        "Return the list of `Callback`, in order, for an `event` in the training loop"     
    def __call__(self, event_name): 
        "Call `event_name` for all `Callback`s in `self.cbs`"

    def show_training_loop(self):
        "Show each step in the training loop
        "At each step, callbacks are shown in order, which can help debugging.
        else: print(f'{" "*indent} - {s:15}:', self.ordered_cbs(s))  



    def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
        with self.added_cbs(cbs):
            # Create optimizer
            self.create_opt()          # if reset_opt or not self.opt: self.create_opt()
            self.opt.set_hypers(wd=wd) # if wd is None: wd = self.wd 
            self.opt.set_hypers(lr=lr) # self.lr if lr is None else lr

            try:
                #_do_begin_fit
                self.n_epoch,self.loss = n_epoch,tensor(0.);         self('begin_fit')
                for epoch in range(n_epoch):
                    try:
                        self.epoch=epoch;          self('begin_epoch')
                        #_do_epoch_train
                        try:
                            self.dl = self.dls.train;                        self('begin_train')
                            self.all_batches()
                        except CancelTrainException:                         self('after_cancel_train')
                        finally:                                             self('after_train')
                        #_do_epoch_validate
                        try:
                            self.dl = self.dls.valid                         self('begin_validate')
                            with torch.no_grad(): self.all_batches()
                        except CancelValidException:                         self('after_cancel_validate')
                        finally:                                             self('after_validate')
                    except CancelEpochException:   self('after_cancel_epoch')
                    finally:                       self('after_epoch')

            except CancelFitException:             self('after_cancel_fit')
            finally:
                self('after_fit')
                self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None                                          
    
    def create_opt(self):
        "Create an optimizer with default hyper-parameters"
        self.opt = self.opt_func(self.splitter(self.model), lr=self.lr)
        if not self.wd_bn_bias:
            for p in bn_bias_params(self.model, True).map(self.opt.state): p['do_wd'] = False
        if self.train_bn:
            for p in bn_bias_params(self.model, False).map(self.opt.state): p['force_train'] = True
    
    def all_batches(self):
        self.n_iter = len(self.dl)
        for o in enumerate(self.dl): self.one_batch(*o)

    def one_batch(self, i, b):
        self.iter = i
        try:
            self._split(b);                                  self('begin_batch')
            self.pred = self.model(*self.xb);                self('after_pred')
            if len(self.yb) == 0: return # <-- inference
            self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
            if not self.training: return # <-- validation
            self.loss.backward();                            self('after_backward')
            self.opt.step();                                 self('after_step')
            self.opt.zero_grad()
        except CancelBatchException:                         self('after_cancel_batch')
        finally:                                             self('after_batch')
                    
    def validate(self, ds_idx=1, dl=None, cbs=None):
        if dl is None: dl = self.dls[ds_idx]
        with self.added_cbs(cbs), self.no_logging(), self.no_mbar():
            self('begin_fit')
            self('begin_epoch')
            if dl is None: dl = self.dls[ds_idx]
            try:
                self.dl = dl;                                    self('begin_validate')
                with torch.no_grad(): self.all_batches()
            except CancelValidException:                         self('after_cancel_validate')
            finally:                                             self('after_validate')
            self('after_epoch')
            self('after_fit')
        return getattr(self, 'final_record', None)          

## Callbacks

In [None]:
class Callback(GetAttr):
    "Basic class handling tweaks of the training loop by changing a `Learner` in various events"
    learn = None
    run = True
    run_train = True
    run_valid = True

    def __call__(self, event_name):
        "Call `self.{event_name}` if it's defined"
        if self.run and _run: getattr(self, event_name, noop)()
        if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit

Callback.name

Name of the Callback, camel-cased and with 'Callback' removed.

Callbacks become attributes of Learner:

In [None]:
learn = Learner(...)
assert isinstance(learn.cbs[0], TrainEvalCallback)
assert hasattr(tst_learn, ('train_eval'))

There is a shortcut to avoid having to write self.learn.bla for any bla attribute we seek: you can just write self.bla.

Note that it only works to get the value of the attribute, if you want to change it, you have to manually access it with self.learn.bla. self.a += 1 creates an a attribute of 2 in the callback instead of setting the a of the learner to 2. It also issues a warning that something is probably wrong.

A callback can implement actions on the following events:
- begin_fit: called before doing anything, ideal for initial setup.
- begin_epoch: called at the beginning of each epoch, useful for any behavior you need to reset at each epoch.
- begin_train: called at the beginning of the training part of an epoch.
- begin_batch: called at the beginning of each batch, just after drawing said batch. It can be used to do any setup necessary for the batch (like hyper-parameter scheduling) or to change the input/target before it goes in the model (change of the input with techniques like mixup for instance).
- after_pred: called after computing the output of the model on the batch. It can be used to change that output before it's fed to the loss.
- after_loss: called after the loss has been computed, but before the backward pass. It can be used to add any penalty to the loss (AR or TAR in RNN training for instance).
- after_backward: called after the backward pass, but before the update of the parameters. It can be used to do any change to the gradients before said update (gradient clipping for instance).
- after_step: called after the step and before the gradients are zeroed.
- after_batch: called at the end of a batch, for any clean-up before the next one.
- after_train: called at the end of the training phase of an epoch.
- begin_validate: called at the beginning of the validation phase of an epoch, useful for any setup needed specifically for validation.
- after_validate: called at the end of the validation part of an epoch.
- after_epoch: called at the end of an epoch, for any clean-up before the next one.
- after_fit: called at the end of training, for final clean-up.

Yhe following attributes of Learner are available and updated during the training loop:

- model: the model used for training/validation
- dls: the underlying DataLoaders
- loss_func: the loss function used
- opt: the optimizer used to udpate the model parameters
- opt_func: the function used to create the optimizer
- cbs: the list containing all Callbacks
- dl: current DataLoader used for iteration
- x/xb: last input drawn from self.dl (potentially modified by callbacks). xb is always a tuple (potentially with one element) and x is detuplified. You can only assign to xb.
- y/yb: last target drawn from self.dl (potentially modified by callbacks). yb is always a tuple (potentially with one element) and y is detuplified. You can only assign to yb.
- pred: last predictions from self.model (potentially modified by callbacks)
- loss: last computed loss (potentially modified by callbacks)
- n_epoch: the number of epochs in this training
- n_iter: the number of iterations in the current self.dl
- epoch: the current epoch index (from 0 to n_epoch-1)
- iter: the current iteration index in self.dl (from 0 to n_iter-1)
T
he following attributes are added by TrainEvalCallback and should be available unless you went out of your way to remove that callback:
- train_iter: the number of training iterations done since the beginning of this training
- pct_train: from 0. to 1., the percentage of training iterations completed
- training: flag to indicate if we're in training mode or not

The following attribute is added by Recorder and should be available unless you went out of your way to remove that callback:
- smooth_loss: an exponentially-averaged version of the training loss

In [None]:
TrainEvalCallback

By default, metrics are computed on the validation set only, although that can be changed by adjusting train_metrics and valid_metrics. 

beta is the weight used to compute the exponentially weighted average of the losses (which gives the smooth_loss attribute to Learner).

The logger attribute of a Learner determines what happens to those metrics. By default, it just prints them.

Recorder.plot_loss(skip_start=5, with_valid=True)

Plots the losses from skip_start and onward

In [None]:
class Recorder(Callback):
    "Callback that registers statistics (lr, loss and metrics) during training"
    
    remove_on_fetch = True
    run_after = TrainEvalCallback
    
    add_time = True
    train_metrics = False
    valid_metrics = True
    metric_names
    
    self.loss = AvgLoss()
    self.smooth_loss = AvgSmoothLoss(beta=beta)
    
    start_epoch
    log
    lrs
    iters
    losses
    values
    final_record 
    
    def __init__(self, add_time=True, train_metrics=False, valid_metrics=True, beta=0.98):
        store_attr(self, 'add_time,train_metrics,valid_metrics')
        self.loss,self.smooth_loss = AvgLoss(),AvgSmoothLoss(beta=beta)

    def begin_fit(self):
        "Prepare state for training"
        self.lrs,self.iters,self.losses,self.values = [],[],[],[]
        names = self.metrics.attrgot('name')
        if self.train_metrics and self.valid_metrics:
            names = L('loss') + names
            names = names.map('train_{}') + names.map('valid_{}')
        elif self.valid_metrics: names = L('train_loss', 'valid_loss') + names
        else: names = L('train_loss') + names
        if self.add_time: names.append('time')
        self.metric_names = 'epoch'+names
        self.smooth_loss.reset()

    def after_batch(self):
        "Update all metrics and records lr and smooth loss in training"
        if len(self.yb) == 0: return
        mets = self._train_mets if self.training else self._valid_mets
        for met in mets: met.accumulate(self.learn)
        if not self.training: return
        self.lrs.append(self.opt.hypers[-1]['lr'])
        self.losses.append(self.smooth_loss.value)
        self.learn.smooth_loss = self.smooth_loss.value

    def begin_epoch(self):
        "Set timer if `self.add_time=True`"
        self.cancel_train,self.cancel_valid = False,False
        if self.add_time: self.start_epoch = time.time()
        self.log = L(getattr(self, 'epoch', 0))

    def begin_train   (self): self._train_mets[1:].map(Self.reset())
    def begin_validate(self): self._valid_mets.map(Self.reset())
    def after_train   (self): self.log += self._train_mets.map(_maybe_item)
    def after_validate(self): self.log += self._valid_mets.map(_maybe_item)
    def after_cancel_train(self):    self.cancel_train = True
    def after_cancel_validate(self): self.cancel_valid = True

    def after_epoch(self):
        "Store and log the loss/metric values"
        self.learn.final_record = self.log[1:].copy()
        self.values.append(self.learn.final_record)
        if self.add_time: self.log.append(format_time(time.time() - self.start_epoch))
        self.logger(self.log)
        self.iters.append(self.smooth_loss.count)

    def plot_loss(self, skip_start=5, with_valid=True):
        plt.plot(list(range(skip_start, len(self.losses))), self.losses[skip_start:], label='train')
        if with_valid:
            idx = (np.array(self.iters)<skip_start).sum()
            plt.plot(self.iters[idx:], L(self.values[idx:]).itemgot(1), label='valid')
            plt.legend()

In [None]:
ProgressCallback

## Metrics

Metrics can be simple averages (like accuracy) but sometimes their computation is a little bit more complex and can't be averaged over batches (like precision or recall), which is why we need a special class for them. For simple functions that can be computed as averages over batches, we can use the class AvgMetric, otherwise you'll need to implement the following methods.

Note: If your Metric has state depending on tensors, don't forget to store it on the CPU to avoid any potential memory leaks.

In [None]:
class Metric():
    "Blueprint for defining a metric"
    
    def reset(self): pass
    "Reset inner state to prepare for new computation"
    
    def accumulate(self, learn): pass
    "Use `learn` to update the state with new results"
    
    @property
    def value(self): raise NotImplementedError
    "The value of the metric"
        
    @property
    def name(self): return class2attr(self, 'Metric')
    "Name of the `Metric`, camel-cased and with Metric removed"

In [None]:
class AvgMetric(Metric):
    "Average the values of `func` taking into account potential different batch sizes"
    def __init__(self, func):  self.func = func
    def reset(self):           self.total,self.count = 0.,0
    def accumulate(self, learn):
        bs = find_bs(learn.yb)
        self.total += to_detach(self.func(learn.pred, *learn.yb))*bs
        self.count += bs
    @property
    def value(self): return self.total/self.count if self.count != 0 else None
    @property
    def name(self):  return self.func.func.__name__ if hasattr(self.func, 'func') else  self.func.__name__

In [None]:
class ValueMetric(Metric):
    "Use to include a pre-calculated metric value (for instance calculated in a `Callback`) and returned by `func`"
    def __init__(self, func, metric_name=None): store_attr(self, 'func, metric_name')
        
    @property
    def value(self): return self.func()

    @property
    def name(self): return self.metric_name if self.metric_name else self.func.__name__

In [None]:
class AvgLoss(Metric):
    "Average the losses taking into account potential different batch sizes"
    def reset(self):           self.total,self.count = 0.,0
    def accumulate(self, learn):
        bs = find_bs(learn.yb)
        self.total += to_detach(learn.loss.mean())*bs
        self.count += bs
    @property
    def value(self): return self.total/self.count if self.count != 0 else None
    @property
    def name(self):  return "loss"

In [None]:
class AvgSmoothLoss(Metric):
    "Smooth average of the losses (exponentially weighted with `beta`)"
    def __init__(self, beta=0.98): self.beta = beta
    def reset(self):               self.count,self.val = 0,tensor(0.)
    def accumulate(self, learn):
        self.count += 1
        self.val = torch.lerp(to_detach(learn.loss.mean(), gather=False), self.val, self.beta)
    @property
    def value(self): return self.val/(1-self.beta**self.count)

## Transfer learning

In [None]:
@patch
def freeze_to(self:Learner, n):
    "Freeze parameter groups up to `n`"
    if self.opt is None: self.create_opt()
    self.opt.freeze_to(n)
    self.opt.clear_state()

@patch
def freeze(self:Learner): 
    "Freeze up to last parameter group"
    self.freeze_to(-1)

@patch
def unfreeze(self:Learner): 
    "Unfreeze the entire model"
    self.freeze_to(0)

# Inference

## Learner - inference

> Learner.validate(ds_idx=1, dl=None, cbs=None)

Validate on dl with potential new cbs.

> Learner.get_preds(ds_idx=1, dl=None, with_input=False, with_decoded=False, with_loss=False, act=None, inner=False, reorder=True, save_preds=None, save_targs=None, concat_dim=0)

Get the predictions and targets on the ds_idx-th dataset or dl, optionally with_input and with_loss

with_decoded will also return the decoded predictions using the decodes function of the loss function (if it exists). For instance, fastai's CrossEntropyFlat takes the argmax or predictions in its decodes.

Depending on the loss_func attribute of Learner, an activation function will be picked automatically so that the predictions make sense. For instance if the loss is a case of cross-entropy, a softmax will be applied, or if the loss is binary cross entropy with logits, a sigmoid will be applied. If you want to make sure a certain activation function is applied, you can pass it with act.

save_preds and save_targs should be used when your predictions are too big to fit all in memory. Give a Path object that points to a folder where the predictions and targets will be saved.

concat_dim is the batch dimension, where all the tensors will be concatenated.

inner is an internal attribute that tells get_preds it's called internally, inside another training loop, to avoid recursion errors.

Note: If you want to use the option with_loss=True on a custom loss function, make sure you have implemented a reduction attribute that supports 'none'

> Learner.predict(item, rm_type_tfms=None, with_input=False)

Return the prediction on item, fully decoded, loss function decoded and probabilities

It returns a tuple of three elements with, in reverse order,

- the prediction from the model, potentially passed through the activation of the loss function (if it has one)
- the decoded prediction, using the poential decodes method from it
- the fully decoded prediction, using the transforms used to buil the Datasets/DataLoaders

rm_type_tfms is a deprecated argument that should not be used and will be removed in a future version. 

with_input will add the decoded inputs to the result.

> Learner.show_results(ds_idx=1, dl=None, max_n=9, shuffle=True, **kwargs)

Show some predictions on ds_idx-th dataset or dl

Will show max_n samples (unless the batch size of ds_idx or dl is less than max_n, in which case it will show as many samples) and shuffle the data unless you pass false to that flag. kwargs are application-dependant.

In [None]:
class Learner(): 
                
    def get_preds(self, ds_idx=1, dl=None, with_input=False, with_decoded=False, with_loss=False, act=None, 
                  inner=False, reorder=True, with_input=False, with_loss=False, save_preds=None, save_targs=None, 
                  concat_dim=0):
        # Prepare dataloader
        if dl is None: dl = self.dls[ds_idx].new(shuffled=False, drop_last=False)
        if reorder and hasattr(dl, 'get_idxs'):
            idxs = dl.get_idxs()
            dl = dl.new(get_idxs = _ConstantFunc(idxs))
           
        # Prepare context
        cb = GatherPredsCallback(with_input=with_input, with_loss=with_loss, **kwargs)
        ctx_mgrs = [self.no_logging(), self.added_cbs(cb), self.no_mbar()]
        if with_loss: ctx_mgrs.append(self.loss_not_reduced())
        with ExitStack() as stack:
            for mgr in ctx_mgrs: stack.enter_context(mgr)
                
            # Compute output of model
            self(event.begin_epoch if inner else [event.begin_fit, event.begin_epoch])
            try:
                self.dl = dl;                                    self('begin_validate')
                with torch.no_grad(): self.all_batches()
            except CancelValidException:                         self('after_cancel_validate')
            finally:                                             self('after_validate')
            self(event.after_epoch if inner else _[event.after_epoch, event.after_fit])
            
            # Apply loss_func.activation then (optionally) loss_func.decodes on output
            if act is None: act = getattr(self.loss_func, 'activation', noop)
            res = cb.all_tensors()
            pred_i = 1 if with_input else 0
            if res[pred_i] is not None:
                res[pred_i] = act(res[pred_i])
                if with_decoded: res.insert(pred_i+2, getattr(self.loss_func, 'decodes', noop)(res[pred_i]))
                    
            # Reorder results
            if reorder and hasattr(dl, 'get_idxs'): res = nested_reorder(res, tensor(idxs).argsort())
            return tuple(res)
        
        # Cleanup
        self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

        
    def predict(self, item, rm_type_tfms=None, with_input=False):
        # Create dataloader from a single item
        dl = self.dls.test_dl([item], rm_type_tfms=rm_type_tfms, num_workers=0)
        
        # Get decoded predictions
        inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
        
        # Dataloaders.decode_batch on (inputs + decoded preds) [tuplify / detuplify]
        i = getattr(self.dls, 'n_inp', -1)
        inp = (inp,) if i==1 else tuplify(inp)
        dec = self.dls.decode_batch(inp + tuplify(dec_preds))[0]
        dec_inp,dec_targ = map(detuplify, [dec[:i],dec[i:]])
        res = dec_targ,dec_preds[0],preds[0]
        
        # Return
        if with_input: res = (dec_inp,) + res
        return res

    
    def show_results(self, ds_idx=1, dl=None, max_n=9, shuffle=True, **kwargs):
        # Get one batch
        if dl is None: dl = self.dls[ds_idx].new(shuffle=shuffle)
        b = dl.one_batch()
        
        # Dataloaders.show_results on decoded preds
        _,_,preds = self.get_preds(dl=[b], with_decoded=True)
        self.dls.show_results(b, preds, max_n=max_n, **kwargs

tuplify(o, use_list=False, match=None)

Make o a tuple

In [None]:
test_eq(tuplify(None),())
test_eq(tuplify([1,2,3]),(1,2,3))
test_eq(tuplify(1,match=[1,2,3]),(1,1,1))

detuplify(x)

If x is a tuple with one thing, extract it

In [None]:
test_eq(detuplify(()),None)
test_eq(detuplify([1]),1)
test_eq(detuplify([1,2]), [1,2])
test_eq(detuplify(np.array([[1,2]])), np.array([[1,2]]))

In [None]:
GatherPredsCallback

## Learner - export & import

In [None]:
@patch
def export(self:Learner, fname='export.pkl', pickle_protocol=2):
    "Export the content of `self` without the items and the optimizer state for inference"
    if rank_distrib(): return # don't export if child proc
    self._end_cleanup()
    old_dbunch = self.dls
    self.dls = self.dls.new_empty()
    state = self.opt.state_dict() if self.opt is not None else None
    self.opt = None
    with warnings.catch_warnings():
        #To avoid the warning that come from PyTorch about model not being checked
        warnings.simplefilter("ignore")
        torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
    self.create_opt()
    if state is not None: self.opt.load_state_dict(state)
    self.dls = old_dbunch

The Learner is saved in self.path/fname, using pickle_protocol. Note that serialization in Python saves the names of functions, not the code itself. Therefore, any custom code you have for models, data transformation, loss function etc... should be put in a module that you will import in your training environment before exporting, and in your deployment environment before loading it.

In [None]:
def load_learner(fname, cpu=True):
    "Load a `Learner` object in `fname`, optionally putting it on the `cpu`"
    distrib_barrier()
    res = torch.load(fname, map_location='cpu' if cpu else None)
    if hasattr(res, 'to_fp32'): res = res.to_fp32()
    if cpu: res.dls.cpu()
    return res

Warning: load_learner requires all your custom code be in the exact same place as when exporting your Learner (the main script, or the module you imported it from).