# A basic training loop

## From the last notebook...

In [1]:
import pickle, gzip, torch, math, numpy as np, torch.nn.functional as F
from pathlib import Path
from IPython.core.debugger import set_trace
from torch import nn, optim
from torch.utils.data import TensorDataset, DataLoader

**load the mnist training images and labels**<br>
This data is downloaded in notebook 001a_nn_basics so make sure you have run through that first
We print out the min and max of the features to get a feel for the range of feature values.<br>

In [3]:
DATA_PATH = Path('data')
PATH = DATA_PATH/'mnist'

with gzip.open(PATH/'mnist.pkl.gz', 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')

x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))
x_train.min(),x_train.max()

(tensor(0.), tensor(0.9961))

** now set the batch size, number of epochs and learning rate..**

handy reference for abbreviations:<br>
https://github.com/fastai/fastai_v1/blob/master/docs/abbr.md


In [4]:
bs=64
epochs = 2
lr=0.2

**load the training and validation datasets**

TensorDataset is a pytorch utility class that implements the pytorch Dataset api for a given list of tensors<br>
<br>
The Dataset api allows iteration, indexing and slicing along the first dimension of each tensor passed in.
In this case we are passing in two tensors: training features and classification targets.<br>
so each iteration of of the TensorDataset will return a tuple of length two that looks like (x_features, y_target)

In case you aren't already familiar with the pytorch Dataset class, this tutorial talks more about the Dataset abstract class, how to use it, and how implement custom datasets for new kinds of data:<br>
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html


In [5]:
train_ds = TensorDataset(x_train, y_train)
valid_ds = TensorDataset(x_valid, y_valid)

In [6]:
def loss_batch(model, xb, yb, loss_fn, opt=None):
    loss = loss_fn(model(xb), yb)

    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()
        
    return loss.item(), len(xb)

def fit(epochs, model, loss_fn, opt, train_dl, valid_dl):
    for epoch in range(epochs):
        model.train()
        for xb,yb in train_dl: loss_batch(model, xb, yb, loss_fn, opt)

        model.eval()
        with torch.no_grad():
            losses,nums = zip(*[loss_batch(model, xb, yb, loss_fn)
                                for xb,yb in valid_dl])
        val_loss = np.sum(np.multiply(losses,nums)) / np.sum(nums)

        print(epoch, val_loss)

In [7]:
class Lambda(nn.Module):
    def __init__(self, func):
        super().__init__()
        self.func=func
        
    def forward(self, x): return self.func(x)

## Simplify nn.Sequential layers

**Function composition** is a great way to capture and parameterize common operations as a single concept.<br> 
You can see this with the *PoolFlatten* function below.<br>
The *Lambda* layer we created in the last notebook wakes it easy to quickly create pytorch layers for the same purpose.<br>

In [8]:
def ResizeBatch(*size): return Lambda(lambda x: x.view((-1,)+size))
def Flatten(): return Lambda(lambda x: x.view((x.size(0), -1)))
def PoolFlatten(): return nn.Sequential(nn.AdaptiveAvgPool2d(1), Flatten())

**define the model**<br>
Thanks to our named pytorch nn.Modules above, the meaning and intention of each of the layers in our model is clearer and less prone to error when we make changes<br>
This is an example a stripped down CNN. 
You will nearly always use a kernel size of 3

In [9]:
model = nn.Sequential(
    ResizeBatch(1,28,28),
    nn.Conv2d(1,  16, kernel_size=3, stride=2, padding=1), nn.ReLU(),
    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(),
    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1), nn.ReLU(),
    PoolFlatten()
)

**Define a *get_data* function**<br>
It's often conventient in your notebook to define a *get_data* function that encapsulates the work of setting up the training, validation and sometimes test data.<br>
Parameterizing *get_data* makes it easy to do things like change the batch size, etc.<br>
<br>
Notice in in this scenario that we shuffle the training dataloader but not the validation dataloader. We want the validation loss to be calculated the same way every time so that we can tell if we are still learning and not overfitting.<br>
Shuffling the training data helps prevent overfitting when calculating the gradients to be applied after each batch.

In [10]:
def get_data(train_ds, valid_ds, bs):
    return (DataLoader(train_ds, batch_size=bs, shuffle=True),
            DataLoader(valid_ds, batch_size=bs*2))

train_dl,valid_dl = get_data(train_ds, valid_ds, bs)

**Use cross entropy for our loss function**<br>
We are doing classification so cross_entropy is the correct loss function.<br>
Here is tutorial explaining why we use cross entropy for classification tasks:<br>
https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

In [11]:
loss_fn = F.cross_entropy

**Set the optimizer**<br>
We stick with stochastic gradient descent as our optimizer

In [12]:
opt = optim.SGD(model.parameters(), lr=lr)

**Test our loss function**<br>
Try out our loss function on one batch of X features and y targets to make sure it's working correctly

In [13]:
loss_fn(model(x_valid[0:bs]), y_valid[0:bs])

tensor(2.3003, grad_fn=<NllLossBackward>)

**Fit**<br>
everything looks ready, lets call the fit function we developed earlier for two epochs and confirm we are learning

In [14]:
fit(epochs, model, loss_fn, opt, train_dl, valid_dl)

0 1.0272689867973328
1 0.8396794363021851


## Transformations; refactor network

We are going to refactor some of the data transformations out of the network and into a pipeline that is applied to the data coming out of the Dataloaders.<br>
This is more flexible, simplifies the model, and will be useful later when we want to apply additional transformations for things like data augmentation.

**Define a transformation**
*mnist2image* is a utility function to reshape our features into 28x28 arrays<br>
The pytorch Dataloaders are iterables that on each iteration return a tuple like (X, y)<br>
<br>
X is a batch of features where the first dimension is the number of samples in the batch and the remaining dimensions define the shape of the features.
y is the target variable to be learned - in this case it an integer representing one of 10 image classes

With mnist data, the X features start out as a 1x784 vector. We want to conver the features to 1x28x28 images. This helper function does that for an entire batch work of features. It passes the target variable through as is

(need to finish writing this / pick up here when back)

In [15]:
def mnist2image(b): return b[0].view(-1,1,28,28), b[1]

In [16]:
from collections import Iterable
from functools import reduce

def is_listy(x): return isinstance(x, (list,tuple))

def listify(p=None, q=None):
    if p is None: p=[]
    elif not isinstance(p, Iterable): p=[p]
    n = q if type(q)==int else 1 if q is None else len(q)
    if len(p)==1: p = p * n
    return p

def compose(funcs):
    return reduce(lambda f, g: lambda z: f(g(z)), listify(funcs), lambda o: o)

In [17]:
class IterPipe():
    def __init__(self, iterator, funcs): self.iter,self.func = iterator,compose(funcs)
    def __len__(self): return len(self.iter)
    def __iter__(self): return map(self.func, self.iter)

In [18]:
def get_dl(ds, bs, shuffle, tfms=None):
    return IterPipe(DataLoader(ds, batch_size=bs, shuffle=shuffle), tfms)

def get_data(train_ds, valid_ds, bs, train_tfms=None, valid_tfms=None):
    return (get_dl(train_ds, bs, shuffle=True, tfms=train_tfms),
            get_dl(valid_ds, bs*2, shuffle=False, tfms=valid_tfms))

In [19]:
train_dl,valid_dl = get_data(train_ds, valid_ds, bs, mnist2image, mnist2image)

In [20]:
x,y = next(iter(valid_dl))

In [21]:
valid_ds[0][0].shape, x[0].shape

(torch.Size([784]), torch.Size([1, 28, 28]))

In [22]:
torch.allclose(valid_ds[0][0], x[0].view(-1))

True

In [23]:
def conv2_relu(nif, nof, ks, stride):
    return nn.Sequential(nn.Conv2d(nif, nof, ks, stride, padding=ks//2), nn.ReLU())

def simple_cnn(actns, kernel_szs, strides):
    layers = [conv2_relu(actns[i], actns[i+1], kernel_szs[i], stride=strides[i])
        for i in range(len(strides))]
    layers.append(PoolFlatten())
    return nn.Sequential(*layers)

In [24]:
def get_model():
    model = simple_cnn([1,16,16,10], [3,3,3], [2,2,2])
    return model, optim.SGD(model.parameters(), lr=lr)

In [25]:
model,opt = get_model()

In [26]:
fit(epochs, model, loss_fn, opt, train_dl, valid_dl)

0 0.9104695863723755
1 0.7454537752151489


## CUDA

In [27]:
# TODO: handle non-lists (e.g. single tensor)
def to_device(device, b): return [o.to(device) for o in b]

default_device = torch.device('cuda')

In [28]:
from functools import partial

tfms = [partial(to_device, default_device), mnist2image]
train_dl,valid_dl = get_data(train_ds, valid_ds, bs, tfms, tfms)

In [29]:
def get_model():
    model = simple_cnn([1,16,16,10], [3,3,3], [2,2,2]).to(default_device)
    return model, optim.SGD(model.parameters(), lr=lr)

In [30]:
model,opt = get_model()

In [31]:
fit(epochs, model, loss_fn, opt, train_dl, valid_dl)

0 0.7942686936378479
1 0.48481653938293456


## Learner

In [32]:
from tqdm import tqdm, tqdm_notebook, trange, tnrange

def fit(epochs, model, loss_fn, opt, train_dl, valid_dl):
    for epoch in tnrange(epochs):
        model.train()
        it = tqdm_notebook(train_dl, leave=False)
        for xb,yb in it:
            loss,_ = loss_batch(model, xb, yb, loss_fn, opt)
            it.set_postfix_str(loss)

        model.eval()
        with torch.no_grad():
            losses,nums = zip(*[loss_batch(model, xb, yb, loss_fn)
                                for xb,yb in valid_dl])
        val_loss = np.sum(np.multiply(losses,nums)) / np.sum(nums)

        print(epoch, val_loss)

In [33]:
class DataBunch():
    def __init__(self, train_ds, valid_ds, bs=64, device=None, train_tfms=None, valid_tfms=None):
        self.device = default_device if device is None else device
        dev_tfm = [partial(to_device, self.device)]
        self.train_dl = get_dl(train_ds, bs,   shuffle=True,  tfms=dev_tfm + listify(train_tfms))
        self.valid_dl = get_dl(valid_ds, bs*2, shuffle=False, tfms=dev_tfm + listify(valid_tfms))

class Learner():
    def __init__(self, data, model):
        self.data,self.model = data,model.to(data.device)

    def fit(self, epochs, lr, opt_fn=optim.SGD):
        opt = opt_fn(self.model.parameters(), lr=lr)
        loss_fn = F.cross_entropy
        fit(epochs, self.model, loss_fn, opt, self.data.train_dl, self.data.valid_dl)

In [34]:
data = DataBunch(train_ds, valid_ds, bs, train_tfms=mnist2image, valid_tfms=mnist2image)
model = simple_cnn([1,16,16,10], [3,3,3], [2,2,2])
learner = Learner(data, model)
opt_fn = partial(optim.SGD, momentum=0.9)

In [35]:
learner.fit(4, lr/5, opt_fn=opt_fn)

HBox(children=(IntProgress(value=0, max=4), HTML(value='')))

HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

0 0.457903608417511


HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

1 0.33018289823532104


HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

2 0.29600199661254883


HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

3 0.2516591844558716



In [36]:
learner = Learner(data, simple_cnn([1,16,16,10], [3,3,3], [2,2,2]))

In [37]:
learner.fit(1, lr/5, opt_fn=opt_fn)
learner.fit(2, lr, opt_fn=opt_fn)
learner.fit(1, lr/5, opt_fn=opt_fn)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

0 0.4901185554504395



HBox(children=(IntProgress(value=0, max=2), HTML(value='')))

HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

0 0.2518716604232788


HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

1 0.21359226384162902



HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

HBox(children=(IntProgress(value=0, max=782), HTML(value='')))

0 0.14774560313224794



In [38]:
# TODO: metrics