Sample notebook for playing around with mnist and fastai.

In [None]:
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

In [None]:
from fastai.vision.all import *
from fastbook import * 

matplotlib.rc('image', cmap='Greys')

In [None]:
print("fastai MNIST samples data is at '{}'".format(URLs.MNIST_SAMPLE))

path = untar_data(URLs.MNIST_SAMPLE)

print("untar_data places it at '{}'".format(path))

for p in path.ls():
    print(p)

Load up the training and validation data samples.

In [None]:
train_3 = (path/'train'/'3').ls().sorted()
train_7 = (path/'train'/'7').ls().sorted()
valid_3 = (path/'valid'/'3').ls().sorted()
valid_7 = (path/'valid'/'7').ls().sorted()

train_3_tens = [tensor(Image.open(o)) for o in train_3]
train_7_tens = [tensor(Image.open(o)) for o in train_7]
valid_3_tens = [tensor(Image.open(o)) for o in valid_3]
valid_7_tens = [tensor(Image.open(o)) for o in valid_7]

Show a sample three.

In [None]:
show_image(train_3_tens[0])

Show a sample seven.

In [None]:
show_image(train_7_tens[0])

Now we need sample training data (x) and the labels (y) for both threes and sevens.  Use `1` for threes and `0` for sevens.

In [None]:
train_x = torch.cat([torch.stack(train_3_tens).float()/255, torch.stack(train_7_tens).float()/255])
print(train_x.shape)

# Now we need to turn the 28x28 matrices into a single set of 784 features with the view() method.

train_x = train_x.view(-1, 28*28)
print(train_x.shape)

In [None]:
train_y = tensor([1]*len(train_3_tens) + [0]*len(train_7_tens))
print(train_y.shape)

# this is a vector of 12,396 labels, need it to be a matrix [12396,1]
train_y = train_y.unsqueeze(1)
print(train_y.shape)

A Dataset in PyTorch is required to return a tuple of (x,y) when indexed. Python provides a zip function which, when combined with list, provides a simple way to get this functionality:


In [None]:
trainset = list(zip(train_x,train_y))
x,y = trainset[0]
x.shape,y

Now do the same for the validation set.

In [None]:
valid_x = torch.cat([torch.stack(valid_3_tens).float()/255, torch.stack(valid_7_tens).float()/255]).view(-1, 28*28)
valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)
validset = list(zip(valid_x, valid_y))
x,y = validset[0]
x.shape,y

This will create a set of weights and a bias wrapped in a single class.

Sample graph of sigmoid function.

In [None]:
plot_function(torch.sigmoid,title='Sigmoid', min=-9, max=9)

### Key point from the original notebook: Difference between `metric` and `loss`
Having defined a loss function, now is a good moment to recapitulate why we did this. After all, we already had a metric, which was overall accuracy. So why did we define a loss?

The key difference is that the **`metric` is to drive human understanding** and the **`loss` is to drive automated learning**. To drive automated learning, the loss must be a function that has a meaningful derivative. It can't have big flat sections and large jumps, but instead must be reasonably smooth. This is why we designed a loss function that would respond to small changes in confidence level. This requirement means that sometimes it does not really reflect exactly what we are trying to achieve, but is rather a compromise between our real goal, and a function that can be optimized using its gradient. The loss function is calculated for each item in our dataset, and then at the end of an epoch the loss values are all averaged and the overall mean is reported for the epoch.

Metrics, on the other hand, are the numbers that we really care about. These are the values that are printed at the end of each epoch that tell us how our model is really doing. It is important that we learn to focus on these metrics, rather than the loss, when judging the performance of a model.

Now let's actually do a training run.

In [None]:
traindl = DataLoader(trainset, batch_size=256)
xb,yb = first(traindl)
print("Training example: {}, {}".format(xb.shape,yb.shape))

validdl = DataLoader(validset, batch_size=256)
xt, yt = first(validdl)
print("Validation example: {}, {}".format(xt.shape, yt.shape))

In [None]:
def mnist_loss(predictions, targets):
    predictions = predictions.sigmoid()
    return torch.where(targets == 1, 1-predictions, predictions).mean()

In [None]:
def batch_accuracy(xb, yb):
    preds = xb.sigmoid()
    correct = (preds > 0.5) == yb
    return correct.float().mean()

In [None]:
dls = DataLoaders(traindl, validdl)

simple_net = nn.Sequential(
    nn.Linear(28*28, 30),
    nn.ReLU(),
    nn.Linear(30, 1)
)


In [None]:
learn = Learner(dls, simple_net, opt_func=SGD,
                loss_func=mnist_loss, metrics=batch_accuracy)

In [None]:
learn.fit(40, 0.1)