We import the data primarily for its shape parameters. The main focus of this particular notebook is otherwise on specifying a NN model parametrized to the data shape.

In [1]:
from fastai.vision.all import *

pickle_path = URLs.path('mnist_png')/'mnist_png.pkl'
path = untar_data(URLs.MNIST)/'training'

if not pickle_path.exists():
    pickle_path.parent.mkdir(parents=True, exist_ok=True)
    ds = DataBlock(
        blocks = (ImageBlock(PILImageBW), CategoryBlock),
        get_items = get_image_files,
        get_y = parent_label,
        splitter = RandomSplitter(1/6, seed=0)
    ).datasets(path)

    xs, ys = zip(*ds.train, *ds.valid)
    xs = np.stack(L(map(lambda x: np.array(x, dtype=np.float32).reshape(-1), xs))) / 255.
    ys = np.array(ys, dtype=np.int64)

    x_train, x_valid = xs[:len(ds.train)], xs[len(ds.train):]
    y_train, y_valid = ys[:len(ds.train)], ys[len(ds.train):]

    save_pickle(pickle_path, [x_train, y_train, x_valid, y_valid])

    del ds, xs, ys, x_train, y_train, x_valid, y_valid

x_train, y_train, x_valid, y_valid = map(tensor, load_pickle(pickle_path))


In [2]:
n, m = x_train.shape
c = y_train.max() + 1
nh = 50

We specify a model for boolean prediction.

In [3]:
w1 = torch.randn(m, nh)
b1 = torch.zeros(nh)
w2 = torch.randn(nh, 1)
b2 = torch.zeros(1)

In [4]:
def lin(x, w, b):
    return x @ w + b

In [5]:
t = lin(x_valid, w1, b1)
t.shape

torch.Size([10000, 50])

In [6]:
def relu(x):
    return x.clamp_min(0.)

In [7]:
t = relu(t)
t

tensor([[15.5903,  0.0000,  8.9234,  ...,  0.0000,  0.0000,  1.9597],
        [27.5410,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  9.4916],
        [ 7.2958,  0.0000,  3.1869,  ...,  0.0000, 10.5645,  0.0962],
        ...,
        [ 8.1671,  0.0000,  0.0000,  ...,  1.3344, 14.9702,  0.0000],
        [12.3464,  0.0000, 18.7875,  ...,  1.7658,  5.5055,  0.0000],
        [ 1.3115,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  2.3366]])

In [8]:
def model(xb):
    l1 = lin(xb, w1, b1)
    l2 = relu(l1)
    return lin(l2, w2, b2)

In [9]:
preds = model(x_train)
preds.shape, y_train.shape


(torch.Size([50000, 1]), torch.Size([50000]))

We now go on to see how to compute a loss against these predictions (here we use `mse` for demonstration, but it is not an appropriate loss for categorization)

We need to reshape `preds` to eliminate errorneous broadcasting on operations between `preds` and `y_train`.

In [10]:
(preds[:, 0] - y_train).shape

torch.Size([50000])

Hence we include this reindexing.

In [11]:
def mse(output, targ):
    return (output[:, 0] - targ).pow(2).mean()


In [12]:
mse(preds, y_train)


tensor(670.1931)