We illustrate how we might write a class that indexes like numpy arrays and torch tensors.

In [1]:
from pathlib import Path
from urllib.request import urlretrieve
import gzip, pickle

MNIST_URL='https://github.com/mnielsen/neural-networks-and-deep-learning/blob/master/data/mnist.pkl.gz?raw=true'
path_data = Path('data')
path_data.mkdir(exist_ok=True)
path_gz = path_data/'mnist.pkl.gz'

if not path_gz.exists():
    urlretrieve(MNIST_URL, path_gz)

with gzip.open(path_gz, 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')

import torch
from torch import tensor

x_train,y_train,x_valid,y_valid = map(tensor, (x_train,y_train,x_valid,y_valid))
x_train.shape, x_train.type()


(torch.Size([50000, 784]), 'torch.FloatTensor')

In [2]:
x_valid.shape

torch.Size([10000, 784])

We create a model that performs the following:

`dataset` (N-by-784) @ `weights` (784-by-10) + `bias` (10) = `preds` (N-by-10),

where `@` indicated matrix multiplication, we have abstracted away the activation functions, and the 10 output dimensions encode a probability distribution for the model's convidence in the input being each digit.

In [3]:
torch.manual_seed(1)
weights = torch.randn(784, 10)
bias = torch.zeros(10)


Demo of matmul as dotproduct on a slice.

In [4]:
m1 = x_valid[:5]
m2 = weights

ar, ac = m1.shape 
br, bc = m2.shape
(ar, ac), (br, bc)

t1 = torch.zeros(ar, bc)
print(t1.shape)

for i in range(ar):         # 5
    for j in range(bc):     # 10
        for k in range(ac): # 784
            t1[i, j] += m1[i, k] * m2[k, j]

t1

torch.Size([5, 10])


tensor([[-10.9417,  -0.6844,  -7.0038,  -4.0066,  -2.0857,  -3.3588,   3.9127,
          -3.4375, -11.4696,  -2.1153],
        [ 14.5430,   5.9977,   2.8914,  -4.0777,   6.5914, -14.7383,  -9.2787,
           2.1577, -15.2772,  -2.6758],
        [  2.2204,  -3.2171,  -4.7988,  -6.0453,  14.1661,  -8.9824,  -4.7922,
          -5.4446, -20.6758,  13.5657],
        [ -6.7097,   8.8998,  -7.4611,  -7.8966,   2.6994,  -4.7260, -11.0278,
         -12.9776,  -6.4443,   3.6376],
        [ -2.4444,  -6.4034,  -2.3984,  -9.0371,  11.1772,  -5.7724,  -8.9214,
          -3.7862,  -8.9827,   5.2797]])

In [5]:
t1.shape


torch.Size([5, 10])

We abstract it to its own function.

In [6]:
def matmul(a, b):
    (ar, ac), (br, bc) = a.shape, b.shape
    c = torch.zeros(ar, bc)
    for i in range(ar):
        for j in range(bc):
            for k in range(ac):
                c[i, j] += a[i, k] * b[k, j]
    return c


And time how long it takes to run.

In [7]:
%time _ = matmul(m1, m2)


CPU times: user 549 ms, sys: 0 ns, total: 549 ms
Wall time: 559 ms


The innermost loop of matmul is run the following number of times

In [8]:
ar * bc * ac


39200