# Test PyTorch notebooks

Demonstrate basic PyTorch training loop for an MNIST application.

Adapted from [https://youtu.be/OMDn66kM9Qc](https://youtu.be/OMDn66kM9Qc)

In [1]:
import torch
from torch import nn
from torch import optim
from torchvision import datasets, transforms
from torch.utils.data import random_split, DataLoader


In [2]:
#torch.randn(5).cuda()

In [3]:
# basic model
model = nn.Sequential(
    nn.Linear(28*28, 64),
    nn.ReLU(),
    nn.Linear(64,64),
    nn.ReLU(),
    nn.Linear(64,10)
)

In [4]:
# more flexible model
# Resnet trains faster than older definition - due to h2 + h1
class ResNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.l1 =  nn.Linear(28*28, 64)
    self.l2 = nn.Linear(64,64)
    self.l3 = nn.Linear(64,10)
    self.do = nn.Dropout(0.1)

  def forward(self, x):
    h1 = nn.functional.relu(self.l1(x))
    h2 = nn.functional.relu(self.l2(h1))
    do = self.do(h2 + h1)
    logits = self.l3(do)
    return logits
#model = ResNet().cuda()

In [5]:
# define optimizer
params = model.parameters()
optimiser = optim.SGD(model.parameters(), lr=1e-2)

In [6]:
# define loss
loss = nn.CrossEntropyLoss()

In [7]:
# Train, val split
train_data = datasets.MNIST('data', train=True, download=True,transform=transforms.ToTensor())
train, val = random_split(train_data,[55000,5000])
train_loader = DataLoader(train,batch_size=32)
val_loader = DataLoader(val,batch_size=32)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw



In [8]:
%%time
from torch.nn.modules import batchnorm
# training and validation loopsloop
nb_epochs = 5
for epoch in range(nb_epochs):
  losses = list()
  accuracies = list()
  model.train()
  for batch in train_loader:
    x, y = batch
    # x: b* 1 * 28 * 28
    b = x.size(0)
    #x = x.view(b, -1).cuda()
    x = x.view(b, -1)
    # 1 forward
    l = model(x) # l for logit
    #import pdb; pdb.set_trace()
    # 2 compute the objective func
    #y = y.cuda()
    J = loss(l,y)
    # 3 cleaning the gradients
    model.zero_grad()
    # equivalent to params.grad.zero_()
    # 4 accumulate the partial derivatives of J wrt the parms
    J.backward()
    # params.grad.add_(dJ/dparams)
    # 5 setp in the opposite direction of the gradient
    optimiser.step()
    # could have also done: with torch.no_grad params = params - eta *params.grad
    # show value of scalar tensor
    losses.append(J.item())
    accuracies.append(y.eq(l.detach().argmax(dim=1).cpu()).float().mean())
  print(f'Epoch {epoch+1},train loss: {torch.tensor(losses).mean():.2f}')
  print(f'training accuracy: {torch.tensor(accuracies).mean():.2f}')
  losses = list()
  model.eval()
  accuracies = list()
  for batch in val_loader:
    x, y = batch
    # x: b* 1 * 28 * 28
    b = x.size(0)
    #x = x.view(b, -1).cuda()
    x = x.view(b, -1)
    # 1 forward
    with torch.no_grad():
      l = model(x) # l for logit
    # 2 compute the objective func
    #y = y.cuda()
    J = loss(l,y)
    losses.append(J.item())
    accuracies.append(y.eq(l.detach().argmax(dim=1).cpu()).float().mean())
  print(f'Epoch {epoch+1},validation loss: {torch.tensor(losses).mean():.2f}')
  print(f'validation accuracy: {torch.tensor(accuracies).mean():.2f}')

Epoch 1,train loss: 1.24
training accuracy: 0.67
Epoch 1,validation loss: 0.47
validation accuracy: 0.87
Epoch 2,train loss: 0.39
training accuracy: 0.89
Epoch 2,validation loss: 0.34
validation accuracy: 0.90
Epoch 3,train loss: 0.32
training accuracy: 0.91
Epoch 3,validation loss: 0.30
validation accuracy: 0.92
Epoch 4,train loss: 0.28
training accuracy: 0.92
Epoch 4,validation loss: 0.27
validation accuracy: 0.93
Epoch 5,train loss: 0.25
training accuracy: 0.93
Epoch 5,validation loss: 0.24
validation accuracy: 0.93
CPU times: user 30 s, sys: 118 ms, total: 30.2 s
Wall time: 30.2 s


In [9]:
#  p y.eq(l.detach().argmax(dim=1)).float().mean()

In [10]:
!nvidia-smi

Sat Jun  4 20:40:21 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    27W / 250W |      2MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces