# Basic lightning example

Adapted from [https://youtu.be/OMDn66kM9Qc](https://youtu.be/OMDn66kM9Qc)


In [1]:

! pip install torchmetrics

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchmetrics
  Downloading torchmetrics-0.9.0-py3-none-any.whl (418 kB)
[K     |████████████████████████████████| 418 kB 9.6 MB/s 
Installing collected packages: torchmetrics
Successfully installed torchmetrics-0.9.0


In [2]:
! pip install pytorch-lightning

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytorch-lightning
  Downloading pytorch_lightning-1.6.4-py3-none-any.whl (585 kB)
[K     |████████████████████████████████| 585 kB 8.0 MB/s 
Collecting pyDeprecate>=0.3.1
  Downloading pyDeprecate-0.3.2-py3-none-any.whl (10 kB)
Collecting PyYAML>=5.4
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 44.0 MB/s 
Collecting fsspec[http]!=2021.06.0,>=2021.05.0
  Downloading fsspec-2022.5.0-py3-none-any.whl (140 kB)
[K     |████████████████████████████████| 140 kB 66.5 MB/s 
Collecting aiohttp
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 45.3 MB/s 
Collecting asynctest==0.13.0
  Downloading asynctest-0.13.0-py3-none

In [3]:
import torch
from torch import nn
from torch import optim
from torchvision import datasets, transforms
from torch.utils.data import random_split, DataLoader
import pytorch_lightning as pl
from torchmetrics import Accuracy

In [4]:
#torch.randn(5).cuda()

##Lightning makes it possible to eliminate some of the boilerplate

1. model
2. optimizer
3. data
4. training loop
5. validation loop

Advantages of Lightning:
- will put on the correct device; don't have to worry about `cuda()`
- can specify number of epoxh and to use GPU with just this: `trainer = pl.Trainer(max_epochs=5, gpus=1)`

In [5]:
class ResNet(pl.LightningModule):
  def __init__(self):
    super().__init__()
    self.l1 =  nn.Linear(28*28, 64)
    self.l2 = nn.Linear(64,64)
    self.l3 = nn.Linear(64,10)
    self.do = nn.Dropout(0.1)
    self.loss = nn.CrossEntropyLoss()
    train_data = datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor())
    self.train_split, self.val_split = random_split(train_data, [55000, 5000])

  def forward(self, x):
    h1 = nn.functional.relu(self.l1(x))
    h2 = nn.functional.relu(self.l2(h1))
    do = self.do(h2 + h1)
    logits = self.l3(do)
    return logits

  def configure_optimizers(self):
    #params = model.parameters()
    optimiser = optim.SGD(self.parameters(), lr=1e-2)
    return optimiser

  def training_step(self, batch, batch_idx):
    x, y = batch
    # x: b* 1 * 28 * 28
    b = x.size(0)
    #x = x.view(b, -1).cuda()
    x = x.view(b, -1)
    # 1 forward
    logits = self(x) # l for logit
    #import pdb; pdb.set_trace()
    # 2 compute the objective func
    #y = y.cuda()
    J = self.loss(logits,y)
    acc = Accuracy()
    pbar = {'train_acc':acc(logits,y)}
    return {'loss':J,'progress_bar':pbar} # equivalent to return J
    # return J

  def validation_step(self,batch,batch_idx):
    # in validation loop - want to show metrics for whole validation set
    # this is saying for eveyr batch in the validation loop, know the loss
    results = self.training_step(batch,batch_idx)
    results['progress_bar']['val_acc'] = results['progress_bar']['train_acc']
    del results['progress_bar']['train_acc']
    return results

  def validation_epoch_end(self, val_step_outputs):
    # early stopping automatically integrated
    avg_val_loss = torch.tensor([x['loss'] for x in val_step_outputs])
    avg_val_acc = torch.tensor([x['progress_bar']['val_acc'] for x in val_step_outputs])
    pbar = {'avg_val_acc': avg_val_acc}
    return {'val_loss':avg_val_loss, 'progress_bar': pbar}

  #def prepare_data(self):
  #  datasets.MNIST('data', train=True, download=True,transform=transforms.ToTensor())

  #def setup(self,stage=None):
  #  dataset = datasets.MNIST('data', train=True, download=False,transform=transforms.ToTensor())
  #  self.train, self.val = random_split(dataset,[55000,5000])
    

  def train_dataloader(self):
    #train_data = datasets.MNIST('data', train=True, download=True,transform=transforms.ToTensor())
    #self.train, self.val = random_split(train_data,[55000,5000])
    #train_loader = DataLoader(train_data,batch_size=32)
    #val_loader = DataLoader(val,batch_size=32)
    train_loader = DataLoader(self.train_split, batch_size=16)
    return train_loader

  def val_dataloader(self):
    #val_loader = DataLoader(self.val,batch_size=32)
    val_loader = DataLoader(self.val_split, batch_size=16)
    return val_loader

model = ResNet()

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw



In [6]:
%%time
trainer = pl.Trainer(progress_bar_refresh_rate=20, max_epochs=5)
trainer.fit(model)

  f"Setting `Trainer(progress_bar_refresh_rate={progress_bar_refresh_rate})` is deprecated in v1.5 and"
GPU available: True, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: /content/lightning_logs

  | Name | Type             | Params
------------------------------------------
0 | l1   | Linear           | 50.2 K
1 | l2   | Linear           | 4.2 K 
2 | l3   | Linear           | 650   
3 | do   | Dropout          | 0     
4 | loss | CrossEntropyLoss | 0     
------------------------------------------
55.1 K    Trainable params
0         Non-trainable params
55.1 K    Total params
0.220     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

In [7]:
! ls lightning_logs/

version_0


In [8]:
# saves the best checkpoint wrt to loss
! ls lightning_logs/version_4/checkpoints

ls: cannot access 'lightning_logs/version_4/checkpoints': No such file or directory


In [10]:
'''
from torch.nn.modules import batchnorm
# training and validation loopsloop
nb_epochs = 5
for epoch in range(nb_epochs):
  losses = list()
  accuracies = list()
  model.train()
  for batch in train_loader:
    x, y = batch
    # x: b* 1 * 28 * 28
    b = x.size(0)
    #x = x.view(b, -1).cuda()
    x = x.view(b, -1)
    # 1 forward
    l = model(x) # l for logit
    #import pdb; pdb.set_trace()
    # 2 compute the objective func
    #y = y.cuda()
    J = loss(l,y)
    # 3 cleaning the gradients
    model.zero_grad()
    # equivalent to params.grad.zero_()
    # 4 accumulate the partial derivatives of J wrt the parms
    J.backward()
    # params.grad.add_(dJ/dparams)
    # 5 setp in the opposite direction of the gradient
    optimiser.step()
    # could have also done: with torch.no_grad params = params - eta *params.grad
    # show value of scalar tensor
    losses.append(J.item())
    accuracies.append(y.eq(l.detach().argmax(dim=1).cpu()).float().mean())
  print(f'Epoch {epoch+1},train loss: {torch.tensor(losses).mean():.2f}')
  print(f'training accuracy: {torch.tensor(accuracies).mean():.2f}')
  losses = list()
  model.eval()
  accuracies = list()
  for batch in val_loader:
    x, y = batch
    # x: b* 1 * 28 * 28
    b = x.size(0)
    #x = x.view(b, -1).cuda()
    x = x.view(b, -1)
    # 1 forward
    with torch.no_grad():
      l = model(x) # l for logit
    # 2 compute the objective func
    #y = y.cuda()
    J = loss(l,y)
    losses.append(J.item())
    accuracies.append(y.eq(l.detach().argmax(dim=1).cpu()).float().mean())
  print(f'Epoch {epoch+1},validation loss: {torch.tensor(losses).mean():.2f}')
  print(f'validation accuracy: {torch.tensor(accuracies).mean():.2f}')'''

"\nfrom torch.nn.modules import batchnorm\n# training and validation loopsloop\nnb_epochs = 5\nfor epoch in range(nb_epochs):\n  losses = list()\n  accuracies = list()\n  model.train()\n  for batch in train_loader:\n    x, y = batch\n    # x: b* 1 * 28 * 28\n    b = x.size(0)\n    #x = x.view(b, -1).cuda()\n    x = x.view(b, -1)\n    # 1 forward\n    l = model(x) # l for logit\n    #import pdb; pdb.set_trace()\n    # 2 compute the objective func\n    #y = y.cuda()\n    J = loss(l,y)\n    # 3 cleaning the gradients\n    model.zero_grad()\n    # equivalent to params.grad.zero_()\n    # 4 accumulate the partial derivatives of J wrt the parms\n    J.backward()\n    # params.grad.add_(dJ/dparams)\n    # 5 setp in the opposite direction of the gradient\n    optimiser.step()\n    # could have also done: with torch.no_grad params = params - eta *params.grad\n    # show value of scalar tensor\n    losses.append(J.item())\n    accuracies.append(y.eq(l.detach().argmax(dim=1).cpu()).float().mean()

In [11]:
!nvidia-smi

Sat Jun  4 20:48:23 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    27W / 250W |      2MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces