[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/guiwitz/DSL_CV2_PyTorch/blob/main/notebooks/06-Advanced_training.ipynb)
# Full training

We have seen all the essential parts of creating a training loop (optimizer, stepping, zeroing gradient etc.) in a previous notebook. It involved many steps that are easy to forget, and also ignored most of the data-related steps:

- correctly batching the data and running training over the entire dataset
- keeping some data for validation
- running quality checks during training etc.

Our training loop will look something like this:

```
for batch in dataset:
    compute a prediction and its loss
    do a around of optimization
    compute the current accuracy
    
    every nth iteration:
        compute the accuracy on the validation dataset
```

Several higher level libraries simplify writing this loop. We use here the PyTorch-lightning library which sticks very closely to PyTorch but brings many simplifications.

The idea is essentially to skip having to write loops. Instead, we'll just specify what is supposed to happen at each iteration (e.g. compute accuracy), provide a dataloader and PyTorch-lightning will take care for us to go over all batches, zero the optimizer etc. Here we have to create a model again by subclassing. However here we use the specialized ```LightningModule```.

In [1]:
import pytorch_lightning as pl
import torch
from torch import nn
import torch.nn.functional as F

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
class Mynetwork_pl(pl.LightningModule):
    def __init__(self, input_size, num_categories):
        super(Mynetwork_pl, self).__init__()
        
        # define e.g. layers here e.g.
        self.layer1 = nn.Linear(input_size, 10)
        self.layer2 = nn.Linear(10, num_categories)
        
        self.loss = nn.CrossEntropyLoss()

    def forward(self, x):
        
        # flatten the input
        x = x.flatten(start_dim=1)
        # define the sequence of operations in the network including e.g. activations
        x = F.relu(self.layer1(x))
        x = self.layer2(x)
        
        return x
    
    def training_step(self, batch, batch_idx):
        
        x, y = batch
        output = self(x)
        loss = self.loss(output, y)
        
        self.log('loss', loss, on_epoch=True, prog_bar=True, logger=True)

        return loss
    
    def validation_step(self, batch, batch_idx):
        
        x, y = batch
        output = self(x)
        accuracy = (torch.argmax(output,dim=1) == y).sum()/len(y)

        self.log('accuracy', accuracy, on_epoch=True, prog_bar=True, logger=True)
        
        return accuracy
        
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-3)


In [3]:
if 'model' in locals():
    del model
model = Mynetwork_pl(1024, 2)

We see that we have now added the loss definition in the ```__init__``` function and just added three functions:
- ```training_step``` that says what happens at each training step: computing and returning the loss
- ```validation_step``` that says what happens at each validation step: compute the accuracy
- finally the optimizer is defined in ```configure_optimizers```

## Dataloader

We recreate our simple dataloader for circles and triangles:

In [4]:
import torchvision
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
import pandas as pd
from torch.utils.data import random_split
import skimage

transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.RandomRotation(20)
])

class Tricircle(Dataset):
    def __init__(self, image_path, labels, transform=None):
        super(Tricircle, self).__init__()
        self.image_path = image_path
        self.labels = labels
        self.transform = transform

    def __getitem__(self, index):
        
        x = skimage.io.imread(image_path.joinpath(f'image_{index}.tif'))
        if self.transform is not None:
            x = self.transform(x)
        y = torch.tensor(self.labels['class'].values[index])
        
        return x, y

    def __len__(self):

        return len(self.labels)

image_path = Path('../data/triangles_circles/images/')
labels = pd.read_csv('../data/triangles_circles/triangles_circles.csv')
tridata = Tricircle(image_path=image_path, labels=labels, transform=transforms)

test_size = int(0.8 * len(tridata))
valid_size = len(tridata)-test_size

train_data, valid_data = random_split(tridata, [test_size, valid_size])

train_loader = DataLoader(train_data, batch_size=10)
validation_loader = DataLoader(valid_data, batch_size=10)

In [5]:
im_batch, label_batch = next(iter(validation_loader))

In [6]:
im_batch.shape

torch.Size([10, 1, 32, 32])

In [7]:
valid_size

40

## Training

Now we can finally train the network. For that we use a lightning ```Trainer``` which is the object that will perform the loops, batching etc. We only have to specify how many times we want to go over the entire dataset (epochs):

In [8]:
from pytorch_lightning.trainer import Trainer

In [9]:
trainer = pl.Trainer(max_epochs=30, enable_progress_bar=True)
#trainer = pl.Trainer(max_epochs=10), accelerator='gpu') #on Colab or GPU
#trainer = pl.Trainer(max_epochs=10), accelerator='mps') #on mac m1

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Then we can run the training by specifying which dataset to use and which model:

In [10]:
train_loader = DataLoader(dataset=train_data, batch_size=20)#, num_workers=0)
validation_loader = DataLoader(dataset=valid_data, batch_size=20)#, num_workers=0)

In [11]:
trainer.fit(model, train_dataloaders=train_loader, val_dataloaders=validation_loader)


  | Name   | Type             | Params
--------------------------------------------
0 | layer1 | Linear           | 10.2 K
1 | layer2 | Linear           | 22    
2 | loss   | CrossEntropyLoss | 0     
--------------------------------------------
10.3 K    Trainable params
0         Non-trainable params
10.3 K    Total params
0.041     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

  rank_zero_warn(


                                                                                                  

  rank_zero_warn(
  rank_zero_warn(


Epoch 0: 100%|███████████████████████████| 8/8 [00:01<00:00,  5.32it/s, v_num=16, loss_step=0.612]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|                                                           | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                                              | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0:  50%|██████████████████▌                  | 1/2 [00:00<00:00, 240.17it/s][A
Epoch 0: 100%|███████████| 8/8 [00:01<00:00,  5.19it/s, v_num=16, loss_step=0.612, accuracy=0.750][A
Epoch 1: 100%|█| 8/8 [00:00<00:00, 71.67it/s, v_num=16, loss_step=0.576, accuracy=0.750, loss_epoc[A
Validation: 0it [00:00, ?it/s][A
Validation:   0%|                                                           | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                                              | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0:  50%|██████████████████▌                  | 1/2 [00:00<00:00, 201.82it/s][A
Epoch 1: 100%|█| 

`Trainer.fit` stopped: `max_epochs=30` reached.


Epoch 29: 100%|█| 8/8 [00:00<00:00, 50.67it/s, v_num=16, loss_step=0.294, accuracy=0.900, loss_epo


In [12]:
val_data, val_label = next(iter(validation_loader))

In [13]:
model(val_data).argmax(axis=1)

tensor([0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1])

In [14]:
val_label

tensor([0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1])