# Basic Training and Inference

## 0. Datasets and DataLoader class

Datasets are classes that access data samples. They require 2 methods:
- \_\_len\_\_: return the total numbers of samples
- \_\_getitem\_\_: retrieve a specific sample by index

DataLoaders are classes that handles data procesing fro training. They deal with tasks such as:
- create batches
- shuffle data
- parallel processing (loading)


In [1]:
# raw implementation of custom data class

import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np

class CustomDataset(Dataset):
    def __init__(self, data_path, transform=None):
        self.data = np.load(data_path)
        #users can actually put data processing pipeline here
        self.transform = transform
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        sample = self.data(index)

        if self.transform:
            sample = self.transform(sample)

        return sample

In PyTorch, there are already pre-built Dataset classes to handle common datasets.

In [2]:
# Regular tensors (already in memory)

from torch.utils.data import TensorDataset

X = torch.tensor([[1, 2], [3, 4], [5, 6]], dtype=torch.float)
y = torch.tensor([0, 1, 0], dtype=torch.float)

dataset = TensorDataset(X, y)

dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

In [3]:
# Images

from torchvision import datasets, transforms

transform = transforms.Compose([ #tensor processing pipeline
    transforms.Resize(256), #resizes image
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.75, 0.75, 0.75])
])

train_dataset = datasets.ImageFolder("../../../data/random_images", transform=transform)

train_data_loader = DataLoader(
    train_dataset, 
    batch_size=32, 
    shuffle=True, 
    num_workers=4
)

There are 2 additional features can be used to customize DataLoaders:
1. Sampling methods
    - entered as `sampler` argument in declaration
2. Collate functions (how samples are combined into a batch)
    - This cannot be used with shuffle
    - entered as `collate_fn` argument in declaration

Collate functions take in `batch` as arguments and returns `padded_sequences`, `labels`, and `sequence_lengths`.

## 1. Build a basic CNN

In [4]:
# setup the model architecture

from torch import nn
from torch.nn import functional as F

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(
            in_channels=1, #pictures are greyscale
            out_channels=32, 
            kernel_size=3
        )
        self.conv2 = nn.Conv2d(
            in_channels=32, 
            out_channels=64, 
            kernel_size=3
        )
        self.dropout1 = nn.Dropout(p=0.25)
        self.dropout2 = nn.Dropout(p=0.5)
        self.fc1 = nn.Linear(9216, 128) # (28 - 2 - 2)^2 * 64
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x) #layer 1
        x = F.relu(x) #activation function 1
        x = self.conv2(x) #layer 2
        x = F.relu(x) #activation function 2
        x = F.max_pool2d(x, 2) #layer 2 - weed out dimensions
        x = self.dropout1(x) #layer 2 - ease overfitting
        x = torch.flatten(x, 1) #transform to 1-d array
        x = self.fc1(x) #layer 3
        x = F.relu(x) #layer 3 - activation function
        x = self.dropout2(x) #layer 3 - ease overfitting
        x = self.fc2(x) #layer 4
        output = F.log_softmax(x, dim = 1) #layer 4 - final result
        return output


In [None]:
# Train Model

import torch.optim as optim

# create model instance and load MNIST data
model = CNN()

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5), (0.75))
])

train_data = datasets.MNIST(
    "../../../data", 
    train=True, 
    download=True, 
    transform=transform
)
test_data = datasets.MNIST(
    "../../../data", 
    train=False, 
    download=True, 
    transform=transform
)

train_data_loader = DataLoader(
    train_data, 
    batch_size=1000, 
    shuffle=True
)
test_data_loader = DataLoader(
    test_data,
    batch_size=1000,
    shuffle=True
)

# model training
device = torch.device("cpu") # train on cpu

learning_rate = 0.001
epochs = 15

optimizer = optim.Adam(model.parameters(), lr = learning_rate) #different optimizers may have significantly different results

for epoch in range(1, epochs + 1):
    print(f"Train Epoch: {epoch}")
    model.train()
    for batch_id, (data, target) in enumerate(train_data_loader): #for each batch, also get the index of batch
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data) #forward pass
        loss = F.nll_loss(output, target) #negative loglikelihood loss function
        loss.backward() #compute gradients
        optimizer.step() #update weights

        if batch_id % 20 == 0: #update on training iterations
            print("Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}".format(
                epoch, batch_id * len(data), len(train_data_loader.dataset),
                100. * batch_id / len(train_data_loader), loss.item()))
    
    model.eval() #test set
    test_loss = 0 #0 out losses
    correct = 0
    with torch.no_grad(): #disable gradient calculation
        for data, target in test_data_loader: #for each batch
            data, target = data.to(device), target.to(device)
            output = model(data) #forward pass
            test_loss += F.nll_loss(output, target, reduction="sum").item() # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_data_loader.dataset)

    print("\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n".format(
        test_loss, correct, len(test_data_loader.dataset),
        100. * correct / len(test_data_loader.dataset)))



Train Epoch: 1

Test set: Average loss: 0.0862, Accuracy: 9748/10000 (97%)

Train Epoch: 2

Test set: Average loss: 0.0530, Accuracy: 9823/10000 (98%)

Train Epoch: 3

Test set: Average loss: 0.0407, Accuracy: 9865/10000 (99%)

Train Epoch: 4

Test set: Average loss: 0.0351, Accuracy: 9874/10000 (99%)

Train Epoch: 5

Test set: Average loss: 0.0328, Accuracy: 9888/10000 (99%)

Train Epoch: 6

Test set: Average loss: 0.0293, Accuracy: 9897/10000 (99%)

Train Epoch: 7

Test set: Average loss: 0.0308, Accuracy: 9897/10000 (99%)

Train Epoch: 8

Test set: Average loss: 0.0275, Accuracy: 9914/10000 (99%)

Train Epoch: 9

Test set: Average loss: 0.0288, Accuracy: 9910/10000 (99%)

Train Epoch: 10

Test set: Average loss: 0.0267, Accuracy: 9919/10000 (99%)

Train Epoch: 11

Test set: Average loss: 0.0268, Accuracy: 9913/10000 (99%)

Train Epoch: 12

Test set: Average loss: 0.0258, Accuracy: 9917/10000 (99%)

Train Epoch: 13

Test set: Average loss: 0.0267, Accuracy: 9914/10000 (99%)

Train Ep

In [24]:
# save model

torch.save(model.state_dict(), "./models/MNIST_CNN_weights.pt") #saves model weights
torch.save(model, "./models/MNIST_CNN_model.pt") #not recommended for production as it would break


In [None]:
# load model

model_state_reloaded = CNN() #initialize the instance 
model_state_reloaded.load_state_dict( #restore the weights
    torch.load("./models/MNIST_CNN_weights.pt", weights_only=True)
)

model_reloaded = torch.load("./models/MNIST_CNN_model.pt", weights_only=False) #loads the whole saved instance

### 1a. Inferencing on CNN

The following steps are for inferencing an existing trained model. Most of it is already implemented in the sample training code.

0. Load the model and weights (only if necessary)
1. Set model in eval mode `model.eval()`
2. Conduct forward pass with input `raw_output = model(input_data)` and ensure no gradients are calculated along the way `with torch.no_grad():`
3. Transform and interpret raw_output for use cases as required

## 2. Build a LSTM