# Basic Training and Inference

## 0. Datasets and DataLoader class

Datasets are classes that access data samples. They require 2 methods:
- \_\_len\_\_: return the total numbers of samples
- \_\_getitem\_\_: retrieve a specific sample by index

DataLoaders are classes that handles data procesing fro training. They deal with tasks such as:
- create batches
- shuffle data
- parallel processing (loading)


In [1]:
# raw implementation of custom data class

import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np

class CustomDataset(Dataset):
    def __init__(self, data_path, transform=None):
        self.data = np.load(data_path)
        #users can actually put data processing pipeline here
        self.transform = transform
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        sample = self.data(index)

        if self.transform:
            sample = self.transform(sample)

        return sample

In PyTorch, there are already pre-built Dataset classes to handle common datasets.

In [2]:
# Regular tensors (already in memory)

from torch.utils.data import TensorDataset

X = torch.tensor([[1, 2], [3, 4], [5, 6]], dtype=torch.float)
y = torch.tensor([0, 1, 0], dtype=torch.float)

dataset = TensorDataset(X, y)

dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

In [3]:
# Images

from torchvision import datasets, transforms

transform = transforms.Compose([ #tensor processing pipeline
    transforms.Resize(256), #resizes image
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.75, 0.75, 0.75])
])

train_dataset = datasets.ImageFolder("../../../data/random_images", transform=transform)

train_data_loader = DataLoader(
    train_dataset, 
    batch_size=32, 
    shuffle=True, 
    num_workers=4
)

There are 2 additional features can be used to customize DataLoaders:
1. Sampling methods
    - entered as `sampler` argument in declaration
2. Collate functions (how samples are combined into a batch)
    - This cannot be used with shuffle
    - entered as `collate_fn` argument in declaration

Collate functions take in `batch` as arguments and returns `padded_sequences`, `labels`, and `sequence_lengths`.

## 1. Build a basic feedforward neural network

Build a simple 3 layer (1 hidden layer) neural network on MNIST data.

In [4]:
# Setup a model architecture

from torch import nn
from torch.nn import functional as F

class FeedForwardNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedForwardNN, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim) #weights to go to hidden layer
        self.fc2 = nn.Linear(hidden_dim, output_dim) #weights to go to output

    def forward(self, x):
        first_layer = self.fc1(x)
        activated_first_layer = torch.tanh(first_layer) #use tanh as activation function for 1st layer
        second_layer = self.fc2(activated_first_layer)
        output = torch.sigmoid(second_layer) #use sigmoid as activation function for 2nd layer
        return output

In [5]:
# Train Model
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torch.optim as optim

# create model instance and load MNIST data
model = FeedForwardNN(
    input_dim=28*28, #image size
    hidden_dim=100, #hidden nodes
    output_dim=10 #10 categories as output
)

batch_size = 5000

transform = transforms.Compose([
    transforms.ToTensor()
])

train_data = datasets.MNIST(
    "../../../data", 
    train=True, 
    download=True, 
    transform=transform
)
test_data = datasets.MNIST(
    "../../../data", 
    train=False, 
    download=True, 
    transform=transform
)

train_data_loader = DataLoader(
    train_data, 
    batch_size=batch_size, 
    shuffle=True,
    drop_last=True
)
test_data_loader = DataLoader(
    test_data,
    batch_size=batch_size,
    shuffle=True,
    drop_last=True
)

# model training
device = torch.device("cpu") # train on cpu

learning_rate = 0.001
epochs = 10

optimizer = optim.Adam(model.parameters(), lr = learning_rate) #different optimizers may have significantly different results

for epoch in range(1, epochs + 1):
    print(f"Train Epoch: {epoch}")
    model.train()
    for batch_id, (data, target) in enumerate(train_data_loader): #for each batch, also get the index of batch
        data, target = data.to(device), target.to(device)
        data = data.view(-1, 28*28).requires_grad_()
        optimizer.zero_grad()
        output = model(data) #forward pass
        criterion = nn.CrossEntropyLoss()
        loss = criterion(output, target) #cross entropy loss as loss function
        loss.backward() #compute gradients
        optimizer.step() #update weights

        if batch_id % 10 == 0: #update on training iterations
            print("Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}".format(
                epoch, batch_id * len(data), len(train_data_loader.dataset),
                100. * batch_id / len(train_data_loader), loss.item()))
    
    model.eval() #test set
    test_loss = 0 #0 out losses
    correct = 0
    print(f"Evaluate Epoch: {epoch}")
    with torch.no_grad(): #disable gradient calculation
        for data, target in test_data_loader: #for each batch
            data, target = data.to(device), target.to(device)
            data = data.view(-1, 28*28)
            output = model(data) #forward pass
            criterion = nn.CrossEntropyLoss()
            test_loss += criterion(output, target)
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_data_loader.dataset)

    print("\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n".format(
        test_loss, correct, len(test_data_loader.dataset),
        100. * correct / len(test_data_loader.dataset)))


Train Epoch: 1
Evaluate Epoch: 1

Test set: Average loss: 0.0004, Accuracy: 7601/10000 (76%)

Train Epoch: 2
Evaluate Epoch: 2

Test set: Average loss: 0.0004, Accuracy: 8096/10000 (81%)

Train Epoch: 3
Evaluate Epoch: 3

Test set: Average loss: 0.0004, Accuracy: 8353/10000 (84%)

Train Epoch: 4
Evaluate Epoch: 4

Test set: Average loss: 0.0004, Accuracy: 8500/10000 (85%)

Train Epoch: 5
Evaluate Epoch: 5

Test set: Average loss: 0.0004, Accuracy: 8594/10000 (86%)

Train Epoch: 6
Evaluate Epoch: 6

Test set: Average loss: 0.0003, Accuracy: 8711/10000 (87%)

Train Epoch: 7
Evaluate Epoch: 7

Test set: Average loss: 0.0003, Accuracy: 8780/10000 (88%)

Train Epoch: 8
Evaluate Epoch: 8

Test set: Average loss: 0.0003, Accuracy: 8844/10000 (88%)

Train Epoch: 9
Evaluate Epoch: 9

Test set: Average loss: 0.0003, Accuracy: 8898/10000 (89%)

Train Epoch: 10
Evaluate Epoch: 10

Test set: Average loss: 0.0003, Accuracy: 8946/10000 (89%)



In [6]:
# save model

torch.save(model.state_dict(), "./models/MNIST_feedforward_weights.pt") #saves model weights
torch.save(model, "./models/MNIST_feedforward_model.pt") #not recommended for production as it would break


In [7]:
# load model

model_state_reloaded = FeedForwardNN( #must use the same parameters
    input_dim=28*28,
    hidden_dim=100, 
    output_dim=10
) #initialize the instance 
model_state_reloaded.load_state_dict( #restore the weights
    torch.load("./models/MNIST_feedforward_weights.pt", weights_only=True)
)

model_reloaded = torch.load("./models/MNIST_feedforward_model.pt", weights_only=False) #loads the whole saved instance

### 1a. Inferencing on Feedforward model

The following steps are for inferencing an existing trained model. Most of it is already implemented in the sample training code.

0. Load the model and weights (if necessary)
1. Set model in eval mode `model.eval()`
2. Conduct forward pass with input `raw_output = model(input_data)` and ensure no gradients are calculated along the way `with torch.no_grad():`
3. Transform and interpret raw_output for use cases as required