## Working with data

`pytorch` has two primitives to work with data. 

1. `torch.utils.data.DataLoader`: Wraps an iterable around `Dataset`
2. `torch.utils.data.Dataset`: Stores the samples and their corresponding labels. 

In [1]:
import torch
from torch import nn 
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

`pytorch` offers domain-sepcific libs such as `TorchText`, `TorchVision`, and `TorchAudio`, all of which includes datasets. These are subclasses of `torch.utils.data.Dataset`. 

Every TorchVision `Dataset` includes two arguments: `transform` and `target_transform` to modify the samples and labels respectively. 

In [2]:
# Download training data from open datasets
training_data = datasets.FashionMNIST(
    root="data",  # where data should be downloaded or looked for
    train=True,  # each Dataset has training and test split
    download=True,  # Download if not available at `root`
    transform=ToTensor(),
)

# Download test data from open datasets
test_data = datasets.FashionMNIST(
    root="data", 
    train=False, 
    download=True, 
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw



We pass the `Dataset` as an argument to `DataLoader`. This wraps an iterable over our dataset, and supports **automatic batching**, sampling, shuffling and multiprocess data loading. Here we defined a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.  

In [3]:
batch_size = 64

# Create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
  print(f"Shape of X [N, C, H, W]: {X.shape}")
  # print(X[0])
  print(f"Shape of y: {y.shape} {y.dtype}")
  # print(y[0])
  break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


## Creating Models

Neural networks comprise of layers/modules that perform operations on data. The `torch.nn` namespace provides all the building blocks you need to build your own network. **Every** module in PyTorch subclasses the `nn.Module`. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily. 

> `nn.Module` is an implementation of the `Composite` design pattern. 

To define a neural network in PyTorch, 
- Get device for training, with accelerator if possible
- Create a class that inherites from `nn.Module`
- Defined layers of the network in the `__init__` function
- Specify how data will pass through the network in the forward function, every `nn.Module` subclass imlpements the operations on input data in the `forward` method
- To accelerate operations inthe neural network, we move it to the GPU if available

In [4]:
# Get CPU or GPU device for training
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
  def __init__(self):
    super().__init__()
    self.flatten = nn.Flatten()
    self.linear_relu_stack = nn.Sequential(
        nn.Linear(in_features=28*28, out_features=512),
        nn.ReLU(), 
        nn.Linear(512, 512), 
        nn.ReLU(), 
        nn.Linear(512, 10)  # one hot labeling
    )  # nn.Sequential is an ordered container of modules

  def forward(self, x):
    x = self.flatten(x)
    logits = self.linear_relu_stack(x)
    return logits


model = NeuralNetwork().to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use the model, we pass  it the input data. This executes the model's `forward`, along with some **background operations**, book keeping for backprop. 

> Do not call `model.forward()` directly. `nn.Module` is callable!

The model is not trained yet, but we can use its untrained parameters to do a forward pass - through the model.

In [5]:
X = torch.rand(1, 28, 28, device=device)  # random grayscale
logits = model(X)  # don't call forward() directly, model() callable keeps backprop data
pred_probab = nn.Softmax(dim=1)(logits)
y_pred= pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([3])


## Optimizing the Model Parameters

To train a model, we obviously need a goal, in AI terms we need a loss function and an optimizer (optmization algorithm). 

In [6]:
loss_fn = nn.CrossEntropyLoss()  # Goal: class predictions need to align with training label
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)  # Stochastic Gradient Descent

**Loss fn**: We pass our model's output logits to `nn.CrossEntropyLoss`, which will normalize the logits (Log Softmax, `nn.LogSoftmax`) and compute prediction error (Negative Log Likelihood, `nn.NLLLoss`). 

**Optimizer**: We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter. 

Inside the training loop, optimization happens in 3 steps:
1. Call `optimizer.zero_grad()` to reset the gradients of model parameters. Gradients by default add up; to prevent double counting, we explicitly zero them at each iteration. 
2. Backpropagate the prediction loss with a call to `loss.backward()`. Pytorch **deposits** (cache) the gradients of the loss w.r.t each parameter. 
3. Once we have our gradients, we call `optimizer.step()` to adjust the parameters by the gradients collected in the backward pass. 

In [7]:
def train(dataloader, model, loss_fn, optimizer):
  size = len(dataloader.dataset)
  model.train()

  for batch, (X, y) in enumerate(dataloader):  # one epoch
    X, y = X.to(device), y.to(device)

    # Compute prediction error
    pred = model(X)
    loss = loss_fn(pred, y)

    # Backpropagation
    optimizer.zero_grad()  # each epoch, certain optimzers like momentum, need previous grads
    loss.backward()
    optimizer.step()

    if batch % 100 == 0:
      loss, current = loss.item(), batch * len(X)
      print(f"Loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

In [8]:
def test(dataloader, model, loss_fn):
  size = len(dataloader.dataset)
  num_batches = len(dataloader)
  model.eval()
  test_loss, correct = 0, 0
  with torch.no_grad():
    for X, y in dataloader:
      X, y = X.to(device), y.to(device)
      pred = model(X)
      test_loss += loss_fn(pred, y).item()
      correct += (pred.argmax(1) == y).type(torch.float).sum().item()

  test_loss /= num_batches
  correct /= size
  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several epochs (passes over the entire dataset). During each epoch, the moedl learns parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the accuracy increase and the loss decrease with every epoch. 

In [9]:
epochs = 5
for t in range(epochs):
  print(f"Each {t+1}\n--------------------")
  train(train_dataloader, model, loss_fn, optimizer)
  test(test_dataloader, model, loss_fn)
print("Done!")

Each 1
--------------------
Loss: 2.307557 [    0/60000]
Loss: 2.287271 [ 6400/60000]
Loss: 2.274483 [12800/60000]
Loss: 2.267368 [19200/60000]
Loss: 2.253418 [25600/60000]
Loss: 2.222031 [32000/60000]
Loss: 2.230790 [38400/60000]
Loss: 2.197815 [44800/60000]
Loss: 2.200097 [51200/60000]
Loss: 2.168135 [57600/60000]
Test Error: 
 Accuracy: 40.2, Avg loss: 2.163605 

Each 2
--------------------
Loss: 2.170888 [    0/60000]
Loss: 2.155220 [ 6400/60000]
Loss: 2.111791 [12800/60000]
Loss: 2.130727 [19200/60000]
Loss: 2.080860 [25600/60000]
Loss: 2.015082 [32000/60000]
Loss: 2.049182 [38400/60000]
Loss: 1.972299 [44800/60000]
Loss: 1.984206 [51200/60000]
Loss: 1.913921 [57600/60000]
Test Error: 
 Accuracy: 51.5, Avg loss: 1.914083 

Each 3
--------------------
Loss: 1.939876 [    0/60000]
Loss: 1.904718 [ 6400/60000]
Loss: 1.809485 [12800/60000]
Loss: 1.854599 [19200/60000]
Loss: 1.746787 [25600/60000]
Loss: 1.683400 [32000/60000]
Loss: 1.716342 [38400/60000]
Loss: 1.616112 [44800/60000]
Lo

## Saving Models

A common way to save a model is to serialize the **internal state dictionary** (containing the model parameters). 

In [10]:
torch.save(model.state_dict(), "model.pth")  # torch.save() saves an object to file system, or buffer..., the object being the state dict
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


## Loading Models

The process of loading a model includes re-creating the moddel structure and loading the state dictionary to it. 

In [11]:
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

In [16]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[55][0], test_data[55][1]
with torch.no_grad():
  pred = model(x)
  predicted, actual = classes[pred[0].argmax(0)], classes[y]
  print(f"Predicted: {predicted}, Actual: {actual}")

Predicted: Pullover, Actual: Pullover


PyTorch models store the learned parameters in an internal state dictionary, called `state_dict`. These can be persisted via the `torch.save` method. 

To load model weights, your need to create an instance of the same model first, and then load the parameters using `load_state_dict()` method. 

> Be sure to call `model.eval()` method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent results. 

>Loading the model parameters need reconstruction of the same model architecture. **Model** itself can be saved. 

```python
torch.save(model, "model.pth")

...

model = torch.load("model.pth")
```


This approach uses Python `pickle` module when serializing the model, thus it relies on the actual class definition to be available when loading the model. 