Source:https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

## Import Data

In [3]:
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [4]:
batch_size = 64

# create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


## Create Models

to define a neurual network in pytorch, we create a class that inherits from nn.module. We define the layers of the network in the init function and specify how data will pass through the network in the forward function. 

To accelerate operations in the neural network, we move it to the gpu or mps if available

In [5]:
# get cpu, gpu or mps device for trianing
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "vpu"
)
print(f"Using {device} device")

Using mps device


In [6]:
# define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [7]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Optimizing the Model Parameters

To train a model, we need loss function and optimizer

In [8]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to the batches), and backpropagates the prediction error to adjust the models parameters

In [12]:
def train(dataloader, model, loos_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

We also check the modesl performance against the dataset to ensure it is learning

In [10]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy {(100*correct) :>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the models accuracy and loss at each epoch; wed like to see the accruacy increase and the loss decrease with every epoch

In [13]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n---------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
---------------------------
loss: 2.305757 [   64/60000]
loss: 2.296130 [ 6464/60000]
loss: 2.278633 [12864/60000]
loss: 2.269385 [19264/60000]
loss: 2.247319 [25664/60000]
loss: 2.230033 [32064/60000]
loss: 2.234461 [38464/60000]
loss: 2.210865 [44864/60000]
loss: 2.203962 [51264/60000]
loss: 2.164205 [57664/60000]
Test Error: 
 Accuracy 42.1%, Avg loss: 2.164632 

Epoch 2
---------------------------
loss: 2.175488 [   64/60000]
loss: 2.170919 [ 6464/60000]
loss: 2.114862 [12864/60000]
loss: 2.125118 [19264/60000]
loss: 2.072998 [25664/60000]
loss: 2.020918 [32064/60000]
loss: 2.052995 [38464/60000]
loss: 1.983420 [44864/60000]
loss: 1.979616 [51264/60000]
loss: 1.904405 [57664/60000]
Test Error: 
 Accuracy 54.2%, Avg loss: 1.907066 

Epoch 3
---------------------------
loss: 1.934356 [   64/60000]
loss: 1.914192 [ 6464/60000]
loss: 1.800701 [12864/60000]
loss: 1.838264 [19264/60000]
loss: 1.718788 [25664/60000]
loss: 1.679312 [32064/60000]
loss: 1.708526 [38464/60000]
loss: 1

### Saving Models

A common way to save a model is to serialize the internal state dictionary (containg the model parameters)

In [14]:
torch.save(model.state_dict(), "model.pth")
print("Saved pytorch model state to model.pth")

Saved pytorch model state to model.pth


## Loading Models

The process for loading a model includes re-creating the model structure and loading the state dictionary into it

In [15]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

This model can now be used to make predictions

In [16]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f"Preidcted: {predicted}, Actual: {actual}")

Preidcted: Ankle boot, Actual: Ankle boot
