## PyTorch Basics
#### ML workflow implemented in PyTorch

Most machine learning workflows involve working with data, creating models, optimizing model parameters, and saving the trained models.
We’ll use the FashionMNIST dataset to train a neural network that predicts if an input image belongs to one of the following classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, or Ankle boot.
####  setup PyTorch and TorchVision 

## Working with data
PyTorch has two primitives to work with data:$ torch.utils.data.DataLoade$r and$ torch.utils.data.Datase$t.$Datase$t stores the samples and their corresponding labels, and$DDataLoade$r wraps an iterable around the Dataset.

In [2]:
#pip install torchvision

Collecting torchvisionNote: you may need to restart the kernel to use updated packages.

  Downloading torchvision-0.21.0-cp311-cp311-win_amd64.whl.metadata (6.3 kB)
Downloading torchvision-0.21.0-cp311-cp311-win_amd64.whl (1.6 MB)
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
    --------------------------------------- 0.0/1.6 MB 131.3 kB/s eta 0:00:12
    --------------------------------------- 0.0/1.6 MB 131.3 kB/s eta 0:00:12
    --------------------------------------- 0.0/1.6 MB 131.3 kB/s eta 0:00:12
   - -------------------------------------- 0.0/1.6 MB 115.9 kB/s eta 0:00:14
   - -------------------------------------

In [3]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

PyTorch offers domain-specific libraries such as TorchText, TorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.
   
    The torchvision.datasets module contains Dataset objects for many real-world vision data like CIFAR, COCO (full list here). In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

In [4]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

100%|██████████████████████████████████████████████████████████████████████████████| 26.4M/26.4M [00:51<00:00, 513kB/s]
100%|██████████████████████████████████████████████████████████████████████████████| 29.5k/29.5k [00:00<00:00, 126kB/s]
100%|██████████████████████████████████████████████████████████████████████████████| 4.42M/4.42M [00:24<00:00, 183kB/s]
100%|█████████████████████████████████████████████████████████████████████████████████████| 5.15k/5.15k [00:00<?, ?B/s]


We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

In [9]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for x, y in test_dataloader:
    print(f" Shape of x [N, C, H, W]:\n {x.shape}")
    print(f" Shape of y: {y.shape} {y.dtype}")
    break

 Shape of x [N, C, H, W]:
 torch.Size([64, 1, 28, 28])
 Shape of y: torch.Size([64]) torch.int64


## Creating Models
To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the$ __inti__$_ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the accelerator such as CUDA, MPS, MTIA, or XPU. If the current accelerator is available, we will use it. Otherwise, we use the CPU.

In [10]:
#check device type
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")



Using cpu device


In [11]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Optimizing the Model Parameters
To train a model, we need a loss function and an optimizer.

In [12]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.

In [13]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (x, y) in enumerate(dataloader):
        x, y = x.to(device), y.to(device)

        # Compute prediction error
        pred = model(x)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(x)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model’s performance against the test dataset to ensure it is learning.

In [14]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

In [15]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.300275  [   64/60000]
loss: 2.285605  [ 6464/60000]
loss: 2.266729  [12864/60000]
loss: 2.264498  [19264/60000]
loss: 2.242782  [25664/60000]
loss: 2.209967  [32064/60000]
loss: 2.222518  [38464/60000]
loss: 2.183590  [44864/60000]
loss: 2.180926  [51264/60000]
loss: 2.157121  [57664/60000]
Test Error: 
 Accuracy: 45.1%, Avg loss: 2.141966 

Epoch 2
-------------------------------
loss: 2.150041  [   64/60000]
loss: 2.137763  [ 6464/60000]
loss: 2.072348  [12864/60000]
loss: 2.096393  [19264/60000]
loss: 2.040276  [25664/60000]
loss: 1.977141  [32064/60000]
loss: 2.014658  [38464/60000]
loss: 1.922233  [44864/60000]
loss: 1.923998  [51264/60000]
loss: 1.875288  [57664/60000]
Test Error: 
 Accuracy: 56.6%, Avg loss: 1.854428 

Epoch 3
-------------------------------
loss: 1.881334  [   64/60000]
loss: 1.854585  [ 6464/60000]
loss: 1.727232  [12864/60000]
loss: 1.783282  [19264/60000]
loss: 1.673020  [25664/60000]
loss: 1.624694  [32064/600

## Saving Models
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).th")

In [17]:

torch.save(model.state_dict(), "SmilesPyTorchModel.pth")
print("Saved PyTorch Model State to SmilesPyTorchModel.pth")

Saved PyTorch Model State to SmilesPyTorchModel.pth


## Loading Models
The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

In [19]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("SmilesPyTorchModel.pth", weights_only=True))

<All keys matched successfully>

In [24]:
NeuralNetwork().to(device)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

## Make Predictions with custom made model

In [20]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x1, y = test_data[0][0], test_data[0][1]
with torch.no_grad(): 
    x1 = x1.to(device)
    pred = model(x1)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


In [26]:
test_data[3][0], 

(tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0824, 0.4824, 0.4235, 0.3882, 0.3882, 0.3294, 0.3255,
           0.3373, 0.3608, 0.2745, 0.0235, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.4157, 0.9725, 0.9020, 0.8039, 0.9373, 0.8314, 0.6824,
           0.8431, 0.8118, 0.5451, 0.3647, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.4980, 0.8471, 0.8353, 0.8039, 0.8392, 0.8392, 0.7569,
           0.8980, 0.7882, 0.6471, 0.3882, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.5725, 0.7647, 0.8980, 0.8314, 0.8941, 0.8431, 0.8196,
           0.9020, 0.8392, 0.6431, 0.2118, 

In [30]:
classes[test_data[3][1]]

'Trouser'

In [31]:
classes[model(test_data[3][0]).argmax()]

'Trouser'