# PyTorch QuickGuide

Author: Kellen Sullivan

This is a quick guide for how to load in pre-built models provided by PyTorch to train and deploy on your own on dataset. If you are interested in learning all the PyTorch basics, and how to build your very own model, checkout the official PyTorch begineers guide [here](https://docs.pytorch.org/tutorials/beginner/basics/intro.html).

## Getting Started
To get started using PyTorch, you first have to install it! To do so, open a new terminal and run the following command:

- If you are on Windows/Mac: `pip3 install torch torchvision`
- If you are on Linux: `pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu`

Note that these commands will use your CPU as the compute platform. If you have an NVIDIA GPU, you can install PyTorch with CUDA as the compute platform to greatly speed up training. Check out the official installation guide to learn more: https://pytorch.org/get-started/locally/

You can confirm PyTorch was successfully installed by importing the package and printing out its version.

In [2]:
import torch

print(torch.__version__)

2.8.0+cpu


## Loading a Prebuilt Model

PyTorch has libraries such as [torchvision](https://docs.pytorch.org/vision/0.8/models.html), [torchaudio](https://docs.pytorch.org/audio/stable/models.html), and [torchtext](https://docs.pytorch.org/text/stable/models.html) that provide prebuilt models to quickly process image data, audio data, or text data. In this quickguide, we will use a prebuilt model from torchvision to classify images into 1 of 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. 

### Datasets

Before we load in a prebuilt model, we need to get data. Thankfully torchvision provides datasets that can be easily accessed to train neural networks in PyTorch, including the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset that we will be training a model on!

CIFAR-10 contains small 32×32 pixel color images across 10 categories.
However, the prebuilt ResNet-18 model we will later use to classify CIFAR-10 images was originally trained on [ImageNet](https://www.image-net.org/download.php?utm_source=chatgpt.com), a much larger dataset with 224×224 pixel images and 1,000 output classes.

Because of these differences, we’ll need to do two things:
1. Resize the CIFAR-10 images to match the input size expected by ResNet-18 (224x224).
2. Modify the model’s final layer so it predicts 10 classes instead of 1,000. We will do this later in the PreBuilt Models section.

To start, we’ll define a transform to resize the CIFAR-10 images and prepare them for use with the ResNet-18 prebuilt model.

In [3]:
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),  # resize images: 32x32 -> 224x224
    transforms.ToTensor()
])

training_data = datasets.CIFAR10(
    root="data", 
    train=True, 
    download=True, 
    transform=transform
)

test_data = datasets.CIFAR10(
    root="data", 
    train=False, 
    download=True, 
    transform=transform
)

We load training data and testing data separately, that way we can train the model using the training data and then test it on never before seen data, to ensure the model is indeed learning how to classify images and not just memorizing the training dataset. 

`training_data` is a set of 50,000 3x32x32 images with their corresponding labels (3 representes the 3 color channels red, green, and blue)

`test_data` is a set of 10,000 3x32x32 images with their corresponding labels 

The parameters do the following:
- `root` is the path where the train/test data is stored
- `train` specifies training or test dataset
- `download` if true, dowloads the data from the internet if it's not available at `root`.
- `transform` specify the feature and label transformations

To check that we successfully loaded the data, let's print a random example image from the training dataset.

In [1]:
import matplotlib.pyplot as plt
import random

idx = random.randint(0, len(training_data) - 1)
img, label = training_data[idx]

# show image
plt.imshow(img.permute(1, 2, 0))
plt.title(training_data.classes[label])
plt.axis("off")
plt.show()

NameError: name 'training_data' is not defined

### DataLoader
Before we can start training, we need an efficient way to feed images from our dataset into our model. Thankfully, PyTorch’s `DataLoader` class makes this simple. It wraps an existing dataset and handles batching, shuffling, and loading data in parallel, which can greatly improve training efficiency. The key difference between a dataset and `DataLoader` is that a dataset provides access to individual samples, while a `DataLoader` controls how those samples are efficiently fed to the model during training.

The following code creates two DataLoaders, one for training and one for testing:

In [None]:
train_loader = torch.utils.data.DataLoader(
    training_data,
    batch_size=64, # number of samples loaded per batch
    shuffle=True,  # randomize the order of samples
    num_workers=2  # number of subprocesses used to load data
)

test_loader = torch.utils.data.DataLoader(
    test_data,
    batch_size=64,
    shuffle=False,
    num_workers=2
)

These parameters determine how your data is batched, shuffled, and loaded
- `batch_size` is how many samples are loaded per batch.
- `shuffle=True` randomizes the sample order each time the data is loaded, preventing the model from memorizing the order.
- `num_workers` is the number of subprocesses used to load data in parallel.

### Prebuilt Models

Now that we have loaded our data and can iterate through it, we can load in our pre-built model! You can read about many different prebuilt models provided by torchvision [here](https://docs.pytorch.org/vision/0.9/models.html). For this quickguide we will use the [ResNet-18](https://docs.pytorch.org/vision/2.0/models/generated/torchvision.models.resnet18.html) model due to its moderate size and robust feature extraction capabilities.  

In [None]:
import torchvision.models as models
import torch.nn as nn

resnet_model = models.resnet18(weights=None) # only load model structure, not model weights

You can view the structure of any PyTorch model by simply printing it

In [8]:
print(resnet_model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

Notice that by default, the ResNet18 model’s final layer is defined as the following: 

`(fc): Linear(in_features=512, out_features=1000, bias=True)`

Since the CIFAR-10 dataset has only 10 classes, we need to adjust the final layer so the model produces 10 output scores instead of the default 1000. We can do this by replacing the last layer with a new fully connected layer that maps the same 512 input features to 10 output nodes:

In [None]:
resnet_model.fc = nn.Linear(resnet_model.fc.in_features, 10)  # overwrite resnet_model.fc layer to have 10 output scores

print(resnet_model)

The final layer should now have 10 outputs, matching the exact structure we need to classify CIFAR-10 images!

## Train a Model

Our ResNet18 model contains over 11 million parameters (weights and biases) that influence how the model classifies images. In order for the model to accurately classify images, it must learn the optimal values for those weights and biases through training. 

Before we can train our model, we need to define two key components: a loss function and an optimizer.
- The [loss function](https://docs.pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#loss-function) measures how far the model’s predictions are from the correct labels. The lower the loss, the better the model is performing.
- The [optimizer](https://docs.pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#optimizer) updates the model’s parameters based on the loss, nudging them in directions that reduce future errors. It uses an algorithm called backpropagation to compute how much each parameter contributed to the loss and how it should be adjusted.

In [None]:
import torch.optim as optim

loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet_model.parameters(), lr=0.001) # lr: learning rate affects how quickly the model adjusts parameters

We train the model by iterating over all the images in our training dataset. For each batch of images and their corresponding labels:
1. We reset the gradients using `optimizer.zero_grad()` so updates from previous batches don’t accumulate.
2. We pass the images through the model to get predictions.
3. We calculate the loss between the predictions and true labels.
4. We backpropagate the loss using `loss.backward()`, which computes how each parameter should change.
5. We update the model parameters with `optimizer.step()`.

Over many iterations, the model gradually adjusts its 11 million+ parameters to improve its predictions.

In [None]:
running_loss = 0.0
for i, (images, labels) in enumerate(train_loader): 
    optimizer.zero_grad()                 # reset gradients from the previous batch
    outputs = resnet_model(images)        # get model predictions for this batch of images
    loss = loss_function(outputs, labels) # calculate loss
    loss.backward()                       # compute gradients for each weight (how each weight should change to minimize loss)
    optimizer.step()                      # update model weights

    # Track running loss
    running_loss += loss.item()
    if (i + 1) % 500 == 0:  # print every 500 mini-batches 
        print(f"[Batch {i+1:5d}] loss: {running_loss / 500:.4f}")
        running_loss = 0.0

print("Finished Training")

[Batch   500] loss: 1.4802
Finished Training


If you want to learn more about how gradients and backpropagation work, check out the [PyTorch Optimization Tutorial](https://docs.pytorch.org/tutorials/beginner/basics/optimization_tutorial.html)

## Evaluate a Model

After training, we need to test how well our model generalizes to unseen data. This step is crucial because a model that performs well on training data might simply be memorizing it rather than learning meaningful patterns — a problem known as overfitting.

To evaluate performance, we use the test dataset, which the model has never seen during training. We pass each batch of test images through the model, compare its predictions to the true labels, and compute the overall accuracy.

In [None]:
correct, total = 0, 0
resnet_model.eval()    # set the model to evaluation mode (turns off dropout, etc.)
with torch.no_grad():  # disable gradient tracking since we’re not updating weights
    for images, labels in test_loader: 
        outputs = resnet_model(images)                 # get model predictions for a batch of images
        _, predicted = torch.max(outputs, 1)           # select the class with the highest score
        total += labels.size(0)                        # count total number of images processed
        correct += (predicted == labels).sum().item()  # count how many predictions were correct

print(f"Test accuracy: {100 * correct / total:.2f}%")

Test accuracy: 62.28%


This gives the model's accuracy percentage or the percentage of test images that were correctly classified. For more advanced analysis of classification models, you could also compute metrics like [precision](https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall#precision), [recall](https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall#recall_or_true_positive_rate), or a [confusion matrix](https://www.geeksforgeeks.org/machine-learning/confusion-matrix-machine-learning/), but accuracy is a good starting point for quick model evaluation.

## Save a Model

Once training is complete, you can save your model so it can be reused later without retraining. To do so, we will use the `torch.save` method to save the model structure and weights to a file called `model.pth`.

In [None]:
torch.save(resnet_model, 'model.pth') # save model to model.pth file

You can then reload the model with all the weights it learned while training using the `torch.load` method.

In [None]:
resnet_model = torch.load('model.pth') # load full model stored in model.pth

## Conclusion and Further Readings

Congratulations for completing the PyTorch Quick Guide! 🎓🎉 You now have the skills to load, train, and test a PyTorch Model! 

If you want to learn all the PyTorch basics, check out some of these resources:
- [Official PyTorch Begineers Guide](https://docs.pytorch.org/tutorials/beginner/basics/intro.html).
- [Deep Learning with PyTorch: A 60 Minute Blitz](https://docs.pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)