# TORCHVISION

`torchvision` is a library in PyTorch that provides tools and utilities for working with image data. It includes pre-trained models, datasets, transforms, and utilities for building computer vision systems. It contains the following modules:

1. __torchvision.datasets__: It provides access to popular vision datasets
2. __torchvision.transforms__: offers preprocessing and data augumentation utilities
3. __torchvision.models__: includes pre-trained models for classifications, detection and segmentation tasks
4. __torchvision.utils__: provides helper functions for visualizing and saving images.


## Imports

In [10]:
from torchvision import datasets
from torchvision.transforms import ToTensor, Compose, Resize, Normalize
from torch.utils.data import DataLoader
from torchvision import models
import torch
import torch.nn as nn

## Datasets

Torchvision includes several pre-defined datasets for common tasks like classification, detection, and segmentation. These datasets automatically handle downloading and preparing data, such as:

1. MNIST
2. CIFAR10/100
3. ImageNet
4. COCO
5. VOC (Pascal VOC)
6. CelebA
7. LSUN
8. Cityscapes

Loading these datasets involves creating an instance of a dataset class, specifying its location, whether to download it and or apply any transforms to it.

 __Dataset Attributes__
- _root_: The directory where data will be stored/downloaded.
- _train_: Whether to load the training or test split.
- _transform_: Preprocessing transformations to apply to the images.
- _target_transform_: Transformations to apply to the labels.

In [2]:
train_dataset = datasets.CIFAR10(
    root="data", 
    train=True, 
    download=True, 
    transform=ToTensor()
)

test_dataset = datasets.CIFAR10(
    root="data", 
    train=False, 
    download=True, 
    transform=ToTensor()
)


Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:04<00:00, 36868602.44it/s]


Extracting data/cifar-10-python.tar.gz to data
Files already downloaded and verified


sometimes, you don't want to apply only one transform as in the case above, you would then be required to chain a series of transforms using the `Compose` function available in the `torch.transforms`.

In [3]:
transforms = Compose([
    Resize((128, 128)),
    ToTensor(),
    Normalize(mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
])

mnist_train_dataset = datasets.MNIST(
    root = "data",
    train = True,
    download=True,
    transform = transforms
)

mnist_test_dataset = datasets.MNIST(
    download=True,
    root="data",
    transform=transforms,
    train= False
)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 17601313.78it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 476959.13it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 4415922.73it/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 6564620.53it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






## Data Loaders

Torchvision datasets are compatible with PyTorch DataLoader, which provides an easy way to iterate through datasets in batches. Its major params include:absabs


- _dataset_: The dataset from which to load data (e.g., Torchvision or custom dataset).
- _batch_size_:	Number of samples per batch (default: 1).
- _shuffle_: Whether to shuffle the data (default: False).
- _num_workers_: Number of subprocesses for data loading (default: 0, meaning data is loaded in the main process).
- _pin_memory_: If True, the DataLoader will copy tensors to CUDA pinned memory for faster GPU transfers.
- _drop_last_ : Whether to drop the last incomplete batch (default: False).
- _sampler_: A custom sampler that dictates how data is drawn from the dataset.
- _collate_fn_: A function to merge samples into batches.

In [5]:
train_loader = DataLoader(
    dataset = train_dataset,
    batch_size = 64,
    shuffle = True,
    num_workers = 2
)

for images, labels in train_loader:
    print(images.shape, labels.shape)

torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size([64, 3, 32, 32]) torch.Size([64])
torch.Size

## transforms

This module allows for preprocessing and augumenting image data. Common transforms includes:

1. `ToTensor()` - converts PIL images or numpy arrays to pytorch tensors
2. `Resize()` - Resizes images to a specific size
3. `Normalize()` - Normalize pixel values
4. `RandomHorizontalFlip()` - Randomly flips images horizontally
5. `ColorJitter()` - Adjusts brightness, contrast, saturation & hue.


From an example above, you use the `Compose` method to chain these transforms:

In [None]:
transforms =  Compose([
    Resize((64, 64)),
    ToTensor(),
    Normalize(mean=[0.5,0.5,0.5],std=[0.5,0.5,0.5])
])

## Pre-trained models

TorchVision provides several pre-trained models for tasks like image classification, object detection, and segmentation. These models are trained on large datasets like ImageNet, COCO, or Pascal VOC.

### Categories of Pre-trained models
#### image classification

1. ResNet
2. AlexNet
3. VGG
4. GoogleNet
5. MobileNet
6. EfficientNet
7. Vision Transformers

Pre-trained on imageNet

#### Object Detection & Instance Segmentation

1. Faster R-CNN
2. mask R-CNN
3. RetinaNet
4. SSD

Pre-trained on COCO

#### Semantic Segementation

1. DeepLabV3
2. FCN

pre-trained on COCO & Pascal VOC

#### Video classification

1. SlowFast
2. R(2+1)D
3. MC3

pre-trained on Kinetics

#### Generative models

1. DCGAN

Trained on custom or standard datasets

In [9]:

model = models.resnet50(pretrained=True)

# normal model pipeline resumes here:
model.eval()
input = torch.randn(1,3, 224,224)
output = model(input)
print(output.shape)

torch.Size([1, 1000])


## Transfer learning with pre-trained models

You can fine-tune a pre-trained model for custom tasks by freezing the feature extractor layers and replacing the classifier

In [None]:
model = models.resnet50(pretrained = True)
for param in model.parameters():
    params.requires_grad = False

num_classes = 10 # custom dataset with only 10 classes (in resnet50, it is trained using imageNet which has over 10,000 classes)
model.fc = nn.Linear(model.fc.in_features, num_classes)
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.01) # trains only the new layer (remember, optimizer is responsible for backprop and updates)

## torchvision utils

### torchvision.models.list_models

This lists all available models in torchvision, you can optionally filter by task or weights availability

In [11]:
from torchvision.models import list_models
all_models = list_models()
print(all_models)

['alexnet', 'convnext_base', 'convnext_large', 'convnext_small', 'convnext_tiny', 'deeplabv3_mobilenet_v3_large', 'deeplabv3_resnet101', 'deeplabv3_resnet50', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3', 'efficientnet_b4', 'efficientnet_b5', 'efficientnet_b6', 'efficientnet_b7', 'efficientnet_v2_l', 'efficientnet_v2_m', 'efficientnet_v2_s', 'fasterrcnn_mobilenet_v3_large_320_fpn', 'fasterrcnn_mobilenet_v3_large_fpn', 'fasterrcnn_resnet50_fpn', 'fasterrcnn_resnet50_fpn_v2', 'fcn_resnet101', 'fcn_resnet50', 'fcos_resnet50_fpn', 'googlenet', 'inception_v3', 'keypointrcnn_resnet50_fpn', 'lraspp_mobilenet_v3_large', 'maskrcnn_resnet50_fpn', 'maskrcnn_resnet50_fpn_v2', 'maxvit_t', 'mc3_18', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'mobilenet_v2', 'mobilenet_v3_large', 'mobilenet_v3_small', 'mvit_v1_b', 'mvit_v2_s', 'quantized_googlenet', 'quantized_inception_v3', 'quantized_mobilenet_v2

In [13]:
models = list_models(pretrained=True)
print(models)

TypeError: list_models() got an unexpected keyword argument 'pretrained'

In [18]:
# List models for image classification
classification_models = list_models(module="image_classification")
print(classification_models)


AttributeError: 'str' object has no attribute '__name__'