# Torchvision 

Pytorch module including pretrained models, datasets, preprocessing capabilities...

# Image classification 

## Binary Classification 

## Multiclass Classification 


Data from (https://www.kaggle.com/code/tirendazacademy/cats-dogs-classification-with-pytorch/input)

In [30]:
import torch

from torchvision import datasets 

import torchvision.transforms as transforms

import kagglehub

In [31]:
class Config:
    pass
    
config = Config()

In [32]:
# Download latest version
dataset_path = kagglehub.dataset_download("tongpython/cat-and-dog")

print("Path to dataset files:", dataset_path)


Path to dataset files: /Users/el_fer/.cache/kagglehub/datasets/tongpython/cat-and-dog/versions/1


In [33]:
train_dataset_path = dataset_path + '/training_set/training_set'
train_dataset = datasets.ImageFolder(root=train_dataset_path, transform=transforms.ToTensor())

In [34]:
train_dataset.classes

['cats', 'dogs']

# Pre-Trained models 

Training models from scratch is a long process that requires a lot of data.

There are already trained models (pre-trained models) that can be either directly reusable on our task if its similar to the original or that can be adjusted to the new task (transfer learning)

## Saving a model

In pytorch you can save the model weigths using 

`torch.save(model.state_dict(), "BinaryCNN.pth")`

You can use the extensions `.pt` or `.pth` 

## Loading models 

Loading a pretrained model happens in two steps: first, instantiate the model and then, load the weigths: 

`new_model = BinaryCNN()` 

`new_model.load_state_dict(torch.load("BinaryCNN.pth"))` 



In [35]:
from torchvision.models import (resnet18, ResNet18_Weights)

weights = ResNet18_Weights.DEFAULT 
model = resnet18(weights=weights)
# the transforms allow us to process the data in order to feed the model with what it expects
transforms = weights.transforms()


In [36]:
from PIL import Image 

image = Image.open("cat2.jpg")
image_tensor = transforms(image)
image_reshaped = image_tensor.unsqueeze(0)

In [37]:
model.eval() 

with torch.no_grad(): 
    pred = model(image_reshaped).squeeze(0) 

pred_cls = pred.softmax(0) 
cls_id = pred_cls.argmax().item() 

cls_name = weights.meta["categories"][cls_id]  

print(cls_name)

Siamese cat


# Object Recognition 

Identification of objects in images. Its often performed using bounding boxes. 

## Bounding boxes 

Rectangular box describing the object spatial location within the image 

The ground truth bounding box contains the object precise location 

Bounding boxes are tipically represented using the top left and bottom right coordinates.

`Bounding box = (x1, y1, x2, y2)`

Images are composed of pixels arranged into a 2d matrix. The origin is the top left corner, with coordinates (0,0)

To be able to process images in pytorch we must convert pixels into tensors. 

The `ToTensor()` method converts images to tensors (torch.float type and scaled between [0.0, 1-0])

The `PILToTensor()` method converts images to tensors (torch.unint8 type and scaled between [0, 255])


In [56]:
import torchvision.transforms as transforms 

transform = transforms.Compose([
    transforms.Resize(224), 
    transforms.PILToTensor()
])

image = Image.open("cat2.jpg")

image_tensor = transform(image)

In [57]:
image_tensor.dtype

torch.uint8

In [61]:
image_tensor = image_tensor.permute(1, 2, 0)

## Drawing bounding boxes

In [63]:
from torchvision.utils import draw_bounding_boxes 

# bbox = torch.tensor([x_min, y_min, x_max, y_max])
bbox = torch.tensor([2, 2, 40, 40])
bbox_tensor = torch.tensor(bbox).unsqueeze(0)

# Implement draw_bounding_boxes
img_bbox = draw_bounding_boxes(image_tensor, bbox_tensor, width=3, colors="red")

# Tranform tensors to image
transform = transforms.Compose([
    transforms.ToPILImage()
])
plt.imshow(transform(img_bbox))
plt.show()
                      

  bbox_tensor = torch.tensor(bbox).unsqueeze(0)


ValueError: Only grayscale and RGB images are supported