<a href="https://colab.research.google.com/github/subedinab/ML-projects/blob/master/cultural_heritage_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. # Image Classification using Resnet9 model

Here, I am using cultural heritage dataset, a dataset with around 2.5k images of 6 cultural heritage sites around the nepal.

Url of the dataset : https://www.kaggle.com/datasets/nabarajsubedi/tcultural-heritage-classification

Dataset properties :

        Total number of images : 2476 images

        Training set size : 2000 images

        Test set size : 3000 images

        Pred set size : 7301 images

        Number of classes : 6

        Classes : 'Bhaktapur_Durbar_Square', 'Patan Dhurbar Square', 'phasupatinath', 'lumbini', 'swayambhunath', 'Boudhha'

        Image size : 150x150 pixels

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
import torch
import torchvision
import tarfile
import torch.nn as nn
import numpy as np
import torch.nn.functional as F
from torchvision.datasets import ImageFolder
import torchvision.transforms as tt
from torch.utils.data import Dataset, random_split, DataLoader
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
project_name='image-classification-resnet9'

# Exploring the dataset

In [None]:
data_dir = '../input/tcultural-heritage-classification/classification'
print(os.listdir(data_dir))
classes = os.listdir(data_dir + "/seg_train/seg_train")
print(classes)

## Preparing Datasets and Dataloaders

## Data Preprocessing

Before we create our dataset, we have do **Data Augmentation** which is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. In order to improve the performance and ability of the model to generalize.

We can do this by resizing, shifting, flipping, croping, zoom-in or zoom-out a images and many more...





In [None]:
train_tfms = tt.Compose([tt.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
                         tt.Resize((150,150)),
                         tt.RandomCrop(150, padding=4, padding_mode='reflect'),
                         tt.RandomHorizontalFlip(),
                         tt.RandomRotation(10),
                         tt.ToTensor()])
                        # tt.Normalize(*stats,inplace=True)])
valid_tfms = tt.Compose([tt.Resize((150,150)),tt.ToTensor()])#, tt.Normalize(*stats)])

The dataset is split into 3 parts :

* Training set : This is used to train the model i.e. compute the loss and adjust the weights of the model using gradient descent.
* Validation set : This is used to evaluate the model while training, adjust hyperparameters (learning rate etc.) and pick the best version of the model.
* Test set : This is used to compare different models, or different types of modeling approaches, and report the final accuracy of the model.

In [None]:
train_ds = ImageFolder(data_dir+'/seg_train/seg_train', train_tfms)
valid_ds = ImageFolder(data_dir+'/seg_test/seg_test', valid_tfms)
test_ds = ImageFolder(data_dir+'/seg_pred', transform=valid_tfms)

In [None]:
len(train_ds), len(valid_ds), len(test_ds)

Each element from the training dataset is a tuple, containing a image tensor and a label. Since the data consists of 150 x 150 px color images with 3 channels (RGB). So, each image tensor has the shape (3, 150, 150) :

In [None]:
img, label = train_ds[0]
img_shape = img.shape
img_shape

The list of classes is stored in the .classes property of the dataset. The numeric label for each element corresponds to index of the element's label in the list of classes.

In [None]:
train_ds.classes

This dataset consists of 3-channel color images (RGB). We can view the image using matplotlib, but we need to change the tensor dimensions to (150,150,3) as matplotlib expects channels to be the last dimension of the image tensors (whereas in PyTorch they are the first dimension), so we'll the .permute tensor method to shift channels to the last dimension. Let's create a helper function to display an image and its label.

In [None]:
import matplotlib.pyplot as plt

def show_example(img, label):
    print('Label: ', train_ds.classes[label], "("+str(label)+")")
    plt.imshow(img.permute(1, 2, 0))

In [None]:
show_example(*train_ds[1])

In [None]:
show_example(*train_ds[800])

Now, we'll create a DataLoader, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

In [None]:
batch_size = 100

In [None]:
# PyTorch data loaders
train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=3, pin_memory=True)
valid_dl = DataLoader(valid_ds, batch_size*2, num_workers=3, pin_memory=True)

Let's look at the batches of images from the dataset using the make_grid method from torchvision :

In [None]:
def show_batch(dl):
    for images, labels in dl:
        fig, ax = plt.subplots(figsize=(12, 12))
        ax.set_xticks([]); ax.set_yticks([])
        ax.imshow(make_grid(images[:32], nrow=8).permute(1, 2, 0))
        break

In [None]:
show_batch(train_dl)

# Using a GPU
It is advisable to use GPU instead of CPU when dealing with images dataset because CPUs are generalized for general purpose and GPUs are optimized for training artificial intelligence and deep learning models as they can process multiple computations simultaneously. They have a large number of cores, which allows for better computation of multiple parallel processes. Additionally, computations in deep learning need to handle huge amounts of data — this makes a GPU’s memory bandwidth most suitable.

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.

In [None]:
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')

def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device

    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl:
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

In [None]:
device = get_default_device()
device

Now wrap up our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU (if available).

In [None]:
train_dl = DeviceDataLoader(train_dl, device)
valid_dl = DeviceDataLoader(valid_dl, device)

# Our Model


In this model, unlike in traditional neural networks, each layer feeds into the next layer, we use a network with residual blocks, each layer feeds into the next layer and directly into the layers about 2–3 hops away, to avoid overfitting.

In [None]:
class SimpleResidualBlock(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()

    def forward(self, x):
        out = self.conv1(x)
        out = self.relu1(out)
        out = self.conv2(out)
        return self.relu2(out) + x # ReLU can be applied before or after adding the input

In [None]:
simple_resnet = to_device(SimpleResidualBlock(), device)

for images, labels in train_dl:
    out = simple_resnet(images)
    print(out.shape)
    break

del simple_resnet, images, labels
torch.cuda.empty_cache()

In [None]:
def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))

class ImageClassificationBase(nn.Module):
    def training_step(self, batch):
        images, labels = batch
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels) # Calculate loss
        return loss

    def validation_step(self, batch):
        images, labels = batch
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}

    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

    def epoch_end(self, epoch, result):
        print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
            epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))

In [None]:
def conv_block(in_channels, out_channels, pool=False, pool_no=2):
    layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
              nn.BatchNorm2d(out_channels),
              nn.ReLU(inplace=True)]
    if pool: layers.append(nn.MaxPool2d(pool_no))
    return nn.Sequential(*layers)

class ResNet9(ImageClassificationBase):
    def __init__(self, in_channels, num_classes):
        super().__init__()

        self.conv1 = conv_block(in_channels, 64)
        self.conv2 = conv_block(64, 128, pool=True, pool_no=3)
        self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))

        self.conv3 = conv_block(128, 256, pool=True)
        self.conv4 = conv_block(256, 512, pool=True, pool_no=5)
        self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))

        self.classifier = nn.Sequential(nn.MaxPool2d(5),
                                        nn.Flatten(),
                                        nn.Linear(512, num_classes))

    def forward(self, xb):
        out = self.conv1(xb)
        out = self.conv2(out)
        out = self.res1(out) + out
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.res2(out) + out
        out = self.classifier(out)
        return out

In [None]:
model = to_device(ResNet9(3, 6), device)
model

# Train Our Model

In [None]:
@torch.no_grad()
def evaluate(model, val_loader):
    model.eval()
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader,
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    torch.cuda.empty_cache()
    history = []

    # Set up cutom optimizer with weight decay
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    # Set up one-cycle learning rate scheduler
    sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs,
                                                steps_per_epoch=len(train_loader))

    for epoch in range(epochs):
        # Training Phase
        model.train()
        train_losses = []
        lrs = []
        for batch in train_loader:
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()

            # Gradient clipping
            if grad_clip:
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)

            optimizer.step()
            optimizer.zero_grad()

            # Record & update learning rate
            lrs.append(get_lr(optimizer))
            sched.step()

        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        history.append(result)
    return history

In [None]:
history = [evaluate(model, valid_dl)]
history

In [None]:
epochs = 8
max_lr = 0.001
grad_clip = 0.1
weight_decay = 1e-4
opt_func = torch.optim.AdamW

In [None]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, valid_dl,
                             grad_clip=grad_clip,
                             weight_decay=weight_decay,
                             opt_func=opt_func)

In [None]:
train_time='9:58'

Let's plot the valdation set accuracies to study how the model improves over time.

In [None]:
def plot_accuracies(history):
    accuracies = [x['val_acc'] for x in history]
    plt.plot(accuracies, '-x')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.title('Accuracy vs. No. of epochs');

In [None]:
plot_accuracies(history)

Let's plot the training and validation losses to study the trend.

In [None]:
def plot_losses(history):
    train_losses = [x.get('train_loss') for x in history]
    val_losses = [x['val_loss'] for x in history]
    plt.plot(train_losses, '-bx')
    plt.plot(val_losses, '-rx')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['Training', 'Validation'])
    plt.title('Loss vs. No. of epochs');

In [None]:
plot_losses(history)

It's clear from the trend that our model isn't overfitting to the training data just yet. Finally, let's visualize how the learning rate changed over time, batch-by-batch over all the epochs.

In [None]:
def plot_lrs(history):
    lrs = np.concatenate([x.get('lrs', []) for x in history])
    plt.plot(lrs)
    plt.xlabel('Batch no.')
    plt.ylabel('Learning rate')
    plt.title('Learning Rate vs. Batch no.');

In [None]:
plot_lrs(history)

# Predictions...

Let's predict some images. In this dataset test_ds dataset doesn't have labels but images are pretty much clear that we can guess it by seeing that our model predict it well or not.

In [None]:
def predict_image(img, model):
    # Convert to a batch of 1
    xb = to_device(img.unsqueeze(0), device)
    # Get predictions from model
    yb = model(xb)
    # Pick index with highest probability
    _, preds  = torch.max(yb, dim=1)
    # Retrieve the class label
    return train_ds.classes[preds[0].item()]

In [None]:
img, _= test_ds[90]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

In [None]:
img, _= test_ds[219]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

In [None]:
img, _= test_ds[67]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

In [None]:
img, _= test_ds[79]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

In [None]:
img, _= test_ds[489]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

Pretty well predictions!!!


In [None]:
img, _= test_ds[6432]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

**But, here are some images that even we could confused too.**

In [None]:
img, _= test_ds[745]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

In [None]:
img, _= test_ds[725]
plt.imshow(img.permute(1, 2, 0))
print('Predicted:', predict_image(img, model))

There two classes 'building' or 'street' are present in these image. Model can guess any of them, and even we too. I cannot say anything about these which one is right or wrong prediction, and labels for test_ds data set is also not given. This can be resolved by multi-label image classification problem, where each image can belong to several classes or take that data set having each data belong to any one of the given class and test data set also have the labels….

#### Though our model able to predict images with the about 91% accuracy...

# Save and commit the model and notebook

Since we've trained our model for a long time and achieved a resonable accuracy, it would be a good idea to save the weights and bias matrices to disk, so that we can reuse the model later and avoid retraining from scratch. Here's how you can save the model.

In [None]:
torch.save(model.state_dict(), 'image-classification-resnet.pth')

In [None]:
!pip install jovian --upgrade --quiet

In [None]:
import jovian

Record the hyperparameters of every experiment we do, to replicate it later and compare it against other experiments. We can record them using jovian.log_hyperparams.

In [None]:
jovian.reset()
jovian.log_hyperparams(arch='resnet9',
                       epochs=epochs,
                       lr=max_lr,
                       scheduler='one-cycle',
                       weight_decay=weight_decay,
                       grad_clip=grad_clip,
                       opt=opt_func.__name__)

Just as we have recorded the hyperparameters, we can also record the final metrics achieved by the model using jovian.log_metrics for reference, analysis and comparison.

In [None]:
jovian.log_metrics(val_loss=history[-1]['val_loss'],
                   val_acc=history[-1]['val_acc'],
                   train_loss=history[-1]['train_loss'],
                   time=train_time)

Now, save and commit our work using the jovian library. Along with the notebook, we can also attach the weights of our trained model, so that we can use it later.

In [None]:
jovian.commit(project=project_name, environment=None, outputs=['image-classification-resnet.pth'])