# Homework 5 - Transfer Learning

In this homework you'll experiment with applying transfer learning for fine-grained classification using the Flowers102 dataset in torchvision.datasets.  Fine-grained classification is when you have many categories or classes that are similar like related series of flowers.  Or, for example, trying to distinguish breeds of dogs as opposed to cats, dogs, and foxes.

Note: we were able to train all the models described in this homework in about 40 minutes on the T4 Compute Server.  The ConvNext model was the biggest and took the most time.

## The Flowers102 dataset

There are 102 classes of flowers each with between 40 and 258 images. The dataset is available in torchvision as `torchvision.datasets.Flowers102`.  You can find more information about the [dataset here](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/).  The labels for the classes are also [available here](https://gist.github.com/JosephKJ/94c7728ed1a8e0cd87fe6a029769cde1).  

The dataset has three splits each of which can be accessed with code like this:

```python
train_dataset = Flowers102(root=DATA_PATH, split='train', download=True, transform = transform_train)
```

To get the validation and testing splits change split to 'valid' or 'test'.  

### Data Exploration (5 pts)

In this section you should explore the dataset a bit.  Plot a few examples and find at least two classes that have similar looking flowers.  Also how many images per class in the training and validation sets?  You may want to start with transforms that don't add any augmentation for the purposes of exploring.


### Augmentation and DataLoaders (5 pts)

Build your transforms for training.  Remember that for testing and validation the transforms shouldn't add any augmentation.  The images should be $224 \times 224$ when transformed since our pretrained models were trained on Imagenet with the same size images.  We used `batch_size = 32` on the T4 Compute Servers.  For normalization use the statistics from Imagenet since the pretrained models we are using expect that normalization.

### ResNet50 (5 pts)

The ResNet models establish good baselines for results.

Build a custom model class for ResNet50 (AI may be helpful here) with an adjustable number of output classes.  It should have methods to freeze and unfreeze the backbone.  Apply transfer learning instantiating your model with the default Imagenet weights and training with for 5 epochs followed by training for a suitable number of epochs (you may need to experiment).  Include graphics or display dataframes to show how the model is converging (at least for the unfrozen training).

Use the training and validation sets here.  The test set will be reserved for your final best model. 

What kind of validation accuracy are you able to achieve?  Is the model overfitting?

Note: the training dataset is already pretty small so downsampling it to expedite experimentation isn't a good idea, but you could temporarily reduce the size of the images to say 128x128 in your tranforms to get things working, then go back to 224x224 to train your models.  All final results should be done with 224x224.

### EfficientNet V2 Small (5 pts)

EfficientNet models are a modern upgrade to traditional convolutional neural networks, offering improved performance and efficiency.  Repeat what you did for ResNet50 for EfficientNet V2 Small.  Use AI to search for how to load it in torchvision and how to adapt in your custom model class.

### ConvNeXt Small (5 pts)

ConvNeXt models are a family of convolutional neural networks that aim to modernize the design of traditional CNNs by incorporating elements from vision transformers. They provide a strong performance baseline for various computer vision tasks.  Use transfer learning to train a ConvNeXT Small (not Tiny) model on Flowers102.

### ViT Small (5 pts)

Vision Transformers (ViTs) are a type of neural network architecture that leverages the transformer model, originally designed for natural language processing, to process image data. Unlike Convolutional Neural Networks (CNNs), which use convolutional layers to capture spatial hierarchies, ViTs divide images into patches and process them as sequences, allowing for global context understanding. ViTs typically require more data to train from scratch compared to CNNs, but they can be effectively used for transfer learning on smaller datasets if the images are similar to those in the Imagenet dataset.  We'll learn more about transformer models in the second half of the course.

We'll use the timm library which doesn't seem to be installed in CoCalc.  
To use ViT Small from the timm library, you can install timm with the following command:
```python
!pip install timm
```
Then, load the pre-trained ViT Small model with:
```python
import timm
model = timm.create_model('vit_small_patch16_224', pretrained=True)
```

The ViT Small model is pretrained on Imagenet and expects the same size images and same normalization as other models.  Typically we fine tune the whole model and don't train with a frozen backbone.  The learning rates used are usually smaller, too.  Do the same kind of fine tuning as you've done above using OneCycleLR with max_lr = 0.0005.  We found that the number of epochs needed was similar to the total number of epochs used in the two-phase training used by our other models.

### Apply Best Model to Test Data and Evaluate (10 pts)

Write a brief summary of your investigations above.  Include a graph comparing the training metrics from the fine-tuning phases on the validation data from above.

Generate a classification report comparing the predictions of your best model to the ground truth labels on the test dataset.  Summarize the highlights of the report.  A confusion matrix display probably isn't helpful because there are so many classes (set `display_confusion=False` if use `evaluate_classifier` from `introdl.utils`.)  But you can look at slices of the confusion matrix.  Try to identify at least two classes which are being confused by your model and display examples, with proper labels, from those classes.

In [1]:
# imports and configuration

import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torchvision.models as models
import torch.optim as optim
from torch.utils.data import DataLoader, Subset
from torchvision.datasets import Flowers102
from torch.optim.lr_scheduler import OneCycleLR

import torchvision.models as models
import torchvision.transforms.v2 as T
from torchvision.models import resnet34, ResNet34_Weights
from torchvision.models import resnet18, ResNet18_Weights
from torchvision.models import resnet50, ResNet50_Weights

from torchinfo import summary

from introdl.utils import get_device, load_results, load_model, config_paths_keys
from introdl.idlmam import train_network
from introdl.visul import plot_training_metrics, plot_transformed_images, create_image_grid, evaluate_classifier

sns.set_theme(style='whitegrid')
plt.rcParams['figure.figsize'] = [8, 6]  # Set the default figure size (width, height) in inches

paths = config_paths_keys()
MODELS_PATH = paths['MODELS_PATH']
DATA_PATH = paths['DATA_PATH']

MODELS_PATH=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\models
DATA_PATH=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\data
TORCH_HOME=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads
HF_HOME=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads


In [None]:
# use 
mean = [0.485, 0.456, 0.406]  # Imagenet
std = [0.229, 0.224, 0.225]  # Imagenet

In [7]:
transform_train = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Resize(224, max_size=None),  # Resize so the shortest edge is 224
    T.CenterCrop(224),            # Center crop to 224x224
    T.RandomRotation(degrees=15),
    T.RandomCrop(224, padding=10),            # Random crop padded img to 224x224
    T.RandomHorizontalFlip(),
    T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    T.RandomGrayscale(),
    T.Normalize(mean=mean, std=std),
    T.ToPureTensor()
])

transform_val = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Resize(224, max_size=None),  # Resize so the shortest edge is 224
    T.CenterCrop(224),            # Center crop to 224x224
    T.Normalize(mean=mean, std=std),
    T.ToPureTensor()
])


In [8]:
train_dataset = Flowers102(root=DATA_PATH, split='train', download=True, transform = transform_train)
valid_dataset = Flowers102(root=DATA_PATH, split='val', download=True, transform = transform_val)
test_dataset = Flowers102(root=DATA_PATH, split='test', download=True, transform = transform_val)
batch_size = 32

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

In [None]:
class ResNet50Custom(nn.Module):
    """
    A custom ResNet18 model with a modified final layer for a specified number of output classes.

    Args:
        num_outputs (int): The number of output classes for the modified final layer.
        weights (ResNet18_Weights or None): Pretrained weights to load for ResNet18. If None, the model is randomly initialized.

    Methods:
        freeze_backbone(): Freezes all layers of the backbone except the final classification head.
        unfreeze_backbone(): Unfreezes all layers of the backbone.
    """
    def __init__(self, num_outputs: int, weights=None):
        """
        Initializes the ResNet18Custom model.

        Args:
            num_outputs (int): The number of output classes for the modified final layer.
            weights (ResNet18_Weights or None): Pretrained weights for ResNet18. Defaults to None.
        """
        super(ResNet50Custom, self).__init__()
        # Load ResNet18 with specified weights (pretrained or None)
        self.model = models.resnet50(weights=weights)
        
        # Replace the final fully connected layer to fit the desired output size
        in_features = self.model.fc.in_features
        self.model.fc = nn.Linear(in_features, num_outputs)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width).

        Returns:
            torch.Tensor: Output tensor of shape (batch_size, num_outputs).
        """
        return self.model(x)

    def freeze_backbone(self):
        """
        Freezes all layers of the backbone except the final classification head.
        This is useful for transfer learning scenarios where only the head is fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = False
        
        # Ensure the final fully connected layer remains trainable
        for param in self.model.fc.parameters():
            param.requires_grad = True

    def unfreeze_backbone(self):
        """
        Unfreezes all layers of the backbone, allowing the entire model to be fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = True

model = ResNet50Custom(num_outputs=102, weights=ResNet50_Weights.DEFAULT)
model.freeze_backbone()
loss_func = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.AdamW(model.parameters())  # Adam optimizer

device = get_device()
print(device)

ckpt_file = MODELS_PATH / 'HW05_resnet50_frozen_backbone.pt'
epochs = 5

score_funcs = {'ACC':accuracy_score}
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file)

print('frozen backbone results')
print(results)
# load the model with the frozen backbone and unfreeze it
model = load_model(ResNet50Custom(num_outputs=102), MODELS_PATH / 'HW05_resnet50_frozen_backbone.pt')
model.unfreeze_backbone()

# Configure Training for unfrozen model
ckpt_file = MODELS_PATH / 'HW05_resnet50_unfrozen_backbone.pt'
epochs = 10
optimizer = optim.AdamW(model.parameters())
scheduler = OneCycleLR(optimizer, max_lr=0.001, epochs=epochs, steps_per_epoch=len(train_loader))

# Train and save
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file,
                        lr_schedule=scheduler,
                        scheduler_step_per_batch=True,
                        #early_stop_crit='max',
                        #early_stop_metric='ACC',
                        #patience=1,
                        pretend_train=False)

print(results)

In [92]:
device = get_device()
model = load_model(ResNet50Custom(num_outputs=102), MODELS_PATH / 'HW05_resnet50_unfrozen_backbone.pt', device)
conf_mat,report,missed_dataset=evaluate_classifier(model, test_dataset, device, display_confusion=False,img_size=(8,8),use_class_labels=False)
print(report)

The dataset has 6149 samples.
The model misclassified 754 samples.
              precision    recall  f1-score   support

           0       0.36      1.00      0.53        20
           1       1.00      0.90      0.95        40
           2       0.50      0.80      0.62        20
           3       0.62      0.67      0.64        36
           4       0.93      0.62      0.75        45
           5       0.85      0.92      0.88        25
           6       0.80      1.00      0.89        20
           7       0.84      1.00      0.92        65
           8       0.74      0.88      0.81        26
           9       0.93      1.00      0.96        25
          10       0.93      0.60      0.73        67
          11       0.86      0.94      0.90        67
          12       0.88      0.97      0.92        29
          13       0.87      0.93      0.90        28
          14       0.69      1.00      0.82        29
          15       0.73      0.76      0.74        21
          16  

In [16]:
class EfficientNetV2SCustom(nn.Module):
    """
    A custom EfficientNetV2-S model with a modified final layer for a specified number of output classes.

    Args:
        num_outputs (int): The number of output classes for the modified final layer.
        weights (EfficientNet_V2_S_Weights or None): Pretrained weights to load for EfficientNetV2-S. If None, the model is randomly initialized.

    Methods:
        freeze_backbone(): Freezes all layers of the backbone except the final classification head.
        unfreeze_backbone(): Unfreezes all layers of the backbone.
    """
    def __init__(self, num_outputs: int, weights=None):
        """
        Initializes the EfficientNetV2SCustom model.

        Args:
            num_outputs (int): The number of output classes for the modified final layer.
            weights (EfficientNet_V2_S_Weights or None): Pretrained weights for EfficientNetV2-S. Defaults to None.
        """
        super(EfficientNetV2SCustom, self).__init__()
        # Load EfficientNetV2-S with specified weights (pretrained or None)
        self.model = models.efficientnet_v2_s(weights=weights)
        
        # Replace the final fully connected layer to fit the desired output size
        in_features = self.model.classifier[1].in_features
        self.model.classifier[1] = nn.Linear(in_features, num_outputs)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width).

        Returns:
            torch.Tensor: Output tensor of shape (batch_size, num_outputs).
        """
        return self.model(x)

    def freeze_backbone(self):
        """
        Freezes all layers of the backbone except the final classification head.
        This is useful for transfer learning scenarios where only the head is fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = False
        
        # Ensure the final fully connected layer remains trainable
        for param in self.model.classifier[1].parameters():
            param.requires_grad = True

    def unfreeze_backbone(self):
        """
        Unfreezes all layers of the backbone, allowing the entire model to be fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = True

from torchvision.models import efficientnet_v2_s, EfficientNet_V2_S_Weights

# Example usage
model = EfficientNetV2SCustom(num_outputs=102, weights=EfficientNet_V2_S_Weights.DEFAULT)
model.freeze_backbone()
loss_func = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.AdamW(model.parameters())  # Adam optimizer

device = get_device()
print(device)

ckpt_file = MODELS_PATH / 'HW05_effnetv2s_frozen_backbone.pt'
epochs = 5

score_funcs = {'ACC':accuracy_score}
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file)
print('results from first frozen')
print(results)
# load the model with the frozen backbone and unfreeze it
model = load_model(EfficientNetV2SCustom(num_outputs=102), MODELS_PATH / 'HW05_effnetv2s_frozen_backbone.pt')
model.unfreeze_backbone()

# Configure Training for unfrozen model
ckpt_file = MODELS_PATH / 'HW05_effnetv2s_unfrozen_backbone.pt'
epochs = 10
optimizer = optim.AdamW(model.parameters())
scheduler = OneCycleLR(optimizer, max_lr=0.001, epochs=epochs, steps_per_epoch=len(train_loader))

# Train and save
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file,
                        lr_schedule=scheduler,
                        scheduler_step_per_batch=True,
                        #early_stop_crit='max',
                        #early_stop_metric='ACC',
                        #patience=1,
                        pretend_train=False)

print('results from unfrozen')
print(results)

cuda


Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

results from first frozen
   epoch  total time  train loss  val loss  train ACC   val ACC
0      0   11.369319    4.492456  4.012067   0.058824  0.309804
1      1   22.745737    3.621311  3.471067   0.447059  0.483333
2      2   34.182627    3.006349  3.023964   0.650000  0.565686
3      3   46.747025    2.511653  2.690017   0.733333  0.595098
4      4   58.774077    2.100312  2.434196   0.794118  0.621569


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

results from unfrozen
   epoch  total time  train loss  val loss  train ACC   val ACC            lr
0      0   12.209569    1.282725  1.169192   0.816667  0.761765  2.845967e-04
1      1   25.082786    0.449274  0.609663   0.925490  0.859804  7.691054e-04
2      2   38.376692    0.292723  0.775564   0.927451  0.810784  9.999508e-04
3      3   50.758855    0.330240  0.632020   0.914706  0.819608  9.473978e-04
4      4   62.625299    0.265910  0.613509   0.925490  0.839216  8.062326e-04
5      5   74.513244    0.127086  0.533884   0.974510  0.876471  6.044147e-04
6      6   86.665337    0.064149  0.413011   0.984314  0.895098  3.819165e-04
7      7   98.682648    0.052109  0.360036   0.991176  0.907843  1.828066e-04
8      8  110.750937    0.027482  0.344016   0.995098  0.913725  4.652118e-05
9      9  122.982529    0.025984  0.338021   0.995098  0.917647  5.317392e-08


In [14]:
class EfficientNetV2MCustom(nn.Module):
    """
    A custom EfficientNetV2-S model with a modified final layer for a specified number of output classes.

    Args:
        num_outputs (int): The number of output classes for the modified final layer.
        weights (EfficientNet_V2_S_Weights or None): Pretrained weights to load for EfficientNetV2-S. If None, the model is randomly initialized.

    Methods:
        freeze_backbone(): Freezes all layers of the backbone except the final classification head.
        unfreeze_backbone(): Unfreezes all layers of the backbone.
    """
    def __init__(self, num_outputs: int, weights=None):
        """
        Initializes the EfficientNetV2SCustom model.

        Args:
            num_outputs (int): The number of output classes for the modified final layer.
            weights (EfficientNet_V2_S_Weights or None): Pretrained weights for EfficientNetV2-S. Defaults to None.
        """
        super(EfficientNetV2MCustom, self).__init__()
        # Load EfficientNetV2-S with specified weights (pretrained or None)
        self.model = models.efficientnet_v2_m(weights=weights)
        
        # Replace the final fully connected layer to fit the desired output size
        in_features = self.model.classifier[1].in_features
        self.model.classifier[1] = nn.Linear(in_features, num_outputs)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width).

        Returns:
            torch.Tensor: Output tensor of shape (batch_size, num_outputs).
        """
        return self.model(x)

    def freeze_backbone(self):
        """
        Freezes all layers of the backbone except the final classification head.
        This is useful for transfer learning scenarios where only the head is fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = False
        
        # Ensure the final fully connected layer remains trainable
        for param in self.model.classifier[1].parameters():
            param.requires_grad = True

    def unfreeze_backbone(self):
        """
        Unfreezes all layers of the backbone, allowing the entire model to be fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = True

from torchvision.models import efficientnet_v2_m, EfficientNet_V2_M_Weights

# Example usage
model = EfficientNetV2MCustom(num_outputs=102, weights=EfficientNet_V2_M_Weights.DEFAULT)
model.freeze_backbone()
loss_func = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.AdamW(model.parameters())  # Adam optimizer

device = get_device()
print(device)

ckpt_file = MODELS_PATH / 'HW05_effnetv2m_frozen_backbone.pt'
epochs = 5

score_funcs = {'ACC':accuracy_score}
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file)
print('results from first frozen')
print(results)
# load the model with the frozen backbone and unfreeze it
model = load_model(EfficientNetV2MCustom(num_outputs=102), MODELS_PATH / 'HW05_effnetv2m_frozen_backbone.pt')
model.unfreeze_backbone()

# Configure Training for unfrozen model
ckpt_file = MODELS_PATH / 'HW05_effnetv2m_unfrozen_backbone.pt'
epochs = 10
optimizer = optim.AdamW(model.parameters())
scheduler = OneCycleLR(optimizer, max_lr=0.001, epochs=epochs, steps_per_epoch=len(train_loader))

# Train and save
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file,
                        lr_schedule=scheduler,
                        scheduler_step_per_batch=True,
                        #early_stop_crit='max',
                        #early_stop_metric='ACC',
                        #patience=1,
                        pretend_train=False)

print('results from unfrozen')
print(results)

cuda


Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

results from first frozen
   epoch  total time  train loss  val loss  train ACC   val ACC
0      0   11.541814    4.622006  4.198492   0.039216  0.201961
1      1   22.926343    3.984223  3.757851   0.254902  0.384314
2      2   34.616255    3.542271  3.425205   0.413725  0.450980
3      3   46.276812    3.191760  3.124381   0.503922  0.493137
4      4   57.802936    2.870678  2.910887   0.587255  0.500980


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

results from unfrozen
   epoch  total time  train loss  val loss  train ACC   val ACC            lr
0      0   13.637757    1.647008  1.153886   0.686275  0.716667  2.845967e-04
1      1   28.386930    0.562287  0.934798   0.857843  0.755882  7.691054e-04
2      2   45.844816    0.433745  0.998435   0.881373  0.741176  9.999508e-04
3      3   62.852023    0.497462  0.889549   0.863725  0.768627  9.473978e-04
4      4   79.829498    0.315314  0.693951   0.922549  0.815686  8.062326e-04
5      5   96.138303    0.213922  0.509777   0.937255  0.870588  6.044147e-04
6      6  113.896498    0.085581  0.375674   0.978431  0.909804  3.819165e-04
7      7  130.724488    0.041973  0.289377   0.989216  0.927451  1.828066e-04
8      8  146.978459    0.016492  0.279471   0.997059  0.928431  4.652118e-05
9      9  163.910269    0.015655  0.279825   0.996078  0.927451  5.317392e-08


In [104]:
class ConvNeXtSmallCustom(nn.Module):
    """
    A custom ConvNeXt-Small model with a modified final layer for a specified number of output classes.

    Args:
        num_outputs (int): The number of output classes for the modified final layer.
        weights (ConvNeXt_Small_Weights or None): Pretrained weights to load for ConvNeXt-Small. If None, the model is randomly initialized.

    Methods:
        freeze_backbone(): Freezes all layers of the backbone except the final classification head.
        unfreeze_backbone(): Unfreezes all layers of the backbone.
    """


Downloading: "https://download.pytorch.org/models/convnext_small-0c510722.pth" to C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads\hub\checkpoints\convnext_small-0c510722.pth
100%|██████████| 192M/192M [00:03<00:00, 57.2MB/s] 


cuda


Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

results from first frozen
   epoch  total time  train loss  val loss  train ACC   val ACC
0      0   12.888771    4.394079  3.640191   0.087255  0.365686
1      1   24.798908    3.363116  2.818370   0.476471  0.574510
2      2   36.376364    2.680339  2.246206   0.634314  0.689216
3      3   48.677295    2.154083  1.873590   0.738235  0.711765
4      4   60.036880    1.772913  1.596199   0.792157  0.759804


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

results from unfrozen
   epoch  total time  train loss  val loss  train ACC   val ACC            lr
0      0   16.460839    0.912522  0.547375   0.883333  0.918627  2.845967e-04
1      1   35.332454    0.348125  0.413000   0.955882  0.904902  7.691054e-04
2      2   54.415028    0.397739  0.703783   0.902941  0.799020  9.999508e-04
3      3   71.640449    0.634795  0.724583   0.811765  0.804902  9.473978e-04
4      4   88.223327    0.398121  0.500378   0.889216  0.852941  8.062326e-04
5      5  104.519420    0.230908  0.497983   0.940196  0.878431  6.044147e-04
6      6  120.740600    0.065830  0.314118   0.983333  0.919608  3.819165e-04
7      7  137.474455    0.024095  0.256581   0.996078  0.928431  1.828066e-04
8      8  153.966809    0.016123  0.241121   0.998039  0.933333  4.652118e-05
9      9  170.211968    0.011608  0.238230   0.999020  0.935294  5.317392e-08


In [None]:
    def __init__(self, num_outputs: int, weights=None):
        """
        Initializes the ConvNeXtSmallCustom model.

        Args:
            num_outputs (int): The number of output classes for the modified final layer.
            weights (ConvNeXt_Small_Weights or None): Pretrained weights for ConvNeXt-Small. Defaults to None.
        """
        super(ConvNeXtSmallCustom, self).__init__()
        # Load ConvNeXt-Small with specified weights (pretrained or None)
        self.model = convnext_small(weights=weights)
        
        # Replace the final fully connected layer to fit the desired output size
        in_features = self.model.classifier[2].in_features
        self.model.classifier[2] = nn.Linear(in_features, num_outputs)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width).

        Returns:
            torch.Tensor: Output tensor of shape (batch_size, num_outputs).
        """
        return self.model(x)

    def freeze_backbone(self):
        """
        Freezes all layers of the backbone except the final classification head.
        This is useful for transfer learning scenarios where only the head is fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = False
        
        # Ensure the final fully connected layer remains trainable
        for param in self.model.classifier[2].parameters():
            param.requires_grad = True

    def unfreeze_backbone(self):
        """
        Unfreezes all layers of the backbone, allowing the entire model to be fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = True

from torchvision.models import convnext_small, ConvNeXt_Small_Weights

# Example usage
model = ConvNeXtSmallCustom(num_outputs=102, weights='DEFAULT')
model.freeze_backbone()
loss_func = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.AdamW(model.parameters())  # Adam optimizer

device = get_device()
print(device)

ckpt_file = MODELS_PATH / 'HW05_convnext_frozen_backbone.pt'
epochs = 5

score_funcs = {'ACC':accuracy_score}
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file)
print('results from first frozen')
print(results)
# load the model with the frozen backbone and unfreeze it
model = load_model(ConvNeXtSmallCustom(num_outputs=102), MODELS_PATH / 'HW05_convnext_frozen_backbone.pt')
model.unfreeze_backbone()

# Configure Training for unfrozen model
ckpt_file = MODELS_PATH / 'HW05_convnext_unfrozen_backbone.pt'
epochs = 10
optimizer = optim.AdamW(model.parameters())
scheduler = OneCycleLR(optimizer, max_lr=0.001, epochs=epochs, steps_per_epoch=len(train_loader))

# Train and save
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file,
                        lr_schedule=scheduler,
                        scheduler_step_per_batch=True,
                        #early_stop_crit='max',
                        #early_stop_metric='ACC',
                        #patience=1,
                        pretend_train=False)

print('results from unfrozen')
print(results)

In [13]:
class ConvNeXtSmallCustom(nn.Module):
    """
    A custom ConvNeXt-Small model with a modified final layer for a specified number of output classes.

    Args:
        num_outputs (int): The number of output classes for the modified final layer.
        weights (ConvNeXt_Small_Weights or None): Pretrained weights to load for ConvNeXt-Small. If None, the model is randomly initialized.

    Methods:
        freeze_backbone(): Freezes all layers of the backbone except the final classification head.
        unfreeze_backbone(): Unfreezes all layers of the backbone.
    """
    def __init__(self, num_outputs: int, weights=None):
        """
        Initializes the ConvNeXtSmallCustom model.

        Args:
            num_outputs (int): The number of output classes for the modified final layer.
            weights (ConvNeXt_Small_Weights or None): Pretrained weights for ConvNeXt-Small. Defaults to None.
        """
        super(ConvNeXtSmallCustom, self).__init__()
        # Load ConvNeXt-Small with specified weights (pretrained or None)
        self.model = convnext_small(weights=weights)
        
        # Replace the final fully connected layer to fit the desired output size
        in_features = self.model.classifier[2].in_features
        self.model.classifier[2] = nn.Linear(in_features, num_outputs)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, channels, height, width).

        Returns:
            torch.Tensor: Output tensor of shape (batch_size, num_outputs).
        """
        return self.model(x)

    def freeze_backbone(self):
        """
        Freezes all layers of the backbone except the final classification head.
        This is useful for transfer learning scenarios where only the head is fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = False
        
        # Ensure the final fully connected layer remains trainable
        for param in self.model.classifier[2].parameters():
            param.requires_grad = True

    def unfreeze_backbone(self):
        """
        Unfreezes all layers of the backbone, allowing the entire model to be fine-tuned.
        """
        for param in self.model.parameters():
            param.requires_grad = True

from torchvision.models import convnext_small, ConvNeXt_Small_Weights

# Example usage
model = ConvNeXtSmallCustom(num_outputs=102, weights='DEFAULT')
loss_func = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.AdamW(model.parameters())  # Adam optimizer

device = get_device()
print(device)

ckpt_file = MODELS_PATH / 'HW05_convnext.pt'


epochs = 15
optimizer = optim.AdamW(model.parameters())
scheduler = OneCycleLR(optimizer, max_lr=0.001, epochs=epochs, steps_per_epoch=len(train_loader))

# Train and save
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file,
                        lr_schedule=scheduler,
                        scheduler_step_per_batch=True,
                        #early_stop_crit='max',
                        #early_stop_metric='ACC',
                        #patience=1,
                        pretend_train=False)

print('results from unfrozen')
print(results)

cuda


Epoch:   0%|          | 0/15 [00:00<?, ?it/s]

results from unfrozen
    epoch  total time  train loss  val loss  train ACC   val ACC            lr
0       0   17.845672    4.579647  4.177164   0.032353  0.223529  1.538093e-04
1       1   36.665318    3.324904  1.781848   0.413725  0.771569  4.412683e-04
2       2   56.014508    1.141581  0.681193   0.835294  0.849020  7.660623e-04
3       3   73.363088    0.571640  0.731323   0.883333  0.809804  9.741722e-04
4       4   89.867398    0.567349  0.971712   0.869608  0.759804  9.936971e-04
5       5  106.066579    0.547120  0.720939   0.862745  0.804902  9.484366e-04
6       6  122.392435    0.375067  0.698968   0.900980  0.812745  8.633307e-04
7       7  138.543738    0.170329  0.481625   0.962745  0.874510  7.459415e-04
8       8  157.585560    0.079554  0.503962   0.979412  0.865686  6.066995e-04
9       9  176.653614    0.067205  0.350411   0.990196  0.900980  4.579769e-04
10     10  196.826587    0.038177  0.338550   0.991176  0.908824  3.129885e-04
11     11  216.331042    0.012

## ViT Small

In [12]:
import timm
import torch.nn as nn

# Load pre-trained ViT-Small model
model = timm.create_model('vit_small_patch16_224', pretrained=True)

# Modify the classification head for 102 outputs
model.head = nn.Linear(model.head.in_features, 102)

loss_func = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.AdamW(model.parameters())  # Adam optimizer

epochs = 15
scheduler = OneCycleLR(optimizer, max_lr=0.0005, epochs=epochs, steps_per_epoch=len(train_loader))

device = get_device()
print(device)

ckpt_file = MODELS_PATH / 'HW05_vit.pt'


score_funcs = {'ACC':accuracy_score}
results = train_network(model,
                        loss_func,
                        train_loader,
                        device=device,
                        val_loader=valid_loader,
                        epochs = epochs,
                        optimizer = optimizer,
                        lr_schedule=scheduler,
                        score_funcs = score_funcs,
                        checkpoint_file=ckpt_file)

print(results)


cuda


Epoch:   0%|          | 0/15 [00:00<?, ?it/s]

    epoch  total time  train loss  val loss  train ACC   val ACC        lr
0       0   11.626570    5.220364  4.602282   0.030392  0.070588  0.000020
1       1   25.202290    4.032850  3.722921   0.124510  0.180392  0.000020
2       2   39.728007    3.046734  2.836828   0.341176  0.397059  0.000020
3       3   54.210005    2.208314  2.126258   0.596078  0.575490  0.000021
4       4   68.995062    1.479452  1.578535   0.762745  0.728431  0.000021
5       5   83.802866    0.994949  1.165487   0.868627  0.815686  0.000021
6       6   98.626444    0.637018  0.880288   0.948039  0.863725  0.000022
7       7  113.155464    0.440195  0.731099   0.961765  0.891176  0.000023
8       8  128.264617    0.294525  0.610322   0.982353  0.918627  0.000024
9       9  144.346943    0.209792  0.539418   0.992157  0.922549  0.000025
10     10  159.578101    0.162213  0.476298   0.992157  0.929412  0.000026
11     11  173.948681    0.132028  0.443842   0.991176  0.930392  0.000027
12     12  188.169427    