# Fine-tuning pretrained models in PyTorch

The `12_fine_tuning_pretrained_models` notebook focuses on fine-tuning pretrained models to improve performance on new tasks. Fine-tuning involves both freezing certain layers for feature extraction and unfreezing others to allow adaptation to the new task. 

This notebook covers dataset preparation, loading a pretrained model, freezing layers for feature extraction, unfreezing layers for fine-tuning, and training the model. Additionally, it explores evaluating model performance and experimenting with hyperparameters to optimize the fine-tuning process.

## Table of contents

1. [Understanding fine-tuning in transfer learning](#understanding-fine-tuning-in-transfer-learning)
2. [Setting up the environment](#setting-up-the-environment)
3. [Preparing the dataset](#preparing-the-dataset)
4. [Loading a pretrained model](#loading-a-pretrained-model)
5. [Freezing layers for feature extraction](#freezing-layers-for-feature-extraction)
6. [Unfreezing layers for fine-tuning](#unfreezing-layers-for-fine-tuning)
7. [Training the model](#training-the-model)
8. [Evaluating model performance](#evaluating-model-performance)
9. [Experimenting with hyperparameters](#experimenting-with-hyperparameters)
10. [Conclusion](#conclusion)

## Understanding fine-tuning in transfer learning

Fine-tuning is a transfer learning technique where a pretrained model, trained on a large dataset for one task, is adapted to a new, related task by allowing part of the model to be retrained (or fine-tuned) on the new dataset. The goal is to leverage the knowledge the model has already gained from the large dataset and adjust it to better suit the specific characteristics of the new task.

### **Why fine-tuning?**

Fine-tuning is particularly useful when:
- **You have limited data**: Instead of training a model from scratch, which would require a large dataset to avoid overfitting, fine-tuning allows you to adapt a model that has already learned important features from a large dataset (e.g., ImageNet).
- **The new task is related to the original task**: When the new task is somewhat similar to the task the model was originally trained on, fine-tuning the model helps adjust it to specific nuances of the new task while retaining the general knowledge it has already learned.
- **Performance improvements**: Fine-tuning can help squeeze better performance out of pretrained models by adapting them more closely to the new data.

### **Key concepts in fine-tuning**

#### **Pretrained models**

Fine-tuning always starts with a pretrained model. Pretrained models, such as ResNet, VGG, and others, are typically trained on large datasets like ImageNet. These models learn to extract general features from the data—like edges, textures, and shapes in the case of image datasets—that can be useful across a wide variety of tasks.

The layers of a pretrained model are typically divided into two types:
- **Early layers**: These layers learn low-level, general features (e.g., edges, gradients). These features are often useful across different tasks and tend to be more stable.
- **Later layers**: These layers learn higher-level, more task-specific features (e.g., object parts, specific categories) and may need more adaptation for new tasks.

#### **Feature extraction vs fine-tuning**

In **feature extraction**, the pretrained model's weights remain fixed (frozen), and only the final classifier layer is replaced and trained for the new task. This approach works well when the new dataset is small or when the new task is closely related to the original task.

In **fine-tuning**, the final classification layer is replaced, but some or all of the layers in the pretrained model are also unfrozen and retrained on the new dataset. This approach allows the model to adjust its previously learned features to better suit the specific task at hand.

Fine-tuning is generally more effective when:
- The new task is different from the original task.
- The new dataset is large enough to support retraining without overfitting.
- There is a need to adapt the model’s features to the specifics of the new dataset.

### **How fine-tuning works**

Fine-tuning involves selectively retraining layers of the pretrained model. Here’s a typical process:

1. **Replace the final layer**: The final classification layer of the pretrained model is removed and replaced with a new layer specific to the new task. For instance, if the original model was trained on ImageNet (which has 1,000 classes) but the new task has only 10 classes, the final layer is replaced with a new one that matches the number of classes in the new dataset.
2. **Freeze earlier layers**: Often, the early layers of the pretrained model are kept frozen. These layers contain low-level features that are general enough to be useful across many tasks. Freezing these layers helps preserve the features they have learned and reduces the risk of overfitting.
3. **Unfreeze later layers**: The later layers of the model are usually task-specific and are unfrozen so that they can be retrained on the new dataset. Fine-tuning these layers allows the model to adjust to the new task by learning task-specific features from the new data.
4. **Train with a lower learning rate**: Fine-tuning is typically done with a much lower learning rate than training from scratch. This is because the pretrained model’s weights already contain valuable information, and large updates to these weights could cause the model to forget what it has already learned (a phenomenon known as **catastrophic forgetting**). A lower learning rate helps make smaller adjustments to the weights, allowing the model to fine-tune its features without losing the general knowledge it has gained.

### **How fine-tuning affects the model**

#### **Gradient flow in fine-tuning**

When fine-tuning, gradients are calculated for the unfrozen layers and used to update their weights. The frozen layers do not have their weights updated, as the gradients for these layers are not computed during backpropagation. The process allows the model to gradually adapt its features without overwriting the general-purpose features learned from the original dataset.

The layers that are unfrozen are typically the later layers, where more task-specific features are learned. The early layers remain frozen because the low-level features they capture are often relevant across a wide range of tasks.

#### **Overfitting and regularization**

A key challenge in fine-tuning is avoiding overfitting, particularly when the new dataset is small. Since the pretrained model was originally trained on a large dataset, overfitting can occur if too many layers are retrained on a much smaller dataset. To mitigate this, regularization techniques such as **dropout** or **weight decay** can be applied during training. Additionally, freezing some layers and fine-tuning only the later ones helps reduce the risk of overfitting.

#### **Transferability of layers**

The effectiveness of fine-tuning depends on the similarity between the original and new tasks. If the tasks are similar (e.g., both involve image classification), fine-tuning fewer layers might be sufficient, as the pretrained model already contains useful features. However, if the tasks are different (e.g., from classifying animals to medical images), more layers may need to be fine-tuned, as the higher-level features learned by the pretrained model may not be as relevant.

### **Fine-tuning strategies**

There are several strategies for fine-tuning pretrained models, depending on the new task and dataset size:

- **Small dataset, similar task**: In this case, feature extraction may be sufficient. The features learned by the pretrained model are likely applicable to the new task, so freezing most layers and only retraining the classifier might yield good results.
- **Large dataset, similar task**: Fine-tuning the later layers can be effective, as the dataset size provides enough data to retrain the model without overfitting, while the model’s existing features still provide a strong starting point.
- **Small dataset, dissimilar task**: Fine-tuning can be risky due to overfitting, so freezing most layers and fine-tuning just a few may be a safer approach.
- **Large dataset, dissimilar task**: In this scenario, you can fine-tune more layers or even the entire model, as the large dataset will support deeper retraining and help the model learn new, task-specific features.

### **Advantages of fine-tuning**

Fine-tuning pretrained models offers several benefits:
- **Faster convergence**: Since the model starts with pretrained weights, training time is significantly reduced compared to training from scratch.
- **Improved performance**: Fine-tuning allows the model to adapt its learned features to the specific details of the new task, often leading to better performance.
- **Resource efficiency**: Fine-tuning enables the reuse of large, expensive models that were trained on large datasets, reducing the need for costly retraining from scratch.

### **Challenges of fine-tuning**

While fine-tuning is effective, it also presents challenges:
- **Overfitting**: When the new dataset is small, there is a risk of overfitting, especially if too many layers are fine-tuned.
- **Catastrophic forgetting**: Fine-tuning with too high a learning rate can cause the model to forget the general-purpose features it has learned, leading to performance degradation.
- **Choosing which layers to fine-tune**: Determining which layers to freeze and which to unfreeze can be challenging and often requires experimentation.

## Setting up the environment


##### **Q1: How do you install the necessary libraries for loading pretrained models and training in PyTorch?**


##### **Q2: How do you import the required modules for model loading, training, and dataset handling in PyTorch?**


##### **Q3: How do you set up your environment to use a GPU if available, or fallback to CPU in PyTorch?**


##### **Q4: How do you set a random seed in PyTorch to ensure reproducibility of results?**

## Preparing the dataset


##### **Q5: How do you load an image dataset using `torchvision.datasets` in PyTorch?**


##### **Q6: How do you apply image transformations (e.g., resizing and normalization) to prepare the dataset for a pretrained model?**


##### **Q7: How do you split a dataset into training, validation, and test sets using PyTorch?**


##### **Q8: How do you create DataLoaders for efficient batch processing of the dataset in PyTorch?**

## Loading a pretrained model


##### **Q9: How do you load a pretrained model, such as ResNet or VGG, from PyTorch’s `torchvision.models`?**


##### **Q10: How do you inspect and print the architecture of a pretrained model to understand its layers?**


##### **Q11: How do you modify the final fully connected layer of a pretrained model to match the number of classes in your dataset?**


##### **Q12: How do you print out the total number of trainable parameters in the pretrained model?**

## Freezing layers for feature extraction


##### **Q13: How do you freeze all layers of the pretrained model to prevent their weights from being updated during training?**


##### **Q14: How do you verify that the pretrained layers are frozen by checking their `requires_grad` attribute?**


##### **Q15: How do you ensure that only the final fully connected layer is updated while the rest of the pretrained model remains frozen?**

## Unfreezing layers for fine-tuning


##### **Q16: How do you unfreeze specific layers or blocks in the pretrained model for fine-tuning?**


##### **Q17: How do you unfreeze all layers of the model to fine-tune the entire model?**


##### **Q18: How do you verify which layers are unfrozen and will be updated during fine-tuning?**

## Training the model


##### **Q19: How do you define the loss function (e.g., CrossEntropyLoss) for training the fine-tuned model?**


##### **Q20: How do you configure the optimizer (e.g., Adam or SGD) to update the model's parameters during training?**


##### **Q21: How do you implement a training loop that performs forward pass, loss calculation, and backpropagation for fine-tuning the model?**


##### **Q22: How do you implement gradient clipping to prevent exploding gradients during training in PyTorch?**


##### **Q23: How do you monitor and plot the training loss and accuracy over epochs during fine-tuning?**

## Evaluating model performance


##### **Q24: How do you evaluate the fine-tuned model on the validation dataset using PyTorch?**


##### **Q25: How do you calculate and print the accuracy of the fine-tuned model on the test set?**


##### **Q26: How do you visualize the confusion matrix for the model’s predictions on the test set?**


##### **Q27: How do you visualize the model’s predictions versus the ground truth labels on a batch of test images?**

## Experimenting with hyperparameters


##### **Q28: How do you adjust the learning rate for the unfrozen layers during fine-tuning?**


##### **Q29: How do you experiment with different batch sizes and observe the effect on training speed and performance?**


##### **Q30: How do you modify the number of training epochs and evaluate its effect on the fine-tuned model’s performance?**


##### **Q31: How do you experiment with different optimizers (e.g., Adam vs. SGD) and their parameters to optimize the fine-tuning process?**

## Conclusion