What is transfer learning?
--------------------------

Transfer learning (TL) is a [machine learning (ML)](https://aws.amazon.com/what-is/machine-learning/) technique where a model pre-trained on one task is fine-tuned for a new, related task. Training a new ML model is a time-consuming and intensive process that requires a large amount of data, computing power, and several iterations before it is ready for production. Instead, organizations use TL to retrain existing models on related tasks with new data. For example, if a machine learning model can identify images of dogs, it can be trained to identify cats using a smaller image set that highlights the feature differences between dogs and cats.

What are the benefits of transfer learning?
-------------------------------------------

TL offers several of the following benefits to researchers creating ML applications.

### Enhanced efficiency

Training ML models takes time as they build knowledge and identify patterns. It also requires a large data set and is computationally expensive. In TL, a pre-trained model retains fundamental knowledge of tasks, features, weights, and functions, allowing it to adapt to new tasks faster. You can use a much smaller dataset and fewer resources while achieving better results. 

### Increased accessibility

Building deep-learning neural networks requires large data volumes, resources, computing power, and time. TL overcomes these barriers to creation, allowing organizations to adopt ML for custom use cases. You can adapt existing models to your requirements at a fraction of the cost. For example, using a pre-trained image recognition model, you can create models for medical imaging analysis, environmental monitoring, or facial recognition with minimal adjustments.

### Improved performance

Models developed through TL often demonstrate greater robustness in diverse and challenging environments. They better handle real-world variability and noise, having been exposed to a wide range of scenarios in their initial training. They give better results and adapt to unpredictable conditions more flexibly.

What are the different transfer learning strategies?
----------------------------------------------------

The strategy you use to facilitate TL will depend on the domain of the model you are building, the task it needs to complete, and the availability of training data.

### Transductive transfer learning

*Transductive transfer learning* involves transferring knowledge from a specific source domain to a different but related target domain, with the primary focus being on the target domain. It is especially useful when there is little or no labeled data from the target domain.

Transductive transfer learning asks the model to make predictions on target data by using previously-gained knowledge. As the target data is mathematically similar to the source data, the model finds patterns and performs faster. 

For example, consider adapting a sentiment analysis model trained on product reviews to analyze movie reviews. The source domain (product reviews) and the target domain (movie reviews) differ in context and specifics but share similarities in structure and language use. The model quickly learns to apply its understanding of sentiment from the product domain to the movie domain.

### Inductive transfer learning

Inductive transfer learning is where the source and target domains are the same, but the tasks the model must complete differ. The pre-trained model is already familiar with the source data and trains faster for new functions.

An example of inductive transfer learning is in natural language processing (NLP). Models are pre-trained on a large set of texts and then fine-tuned using inductive transfer learning to specific functions like sentiment analysis. Similarly, computer vision models like VGG are pre-trained on large image datasets and then fine-tuned to develop object detection.

### Unsupervised transfer learning

*Unsupervised transfer learning* uses a strategy similar to inductive transfer learning to develop new abilities. However, you use this form of transfer learning when you only have unlabeled data in both the source and target domains. 

The model learns the common features of unlabeled data to generalize more accurately when asked to perform a target task. This method is helpful if it is challenging or expensive to obtain labeled source data.

For example, consider the task of identifying different types of motorcycles in traffic images. Initially, the model is trained on a large set of unlabeled vehicle images. In this instance, the model independently determines the similarities and distinguishing features among different types of vehicles like cars, buses, and motorcycles. Next, the model is introduced to a small, specific set of motorcycle images. The model performance improves significantly compared to before.

What are the steps in transfer learning?
----------------------------------------

There are three main steps when fine-tuning a machine-learning model for a new task.

### Select a pre-trained model

First, select a pre-trained model with prior knowledge or skills for a related task. A useful context for choosing a suitable model is to determine the source task of each model. If you understand the original tasks the model performed, you can find one that more effectively transitions to a new task.

### Configure your pre-trained models

After selecting your source model, configure it to pass knowledge to a model to complete the related task. There are two main methods of doing this.

#### *Freeze pre-trained layers*

Layers are the building blocks of neural networks. Each layer consists of a set of neurons and performs specific transformations on the input data. Weights are the parameters the network uses for decision-making. Initially set to random values, weights are adjusted during the training process as the model learns from the data.

By freezing the weights of the pre-trained layers, you keep them fixed, preserving the knowledge that the [deep learning](https://aws.amazon.com/what-is/deep-learning/) model obtained from the source task.

#### *Remove the last layer*

In some use cases, you can also remove the last layers of the pre-trained model. In most ML architectures, the last layers are task-specific. Removing these final layers helps you reconfigure the model for new task requirements.

#### *Introduce new layers*

Introducing new layers on top of your pre-trained model helps you adapt to the specialized nature of the new task. The new layers adapt the model to the nuances and functions of the new requirement.

### Train the model for the target domain

You train the model on target task data to develop its standard output to align with the new task. The pre-trained model likely produces different outputs from those desired. After monitoring and evaluating the model's performance during training, you can adjust the hyperparameters or baseline neural network architecture to improve output further. Unlike weights, hyperparameters are not learned from the data. They are pre-set and play a crucial role in determining the efficiency and effectiveness of the training process. For example, you could adjust regularization parameters or the model's learning rates to improve its ability in relation to the target task.

![TL](https://www.ruder.io/content/images/2017/03/andrew_ng_drivers_ml_success-1.png)

![Image](https://github.com/user-attachments/assets/bcebc7eb-1a26-4497-96ab-6707b9610780)

![Image](https://github.com/user-attachments/assets/0f0b5768-4d9d-4da2-b0b5-cc574bffab15)

Transfer learning works because it leverages knowledge gained from solving one problem and applies it to a different but related problem. Here's why it is effective:

1. **Feature Reusability**: In deep learning, earlier layers of a model often learn general features (like edges, textures, or shapes in images), while later layers learn task-specific features. These general features are often useful across different tasks, so reusing them saves time and computational resources.

2. **Reduced Data Requirements**: Training a deep neural network from scratch requires a large amount of labeled data. Transfer learning allows you to use pre-trained models, which have already been trained on massive datasets (e.g., ImageNet), reducing the need for extensive labeled data for your specific task.

3. **Faster Training**: Since the model starts with pre-trained weights, it converges faster during fine-tuning compared to training from scratch. This is especially useful when computational resources are limited.

4. **Domain Similarity**: Transfer learning is particularly effective when the source domain (the domain the model was pre-trained on) and the target domain (your specific task) share similarities. For example, a model trained on general image classification can be fine-tuned for medical imaging tasks.

5. **Avoiding Overfitting**: When you have a small dataset, training a large model from scratch can lead to overfitting. Transfer learning mitigates this by starting with weights that already encode meaningful patterns, reducing the risk of overfitting.

In summary, transfer learning works because it builds on the general knowledge encoded in pre-trained models, making it easier and faster to adapt to new tasks with limited data.


Ways of Doing Transfer Learning
-------------------------------

There are several approaches to implementing transfer learning, depending on the task and the available data:

1. **Feature Extraction**: 
    - Use the pre-trained model as a fixed feature extractor.
    - Remove the final classification layer and use the output of the intermediate layers as input features for a new model.

2. **Fine-Tuning**:
    - Unfreeze some or all of the layers of the pre-trained model.
    - Train the model on the new dataset with a lower learning rate to adjust the weights without losing the pre-trained knowledge.

3. **Pre-trained Embeddings**:
    - Use embeddings from pre-trained models (e.g., word embeddings like Word2Vec, GloVe, or BERT for NLP tasks).
    - These embeddings capture semantic relationships and can be used as input features for downstream tasks.

4. **Hybrid Approach**:
    - Combine feature extraction and fine-tuning.
    - Freeze some layers of the pre-trained model while fine-tuning others.

5. **Domain Adaptation**:
    - Adapt a model trained on a source domain to perform well on a target domain with limited labeled data.
    - Techniques like adversarial training or domain-specific fine-tuning are often used.

6. **Zero-Shot Learning**:
    - Use a pre-trained model to perform tasks it has not been explicitly trained for.
    - This is achieved by leveraging the model's generalization capabilities.

Each method has its advantages and is chosen based on the similarity between the source and target tasks, the size of the target dataset, and the computational resources available.
