# 1. Why Transfer Learning?

One of the main challenges of deep learning is **finding the optimal architecture** and parameters for a neural network that can perform well on a given task. 
There are many choices to make, such as how many layers, how many neurons, which activation functions (sigmoid, ReLU, softmax,...), which regularization techniques (dropout, batch normalization,...), which optimization methods (adam,...), etc. 
These choices can have a significant impact on the accuracy, speed and generalization ability of the network. However, there is no universal formula or rule to determine the best configuration for every problem. It often requires trial and error, domain knowledge and expert intuition to design a good network. It's more an art-form, than anything else. The trial and error process can be very tedious and expensive, especially for complex tasks that require large and deep networks.

Another challenge of deep learning is the **amount of data (and compute)** needed to train a neural network effectively. Deep learning models have a large number of parameters that need to be adjusted during the learning process. To avoid overfitting and underfitting, these models need a lot of labeled data that can capture the diversity and complexity of the problem domain. However, obtaining such data can be difficult, costly or even impossible in some cases. For example, in medical imaging or natural language processing, labeling data may require human experts or domain-specific knowledge that are not easily available. Moreover, some domains may have inherent data scarcity or imbalance issues that limit the amount of useful data for training.

Transfer learning can address these challenges by allowing us to reuse existing models that have been pre-trained on large and rich datasets for similar or related tasks. For example, we can use a model that has been trained to recognize objects in natural images (such as **ImageNet**) as a starting point for a model that needs to classify medical images or detect faces. By doing so, we can benefit from the features and representations learned by the pre-trained model, which can capture general patterns and concepts that are relevant for both tasks. This way, we can reduce the need to design a new network from scratch, and also reduce the amount of data needed to fine-tune the network for the new task.


# ImageNet & ILSVRC

ImageNet is a large visual database for visual object recognition research, created by Prof. Fei-Fei Li and her team in 2009. ImageNet hosts an annual competition called ILSVRC (ImageNet Large Scale Visual Recognition Challenge), where research teams test their algorithms on various visual recognition tasks using a subset of ImageNet with 1.2 million images and 1,000 classes. This subset is also known as ImageNet-1K

<img src="resources/imagenet.png" width="800">

As you can see from the image below, around 2015 (with a model called 'resnet'), our CNN networks could outperform humans.

<img src="resources/imagenet2.png" width="800">

# 2. What is Transfer Learning?

This youtube movie clearly explains the essence of Transfer Learning.

<a href="https://www.youtube.com/embed/DyPW-994t7w?start=13&end=282"><img src="resources/TransferLearning_YouTube.png" width="800"></a>

# 3. Feature Extraction and Fine-tuning

## Feature extraction
 
 This is the most common type of transfer learning, where we use the pre-trained model as a feature extractor for the new task. We remove the last layer (or layers) of the pre-trained model, and add a new layer (or layers) that are specific to the new task. We then **freeze the weights of the pre-trained model**, and only train the new layer (or layers) on the new data. This way, we can use the features learned by the pre-trained model, and adapt them to the new task. For example, we can use a model pre-trained on ImageNet to extract features from images, and then add a new classifier layer to perform image classification on a different dataset.

## Fine-tuning: 

This is a type of transfer learning where we slightly modify or update the weights of the pre-trained model using the new data. We do not freeze the weights of the pre-trained model (or -the preferred way- we first train with frozen weights, like in feature extraction, and then **unfreeze**), but we use a **very small learning rate** to prevent overwriting the original features. This way, we can fine-tune the pre-trained model to better fit the new task. For example, we can use a model pre-trained on ImageNet to fine-tune it on a smaller dataset of flowers.

<img src="resources/FineTune.png" width="1600">

# 4. Benefits Recap of Transfer Learning

 - A lot Less labeled data needed (only a fraction of what should otherwise be needed)
 - Speeds up training by a lot, because we don't have to start from scratch

 BTW: transfer learning is the trick Google TeachableMachines uses (with a pretrained model called MobileNet), so it can quickly, and accurately, classify custom images.

 Another added benefit of transfer learning: Lower Carbon footprint! Since we're re-using pretrained networks, we don't have to start training from scratch, and thus, we don't need that much resources. 
