<a href="https://colab.research.google.com/github/lab30041954/ML_IESE_Course/blob/main/%5BML-08%5D%20Transfer%20learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [ML-08] Transfer learning

## What is transfer learning?

**Transfer learning** is a technique used in machine learning to leverage the knowledge gained from one task to improve the performance on another related task. This typically involves taking a model that has been pre-trained on a big data set and adapting it for the new task, by updating the model's parameter values with new data, specific for the new task. The data set used in this updating step is typically much smaller than the pre-training data set, which allows us to save money and time.

Transfer learning is critical in domains where data are only available in small amounts, or would be very expensive to collect. Since big data sets are needed to scape overfitting with complex models, transfer learning is common practice in many real-world applications. In both **computer vision** and in **natural language processing**, we can profit from existing models that have been released by their developers.

Transfer learning has two components:

* The pre-trained model. These models are usually extracted from public repositories, called **hubs**.

* The new data. These data have to be specific of the new task, called the **downstream task**.

When can transfer learning help you? When you can find a pre-trained model with the same kind of inputs, and there is a commonality in the tasks of the two models. It makes a difference because, when training a neural network model from scratch, we start with random parameter values. These initial values don't make any sense for the task on which we are training the model. Starting with the parameter values learned in a previous training, we are much closer to the optimal values.

## Sources of pre-trained models

* **Keras**. The Keras team has created two hubs, **KerasCV** (computer vision) and **KerasNLP** (natural language processing), which can be accessed by means of specific Python packages. They have selected the models, so there is plenty of choice and you will not miss anything relevant. These are (relatively) small models, and most of them can be managed by your computer.

* **Hugging Face**. The favorite hub, so far independent of the big corporations. In addition to the "serious" models that you can find in the Keras hubs, you will find in Hugging Face thousands of models, uploaded by the (registered) users, which are just retrained versions of the those available in the Keras hubs. When this is being written, Hugging face website claims to provide 1,990,650 models.

* **ModelScope** is a copy cat of Hugging Face, launched by Alibaba Cloud. It is much smaller, although it is growing fast. It currently provides 94,634 models.

* **Kaggle Models**. Kaggle started as an independent platform for data science and machine learning competitions, adding later a hub for data sets. Everybody could post data, notebooks, etc. It was purchased by Google in 2017. Right now, those competitions have lost their glamour, but Kaggle offers, besides the data sets, a mix of courses, notebooks and pre-trained models. Though the (registered) members of the Kaggle community can post their models, as in Hugging Face, the relevant stuff can be easily found.

* **TensorFlow Hub**. It was initially part of the Keras/TensorFlow combo, but was integrated with Kaggle Models in November 2022.

* **Ollama**. An open-source project that serves as a platform for running LLMs on your local machine. Not exactly user-friendly, as many open-source projects, but quite powerful. It is presented as if you had to manage it from the shell, but there is a Python package that provides an easy way to integrate it in your workflow.

Some of this will show up in this course, in this and the following two lectures.

## Transfer learning for CNN models

Keras provides some powerful image classifiers, pre-trained on an ML classic, the **ImageNet** data set. ImageNet is the outcome of a project started by FF Lei, then a professor at Princeton, in 2006. We use one of these models, **VGG16**, in this lecture. It is based on a CNN architecture which is similar to the one used in the preceding lecture, though a bit bigger. Even if VGG16 is a dwarf (below 20M parameters) compared to the top popular large language models, it will suffice for understanding the dynamics of transfer learning.

To illustrate this, let us take the model summarized below, trained on the MNIST data in the preceding lecture.

```
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_1 (InputLayer)      │ (None, 28, 28, 1)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_1 (Conv2D)               │ (None, 26, 26, 32)     │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_1 (MaxPooling2D)  │ (None, 13, 13, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_2 (Conv2D)               │ (None, 11, 11, 64)     │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_2 (MaxPooling2D)  │ (None, 5, 5, 64)       │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_3 (Conv2D)               │ (None, 3, 3, 64)       │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten_1 (Flatten)             │ (None, 576)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 64)             │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 10)             │           650 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
```

In the context of this lecture, who would say that it is the pre-trained model. We can see it as composed of two parts:

* The **convolutional base**, a stack of three `Conv2D` layers and two `MaxPooling2D` layers. This part encodes the image as a vector of length 576. This transformation is called an **embedding** (see next lecture). Because of the pretraining, the embedding generates features that are appropriate for recognizing shapes and corners.

* The **top classifier**, which is like an MLP model with a hidden layer of 64 nodes whose input is a vector of length 576. The `Flatten` layer does not have any parameter, so it can be included in this part or in the base. This top part classifies the embedding vectors as digits.

Now, suppose that we switch from digit recognition to letter recognition. Our model can be based on the same network architecture, except for the last `Dense` layer, which will be adapted so that it will ouptput 26 class probabilities (English alphabet) instead of 10. If we consider that the convolutional base, as it is, is also good for the new task, we can freeze the parameter values of that part, and train only the classifier. In practice, it would be as if we were using the embedding vectors produced by the convolutional base as the features for a classification model. Then, instead of transfer learning, we could call this feature engineering.

We can also unfreeze part of the convolutional base (*e.g*. the last `Conv2D` layer), or the whole thing, modifying so the parameter values. This is called  **fine-tuning**, because, even if the parameter values change, they don't change much. We will see in the following example how easy are these tricks in Keras.

## Example - The Dogs vs Cats data

### Introduction

Web services are often protected with a challenge that is supposed to be easy for people, but difficult for computers. This is often called **CAPTCHA** (Completely Automated Public Turing test to tell Computers and Humans Apart) or **HIP** (Human Interactive Proof). CAPTCHA's are used for many purposes, such as to reduce email and blog spam, preventing brute force attacks on web site passwords.

**ASIRRA** (Animal Species Image Recognition for Restricting Access) is a CAPTCHA that works by asking users to distinguish between photographs of **cats** and **dogs**. This task is difficult for computers, but studies have shown that people can accomplish it quickly and accurately. ASIRRA is unique because of its partnership with **Petfinder.com**, the world's largest site devoted to finding homes for homeless pets. They have provided Microsoft Research with over three million images of cats and dogs, manually classified by people at thousands of animal shelters across the United States.

This example uses part of this data set, released for the *Dogs vs. Cats* Kaggle competition. It is inspired by the approach to transfer learning taken in chapter 9 of the Keras book (Chollet, 2021).

### The data set

The *Dogs vs. Cats* data set is available from many sources, including Kaggle and Keras. It contains 25,000 images of dogs and cats (12,500 from each class). The pictures are medium-resolution color JPG files. In this example, we use 2,000 images for training and 1,000 for testing. Both data sets are balanced.

The images have been pre-processed so they all have resolution 150 $\times$ 150. Setting a fixed resolution is needed for training and later application of the model obtained. This resolution, which has no special virtue, has been chosen so that this notebook can be run in either your laptop, or in Google Colab, without pain. Note that you don't need to use square images, but it looks like a reasonable choice, since in the original image set we found both landscape and portrait orientation. The conversion is an easy job which can be done with the Python package `opencv`, which appears below.

The data come as a four (zipped) folders of JPG files, `dogs-train`, `dogs-test`, `cats-train` and `cats-test`. As we have mentioned in the preceding lecture, an image is just an array with two spatial axes, called **height** and **width**, and a **depth** axis. For an RGB image, the dimension of the depth axis would be 3, since the image has three color channels, red, green, and blue. The height and width depend on the resolution of the images, which is not fixed here. So, before being converted to NumPy arrays that can be inputted to a neural network, all the images must be resized to a common resolution.

Sources:

1. W Cukierski (2013) *Dogs vs. Cats*, `https://kaggle.com/competitions/dogs-vs-cats`.

2. F Chollet (2021), *Deep Learning with Python*, Manning.

### Questions

Q1. Create a folder in the working directory of your current Python kernel and download there the ZIP files. Unzip these files, so you get the four folders mentioned above. The folders with training data contain contain 1,000 JPG files each, while those with test data contain 500 files each.

Q2. Write a function which reads the JPG files as NumPy arrays (the `matplot.pyplot` function `imread()` can be used here), and reshapes the arrays as shape `(1, 150, 150, 3)`. Apply your function to all the JPG files and concatenate the resulting arrays in such a way that you obtain an array `X_train`, with shape `(2000, 150, 150, 3)` and an array `X_test`, with shape `(1000, 150, 150, 3)`. Create the corresponding target vectors, with value 1 for the dogs and value 0 for the cats.

Q3. Train a CNN model from scratch and evaluate it. You can build the network by stacking four pairs of `Conv2D` and `MaxPooling2D` layers, a `Flatten` layer and two `Dense` layers. The last layer must output the two class probabilities.

Q4. Import the pre-trained model VGG16 from the Keras module `applications`. Include the convolutional base but not the top classifier. Then, freeze the parameter values of the convolutional base and add a densely connected classifier on top.

Q5. Train the new model and compare its performance on the test data with that of the model of question Q3.

### Q1a. Creating a data folder

The package `os` (included in the Python Standard Library) contains a collection of functions for common **operating system commands**. You can create, delete and copy files and folders from the Python kernel. We have seen it in the two previous lectures. The function `mkdir()` creates a folder, that will be appear in the working directory of the current kernel, unless you specify a specific path.

In [1]:
import os
os.mkdir('data/')

*Note*. The slash (`/`) at the end of the folder name is not needed, but it may help to distinguish between files and folders.

### Q1b. Dowloading the zip files

The package `requests` provides **HTTP functionality**. We import it as:

In [2]:
import requests

We specify our GitHub path as usual.

In [3]:
gitpath = 'https://raw.githubusercontent.com/lab30041954/Data/main/'

To select the files to be downloaded, we will loop over ther following list:

In [4]:
gitlist = ['cats_train.zip', 'cats_test.zip', 'dogs_train.zip', 'dogs_test.zip']

The resources involved in the loop are:

* The `requests` function `get()` will send a **GET request** to GitHub. If the response is positive, the files specified will be read by the Python kernel. The argument `stream=True` is used for efficiency, but it probably does not make a difference here.

* The Python function `open()` will create a **file object** (whose name plays no role). Since these files don't exist exist, new (empty) files will be created. Then, the method `.write()` will write the content of the ZIP files to these new files. The argument `mode='wb'` means *write in binary mode*.


In [5]:
for f in gitlist:
	r = requests.get(gitpath + f, stream=True)
	conn = open('data/' + f, mode='wb')
	conn.write(r.content)
	conn.close()

## Q1c. Unzipping and removing the zip files

The package `zipfile` (also in the Python Standard Library) provides resources for zipping and unizipping files and folders in a simple way. We import it as:

In [6]:
import zipfile

To unizp the four files in a row, we create first a list of the files to unzip. Note that the `os` function `listdir()` lists both files and folders, so we have to exclude the folders from the list.

In [7]:
ziplist = [f for f in os.listdir('data/') if 'zip' in f]

Now, we loop over this list unzipping and removing the ZIP files one by one. The technicalities are:

* The `zipfile` function `Zipfile()` creates a **ZipFile object** associated to the specified file.

* The method `.extractall()` extracts the content of the ZIP file and writes it to disk.

* The Python keyword `del` is used to delete objects (from the Python kernel, not from the disk).

* The `os` function `remove()` is used to remove files from disk.

In [8]:
for f in ziplist:
	zf = zipfile.ZipFile('data/' + f, 'r')
	zf.extractall('data/')
	del zf
	os.remove('data/' + f)

Let us check that the process was carried out as expected. First, the folder `data`, which is in the working directory, contains four folders, with the appropriate names.

In [9]:
os.listdir('data/')

['dogs_train', 'cats_train', 'cats_test', 'dogs_test']

Second, every folder contains the expected number of files. For instance:

In [10]:
len(os.listdir('data/dogs_train/'))

1000

### Q2a. Converting images to tensors

We import the resources to be used for the conversion.

In [11]:
import numpy as np, matplotlib.pyplot as plt

For our training job, we create a NumPy array from every JPG file. These arrays must have the same shape so they can be packed in the training and test features arrays, and processed by a neural network model. We write a function to loop over the folders just created in question Q1. The resources involved are:

* The `matplotlib.pyplot` function `imread()` converts a JPG file to a NumPy array. This is a classic function, also incorporated in many packages. It works the same for other image formats (such as BMP or PNG).

* Every image will be converted to a 3D array of shape `(150, 150, 3)`. This will be reshaped to `(1, 150, 150, 3)`. Remember that, in the preceding lecture, the input of the convolutional network was a 4D array.

In [12]:
def img_to_arr(f):
    arr = plt.imread(f)
    reshaped_arr = arr.reshape(1, 150, 150, 3)
    return reshaped_arr

## Q2b. Training data

The training data will be made of two arrays: (a) the features, packed in an array `X_train` of shape `(2000, 150, 150, 3)`, and (b) the target vector `y_train`, which will be a 1D array with 1's (dogs) and 0's (cats). We create `X_train` using the first dog picture, so it has shape `(1, 150, 150, 3)`.

In [13]:
X_train = img_to_arr('data/dogs_train/' + os.listdir('data/dogs_train')[0])
X_train.shape

(1, 150, 150, 3)

Then, we loop over the folder `dogs-train`, adding dogs one by one with NumPy function `concatenate()`. By default, the concatenation is carried out along the first axis (`axis=0`).

In [14]:
for i in range(1, 1000):
    X_train = np.concatenate([X_train, img_to_arr('data/dogs_train/' + os.listdir('data/dogs_train')[i])])

Now, the cats from the training set.

In [15]:
for i in range(1000):
    X_train = np.concatenate([X_train, img_to_arr('data/cats_train/' + os.listdir('data/cats_train')[i])])

Finally, we rescale the pixel intensities to the 0-1 range.

In [16]:
X_train = X_train/255

The NumPy functions `ones()` and `zeros()` allow for the creation of arrays of the specified shape, filled with 1's and 0's, respectively. We out first the 1's, so they are the target values for the dog pictures.

In [17]:
y_train = np.concatenate([np.ones(1000), np.zeros(1000)])

We check now that the shapes of these arrays are the expected ones.

In [18]:
X_train.shape, y_train.shape

((2000, 150, 150, 3), (2000,))

### Q2c. Test data

We follow the same steps for the test data.

In [19]:
X_test = img_to_arr('data/dogs_test/' + os.listdir('data/dogs_test')[0])
for i in range(1, 500):
    X_test = np.concatenate([X_test, img_to_arr('data/dogs_test/' + os.listdir('data/dogs_test')[i])])
for i in range(500):
    X_test = np.concatenate([X_test, img_to_arr('data/cats_test/' + os.listdir('data/cats_test')[i])])
X_test = X_test/255
y_test = np.concatenate([np.ones(500), np.zeros(500)])
X_test.shape, y_test.shape

((1000, 150, 150, 3), (1000,))

### Q3. Training a CNN model from scratch

We import the Keras function `Input()` and the modules `models` and `layers`, as in the two preceding lectures.


In [20]:
import os
os.environ['KERAS_BACKEND'] = 'jax'
from keras import Input, models, layers

  if not hasattr(np, "object"):


Next, we specify the shape of the input tensor, which corresponds to an RGB image with resolution 150 $\times$ 150.

In [21]:
input_tensor = Input(shape=(150, 150, 3))

Now, the hidden layers. As in the preceding lecture we stack convolutional blocks (`Conv2D` plus `MaxPooling2D`). Since we are dealing with bigger images, we make the network larger, including a fourth block. The depth of the feature maps progressively increases in the network (from 32 to 128), while the size decreases (from 150 $\times$ 150 to 7 $\times$ 7). As we will see in the summary below, flattening the output of the fourth convolutional block leaves us with a tensor of length 6,272, so we reduce the dimensionality with a final `Dense` layer. This last layer returns a vector of length 512 which is expected to provide a representation of the image that helps the dogs vs cats classification. This dimensionality reduction is a standard procedure.

In [22]:
x1 = layers.Conv2D(32, (3, 3), activation='relu')(input_tensor)
x2 = layers.MaxPooling2D((2, 2))(x1)
x3 = layers.Conv2D(64, (3, 3), activation='relu')(x2)
x4 = layers.MaxPooling2D((2, 2))(x3)
x5 = layers.Conv2D(128, (3, 3), activation='relu')(x4)
x6 = layers.MaxPooling2D((2, 2))(x5)
x7 = layers.Conv2D(128, (3, 3), activation='relu')(x6)
x8 = layers.MaxPooling2D((2, 2))(x7)
x9 = layers.Flatten()(x8)
x10 = layers.Dense(512, activation='relu')(x9)

Finally, the output layer, which will return the predicted class probabilities.

In [23]:
output_tensor = layers.Dense(2, activation='softmax')(x10)

The successive application of these functions make the CNN model, which works as a flow that starts with the input tensor and ends with the output tensor.

In [24]:
clf1 = models.Model(input_tensor, output_tensor)

The table returned by the method `.summary()` illustrates this network architecture, with involves 3.45M parameters.

In [25]:
clf1.summary()

Now we apply the `.compile()` and `.fit()` methods as in the preceding lectures.  10 epochs are enough to see the limitations of this approach. We get about 70% accuracy on the test data (not negligeable), but with a clear overfitting issue. The training data do not seem to be enough for so many parameters.

In [26]:
clf1.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
clf1.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test));

Epoch 1/10
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 371ms/step - acc: 0.5435 - loss: 0.7050 - val_acc: 0.5000 - val_loss: 0.6927
Epoch 2/10
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 371ms/step - acc: 0.5645 - loss: 0.6831 - val_acc: 0.6450 - val_loss: 0.6489
Epoch 3/10
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 362ms/step - acc: 0.6140 - loss: 0.6560 - val_acc: 0.6440 - val_loss: 0.6288
Epoch 4/10
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 389ms/step - acc: 0.6490 - loss: 0.6300 - val_acc: 0.5540 - val_loss: 0.6819
Epoch 5/10
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 364ms/step - acc: 0.6820 - loss: 0.6005 - val_acc: 0.6740 - val_loss: 0.6130
Epoch 6/10
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 357ms/step - acc: 0.7340 - loss: 0.5327 - val_acc: 0.6890 - val_loss: 0.6387
Epoch 7/10
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 360ms/

### Q4a. Pre-trained CNN model

The Keras module `applications` is a legacy of Keras 2 that provides a limited (compared to the current repositories) supply or pre-trained models, but is more than enough for this example. It contains a collection of deep learning models that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning (see the homework). The model **VGG16** is a (relatively) simple CNN model with a convolutional base made of `Conv2D` and `MaxPooling2D` layers. Importing this model is straightforward.

In [27]:
from keras.applications import VGG16

We instantiate a VGG16 model. Note the choices made:

* The argument `weights='imagenet'` specifies that the initial parameter values are those obtained from training the model with the ImageNet data. We can update them in the training process, or freeze them, as we do in this example. We can also update only a subset of the weights (typically those from the last layers).

* The model can be seen as a convolutional base plus a densely connected classifier on top. With the argument `include_top=False`, this classifier, which would return probabilities for the 1,000 ImageNet classes, is discarded.

* The argument `input_shape=(150, 150, 3)` is needed only for the summary below. When creating our new model, the input shape will be specified in the usual way.

The summary shows that the VGG16 base is made of five convolutional blocks. These blocks contain two or three `Conv2D` layers. The height and width are kept constant with a trick called **padding** (look at the Keras book is you are interested).

In [28]:
conv_base = VGG16(weights='imagenet', include_top=False, input_shape=(150, 150, 3))
conv_base.summary()

We freeze the parameter values of the convolutional base, so they will not be adapted to the cats vs dogs data. This is optional, and it is even possible to freeze only the initial layers, as proposed in the homework. Freezing the whole convolutional base is pretty easy:

In [29]:
conv_base.trainable = False

### Q4b. Adding a densely connected classifier on top

Next, we build a new network adding three layers on top of the convolutional base, which is managed here as a single component. The top layers are the same as in the network of question Q3. Note that the 14,714,688 parameters of the convolutional base appear here as non-trainable.

In [30]:
input_tensor = Input(shape=(150, 150, 3))
x1 = conv_base(input_tensor)
x2 = layers.Flatten()(x1)
x3 = layers.Dense(256, activation='relu')(x2)
output_tensor = layers.Dense(2, activation='softmax')(x3)
clf2 = models.Model(input_tensor, output_tensor)
clf2.summary()

### Q5. Training the new model

Finally, we train and evaluate the new model. The improvement, with respect to the smaller network of question Q3, is quite clear.

In [31]:
clf2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
clf2.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test));

Epoch 1/5
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m122s[0m 2s/step - acc: 0.7710 - loss: 0.6388 - val_acc: 0.8610 - val_loss: 0.2944
Epoch 2/5
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m126s[0m 2s/step - acc: 0.9160 - loss: 0.2307 - val_acc: 0.8790 - val_loss: 0.2619
Epoch 3/5
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m127s[0m 2s/step - acc: 0.9355 - loss: 0.1675 - val_acc: 0.8680 - val_loss: 0.2803
Epoch 4/5
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m132s[0m 2s/step - acc: 0.9600 - loss: 0.1167 - val_acc: 0.8870 - val_loss: 0.2547
Epoch 5/5
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m131s[0m 2s/step - acc: 0.9790 - loss: 0.0782 - val_acc: 0.8760 - val_loss: 0.3003


### Removing the data

Finally, you may remove the data from your computer. Note that in Google Colab, the data are deleted automatically unless you save them in your Google Drive.

In [32]:
for d in os.listdir('data/'):
    for f in os.listdir('data/' + d):
        os.remove('data/' + d + '/' + f)
    os.rmdir('data/' + d)
os.rmdir('data')

### Homework

1. `keras.applications` offers plenty of choice for pre-trained models, beyond VGG16. `https://keras.io/api/applications` can help you to choose. For instance, you can try **Xception**, which uses some additional tricks proposed by F Chollet.

2. Transfer learning is the broad concept of reusing a pre-trained model for a new, related task, while **fine-tuning** is a specific, advanced type of transfer learning where you unfreeze some or all layers of the pre-trained model and retrain them on new data for better adaptation. In this example, you can easily unfreeze some of the last layers of the pre-trained model. For instance, after freezing all the layers of the the VGG16 model with `conv_base.trainable = False`, you can apply the loop `for l in conv_base.layers[-2]: l.trainable = True`. For the training process to work, you will have to decrease the **learning rate** (the default is `learning_rate=1e-3`). To do this, import first the module `optimizers` (`from keras import optimizers`), and then compile the model using the argument `optimizer=optimizers.Adam(learning_rate=5e-5)`.

4. If you have survived to the preceding exercises, you can play with the learning rate, to see how this affects the fitting process.