<a href="https://colab.research.google.com/github/patbaa/demo_notebooks/blob/master/cnn_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CNN fine tuning

We have seen that for the ImageNet dataset convolutional neural networks can work surprisingly well. The ImageNet dataset consists of more than 1 million training image. Until now we stated that the deep learning revolution could happen due to the increasing computational power and the enormous data availability.   
**What if we do not have too many images?**
 1. go and collect much more data
 2. use the knowledge learned from the ImageNet dataset
 
Using a neural network that was trained on 1 million ImageNet images must have a **good inner representation** (finding round objects, eyes, text etc.) for photos of various objects. The idea of transfer learning is that we rely on these representations. Unfortunately we **cannot use the pre-trained model as it is**, because we have different categories than the ImageNet, but as we have real-world (ImageNet-like) images we can use all the pre-trained CNN weights except for the last prediction layer.

So we have to change the last dense layer of 1000 neurons to match our problem. We can whether train all the weight in the neural network for a small time or we can freze all the weights except for the new last layer and train only that layer. It is also common that one trains only the last layer and when the weights converged they unfreze the rest of the weights to train all of them for a short time.

Sometimes the learning rate changes from layer to layer. We will see later that CNNs first layers are usually captures general properties of images (round object or parallel lines) and the last layers are more task specific (dog eye, human head etc.). In general during transfer learning we want to keep as much information one dataset to the other as possible. As the first layers are generic they do not need to change much, so a much lower learning rate can be fine. The last one are more task specific so we have to train them more (with higher learning rate).

In [0]:
%tensorflow_version 2.x

In [0]:
from PIL import Image
from pathlib import Path
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Dogs and cats

We will use data from the Kaggle challenge 'Dogs and cats'. To make things more difficult we restrict ourselves to a tiny subset of the training images, 300 images for dogs and 300 for cats.

<img src="https://storage.googleapis.com/kaggle-competitions/kaggle/3362/media/woof_meow.jpg">

https://www.kaggle.com/c/dogs-vs-cats/

ImageNet pre-trained models seem to be a perfect choice as ImageNet contains 100+ different dog breeds, so the network weights should be exceptional when it comes to dog classification.

**You can download and unzip the data via uncommenting the two cells below and running them!**

In [0]:
# download the prepared dataset
#!wget http://patbaa.web.elte.hu/dogscats.zip

In [0]:
# unzip it
#!unzip -q dogscats.zip

In [0]:
train_dogs = list(Path('train/dog/').glob('*'))
train_cats = list(Path('train/cat/').glob('*'))

print(f'#Training images: {len(train_cats), len(train_dogs)}')

test_dogs = list(Path('test/dog/').glob('*'))
test_cats = list(Path('test/cat/').glob('*'))

print(f'#Test images: {len(test_cats), len(test_dogs)}')

### Most of the images are proper, but a few ones are tricky
looking only at the training images, we expect similar in the test ones too!

In [0]:
Image.open('train/cat/cat.11100.jpg')

In [0]:
Image.open('train/cat/cat.11368.jpg')

In [0]:
Image.open('train/cat/cat.11184.jpg')

In [0]:
Image.open('train/dog/dog.11125.jpg')

In [0]:
Image.open('train/dog/dog.11350.jpg')

In [0]:
Image.open('train/dog/dog.11191.jpg')

In [0]:
Image.open('train/dog/dog.11299.jpg')

## Setting up the models

We will use ResNet50 (will learn about it later). We use two models:
 1. randomly initialized
 2. pre-trained on imagenet
 
As ImageNet contains 1000 categories we have to change tha last layer to have 1 neuron instead of 1000. We will train a binary classifier (0-1) to indicate if we have a cat or a dog on the image. 
 - binary crossentropy
 - sigmoid instead of softmax
   - softmax with one neuron $\to$ constant 1 prediction always
   
We could have also used 2 neurons with categorical crossentropy and softmax.

In [0]:
# we could set classes=1, but the default activation is softmax
# softmax with one neuron is not the best idea...
model = ResNet50(weights=None)
pretrained_model = ResNet50(weights='imagenet')

Removing the last layer and creating a new model which has 1 neuron at the end with sigmoid activation.

In [0]:
model._layers.pop()
inputs = model.input
output = model.layers[-1].output
output = Dense(1, activation='sigmoid')(output)
model = Model(inputs, output)

In [0]:
pretrained_model._layers.pop()
inputs = pretrained_model.input
output = pretrained_model.layers[-1].output
output = Dense(1, activation='sigmoid')(output)
pretrained_model = Model(inputs, output)

For the pre-trained model we freeze all the layers but the last.

In [0]:
for i in pretrained_model.layers[:-1]:
    i.trainable = False

Check the models and compare the trainable parameters! Later it worth to chech the training times too!

In [0]:
model.summary()

In [0]:
pretrained_model.summary()

## Dataloader

Previously we just loaded all the data to memory and feeded the neural network with it. For larger datset it often happens that it simply does not fit into the RAM. Datagenerators are functions that provide one batch of data at a time.

Tf-keras has a built-in ImageDataGenerator, we will use that here, but it does not take too much effort to write and own dataloader. Image augmentation can be done within the dataloader.

The categories are matched from the folder names.

In [0]:
def imagenet_convert(img):
    img  = img.astype(float)[...,::-1] # RGB --> BGR
    img -= [103.939, 116.779, 123.68]
    return img

In [0]:
train_datagenerator = ImageDataGenerator(preprocessing_function=imagenet_convert)
test_datagenerator  = ImageDataGenerator(preprocessing_function=imagenet_convert)

train_datagenerator = train_datagenerator.flow_from_directory(
        'train',
        target_size=(224, 224),
        batch_size=16,
        class_mode='binary')

test_datagenerator = test_datagenerator.flow_from_directory(
        'test',
        target_size=(224, 224),
        batch_size=16,
        class_mode='binary')

Compile the models with Adam optimizer using learning rate of $10^{-4}$.

In [0]:
model.compile(optimizer=Adam(lr=1e-4),loss='binary_crossentropy',metrics=['accuracy'])
pretrained_model.compile(optimizer=Adam(lr=1e-4),loss='binary_crossentropy',metrics=['accuracy'])

### Fit the models for 25 epochs and run the validation efter every 5th epoch. 

The training time is pretty low for a single epoch, however validation is much slower. It happens because we have 40x images to validate on as to train on. 

In [0]:
model.fit(train_datagenerator, 
          validation_data=test_datagenerator, 
          validation_freq=5, epochs=25)

In [0]:
pretrained_model.fit(train_datagenerator, 
                     validation_data=test_datagenerator, 
                     validation_freq=5, epochs=25)

# Summary

Both network became better by time, but the randomly initialized network could achieve only <70% (with some ideas we could surely get much better) accuracy while the pre-trained model achieved >97% accuracy.   
Also, the training time was ~50% lower for the pre-trained model for each epoch.

https://www.kaggle.com/c/dogs-vs-cats/leaderboard   
6 years ago the winner achieved 98.9%, 10th place 97.9%, 30th place was 96.7% (but on a different test set)

**We achieved those results with training for ~3 minutes (not counting the validation time).**


### Further improvements
 - clean dataset, remove mislabeled images
 - augmentation
 - test time augmentation
 - careful learning rate schedule
 - fine-tuning other pre-trained models from the [model zoo](https://www.tensorflow.org/api_docs/python/tf/keras/applications/) and averaging them (ensemble)