# Transfer Learning in Keras (VGG19)

In this notebook, we'll cover how to load a pre-trained model (VGGNet19) and finetune it for a new task: detecting hot dogs.

#### Load dependencies

In [1]:
import keras
from keras.applications.vgg19 import VGG19
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import TensorBoard

#### Load the pre-trained VGG19 model

In [2]:
vgg19 = VGG19(include_top=False,
              weights='imagenet',
              input_shape=(224,224,3),
              pooling=None)

#### Freeze all the layers in the base VGGNet19 model

In [3]:
for layer in vgg19.layers:
    layer.trainable = False

#### Add custom classification layers

In [4]:
# Instantiate the sequential model and add the VGG19 model:
model = Sequential()
model.add(vgg19)

# Add the custom layers atop the VGG19 model:
model.add(Flatten(name='flattened'))
model.add(Dropout(0.5, name='dropout'))
model.add(Dense(2, activation='softmax', name='predictions'))

#### Compile the model for training

In [5]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

#### Prepare the data for training

The dataset is available for download [here](https://www.kaggle.com/dansbecker/hot-dog-not-hot-dog/home).

Instead of downloading and copying the data to Google Drive, we can use API to download any data from Kaggle.

<span style="color:magenta">[1] sign in/register to Kaggle account</span>

<span style="color:magenta">[2] from account settings, API "create new token" and download kaggle.json</span>

<span style="color:magenta">[3] upload kaggle.json to colab sample_data</span>

In [None]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = '/content/sample_data/'

In [None]:
!kaggle datasets download -d dansbecker/hot-dog-not-hot-dog
!unzip \*.zip

In [None]:
# Define the batch size:
batch_size=32

In [None]:
# Instantiate two image generator classes:
train_datagen = ImageDataGenerator(
    rescale=1.0/255,
    data_format='channels_last',
    rotation_range=30,
    horizontal_flip=True,
    fill_mode='reflect')

valid_datagen = ImageDataGenerator(
    rescale=1.0/255,
    data_format='channels_last')

The above code is creating two instances of the `ImageDataGenerator` class from Keras, which is used to generate batches of tensor image data with real-time data augmentation. The data will be looped over (in batches) indefinitely.

The first instance, `train_datagen`, is for training data. It has several parameters set:

- `rescale=1.0/255` rescales the pixel values in images by dividing them by 255 so that they're in the range [0, 1]. This is a common preprocessing step in image processing.
- `data_format='channels_last'` specifies that the images will have their color channels as the last dimension.
- `rotation_range=30` allows the images to be randomly rotated by up to 30 degrees.
- `horizontal_flip=True` allows the images to be randomly flipped horizontally.
- `fill_mode='reflect'` specifies how to fill in newly created pixels, which can appear after a rotation or width/height shift.

The second instance, `valid_datagen`, is for validation data. It only has two parameters set:

- `rescale=1.0/255` (as above)
- `data_format='channels_last'` (as above)

The validation data generator doesn't include the same data augmentation parameters as the training data generator because we typically want to validate our model on unmodified data.

In [None]:
# Define the train and validation generators:
train_generator = train_datagen.flow_from_directory(
    directory='./train',
    target_size=(224, 224),
    classes=['hot_dog','not_hot_dog'],
    class_mode='categorical',
    batch_size=batch_size,
    shuffle=True,
    seed=42)

valid_generator = valid_datagen.flow_from_directory(
    directory='./test',
    target_size=(224, 224),
    classes=['hot_dog','not_hot_dog'],
    class_mode='categorical',
    batch_size=batch_size,
    shuffle=True,
    seed=42)

The above Python code is using Keras to prepare data for training and validating a model. 

The `train_generator` and `valid_generator` are objects that will generate batches of image data on-the-fly during the model training and validation process.

`flow_from_directory` is a method that loads images from a directory, and generates batches of image data and their labels.

Here's what each parameter does:

- `directory`: This is the path to the directory where the images are stored. For `train_generator`, images are loaded from the './kaggle/train/' directory, and for `valid_generator`, images are loaded from the './kaggle/test' directory.

- `target_size`: This is the dimensions to which all images found will be resized. In this case, all images will be resized to 224x224 pixels.

- `classes`: This is a list of class names. Here, there are two classes: 'hot_dog' and 'not_hot_dog'.

- `class_mode`: Determines the type of label arrays that are returned. 'categorical' means that the labels will be in categorical format (e.g., if there are two classes, and an image belongs to the second class, its label will be [0, 1]).

- `batch_size`: The number of images to generate per batch. This is determined by the variable `batch_size`.

- `shuffle`: Whether to shuffle the data. If set to True, the data will be randomly shuffled at each epoch (iteration over the entire dataset).

- `seed`: Random seed for applying random image augmentation and shuffling the order of the image.

In [None]:
tensorboard = TensorBoard('logs/transfer')

In [None]:
model.fit(train_generator,
          steps_per_epoch=15,
          epochs=20,
          validation_data=valid_generator,
          validation_steps=15,
          callbacks=[tensorboard])

In [None]:
%reload_ext tensorboard
%tensorboard --logdir logs/transfer