<a href="https://colab.research.google.com/github/nyp-sit/it3103/blob/main/week3/convnets_with_small_datasets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab Exercise: Image Classification using Convolutional Neural Network 

In this practical, we will see how we can use a Convolutional Neural Network to classify cat and dog images.

We will train the network using relatively little data (about 2000 images) which is a common real problem with a lot of deep learning projects where data is hard to come by. We have learnt in the lecture how we can solve the small data problem with some common techniques like data augmentation and transfer learning. We will examine how to use data augmentation in this lab and in the next lab, we will learn to use transfer learning.  

In [None]:
import tensorflow as tf
import tensorflow.keras as keras
import matplotlib.pyplot as plt
import os

## Downloading the data

The cats vs. dogs dataset is available at Kaggle.com as part of a computer vision 
competition in late 2013. You can download the [original dataset](
https://www.kaggle.com/c/dogs-vs-cats/data) from Kaggle (you will need to create a Kaggle account if you don't already have one)

The pictures are medium-resolution color JPEGs and are of various sizes and shapes that look like this:

<img src='https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/it3103/resources/cats_vs_dogs_samples.png' height='300'/>

This original dataset contains 25,000 images of dogs and cats (12,500 from each class) and is 543MB large (compressed). For the purpose of demonstrating challenges of training with small data set and also to have an opportunity to see the effects of using data augmentation technique, we will use a smaller subset (3000 images) which you can download from [here](https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/cats_and_dogs_subset.tar.gz). 

In the codes below, we use the keras ``get_file()`` utility to download and unzip the dataset.

In [None]:
dataset_URL = 'https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/cats_and_dogs_subset.tar.gz'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs_subset.tar.gz', origin=dataset_URL, extract=True, cache_dir='.')
dataset_dir = os.path.join(os.path.dirname(path_to_zip), "cats_and_dogs_subset")

In [None]:
dogs_dir = os.path.join(dataset_dir, "dogs")
cats_dir = os.path.join(dataset_dir, "cats")
print('total dog images:', len(os.listdir(dogs_dir)))
print('total cat images:', len(os.listdir(cats_dir)))

So we indeed have 3000 training images, 1500 each for cats and dogs. This is a balanced binary classification problem, which means that classification accuracy will be an appropriate measure of success.

## Building our network

Our convnet will be a stack of alternate `Conv2D` (with `relu` activation) and `MaxPooling2D` layers.

**Exercise 1**: 

Write the codes to implement the following: 

- Input layer should be of shape (128,128,3)
- Rescaling layer to scale pixel values to between 0,1
- The hidden layers consist of the following Conv2D/MaxPooling2D blocks:
  - Block 1: Conv layer with 32 filters with filter size of 3x3, followed by MaxPooling layer
  - Block 2: Conv layer with 64 filters with filter size of 3x3, followed by MaxPooling layer
  - Block 3 and 4: Conv layer with 128 filters with filter size of 3x3, followed by MaxPooling layer
  - A Layer to convert 2D to 1D
- A Dense Layer with 512 neurons
- Output layer using Dense Layer

Use RELU as activation functions for all hidden layers. 

What activation function should you use for the output layer?

<br/>

<details>
<summary>Click here for answer</summary>

```python
def make_model():
    
    model = keras.models.Sequential()
    model.add(keras.layers.Input(shape=(128, 128, 3)))
    model.add(keras.layers.Rescaling(scale=1./255))
    model.add(keras.layers.Conv2D(32, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Conv2D(128, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Conv2D(128, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Flatten())
    model.add(keras.layers.Dense(512, activation='relu'))
    model.add(keras.layers.Dense(1, activation='sigmoid'))

    return model

model = make_model()

```
</details>

In [None]:
### TODO: Write the code to build the model and compile the model 
def make_model():
    
    model = ??

    return model

model = make_model()


Let's print the model summary to show the shape and paramater numbers for each layer. Your output should look something like this: 

<img src="https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/it3103/resources/convnet_summary.png" width=400/>

In [None]:
model.summary()

**Exercise 2**: 

Compile your model with the appropriate optimizer and loss function. We will use Adam with learning rate of 1e-4 and monitor the 'accuracy' metrics. What should we use for the loss function? 

Complete the code below. 

<br/>
<details>
<summary>Click here for answer</summary>

```
model.compile(loss='binary_crossentropy',
              optimizer=keras.optimizers.Adam(learning_rate=1e-4),
              metrics=['accuracy'])
```
</details>

In [None]:
### TODO: Complete the code below ####


## Data preprocessing

Image data should be formatted into appropriately pre-processed floating point tensors before being fed into our 
network. Currently, our data sits on a drive as JPEG files, so the steps for getting it into our network are roughly:

* Read the picture files.
* Decode the JPEG content to RGB grids of pixels.
* Resize the image into same size (e.g, 128 x 128 in our case)
* Convert these into floating point tensors.

It may seem a bit daunting, but `tf.keras.preprocessing.image_dataset_from_directory()` (similar to `ImageDataGenerator` class) allows us to do all this rather painlessly. It also allows us to specify how to split the data into training and validation set.  We will use 80-20 split. 

`tf.keras.preprocessing.image_dataset_from_directory()` creates a dataset iterator using the [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)

In [None]:
img_height, img_width = 128, 128
batch_size = 32

# resize all the images to the same size as expected by VGG model we downloaded above
image_size = (img_height, img_width)

train_ds = keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)
val_ds = keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)

We can use the `take(1)` to retrieve 1 batch of samples from the train dataset. As we specify batch size of 32,  it yields batches of 120 x 128 RGB images (shape `(32, 128, 128, 3)`) and binary 
labels (shape `(32,1)`). 32 is the number of samples in each batch (the batch size).

In [None]:
for images, labels in train_ds.take(1):
    print('images shape:', images.shape)
    print('labels shape:', labels.shape)
    print(tf.squeeze(labels))

How do we know what label is assigned to each of the class? We can use class_names of the dataset to show the class labels. The index position will be the actual numeric label mapped to the class names. In this case, cats=0, dogs=1. By default, `keras.preprocessing.image_dataset_from_directory` assign the labels based on alphanumerical order.

In [None]:
train_ds.class_names

## Visualization using Tensorboard

Let's define a utility function to create a Tensorboard callback function that can be used by the model training later. We will also create a ModelCheckpoint callback tha allows us save the best checkpoint (in terms of validation accuracy) during the training. 


In [None]:
def create_tb_callback(): 

    root_logdir = os.path.join(os.curdir, "tb_logs")

    def get_run_logdir():    # use a new directory for each run
	    import time
	    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
	    return os.path.join(root_logdir, run_id)

    run_logdir = get_run_logdir()

    tb_callback = tf.keras.callbacks.TensorBoard(run_logdir)

    return tb_callback

model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath="bestcheckpoint",
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)


## Train the Model

Let's start the training

In [None]:
model.fit(
    train_ds,
    epochs=30,
    validation_data=val_ds,
    callbacks=[create_tb_callback(), model_checkpoint_callback])

Let's restore the best checkpoints to the model (remember the last checkpoint may not be the best model checkpoint). We will use the model to evaluate the our validation dataset, just to see what is our best validation accuracy achieved.

In [None]:
model.load_weights('bestcheckpoint')
model.evaluate(val_ds)

Let's visualize our training and validation accuracy and loss using Tensorboard.

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [None]:
%tensorboard --logdir tb_logs

These plots are characteristic of overfitting. Our training accuracy increases linearly over time, until it reaches nearly 100%, while our 
validation accuracy stalls at 75%. Our validation loss reaches its minimum after only five epochs then stalls, while the training loss 
keeps decreasing linearly until it reaches nearly 0.

Because we only have relatively few training samples (2400), overfitting is going to be our number one concern. There are a 
number of techniques that can help mitigate overfitting, such as dropout and weight decay (L2 regularization). We are now going to 
use one, specific to computer vision, and used almost universally when processing images with deep learning models: *data 
augmentation*.

## Using data augmentation

Overfitting is caused by having too few samples to learn from, rendering us unable to train a model able to generalize to new data. 
Given infinite data, our model would be exposed to every possible aspect of the data distribution at hand: we would never overfit. Data 
augmentation takes the approach of generating more training data from existing training samples, by "augmenting" the samples via a number 
of random transformations that yield believable-looking images. This helps the model get exposed to more aspects of the data and generalize better. 

In the code below, we will check the tensorflow version and instantiate the correct layer depending on the version. We only one RandomRotation layer in the example below. The value 0.3 refers to the maximum rotation angle in both clock-wise and anti-clockwise direction. You can find out more info from the [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomRotation)

In [None]:
data_augmentation = keras.Sequential(
    [
        keras.layers.RandomRotation(0.3)
    ]
)


To see the effects of data augmentation, let us apply our data_augmentation layer to a sample image.

In [None]:
import matplotlib.pyplot as plt 

images, _ = next(train_ds.take(1).as_numpy_iterator())
sample_image = images[0]/255.
plt.imshow(sample_image)
sample_image = tf.expand_dims(sample_image, 0)
print(sample_image.shape)

In [None]:
plt.figure(figsize=(20, 20))
for i in range(19):
    augmented_image = data_augmentation(sample_image)
    ax = plt.subplot(5, 4, i + 1)
    plt.imshow(augmented_image[0])
    plt.axis("off")

**Exercise 3:**

Modify `data_augmention` above to add in [Random Flipping](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip) and [Random Zoom](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomZoom). Choose the appropriate values for the flipping and cropping heights/widths. 

<details><summary>Click here for answer</summary>

```python

data_augmentation = keras.Sequential(
    [
        keras.layers.RandomRotation(0.3),
        keras.layers.RandomFlip(mode="horizontal"),
        keras.layers.RandomZoom(0.2)
    ]
)
```
</details>


In [None]:
data_augmentation = ??

**Exercise 4:**

Modify `make_model()` to apply data augmention layers you have created earlier. Where should you place your augmentation layer?

To further fight overfitting, we will also add a Dropout layer to our model, right before the densely-connected classifier.

<details>
<summary>Click here for answer</summary>

```python
def make_model():

    model = keras.models.Sequential()
    model.add(keras.layers.Input(shape=(128, 128, 3)))
    model.add(data_augmentation)
    model.add(keras.layers.Rescaling(scale=1./255))
    model.add(keras.layers.Conv2D(32, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Conv2D(128, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Conv2D(128, (3, 3), activation='relu'))
    model.add(keras.layers.MaxPooling2D((2, 2)))
    model.add(keras.layers.Flatten())
    model.add(layers.Dropout(0.5))
    model.add(keras.layers.Dense(512, activation='relu'))
    model.add(keras.layers.Dense(1, activation='sigmoid'))

    return model

model = make_model()

model.compile(loss='binary_crossentropy',
              optimizer=keras.optimizers.Adam(learning_rate=1e-4),
              metrics=['accuracy'])
```

</details>


In [None]:
def make_model():

    model = keras.models.Sequential()
    
    model.add(??)

    return model

model = make_model()

model.compile(??)


In [None]:
model.summary()

Let's train our network using data augmentation and dropout. We also need to train for more epochs, so that our network has better chance of seeing all the original images (since now we cannot guarantee that for each epoch, our original image is chosen at least once, instead, the ImageDataGenerator may choose randomly transformed image instead)

In [None]:
### Note: the training will take quite a while. We have previously trained the model for 100-epochs.
### You can download the checkpoints by uncommenting the following and skip the next cell "mode.fit()"

# !wget https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/it3103/checkpoints/week3-1-100epochs.zip
# !unzip week3-1-100epochs.zip

In [None]:
## Comment out this if you just want to use the pretrained weights
model.fit(train_ds, validation_data=val_ds, 
          epochs=100, 
          callbacks=[create_tb_callback(), model_checkpoint_callback])

In [None]:
model.load_weights("bestcheckpoint")
model.evaluate(val_ds)

Let's visualize our training using Tensorboard.

In [None]:
%tensorboard --logdir tb_logs

Thanks to data augmentation and dropout, we are no longer overfitting: the training curves are more closely tracking the validation 
curves. We are now able to reach an accuracy of 80%, slightly better than previously.

However, it would be very difficult to improve the model any further even with data augmentation. The augmented images are still heavily correlated, since they come from a small number of original images -- we cannot produce new information, we can only remix existing information. As next step to improve our accuracy on this problem, we will have to leverage transfer learning using pre-trained model, which will be the focus of the lesson.