### Dataset

In this homework, we'll build a model for predicting if we have an image of a dog or a cat. For this,
we will use the "Dogs & Cats" dataset that can be downloaded from [Kaggle](https://www.kaggle.com/c/dogs-vs-cats/data). 

You need to download the `train.zip` file.

If you have troubles downloading from Kaggle, use [this link](https://github.com/alexeygrigorev/large-datasets/releases/download/dogs-cats/train.zip) instead:

```bash
wget https://github.com/alexeygrigorev/large-datasets/releases/download/dogs-cats/train.zip
```

In the lectures we saw how to use a pre-trained neural network. In the homework, we'll train a much smaller model from scratch. 

**Note:** You don't need a computer with a GPU for this homework. A laptop or any personal computer should be sufficient. 


### Data Preparation

The dataset contains 12,500 images of cats and 12,500 images of dogs. 

Now we need to split this data into train and validation

* Create a `train` and `validation` folders
* In each folder, create `cats` and `dogs` folders
* Move the first 10,000 images to the train folder (from 0 to 9999) for boths cats and dogs - and put them in respective folders
* Move the remaining 2,500 images to the validation folder (from 10000 to 12499)

You can do this manually or with Python (check `os` and `shutil` packages).


### Model

For this homework we will use Convolutional Neural Network (CNN. Like in the lectures, we'll use Keras.

You need to develop the model with following structure:

* The shape for input should be `(150, 150, 3)`
* Next, create a covolutional layer ([`Conv2D`](https://keras.io/api/layers/convolution_layers/convolution2d/)):
    * Use 32 filters
    * Kernel size should be `(3, 3)` (that's the size of the filter)
    * Use `'relu'` as activation 
* Reduce the size of the feature map with max pooling ([`MaxPooling2D`](https://keras.io/api/layers/pooling_layers/max_pooling2d/))
    * Set the pooling size to `(2, 2)`
* Turn the multi-dimensional result into vectors using a [`Flatten`](https://keras.io/api/layers/reshaping_layers/flatten/) layer
* Next, add a `Dense` layer with 64 neurons and `'relu'` activation
* Finally, create the `Dense` layer with 1 neuron - this will be the output
    * The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use [`SGD`](https://keras.io/api/optimizers/sgd/) with the following parameters:

* `SGD(lr=0.002, momentum=0.8)`


For clarification about kernel size and max pooling, check [Week #11 Office Hours](https://www.youtube.com/watch?v=1WRgdBTUaAc).

In [1]:
import os
import zipfile
import random

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix, classification_report 

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import load_img, ImageDataGenerator
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, GlobalAveragePooling2D, Dropout, BatchNormalization, InputLayer, Input
from tensorflow.keras.applications.xception import Xception, preprocess_input, decode_predictions

In [15]:
# Defining Layers
Layers = [
    keras.Input(shape=(150, 150, 3), name='InputLayer'),
    keras.layers.Conv2D(filters = 32,
                        kernel_size = (3, 3),
                        activation = 'relu',
                        name = 'ConvolutionalLayer'
                        ),
    keras.layers.MaxPool2D(pool_size = (2, 2), name = 'MaxPooling'),
    keras.layers.Flatten(name='Flatten'),
    keras.layers.Dense(units = 64, activation = 'relu', name = 'Inner'),
    keras.layers.Dense(units=1, activation='sigmoid', name='Outer')
]

# Defining Model
model = Sequential(Layers)

# Defining Optimizer
optimizer = keras.optimizers.SGD(learning_rate = 0.002, momentum = 0.8)

### Question 1

Since we have a binary classification problem, what is the best loss function for us?

Note: since we specify an activation for the output layer, we don't need to set `from_logits=True`


**Binary Cross Entropy** is the best loss fuction to be used for binary classification problem.

In [None]:
# Deining Loss Function
loss = keras.losses.BinaryCrossentropy()

# Model compilation
model.compile(loss = 'binary_crossentropy',
              optimizer = optimizer,
              metrics=['accuracy']
             )

### Question 2

What's the total number of parameters of the model? You can use the `summary` method for that. 


### Generators and Training

For the next two questions, use the following data generator for both train and validation:

```python
ImageDataGenerator(rescale=1./255)
```

* We don't need to do any additional pre-processing for the images.
* When reading the data from train/val directories, check the `class_mode` parameter. Which value should it be for a binary classification problem?
* Use `batch_size=20`

For training use `.fit()` with the following params:

```python
model.fit(
    train_generator,
    steps_per_epoch=100,
    epochs=10,
    validation_data=validation_generator,
    validation_steps=50
)
```


In [4]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
ConvolutionalLayer (Conv2D)  (None, 148, 148, 32)      896       
_________________________________________________________________
MaxPooling (MaxPooling2D)    (None, 74, 74, 32)        0         
_________________________________________________________________
Flatten (Flatten)            (None, 175232)            0         
_________________________________________________________________
Inner (Dense)                (None, 64)                11214912  
_________________________________________________________________
Outer (Dense)                (None, 1)                 65        
Total params: 11,215,873
Trainable params: 11,215,873
Non-trainable params: 0
_________________________________________________________________


In [5]:
train_datagen = ImageDataGenerator(rescale=1./255)

val_datagen = ImageDataGenerator(rescale=1./255)

In [6]:
train_generator = train_datagen.flow_from_directory(
                                    directory = "./Datasets/Session#08/train/",
                                    class_mode = "categorical",
                                    target_size = (150, 150),
                                    batch_size = 20
                                )

validation_generator = val_datagen.flow_from_directory(
                                        directory = "./Datasets/Session#08/validation/",
                                        class_mode = "categorical",
                                        target_size = (150, 150),
                                        batch_size = 20
                                    )

Found 20000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.


In [7]:
history = model.fit(
                train_generator,
                steps_per_epoch=100,
                epochs=10,
                validation_data=validation_generator,
                validation_steps=50
            )

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### Question 3

What is the median of training accuracy for this model?



In [8]:
hist = pd.DataFrame(history.history)
# Median of Training Accuracy
hist.accuracy.median()

0.5

### Question 4

What is the standard deviation of training loss for this model?

### Data Augmentation

For the next two questions, we'll generate more data using data augmentations. 

Add the following augmentations to your training data generator:

* `rotation_range=40,`
* `width_shift_range=0.2,`
* `height_shift_range=0.2,`
* `shear_range=0.2,`
* `zoom_range=0.2,`
* `horizontal_flip=True,`
* `fill_mode='nearest'`


In [9]:
# Standard Deviation of Training Loss 
hist.loss.std()

0.00010113538893088423

In [14]:
train_augen= ImageDataGenerator(rescale=1./255,
                                rotation_range = 40
                                width_shift_range=0.2,
                                height_shift_range=0.2,
                                shear_range=0.2,
                                zoom_range=0.2,
                                horizontal_flip=True,
                                fill_mode='nearest'
                                )

train_ds = train_augen.flow_from_directory(
                                directory = "./Datasets/Session#08/train/",
                                class_mode = "categorical",
                                target_size = (150, 150),
                                batch_size = 20
                            )

val_augen = ImageDataGenerator()

val_ds = val_augen.flow_from_directory(
                            directory = "./Datasets/Session#08/validation/",
                            class_mode = "categorical",
                            target_size = (150, 150),
                            batch_size = 20
                        )

Found 20000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.


### Question 5 

Let's train our model for 10 more epochs using the same code as previously.
Make sure you don't re-create the model - we want to continue training the model
we already started training.

What is the mean of validation loss for the model trained with augmentations?



In [11]:
hist_ = model.fit(
                train_ds,
                steps_per_epoch=100,
                epochs=10,
                validation_data=validation_generator,
                validation_steps=50
                )

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [12]:
hist1 = pd.DataFrame(hist_.history)
#  Mean of Validation Loss for the model trained with augmentations
hist1.val_loss.mean()

0.6931486964225769

### Question 6

What's the average of validation accuracy for the last 5 epochs (from 6 to 10)
for the model trained with augmentations?



In [13]:
#  Average of Validation Accuracy for the last 5 epochs (from 6 to 10) for the model trained with augmentations
hist1.val_accuracy[-5:].mean()

0.5