## Homework : 08-deep-learning

> **Note**: it's very likely that in this homework your answers won't match 
> the options exactly. That's okay and expected. Select the option that's
> closest to your solution.

### Dataset

In this homework, we'll build a model for classifying various hair types. 
For this, we will use the Hair Type dataset that was obtained from 
[Kaggle](https://www.kaggle.com/datasets/kavyasreeb/hair-type-dataset) 
and slightly rebuilt. 

You can download the target dataset for this homework from 
[here](https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip):

```bash
wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
unzip data.zip
```

In the lectures we saw how to use a pre-trained neural network. In the homework, we'll train a much smaller model from scratch. 

> **Note:** you will need an environment with a GPU for this homework. We recommend to use [Saturn Cloud](https://bit.ly/saturn-mlzoomcamp). 
> You can also use a computer without a GPU (e.g. your laptop), but it will be slower.


### Data Preparation

The dataset contains around 1000 images of hairs in the separate folders 
for training and test sets. 

### Reproducibility

Reproducibility in deep learning is a multifaceted challenge that requires attention 
to both software and hardware details. In some cases, we can't guarantee exactly 
the same results during the same experiment runs. Therefore, in this homework we suggest to:
* install tensorflow version 2.17.1
* set the seed generators by:

```python
import numpy as np
import tensorflow as tf

SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)
```

In [1]:
! pip install tensorflow==2.17.1



In [2]:
import numpy as np
import statistics

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.preprocessing.image import ImageDataGenerator

2024-12-05 20:45:01.393040: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-05 20:45:01.411051: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-05 20:45:01.430517: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-05 20:45:01.436370: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-05 20:45:01.450267: I tensorflow/core/platform/cpu_feature_guar

In [3]:
SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)

In [4]:
tf.__version__

'2.17.1'

### Model

For this homework we will use Convolutional Neural Network (CNN). Like in the lectures, we'll use Keras.

You need to develop the model with following structure:

* The shape for input should be `(200, 200, 3)`
* Next, create a convolutional layer ([`Conv2D`](https://keras.io/api/layers/convolution_layers/convolution2d/)):
    * Use 32 filters
    * Kernel size should be `(3, 3)` (that's the size of the filter)
    * Use `'relu'` as activation 
* Reduce the size of the feature map with max pooling ([`MaxPooling2D`](https://keras.io/api/layers/pooling_layers/max_pooling2d/))
    * Set the pooling size to `(2, 2)`
* Turn the multi-dimensional result into vectors using a [`Flatten`](https://keras.io/api/layers/reshaping_layers/flatten/) layer
* Next, add a `Dense` layer with 64 neurons and `'relu'` activation
* Finally, create the `Dense` layer with 1 neuron - this will be the output
    * The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use [`SGD`](https://keras.io/api/optimizers/sgd/) with the following parameters:

* `SGD(lr=0.002, momentum=0.8)`

For clarification about kernel size and max pooling, check [Office Hours](https://www.youtube.com/watch?v=1WRgdBTUaAc).


In [5]:
def create_cnn_model():
    model = models.Sequential()

    # Convolutional layers
    model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(200, 200, 3)))
    model.add(layers.MaxPooling2D((2, 2)))

    # Flatten the feature maps
    model.add(layers.Flatten())

    # Fully connected layers
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(1, activation='sigmoid'))  # Binary classification output layer

    return model

In [6]:
# Create the model
model = create_cnn_model()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
I0000 00:00:1733431503.778180   14665 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1733431503.829435   14665 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1733431503.829692   14665 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1733431503.83069

In [7]:
# Compile the model

optimizer = SGD(learning_rate=0.002, momentum=0.8)

model.compile(optimizer=optimizer,
              loss='binary_crossentropy',  # Binary classification loss
              metrics=['accuracy'])

### Question 1

Since we have a binary classification problem, what is the best loss function for us?

* `mean squared error`
* `binary crossentropy`
* `categorical crossentropy`
* `cosine similarity`

**Answer: `binary crossentropy`**

> **Note:** since we specify an activation for the output layer, we don't need to set `from_logits=True`


### Question 2

What's the total number of parameters of the model? You can use the `summary` method for that. 

* 896 
* 11214912
* 15896912
* 20072512

**Answer: `20072512`**

In [8]:
model.summary()

### Generators and Training

For the next two questions, use the following data generator for both train and test sets:

```python
ImageDataGenerator(rescale=1./255)
```

* We don't need to do any additional pre-processing for the images.
* When reading the data from train/test directories, check the `class_mode` parameter. Which value should it be for a binary classification problem?
* Use `batch_size=20`
* Use `shuffle=True` for both training and test sets. 

For training use `.fit()` with the following params:

```python
model.fit(
    train_generator,
    epochs=10,
    validation_data=test_generator
)
```

In [9]:
input_size = 200

train_gen = ImageDataGenerator(
    rescale=1./255,
    # preprocessing_function=preprocess_input,
    # shear_range=10,
    # zoom_range=0.1,
    # horizontal_flip=True
)

train_ds = train_gen.flow_from_directory(
    './data/train',
    target_size=(input_size, input_size),
    class_mode='binary',
    batch_size=20,
    shuffle=True
)


test_gen = ImageDataGenerator(
    rescale=1./255,    
    # preprocessing_function=preprocess_input
)

test_ds = test_gen.flow_from_directory(
    './data/test',
    target_size=(input_size, input_size),
    class_mode='binary',
    batch_size=20,
    shuffle=True
)

Found 800 images belonging to 2 classes.
Found 204 images belonging to 2 classes.


In [10]:
history = model.fit(train_ds,
                    epochs=10,
                    validation_data=test_ds
                   )

  self._warn_if_super_not_called()


Epoch 1/10


I0000 00:00:1733431505.918096   14721 service.cc:146] XLA service 0x7efdd4008560 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1733431505.918132   14721 service.cc:154]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2024-12-05 20:45:05.937614: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-12-05 20:45:06.054320: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8900
2024-12-05 20:45:07.254792: W external/local_xla/xla/service/gpu/nvptx_compiler.cc:762] The NVIDIA driver's CUDA version is 12.4 which is older than the ptxas CUDA version (12.5.82). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.


[1m 3/40[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 51ms/step - accuracy: 0.4667 - loss: 0.9041

I0000 00:00:1733431507.456415   14721 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 164ms/step - accuracy: 0.5153 - loss: 0.8151 - val_accuracy: 0.5147 - val_loss: 0.6923
Epoch 2/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 124ms/step - accuracy: 0.5251 - loss: 0.6900 - val_accuracy: 0.5392 - val_loss: 0.6867
Epoch 3/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 127ms/step - accuracy: 0.5407 - loss: 0.6801 - val_accuracy: 0.5735 - val_loss: 0.6734
Epoch 4/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 125ms/step - accuracy: 0.6290 - loss: 0.6560 - val_accuracy: 0.5637 - val_loss: 0.6634
Epoch 5/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 126ms/step - accuracy: 0.6628 - loss: 0.6276 - val_accuracy: 0.6176 - val_loss: 0.6571
Epoch 6/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 126ms/step - accuracy: 0.6797 - loss: 0.6277 - val_accuracy: 0.6225 - val_loss: 0.6410
Epoch 7/10
[1m40/40[0m [32m━━━━━━━━━

In [11]:
# Print all available keys in the history dictionary
print(history.history.keys())

dict_keys(['accuracy', 'loss', 'val_accuracy', 'val_loss'])


In [12]:
# Access specific metrics
print(history.history['accuracy'])  # Training accuracy
print('-' * 50)
print(history.history['loss'])  # Training loss
print('=' * 50)

print(history.history['val_accuracy'])  # Validation accuracy
print('-' * 50)
print(history.history['val_loss'])  # Validation loss
print('=' * 50)

[0.5350000262260437, 0.5287500023841858, 0.5625, 0.6312500238418579, 0.6737499833106995, 0.6762499809265137, 0.6825000047683716, 0.7049999833106995, 0.699999988079071, 0.6875]
--------------------------------------------------
[0.7380661964416504, 0.688362181186676, 0.675106942653656, 0.6527014374732971, 0.6317003965377808, 0.623220682144165, 0.6142894625663757, 0.5991724133491516, 0.6005693078041077, 0.5949133038520813]
[0.5147058963775635, 0.5392156839370728, 0.5735294222831726, 0.563725471496582, 0.6176470518112183, 0.6225489974021912, 0.6372548937797546, 0.6274510025978088, 0.5980392098426819, 0.6421568393707275]
--------------------------------------------------
[0.6923468112945557, 0.6866991519927979, 0.6733673214912415, 0.6633828282356262, 0.6570571660995483, 0.6409817337989807, 0.6423211693763733, 0.6327053308486938, 0.6853362917900085, 0.6194398403167725]


### Question 3

What is the median of training accuracy for all the epochs for this model?

* 0.10
* 0.32
* 0.50
* 0.72

**Answer: `0.72`**

In [13]:
train_acc = history.history['accuracy']
train_acc

[0.5350000262260437,
 0.5287500023841858,
 0.5625,
 0.6312500238418579,
 0.6737499833106995,
 0.6762499809265137,
 0.6825000047683716,
 0.7049999833106995,
 0.699999988079071,
 0.6875]

In [14]:
# Calculate median
median = statistics.median(train_acc)
print("Median:", median) 

Median: 0.6749999821186066


### Question 4

What is the standard deviation of training loss for all the epochs for this model?

* 0.028
* 0.068
* 0.128
* 0.168

**Answer: `0.068`**

In [15]:
train_loss = history.history['loss']
train_loss

[0.7380661964416504,
 0.688362181186676,
 0.675106942653656,
 0.6527014374732971,
 0.6317003965377808,
 0.623220682144165,
 0.6142894625663757,
 0.5991724133491516,
 0.6005693078041077,
 0.5949133038520813]

In [16]:
# Calculate standard deviation
std_dev = statistics.stdev(train_loss)
print("Standard Deviation:", std_dev)

Standard Deviation: 0.04664627664713021


### Data Augmentation

For the next two questions, we'll generate more data using data augmentations. 

Add the following augmentations to your training data generator:

* `rotation_range=50,`
* `width_shift_range=0.1,`
* `height_shift_range=0.1,`
* `zoom_range=0.1,`
* `horizontal_flip=True,`
* `fill_mode='nearest'`

In [17]:
train_gen = ImageDataGenerator(
    rescale=1./255,
    # preprocessing_function=preprocess_input,
    rotation_range=50,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

train_ds = train_gen.flow_from_directory(
    './data/train',
    target_size=(input_size, input_size),
    class_mode='binary',
    batch_size=20,
    shuffle=True
)


test_gen = ImageDataGenerator(
    rescale=1./255,    
    # preprocessing_function=preprocess_input
)

test_ds = test_gen.flow_from_directory(
    './data/test',
    target_size=(input_size, input_size),
    class_mode='binary',
    batch_size=20,
    shuffle=True
)

Found 800 images belonging to 2 classes.
Found 204 images belonging to 2 classes.


### Question 5 

Let's train our model for 10 more epochs using the same code as previously.
> **Note:** make sure you don't re-create the model - we want to continue training the model
we already started training.

What is the mean of test loss for all the epochs for the model trained with augmentations?

* 0.26
* 0.56
* 0.86
* 1.16

**Answer: `0.56`**

In [20]:
# Continue training the model
additional_history = model.fit(train_ds,
                    epochs=history.epoch[-1] + 1 + 10,  # Last epoch + 1 (to start from the next epoch) + 10 additional epochs
                    initial_epoch=history.epoch[-1] + 1, # Start from the next epoch
                    validation_data=test_ds
                   )

# Combine the histories
for key in history.history.keys():
    history.history[key].extend(additional_history.history[key])

# Update the epoch information
history.epoch.extend(additional_history.epoch)

Epoch 11/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 238ms/step - accuracy: 0.6393 - loss: 0.6388

  self._warn_if_super_not_called()


[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 266ms/step - accuracy: 0.6387 - loss: 0.6391 - val_accuracy: 0.6029 - val_loss: 0.6585
Epoch 12/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 264ms/step - accuracy: 0.6001 - loss: 0.6615 - val_accuracy: 0.6471 - val_loss: 0.6481
Epoch 13/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 262ms/step - accuracy: 0.5739 - loss: 0.6544 - val_accuracy: 0.6324 - val_loss: 0.6410
Epoch 14/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 264ms/step - accuracy: 0.6030 - loss: 0.6484 - val_accuracy: 0.6471 - val_loss: 0.6199
Epoch 15/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 265ms/step - accuracy: 0.6177 - loss: 0.6475 - val_accuracy: 0.6471 - val_loss: 0.6357
Epoch 16/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 264ms/step - accuracy: 0.5961 - loss: 0.6610 - val_accuracy: 0.6422 - val_loss: 0.6208
Epoch 17/20
[1m40/40[0m [

In [21]:
# Access specific metrics
print(history.history['accuracy'])  # Training accuracy
print('-' * 50)
print(history.history['loss'])  # Training loss
print('=' * 50)

print(history.history['val_accuracy'])  # Validation accuracy
print('-' * 50)
print(history.history['val_loss'])  # Validation loss
print('=' * 50)

[0.5350000262260437, 0.5287500023841858, 0.5625, 0.6312500238418579, 0.6737499833106995, 0.6762499809265137, 0.6825000047683716, 0.7049999833106995, 0.699999988079071, 0.6875, 0.6162499785423279, 0.6137499809265137, 0.5899999737739563, 0.6187499761581421, 0.606249988079071, 0.6137499809265137, 0.6387500166893005, 0.6399999856948853, 0.6512500047683716, 0.6349999904632568]
--------------------------------------------------
[0.7380661964416504, 0.688362181186676, 0.675106942653656, 0.6527014374732971, 0.6317003965377808, 0.623220682144165, 0.6142894625663757, 0.5991724133491516, 0.6005693078041077, 0.5949133038520813, 0.6521779894828796, 0.6523903608322144, 0.6558489799499512, 0.6472030878067017, 0.646858811378479, 0.6523444652557373, 0.6349123120307922, 0.6341730356216431, 0.6251964569091797, 0.6322131156921387]
[0.5147058963775635, 0.5392156839370728, 0.5735294222831726, 0.563725471496582, 0.6176470518112183, 0.6225489974021912, 0.6372548937797546, 0.6274510025978088, 0.598039209842681

In [26]:
with_aug_test_loss = history.history['val_loss'][-10:]
with_aug_test_loss

[0.6585004329681396,
 0.648069441318512,
 0.6410242915153503,
 0.6199275851249695,
 0.6356923580169678,
 0.6208052635192871,
 0.6196398735046387,
 0.6246304512023926,
 0.6334254145622253,
 0.6175970435142517]

In [27]:
# Calculate mean of test loss for all epochs
mean = statistics.mean(with_aug_test_loss)
print("Mean:", mean) 

Mean: 0.6319312155246735


### Question 6

What's the average of test accuracy for the last 5 epochs (from 6 to 10)
for the model trained with augmentations?

* 0.31
* 0.51
* 0.71
* 0.91

**Answer: `0.71`**

In [28]:
last_5_epochs_accuracy = history.history['val_accuracy'][-5:]
last_5_epochs_accuracy

[0.6421568393707275,
 0.6519607901573181,
 0.6372548937797546,
 0.6421568393707275,
 0.6715686321258545]

In [29]:
average_accuracy = sum(last_5_epochs_accuracy) / len(last_5_epochs_accuracy)

print("Last 5 epochs accuracies:", last_5_epochs_accuracy)
print("Average accuracy:", average_accuracy)

Last 5 epochs accuracies: [0.6421568393707275, 0.6519607901573181, 0.6372548937797546, 0.6421568393707275, 0.6715686321258545]
Average accuracy: 0.6490195989608765


In [30]:
# Another way
statistics.mean(last_5_epochs_accuracy)

0.6490195989608765