# Overfitting Experimentation

In model_selection.ipynb we selected convolutional neural networks as our model class and identified overfitting as a major issue with this dataset. 
In this notebook we experiment with different ways to configure our model for reducing overfitting and with including more variety into our data by using data generators.

We do this in two independent experiments. 




. These are:
- Dropout Layers
- Batch Normalization
- Regularization
- weight constraints

In addition, we experiment with data generators to include more variety in our dataset.




In this notebook we carry out experiments to select which machine learning model 

- feature engineering
- proportional train test split
- data Generator
- dropout
- batch normalization
- statistical tests

In [1]:
# correct working directory. But only once. 
if not "working_directory_corrected" in vars():
    %cd ..
    working_directory_corrected = True

import numpy as np
import matplotlib.pyplot as plt 

import tensorflow as tf
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential  # Model type to be used
from tensorflow.keras.layers import Dense, Activation # Types of layers to be used in our model
from tensorflow.keras.layers import Conv2D, Conv3D, MaxPooling2D, MaxPooling3D, ZeroPadding2D, GlobalAveragePooling2D, Flatten

from models.CNN_Builder import CNN_Builder

from evaluation.multi_run_evaluation import Multi_Run_Evaluation
from evaluation.evaluation_metrics import Evaluation_Metrics
from data.dataset import Dataset
dataset = Dataset()

c:\Users\frank\Documents\Teaching\LU\Spring2023\Machine Learning² Unit\Machine Learning Example Project


## Data Loading

Before carrying out the experiments, let's load our dataset.

In addition to loading our dataset we will use image generators for the remainder of the experiment. For now, we will generate dummy image generators that don't include any changes.

In [2]:
X,y = dataset.get_prepared_data()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
y_train_prob = np.array([y_train[i] for i in range(len(y_train))])

train_datagen = ImageDataGenerator(#rotation_range=20,
                                    #zoom_range = 0.15,
                                    #width_shift_range=0.2,
                                    #height_shift_range=0.2,
                                    #shear_range=0.15,
                                   horizontal_flip=False,
                                   vertical_flip=False,
                                   fill_mode="nearest")
train_generator = train_datagen.flow(X_train,y_train_prob)

## Experiment 1: Network Configuration

In this experiment we measure the influence of network configurations on generalization. For this purpose we test the following configurations:
- Dropout Layers
- Batch Normalization
- Regularization
- weight constraints

### Experiment Setup:
For our experiment we learn the network identified in previous experiments once with each of the configuration parameters and without and compare the hamming score.

As we have noted in past experiments, the variance of different weight initializations is a concern. For this reason, we implemented the class *Multi_Run_Evaluaton* which can run an experiment multiple times, recording the evaluation metrics and calculating minimum, maximum, mean and standard deviation. 

To make the configuration code less verbous and less error prone we also impemented a builder lass that assembles the CNN. 

During our experiment we ran the cell below with different values for *apply_regularization*, *apply_dropout*, *apply_batch_normalization* and *weight_constraints" and recorded results in the cell below. 

In [10]:


cnn_builder = CNN_Builder(convolutional_layers=[16],
                            fully_connected_layers=[100],
                            in_shape=(X_train.shape[1],X_train.shape[2],X_train.shape[3]),
                            out_shape=y_train.shape[1])
cnn_builder.apply_regularization = True
cnn_builder.apply_dropout = True
cnn_builder.apply_batch_normalization = True
cnn_builder.weight_constraints = True



evaluator = Multi_Run_Evaluation(cnn_builder.build_model)

evaluator.evaluate( nr_runs=10, 
                    epochs=100, 
                    early_stopping_patience=5, 
                    train_generator= train_generator, 
                    X_train=X_train, 
                    y_train=y_train, 
                    X_test=X_test,
                    y_test=y_test)


evaluator.print_metrics()





 running experiment 1 of 10
Epoch 1/100


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100

KeyboardInterrupt: 

### Results

Running our Experiment yielded the following results:

| Configuration          | Min  | Max  | Mean  | St. Dev |
| ---------------------- | ---- | ---- | ----- | ------- |
| No Configuration       | 0.17 | 0.21 | 0.195 | 0.012   |
| Regularization         | 0.16 | 0.21 | 0.184 | 0.017   |
| Dropout                | 0.17 | 0.21 | 0.19  | 0.011   |
| Batch Normalization    | 0.02 | 0.16 | 0.109 | 0.040   |
| Weight Constraints     | 0.11 | 0.20 | 0.163 | 0.029   |

### Interpretation
None of the configurations have improved the Hamming score of the test set. While regularization and Dropout are close - and thus could be a result of random variation, batch normalization led to a significantly worse performance. 

This does not necessarily mean, that these configurations are not useful. It may mean that their effect is overshadowed by the general issue of having a very small data set. In previous experiments we have seen dropout increase the hammilton score slightly, so there seems to be some promise. 

For now, this experiment is inconclusive. we will revisit this experiment after increasing the data set complexity with data generators, hoping that this will increase the complexity enough for these regularizations to make a consistent difference. 



## Experiment 2: Image Generators

In this experiment we introduce variabilty into the training set by using image generators. We hope that this increases the variability of input seen by the model and helps it generalize. 

Specifically, we try to:
- Mirror images horizonzally
- Rotate the image up to 30 degrees in either direction
- Shift the image by 10% of the image size in horizontal or vertical direction
- change brightness by 20%
- zooming in or out by 20%

### Experiment Setup:
For our experiment we learn the network identified in previous experiments once with each of the configuration parameters and without and compare the hamming score.

As in the previoius experiment, we use the CNN Factory and Multi Run Evaluation.
During our experiment we ran the cell below with different convigurations of the train data generator and recorded results in the cell below. 

In [24]:

train_datagen = ImageDataGenerator(
                                    #rotation_range=30,
                                    #zoom_range = 0.20,
                                    width_shift_range=0.1,
                                    height_shift_range=0.1,
                                    # brightness_range= [0.8, 1.2],
                                    #horizontal_flip=True,
                                   fill_mode="nearest")
train_generator = train_datagen.flow(X_train,y_train_prob)

cnn_builder = CNN_Builder(convolutional_layers=[16],
                            fully_connected_layers=[100],
                            in_shape=(X_train.shape[1],X_train.shape[2],X_train.shape[3]),
                            out_shape=y_train.shape[1])


evaluator = Multi_Run_Evaluation(cnn_builder.build_model)

evaluator.evaluate( nr_runs=10, 
                    epochs=100, 
                    early_stopping_patience=5, 
                    train_generator= train_generator, 
                    X_train=X_train, 
                    y_train=y_train, 
                    X_test=X_test,
                    y_test=y_test)


evaluator.print_metrics()


 running experiment 1 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100

KeyboardInterrupt: 

### Results

Running our Experiment yielded the following results:

| Configuration          | Min  | Max  | Mean  | St. Dev |
| ---------------------- | ---- | ---- | ----- | ------- |
| No Configuration       | 0.17 | 0.21 | 0.195 | 0.012   |
| Mirroring              | 0.17 | 0.22 | 0.191 | 0.015   |
| Rotation               | 0.19 | 0.24 | 0.214 | 0.014   |
| Shift                  | 0.21 | 0.25 | 0.231 | 0.015   |
| Brightness             | 0.08 | 0.13 | 0.111 | 0.019   |
| Zoom                   | 0.17 | 0.22 | 0.198 | 0.018   |

### Interpretation

Our results indicate that both rotation and shift seen to increase the hamming score and changing brightness dramatically decrased the score. Mirroring and zoom did not result in significant score changes.

Based on these results we identified the following follow-up questions:
* How do the results change if we activate all configurations except brightness?
* How do the results change if we activate only rotation and shift?
* Is the score increase from shift consistent? 

To answer these question the following additional data was recorded:

| Configuration          | Min  | Max  | Mean  | St. Dev |
| ---------------------- | ---- | ---- | ----- | ------- |
| No Configuration       | 0.17 | 0.21 | 0.195 | 0.012   |
| All except brightness  | 0.19 | 0.25 | 0.214 | 0.017   |
| Rotation + Shift       | 0.18 | 0.24 | 0.218 | 0.022   |
| Shift (rerun)          | 0.18 | 0.24 | 0.222 | 0.018 |

From these results we can see that activating all configurations except brightness yields the same quality as activating only brightness or only shift. The same holds for only activating rotation and shift. The effects of these preprocessing techniques do not seem to confound each other - or at least this is not visible in the relatively simple model we applied.

Rerunning sift also yielded a slighty lower result than before. For the future we decided to use rotation and shift for data preprocessing. 

As a side-node: our rerun of the shift preprocessing also shows that we should be careful even in interpreting the averaged results over ten runs. We will take this into account when doing bigger experiments by using more runs.


## Experiment 3: Increasing complexity
In this experiment we test whether we can improve our scores by increasing the network complexity. 

### Experiment Setup
For our experiment we use the configuration of the image data generator that performed best in Experiment 2 and test different complexity increases. We test:
- Adding convolutional layers
- increasing the number of convolutional patterns
- increasing the number of fully connected neurons
- adding more fully connected neurons. 

Due to technical difficulties, we had to restart the notebook environment for this test. This means, data recorded here is done with a different trainin / test split than the other experiments. The experiment itself has been executed entirely with the same training test split.

As in the previoius experiment, we use the CNN Factory and Multi Run Evaluation.
During our experiment we ran the cell below with different convigurations of the train data generator and recorded results in the cell below. 

In [12]:

train_datagen = ImageDataGenerator(
                                    rotation_range=30,
                                    #zoom_range = 0.20,
                                    width_shift_range=0.1,
                                    height_shift_range=0.1,
                                    # brightness_range= [0.8, 1.2],
                                    #horizontal_flip=True,
                                   fill_mode="nearest")
train_generator = train_datagen.flow(X_train,y_train_prob)

cnn_builder = CNN_Builder(convolutional_layers=[64, 32,16],
                            fully_connected_layers=[100],
                            in_shape=(X_train.shape[1],X_train.shape[2],X_train.shape[3]),
                            out_shape=y_train.shape[1])


evaluator = Multi_Run_Evaluation(cnn_builder.build_model)

evaluator.evaluate( nr_runs=10, 
                    epochs=100, 
                    early_stopping_patience=5, 
                    train_generator= train_generator, 
                    X_train=X_train, 
                    y_train=y_train, 
                    X_test=X_test,
                    y_test=y_test)


evaluator.print_metrics()


 running experiment 1 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100

 running experiment 2 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100

 running experiment 3 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100

 running experiment 4 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100

 running experiment 5 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100

 running experiment 6 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100

 running experiment 7 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100

 running experiment 8 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100

 running experiment 9 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100

 running experiment 10 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
score	:	 minimum 	 maximum 	 mean 	 std. dev.
acc_train:	 1 	 9 	 5.9 	 2.183269719175042
hamming_train:	 0.11 	 0.32 	 0.233 	 0.06617317348358691
acc_test:	 0 	 6 	 3.1 	 1.9692073983655907
hamming_test:	 0.1 	 0.22 	 0.173 	 0.03683295625749672


### Results


Running our Experiment yielded the following results:

| Test | Convolutional Layers | Fully Connected Layers | Min  | Max  | Mean  | St. Dev |
| ---- | -------------------- | ---------------------- | ---- | ----- | ---- | ------- |
| 1    | [16]                 | [100]                  | 0.20 | 0.23 | 0.217 | 0.012   |
| 2    | [32]                 | [100]                  | 0.16 | 0.22 | 0.201 | 0.021   |
| 3    | [32,16]              | [100]                  | 0.20 | 0.23 | 0.215 | 0.013   |
| 4    | [16]                 | [1000]                 | 0.15 | 0.23 | 0.204 | 0.023   |
| 5    | [16]                 | [100,100]              | 0.15 | 0.24 | 0.213 | 0.025   |
| 6    | [32,16]              | [100,100]              | 0.15 | 0.24 | 0.19  | 0.031   |
| 7    | [32,16]              | [20,20]                | 0.09 | 0.22 | 0.175 | 0.038   |
| 8    | [64,32,16]           | [100,100]              | 0.11 | 0.15 | 0.125 | 0.014   |
| 9    | [64,32,16]           | [100]                  | 0.10 | 0.23 | 0.173 | 0.03    |


### Interpretation

The numbers indicate that increasing the complexity did not improve results and in some cases significantly worsened the results. The best configurations we found were configurations 1, 3 and 5.
We will use these as starting points for future fine-tuning.



# Experiment 4: Network configuration in complex networks

In this experiment we rerun experiment 1 for a more complex network, to see if this changes the results and indicates that we should use any of the configurations.

## Experiment Setup: 

For experimentation we used configuration 6 from the previous experiment. While this configuration yielded worse scores than others, it is a combination of the two promising configurations 3 and 5. We did not use either of these two directly as we deem them to simple for this experiment. 


As in the previoius experiment, we use the CNN Factory and Multi Run Evaluation.
During our experiment we ran the cell below with different convigurations of the train data generator and recorded results in the cell below. 

In [19]:

train_datagen = ImageDataGenerator(
                                    rotation_range=30,
                                    #zoom_range = 0.20,
                                    width_shift_range=0.1,
                                    height_shift_range=0.1,
                                    # brightness_range= [0.8, 1.2],
                                    #horizontal_flip=True,
                                   fill_mode="nearest")
train_generator = train_datagen.flow(X_train,y_train_prob)

cnn_builder = CNN_Builder(convolutional_layers=[32,16],
                            fully_connected_layers=[100,100],
                            in_shape=(X_train.shape[1],X_train.shape[2],X_train.shape[3]),
                            out_shape=y_train.shape[1])
cnn_builder.apply_regularization = False
cnn_builder.apply_dropout = False
cnn_builder.apply_batch_normalization = False
cnn_builder.weight_constraints = False


evaluator = Multi_Run_Evaluation(cnn_builder.build_model)

evaluator.evaluate( nr_runs=10, 
                    epochs=100, 
                    early_stopping_patience=5, 
                    train_generator= train_generator, 
                    X_train=X_train, 
                    y_train=y_train, 
                    X_test=X_test,
                    y_test=y_test)


evaluator.print_metrics()


 running experiment 1 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100

 running experiment 2 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100

 running experiment 3 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100

 running experiment 4 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100

 running experiment 5 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100

 running experiment 6 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100

 running experiment 7 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100

 running experiment 8 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100

 running experiment 9 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100

 running experiment 10 of 10


  history = model.fit_generator(train_generator, epochs=epochs, validation_data=(X_test, y_test),callbacks = [early_stopping])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
score	:	 minimum 	 maximum 	 mean 	 std. dev.
acc_train:	 5 	 10 	 7.2 	 1.4757295747452437
hamming_train:	 0.14 	 0.33 	 0.255 	 0.05854722690083432
acc_test:	 2 	 7 	 4.6 	 1.837873166945363
hamming_test:	 0.12 	 0.24 	 0.194 	 0.03627058802329451


### Results

Running our Experiment yielded the following results:

| Configuration          | Min  | Max  | Mean  | St. Dev |
| ---------------------- | ---- | ---- | ----- | ------- |
| No Configuration (again!)      | 0.12 | 0.24 | 0.194 | 0.032   | 
| Regularization         | 0.13 | 0.19 | 0.175 | 0.018   |
| Dropout                | 0.1  | 0.23 | 0.177 | 0.047   |
| Batch Normalization    | 0.07 | 0.16 | 0.117 | 0.026   |
| Weight Constraints     | 0.16 | 0.21 | 0.184 | 0.016   |


### Interpretation.

The results achieved mirror those of Experiment 1. The configurations we tested seem to not make a big difference in changing the hamming score. Batch normalization in particular worsens the score significantly.

In the future, we will use regularization, dropout and weight constraints. Although we will do so more out of recognition of good practices as this experiment did not give cause to assume that they will improve matters drastically.