<a href="https://colab.research.google.com/github/sheldonkemper/portfolio/blob/main/CAM_DS_C201_Activity_3_2_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Activity 3.2.3 Experimenting with hyperparameter tuning

## Scenario
Hopkins et al. (1999) created the Spambase data set donated to the UCI Machine Learning Repository. The data set contains 4,601 emails marked as spam or non-spam by a postmaster or individuals. Fifty-seven features aid in classifying emails as spam (e.g. word frequencies and email characteristics). The Spambase data set is used for developing and benchmarking spam detection models, providing a base for analysing the effectiveness of various machine learning techniques in distinguishing between spam and legitimate emails.

As a data professional, you were tasked by your company to develop a neural network with TensorFlow that can classify emails as spam or non-spam. You were tasked to develop a model based on the Spambase data set.



## Objective
In this portfolio activity, you’ll continue to work with the model you created in Activity 3.1.5 by applying model tuning and grid search to classify emails as spam or non-spam.

You will complete the activity in your Notebook, where you’ll:
- add an extra four layers to the model you created previously
- create a new model pipeline
- employ different batch sizes and epochs to evaluate the impact on the accuracy
- present your insights based on the performance of the model.


## Assessment criteria
By completing this activity, you will be able to provide evidence that you can critically select appropriate strategies to demonstrate expertise in model tuning techniques.


## Activity guidance
1. Continue to work on the model you created in **Activity 3.1.5**.
2. Add 4 hidden layers with the ReLU activation and 16 neurons for the fourth layer.
3. Compile the model with `binary_crossentropy` as loss, Adam optimiser, and print the accuracy of the model.
4. Train and evaluate the model again.
5. Jot down whether the final evaluation changed? Was there any improvement in the model? If not, train and evaluate again. Does the final evaluation change? Does it improve?
6. Create a vector of different `batch_sizes=np.array([16, 32, 64])` and `loop` through it, retraining the model each time, and print the performances. Use the same model and number of epochs.
7. Create a vector of different `epochs=np.array([10, 20, 30])` and `loop` through it, retraining the model each time and using the batch size. Jot down which model gave you the highest accuracy.

> Start your activity here. Select the pen from the toolbar to add your entry.

In [3]:
import keras
from keras import layers
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

In [4]:
# Start your activity here:

# URL to import data set from GitHub.
url = 'https://raw.githubusercontent.com/fourthrevlxd/cam_dsb/main/spamdata.csv'

In [5]:
data = pd.read_csv(url, header = None)
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [6]:
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

In [7]:
# Split into train and test sets (80% train, 20% test)
X_train_full, X_test, y_train_full, y_test = train_test_split(X ,y, test_size=0.2, random_state = 42)
# Further split the training set into train and validation (90% train, 10% validation)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, test_size=0.1, random_state = 42)

In [8]:
scaler = StandardScaler()
# Fit the scaler on the training data and transform it
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

In [9]:
model =  tf.keras.Sequential()
number_neurons_1 = 64
number_neurons_2 = 32

#hiden layer.
model.add(tf.keras.layers.Dense(number_neurons_1,activation = 'relu', input_shape = (X_train.shape[1],)))
model.add(tf.keras.layers.Dense(number_neurons_2,activation = 'relu'))

# Adding the new hidden layers
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(16, activation='relu'))

#output layer
model.add(tf.keras.layers.Dense(1, activation = 'sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [10]:
# Compile the model using TensorFlow's alias
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

In [11]:
history = model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_valid,y_valid))

Epoch 1/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.5665 - loss: 0.6802 - val_accuracy: 0.8995 - val_loss: 0.4260
Epoch 2/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8894 - loss: 0.3396 - val_accuracy: 0.9266 - val_loss: 0.2186
Epoch 3/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9261 - loss: 0.2012 - val_accuracy: 0.9348 - val_loss: 0.2020
Epoch 4/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9371 - loss: 0.1849 - val_accuracy: 0.9375 - val_loss: 0.1957
Epoch 5/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9508 - loss: 0.1560 - val_accuracy: 0.9375 - val_loss: 0.1919
Epoch 6/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9529 - loss: 0.1386 - val_accuracy: 0.9348 - val_loss: 0.1846
Epoch 7/10
[1m52/52[0m [32m━━━━━━━━━━

In [12]:
test_loss, test_accuracy = model.evaluate(X_test, y_test)
# Print the loss and accuracy
print(f'Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}')

[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9364 - loss: 0.1919 
Test Loss: 0.1658, Test Accuracy: 0.9435


### **My Evaluation of the Model's Performance**

After training the model with additional hidden layers, I wanted to see if the increased complexity would improve the accuracy of my spam detection model. Here’s a breakdown of how things went.

#### **Previous Model Performance**:
- **Test Accuracy**: 94.68%
- **Test Loss**: 0.1542
- **Training Accuracy**: 93.79%
- **Training Loss**: 0.1612

#### **New Model (with 4 additional layers) Performance**:
- **Test Accuracy**: 94.35%
- **Test Loss**: 0.1658
- **Training Accuracy**: 93.64%
- **Training Loss**: 0.1919

### **What I Observed**:

1. **Slight Drop in Accuracy**:  
   The first thing I noticed was that the test accuracy dropped a little after adding the 4 extra layers—from **94.68%** to **94.35%**. While the difference isn't huge, it shows that the extra layers didn’t provide the expected boost in performance. In fact, the simpler model did a better job of predicting spam emails.

2. **Increase in Loss**:  
   The test loss also increased slightly from **0.1542** to **0.1658**. This is another sign that the additional layers didn’t help the model generalise better. Ideally, I want the loss to decrease, meaning the model is making fewer errors, but in this case, adding complexity introduced more errors.

3. **Training Performance**:  
   Interestingly, the training accuracy also dropped, and the training loss increased. The **training accuracy** went down slightly to **93.64%**, and the **training loss** rose to **0.1919**. This suggests that the model struggled more during training with the added complexity, which might indicate the model was overfitting.

### **My Take on the Results**:

Adding the extra layers didn’t improve the model’s performance—it slightly worsened it. The original, simpler model did a better job of both generalising to unseen data and performing well on the training set.

I expected that adding more layers would allow the model to learn more complex patterns in the data, but in this case, it seems that the additional layers added unnecessary complexity. This might have led to overfitting, where the model was learning the training data too well but couldn’t generalise as effectively to new data.

In [13]:
import numpy as np

batch_sizes = np.array([16, 32, 64])

# Loop through different batch sizes
for batch_size in batch_sizes:
    print(f"\nTraining with batch size: {batch_size}")
    model.fit(X_train, y_train, epochs=10, batch_size=batch_size, validation_data=(X_valid, y_valid))
    test_loss, test_accuracy = model.evaluate(X_test, y_test)
    print(f'Batch Size: {batch_size}, Test Accuracy: {test_accuracy:.4f}, Test Loss: {test_loss:.4f}')



Training with batch size: 16
Epoch 1/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9629 - loss: 0.1215 - val_accuracy: 0.9348 - val_loss: 0.1890
Epoch 2/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9559 - loss: 0.1186 - val_accuracy: 0.9402 - val_loss: 0.2007
Epoch 3/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9662 - loss: 0.1030 - val_accuracy: 0.9348 - val_loss: 0.2290
Epoch 4/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9681 - loss: 0.0916 - val_accuracy: 0.9293 - val_loss: 0.2307
Epoch 5/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9681 - loss: 0.0928 - val_accuracy: 0.9402 - val_loss: 0.2312
Epoch 6/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9743 - loss: 0.0673 - val_accuracy: 0.9375 - val_loss: 0.239

### **Summary of Results**:
- **Batch Size 16**: Test Accuracy = 93.49%, Test Loss = 0.2407
- **Batch Size 32**: Test Accuracy = 94.35%, Test Loss = 0.2769
- **Batch Size 64**: Test Accuracy = 95.01%, Test Loss = 0.3108

### **Final Analysis**:
1. **Best Accuracy**: The model performed best in terms of accuracy with a batch size of **64** (**95.01% test accuracy**), but the higher test loss indicates some trade-offs between accuracy and generalisation. This suggests that the model might have focused on fitting the data well but still made some errors.

2. **Lower Test Loss**: The batch size of **16** gave the **lowest test loss** of **0.2407**, meaning the model made fewer mistakes but did not achieve the highest accuracy.

3. **Overfitting**: Across all batch sizes, there was some level of overfitting. The training accuracy was consistently higher than both validation and test accuracy, indicating that the model could be memorising the training data.

### **Recommendations**:
- **Batch Size 64** offers the best accuracy, but if loss minimisation is more important, then **Batch Size 16** may be a better choice.
- I could try **adding regularisation** (such as dropout) to mitigate overfitting.
- It might also be worth experimenting with more epochs or slightly adjusting the learning rate to further reduce overfitting and improve generalisation.

Would you like me to explore hyperparameter tuning or anything else further?

In [14]:
epochs = np.array([10, 20, 30])
best_batch_size = 64  # Use the batch size that worked best previously

# Loop through different epochs
for epoch in epochs:
    print(f"\nTraining with epochs: {epoch}")
    model.fit(X_train, y_train, epochs=epoch, batch_size=best_batch_size, validation_data=(X_valid, y_valid))
    test_loss, test_accuracy = model.evaluate(X_test, y_test)
    print(f'Epochs: {epoch}, Test Accuracy: {test_accuracy:.4f}, Test Loss: {test_loss:.4f}')



Training with epochs: 10
Epoch 1/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9917 - loss: 0.0281 - val_accuracy: 0.9321 - val_loss: 0.4125
Epoch 2/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9906 - loss: 0.0252 - val_accuracy: 0.9266 - val_loss: 0.4338
Epoch 3/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9917 - loss: 0.0259 - val_accuracy: 0.9266 - val_loss: 0.4351
Epoch 4/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9908 - loss: 0.0240 - val_accuracy: 0.9293 - val_loss: 0.4277
Epoch 5/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9916 - loss: 0.0244 - val_accuracy: 0.9239 - val_loss: 0.4575
Epoch 6/10
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9911 - loss: 0.0243 - val_accuracy: 0.9293 - val_loss: 0.4371
Epoch 7/10
[1

### **Analysis by Epoch**:

1. **10 Epochs**:
   - **Strength**: Reasonably good accuracy with minimal overfitting.
   - **Weakness**: Lower test accuracy compared to models with more epochs, though this could be acceptable if avoiding overfitting is a priority.

2. **20 Epochs**:
   - **Strength**: Slightly improved test accuracy (94.79%) but with increasing overfitting, as seen in rising validation loss.
   - **Weakness**: Test loss was also higher, showing that the model's error on unseen data was increasing.

3. **30 Epochs**:
   - **Strength**: Best test accuracy (95.01%) but clear signs of overfitting.
   - **Weakness**: Highest validation loss across epochs, and although the accuracy improved, the model made more mistakes when dealing with unseen data, as reflected in the higher loss.

---

### **Conclusion**:

- **Optimal Number of Epochs**: Based on this experiment, **30 epochs** gave the best test accuracy (**95.01%**), but at the cost of significantly higher validation loss, which indicates overfitting.
  
- **Best Trade-off**: **10 epochs** might offer the best balance between avoiding overfitting and maintaining decent performance, given the reasonable validation loss and accuracy.

- **Next Steps**:
   - If I wanted to further optimise the model, I could try **regularisation techniques** (like dropout) or **early stopping** to prevent the model from overfitting with more epochs.
   - Another approach would be to adjust the **learning rate** to see if slower learning could lead to better generalisation over more epochs.

Would you like me to explore those ideas further or try something else?

# Reflect

Write a brief paragraph highlighting your process and the rationale to showcase critical thinking and problem-solving.

Throughout this process, I aimed to identify the optimal combination of **batch size** and **epochs** for my spam classification model using TensorFlow. I began by experimenting with different **batch sizes** (16, 32, 64), evaluating how larger batches affected both model accuracy and overfitting. Then, I experimented with different numbers of **epochs** (10, 20, 30), observing how prolonged training impacted the model's ability to generalise to unseen data. By analysing test accuracy and loss, I was able to balance performance and overfitting, critically identifying that while more epochs improved accuracy, they also increased overfitting, especially with larger batch sizes. This iterative approach allowed me to not only improve accuracy but also develop an understanding of the trade-offs involved in hyperparameter tuning. Ultimately, I concluded that **batch size of 64** with **30 epochs** gave the best accuracy, while a more conservative combination of **10 epochs** with the same batch size offered a better balance between performance and overfitting.

# References

Hopkins, M., Reeber, E., Forman, G., Suermondt, J., 1999. Spambase. [online]. Available at: https://archive.ics.uci.edu/dataset/94. [Accessed 5 March 2024].