<a href="https://colab.research.google.com/github/ponlakshmi-ds/neural-network-spam-detector/blob/main/Copy_of_CAM_DS_C201_Activity_3_3_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<br>

**First things first** - please go to 'File' and select 'Save a copy in Drive' so that you have your own version of this activity set up and ready to use.
Remember to update the portfolio index link to your own work once completed!

# Activity 3.3.6 Exploring evaluation metrics

## Scenario
Hopkins et al. (1999) created the Spambase data set donated to the UCI Machine Learning Repository. The data set contains 4,601 emails marked as spam or non-spam by a postmaster or individuals. Fifty-seven features aid in classifying emails as spam (e.g. word frequencies and email characteristics). The Spambase data set is used for developing and benchmarking spam detection models, providing a base for analysing the effectiveness of various machine learning techniques in distinguishing between spam and legitimate emails.

As a data professional, you were tasked by your company to develop a neural network with TensorFlow that can classify emails as spam or non-spam. You were tasked to develop a model based on the Spambase data set.



## Objective
In this portfolio activity, you’ll continue to work with the model you created in Activity 3.2.3: Experimenting with hyperparameter tuning by applying evaluation metrics and a pre-trained model to classify emails as spam or non-spam.

You will complete the activity in your Notebook, where you’ll:
- choose the best model based on model performance
- make predictions based on the chosen model
- convert probabilities to binary predictions and view accuracy, F1 score, and recall
- present your insights based on the model's performance.


## Assessment criteria
By completing this activity, you will be able to provide evidence that you can critically select appropriate strategies to demonstrate expertise in model-tuning techniques.


## Activity guidance
1. Continue to work on the model you created in **Activity 3.2.3**.
2. Select the best model you obtained through hyperparameter tuning. Substantiate your choice.
3. Run the chosen model again and save it in an `h5` file named `best_model.h5`. Remember to specify the path.
4. Check further metrics for the model with the predict function applied to your model variable in order to create predictions on the `X_test` data set.
5. Convert probabilities to binary predictions and print the accuracy, F1 score, and recall. You can use the following code:
 - predictions: `y_pred = (y_pred > 0.5).astype(int)`
 - confusion matrix metrics: `accuracy_score`, `precision_score`, `recall_score` and `f1_score` functions.

> Start your activity here. Select the pen from the toolbar to add your entry.

In [1]:
# Start your activity here:
# Import necessary libraries.
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import keras
from keras import layers

# URL to import data set from GitHub.
url = 'https://raw.githubusercontent.com/fourthrevlxd/cam_dsb/main/spamdata.csv'

In [2]:
# View the dataframe.
data = pd.read_csv(url, header=None)
data.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1
5,0.0,0.0,0.0,0.0,1.85,0.0,0.0,1.85,0.0,0.0,...,0.0,0.223,0.0,0.0,0.0,0.0,3.0,15,54,1
6,0.0,0.0,0.0,0.0,1.92,0.0,0.0,0.0,0.0,0.64,...,0.0,0.054,0.0,0.164,0.054,0.0,1.671,4,112,1
7,0.0,0.0,0.0,0.0,1.88,0.0,0.0,1.88,0.0,0.0,...,0.0,0.206,0.0,0.0,0.0,0.0,2.45,11,49,1
8,0.15,0.0,0.46,0.0,0.61,0.0,0.3,0.0,0.92,0.76,...,0.0,0.271,0.0,0.181,0.203,0.022,9.744,445,1257,1
9,0.06,0.12,0.77,0.0,0.19,0.32,0.38,0.0,0.06,0.0,...,0.04,0.03,0.0,0.244,0.081,0.0,1.729,43,749,1


In [3]:
# Specify input features and target variable.
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

In [4]:
# Split the data into train and test set(20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify = y)

# Split the training data to create validation data(0.1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42, stratify = y_train)

In [5]:
# Standardize the features.
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

In [6]:
# Creating the sequential model

model = keras.Sequential()
model.add(keras.Input(shape=(57,)))
model.add(layers.Dense(64, activation="relu")) # hidden layer 1
model.add(layers.Dense(32, activation="relu")) # hidden layer 2
model.add(layers.Dense(28, activation="relu")) # Additional layer 1
model.add(layers.Dense(24, activation="relu")) # additional layer 2
model.add(layers.Dense(20, activation="relu")) # additional layer 3
model.add(layers.Dense(16, activation="relu")) # additional layer 4
model.add(layers.Dense(1, activation="sigmoid"))

# Compile the model
model.compile(optimizer=keras.optimizers.Adam(),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Retrain the model with best config from the previous activity
model.fit(X_train_scaled, y_train,
          batch_size=64,
          epochs=20,
          validation_data = (X_val_scaled, y_val))

Epoch 1/20
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 11ms/step - accuracy: 0.6790 - loss: 0.5999 - val_accuracy: 0.9239 - val_loss: 0.2479
Epoch 2/20
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9190 - loss: 0.2390 - val_accuracy: 0.9239 - val_loss: 0.2215
Epoch 3/20
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9332 - loss: 0.1980 - val_accuracy: 0.9484 - val_loss: 0.1688
Epoch 4/20
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9442 - loss: 0.1645 - val_accuracy: 0.9484 - val_loss: 0.1408
Epoch 5/20
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9529 - loss: 0.1337 - val_accuracy: 0.9429 - val_loss: 0.1626
Epoch 6/20
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9540 - loss: 0.1276 - val_accuracy: 0.9484 - val_loss: 0.1357
Epoch 7/20
[1m52/52[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x7ee8c10990d0>

In [7]:
# Evaluate the model
results = model.evaluate(X_test_scaled, y_test, verbose=1)
print("Test loss, Test accuracy:", results)

[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9368 - loss: 0.2527
Test loss, Test accuracy: [0.25079187750816345, 0.9359391927719116]


Based on the hyperparameter tuning results, the model trained with a batch size of 64 and 20 epochs achieved the best performance (accuracy ≈ 94%, loss ≈ 0.25).
This configuration provided the optimal balance between learning and generalization, making it the most suitable model for further evaluation.

In [8]:
# Saving the model
model.save('/content/best_model.h5')



In [9]:
# Generate predictions.
y_pred_prob = model.predict(X_test_scaled)

[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


In [10]:
# Convert probabilities to binay values
y_pred = (y_pred_prob > 0.5).astype(int)

In [11]:
# Compute evaluation metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Accuracy, Precision, Recall, F1
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))

# Confusion Matrix
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Accuracy: 0.9359391965255157
Precision: 0.9293785310734464
Recall: 0.90633608815427
F1 Score: 0.9177126917712691

Confusion Matrix:
 [[533  25]
 [ 34 329]]


# Reflect

Write a brief paragraph highlighting your process and the rationale to showcase critical thinking and problem-solving.

> The model achieved an accuracy of approximately 93.6%, precision of 92.9%, recall of 90.6%, and an F1 score of 91.8%.
These results demonstrate that the model performs reliably in distinguishing between spam and non-spam emails, with a strong balance between correctly identifying spam (high recall) and minimizing false alarms (high precision).
The confusion matrix shows relatively few misclassifications, suggesting the model generalizes well to unseen data.
Minor improvements could focus on enhancing recall to further reduce false negatives, for example, by adjusting the classification threshold or exploring more complex model architectures.

# References

Hopkins, M., Reeber, E., Forman, G., Suermondt, J., 1999. Spambase. [online]. Available at: https://archive.ics.uci.edu/dataset/94. [Accessed 5 March 2024].