## Task 3: Testing on Holdout Set

In this task, we will create a **holdout set** from the **test dataset** to simulate data shifts and performance degradation. The **holdout set** will be different from the **validation** and **testing sets** and will help us evaluate how well the model generalizes to unseen data with simulated shifts.



In [22]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import load_model
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score, roc_auc_score

### Steps to Create and Evaluate the Holdout Set

- Split the Test Set into Holdout and Validation Set

First, we split the **test set** into two parts: **holdout set** and **validation set**. This ensures that the **holdout set** is separate from the testing and validation data used during model training.


In [9]:
# Load the test data
x_test = pd.read_csv("../data/processed/x_test.csv", header=None)
y_test = pd.read_csv("../data/processed/y_test.csv", header=None)

# Check the shape of the loaded data
print(f"x_test shape: {x_test.shape}")
print(f"y_test shape: {y_test.shape}")

# Split the test set into holdout and validation sets (50-50 split)
x_val, x_holdout, y_val, y_holdout = train_test_split(x_test, y_test, test_size=0.5, random_state=42)

print(f"Validation set size: {x_val.shape[0]}")
print(f"Holdout set size: {x_holdout.shape[0]}")

x_test shape: (21892, 187)
y_test shape: (21892, 1)
Validation set size: 10946
Holdout set size: 10946


- Apply Scaling to the Holdout Set

Next, we will apply Min-Max scaling to the holdout set to simulate data shifts.

In [11]:
# Initialize the Min-Max scaler
scaler = MinMaxScaler(feature_range=(0, 1))
x_holdout = scaler.fit_transform(x_holdout)
# Check the scaling has been applied correctly
print(f"Scaled holdout set shape: {x_holdout.shape}")

Scaled holdout set shape: (10946, 187)


- Load the Best Trained Model

In [21]:
# Load the best model
model = load_model("../models/baseline_cnn_mitbih_final_model.h5")
model.summary()



- Evaluate the Model on the Holdout Set

In [23]:
# Predict on the scaled holdout set
pred_holdout = model.predict(x_holdout)
pred_holdout_class = np.argmax(pred_holdout, axis=-1)


[1m343/343[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step


In [24]:
precision_holdout = precision_score(y_holdout, pred_holdout_class, average='macro')
recall_holdout = recall_score(y_holdout, pred_holdout_class, average='macro')
f1_holdout = f1_score(y_holdout, pred_holdout_class, average='macro')
accuracy_holdout = accuracy_score(y_holdout, pred_holdout_class)
roc_auc_holdout = roc_auc_score(y_holdout, pred_holdout, multi_class='ovr', average='macro')

# Print evaluation metrics for the holdout set
print(f"Holdout Test Precision: {precision_holdout:.4f}")
print(f"Holdout Test Recall: {recall_holdout:.4f}")
print(f"Holdout Test F1 Score: {f1_holdout:.4f}")
print(f"Holdout Test Accuracy: {accuracy_holdout:.4f}")
print(f"Holdout Test AUC-ROC: {roc_auc_holdout:.4f}")

Holdout Test Precision: 0.7354
Holdout Test Recall: 0.9176
Holdout Test F1 Score: 0.7999
Holdout Test Accuracy: 0.9461
Holdout Test AUC-ROC: 0.9907
