# Final Test-set Evaluation

We found our best model to be the Gradient Boosting model. It had a a f1-accuracy score of 0.99 on the training set. The classification report showed that the model was 98% correct when predicting class 1 of the target variable, which was normal classification of fetal health, and 99% correct when predicting class 2, which was at-risk classification of fetal health. The confusion matrix also showed this, with 1319 true positives for class 1, 356 true positives for class 2, and a total of 25 incorrect classifications. 

The best hyperparameters we found for this model was a learning rate of 0.1, a max depth of 3, and a number of estimators of 200. 

In [None]:
# Importing libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score, cross_validate, GridSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.metrics import (accuracy_score, classification_report, confusion_matrix,
                             precision_score, recall_score, f1_score)

# Setting random seed for reproducibility
np.random.seed(42)

# Loading the data 
df_train = pd.read_csv("../Data/train_set.csv")

In [None]:
# Separating features and target variable
X_train = df_train.drop(columns=['fetal_health'])
y_train = df_train['fetal_health']

# Encoding the outcome variable
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
classes = label_encoder.classes_

# Verifying the encoding
print(f"\nEncoded classes: {classes}")
print(f"Encoded labels: {np.unique(y_train_encoded)}")


Encoded classes: [1. 2.]
Encoded labels: [0 1]


In [10]:
# Gradient boosting pipeline, with optimized hyperparameters included 
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('gb', GradientBoostingClassifier(learning_rate=0.1,
    max_depth=3,n_estimators=200,random_state=42))
])

# Fitting and predicting on the training set
train_model = pipe.fit(X_train, y_train_encoded)
y_train_pred = train_model.predict(X_train)

# Calculating accuracy score 
print(f"Training set accuracy: {accuracy_score(y_train_encoded, y_train_pred).__round__(4)}")
print(f"Training set precision: {precision_score(y_train_encoded, y_train_pred, average='weighted').__round__(4)}")
print(f"Training set recall: {recall_score(y_train_encoded, y_train_pred, average='weighted').__round__(4)}")
print(f"Training set F1 score: {f1_score(y_train_encoded, y_train_pred, average='weighted').__round__(4)}")

Training set accuracy: 0.9853
Training set precision: 0.9854
Training set recall: 0.9853
Training set F1 score: 0.9852


In [11]:
# Confusion matrix
train_conf_matrix = confusion_matrix(y_train_encoded, y_train_pred)
print(f"Training set confusion matrix:\n{train_conf_matrix}")

Training set confusion matrix:
[[1319    3]
 [  22  356]]


Now that we have fit our best model to the entire training set, we can evaluate its performance on the test set. 

In [None]:
# Loading the data 
df_test = pd.read_csv("../Data/test_set.csv")

# Separating features and target variable
X_test = df_test.drop(columns=['fetal_health'])
y_test = df_test['fetal_health']

# Encoding the outcome variable
label_encoder = LabelEncoder()
y_test_encoded = label_encoder.fit_transform(y_test)
classes = label_encoder.classes_

# Verifying the encoding
print(f"\nEncoded classes: {classes}")
print(f"Encoded labels: {np.unique(y_test_encoded)}")


Encoded classes: [1. 2.]
Encoded labels: [0 1]


In [13]:
# Using Gradient Boosting pipeline from above 
# Fitting and predicting on the test set
test_model = pipe.fit(X_test, y_test_encoded)
y_test_pred = test_model.predict(X_test)

# Calculating accuracy score 
print(f"Test set accuracy: {accuracy_score(y_test_encoded, y_test_pred).__round__(4)}")
print(f"Test set precision: {precision_score(y_test_encoded, y_test_pred, average='weighted').__round__(4)}")
print(f"Test set recall: {recall_score(y_test_encoded, y_test_pred, average='weighted').__round__(4)}")
print(f"Test set F1 score: {f1_score(y_test_encoded, y_test_pred, average='weighted').__round__(4)}")

Test set accuracy: 0.9953
Test set precision: 0.9953
Test set recall: 0.9953
Test set F1 score: 0.9953


In [14]:
# Confusion matrix
test_conf_matrix = confusion_matrix(y_test_encoded, y_test_pred)
print(f"Test set confusion matrix:\n{test_conf_matrix}")

Test set confusion matrix:
[[332   1]
 [  1  92]]
