# Bonus Question
Fit a neural network to the Default data. Use a single hidden layer with 10 units, and dropout regularization. Have a look at Labs 10.9.1–10.9.2 for guidance. Compare the classification performance of your model with that of linear logistic regression.

## Train-Test Split and Scale

In [29]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

df = pd.read_csv("Default.csv", index_col=0)

df['default'] = (df['default'] == 'Yes').astype(int)
df['student'] = (df['student'] == 'Yes').astype(int)

X = df[['student', 'balance', 'income']]
y = df['default']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Training Neural Network

In [30]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

tf.random.set_seed(42)

model = Sequential([
    Dense(10, input_shape=(X_train_scaled.shape[1],), activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history = model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, verbose=0, validation_split=0.2)

## Model Evaluation - Neural Network

In [31]:
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

y_pred_prob_nn = model.predict(X_test_scaled).flatten()
y_pred_nn = (y_pred_prob_nn > 0.5).astype(int)

print("Neural Network Performance")
print(confusion_matrix(y_test, y_pred_nn))
print(classification_report(y_test, y_pred_nn))
print(f"AUC: {roc_auc_score(y_test, y_pred_prob_nn):.3f}")

Neural Network Performance
[[1928    5]
 [  51   16]]
              precision    recall  f1-score   support

           0       0.97      1.00      0.99      1933
           1       0.76      0.24      0.36        67

    accuracy                           0.97      2000
   macro avg       0.87      0.62      0.67      2000
weighted avg       0.97      0.97      0.96      2000

AUC: 0.948


The neural network model shows a high overall classification accuracy of 97% and an excellent AUC of 0.948, indicating strong discriminatory power between the two classes. However, the performance metrics reveal a significant class imbalance problem. The model achieves near-perfect precision and recall for the majority class (non-default, class 0), but it struggles to correctly identify the minority class (default, class 1). Specifically, it only captures 24% of actual defaults (recall), which is concerning in applications like credit risk modeling where identifying defaulters is critical. The F1-score for class 1 is just 0.36, reflecting a poor balance between precision and recall for that class. This suggests that the model is biased toward predicting the majority class and is failing to capture the rare but important instances of default.

## Logistic Regression

In [32]:
from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X_train_scaled, y_train)

y_pred_prob_lr = log_reg.predict_proba(X_test_scaled)[:, 1]
y_pred_lr = (y_pred_prob_lr > 0.5).astype(int)

print("Logistic Regression Performance")
print(confusion_matrix(y_test, y_pred_lr))
print(classification_report(y_test, y_pred_lr))
print(f"AUC: {roc_auc_score(y_test, y_pred_prob_lr):.3f}")

Logistic Regression Performance
[[1925    8]
 [  46   21]]
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      1933
           1       0.72      0.31      0.44        67

    accuracy                           0.97      2000
   macro avg       0.85      0.65      0.71      2000
weighted avg       0.97      0.97      0.97      2000

AUC: 0.948


The logistic regression model, like the neural network, achieves a high overall accuracy of 97% and an identical AUC of 0.948. This indicates that both models are equally effective at distinguishing between the default and non-default classes when evaluated using the area under the ROC curve. However, a closer look at the classification metrics again reveals the challenge posed by class imbalance. The logistic regression model performs very well on the majority class (non-default), with near-perfect precision and recall. For the minority class (default), the model achieves a recall of 31%, slightly better than the neural network's 24%, and a precision of 72%, which is comparable. Its F1-score for the minority class is 0.44, outperforming the neural network's 0.36 and indicating a better balance between precision and recall in identifying defaulters.

## Model Comparison

Both models achieve the same overall accuracy of 97% and an identical AUC of 0.948, indicating strong and comparable ability to distinguish between defaulters and non-defaulters. However, important differences emerge when examining performance on the minority class (defaults). The neural network correctly identifies only 24% of defaulters (recall), with a corresponding F1-score of 0.36, despite having high precision (76%). This indicates that the model is conservative in predicting defaults and fails to detect most of them. In contrast, the logistic regression model achieves a recall of 31% and an F1-score of 0.44 for the same class, outperforming the neural network in its ability to capture actual default cases. While both models perform well on the majority class (non-default), logistic regression demonstrates slightly better balance between precision and recall for the minority class, making it the preferable choice in this case, especially when the ability to identify defaults is more critical than overall accuracy. A possible explanation could be that the neural network’s complexity may have caused it to overfit to the dominant class.