# Credit Default Prediction — Business Case

**Candidate name:**  
**Date:**  
**AI tools used:** *(e.g. "GitHub Copilot for boilerplate plotting code", or "None")*

---

## Setup

In [None]:
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set seeds for reproducibility
RANDOM_SEED = 42
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.4f}'.format)
plt.rcParams['figure.figsize'] = (10, 5)
sns.set_theme(style='whitegrid')

In [None]:
# Load data
df = pd.read_csv('../data/default_of_credit_card_clients.csv', index_col='ID')

# Rename target for convenience
df = df.rename(columns={'default payment next month': 'default'})

print(f'Shape: {df.shape}')
print(f'\nTarget distribution:')
print(df['default'].value_counts(normalize=True).rename({0: 'No default', 1: 'Default'}))

In [None]:
# Quick sanity check
df.info()

In [None]:
df.describe()

---

## A1 — Exploratory Data Analysis

> Explore the population: class balance, feature distributions, correlations, and any data quality issues.
> Include visualisations you consider important for the business context.

In [None]:
# Your EDA code here

---

## A2 — Feature Engineering & Selection

> Select and/or engineer the features with the greatest predictive power.
> Justify your selection using appropriate techniques (e.g., IV/WoE, statistical tests, feature importance).
> Provide a per-variable report on your process.

In [None]:
# Your feature engineering & selection code here

---

## A3 — Model Training & Evaluation

> Train at least two different model families and find the best-performing model.
> Justify decisions using appropriate metrics (AUC-ROC, Gini, KS, precision-recall, F1).
> Discuss your validation strategy.

In [None]:
from sklearn.model_selection import train_test_split

TARGET = 'default'
FEATURES = [c for c in df.columns if c != TARGET]

X = df[FEATURES]
y = df[TARGET]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_SEED, stratify=y
)

print(f'Train: {X_train.shape}, Test: {X_test.shape}')

In [None]:
# Your model training & evaluation code here

---

## A4 — Model Interpretation & Business Recommendations

> Interpret model results from a business perspective.
> Explain which factors most influence default probability.
> Provide actionable recommendations and discuss integration into the credit approval workflow.

In [None]:
# Your interpretation & recommendation code here

---

## Bonus Exercises

Complete any or all of the following bonus sections. Each is self-contained.

### A5 — Model Fairness & Bias Analysis *(bonus)*

> Analyse whether your model exhibits disparate impact across demographic groups (e.g., gender, education level).
> Discuss how you would address fairness concerns in a production credit model.

In [None]:
# Your fairness analysis code here

### A6 — Monitoring & Drift Strategy *(bonus)*

> Propose a monitoring plan for the model once deployed.
> How would you detect concept drift or data drift? What metrics and alerts would you set up?

*Your written response here (code optional)*

### A7 — Rejection Inference *(bonus)*

> Discuss the challenge of reject inference in credit scoring.
> How might training only on approved clients bias the model, and what techniques could mitigate it?

*Your written response here*

### A8 — Business Simulation *(bonus)*

> Create a simulation showing the financial impact (expected profit/loss) of your model at different score cutoff thresholds.
> Include assumptions about loan amounts, interest rates, and recovery rates.

In [None]:
# Your business simulation code here

---

## Part B — Results Presentation

> A brief summary a non-technical stakeholder could understand.
> Cover: key EDA findings, top drivers of default, recommended decision threshold and rationale, expected business impact and limitations.

### Key EDA Findings

*Your summary here*

### Top Drivers of Default

*Your summary here*

### Recommended Decision Threshold

*Your recommendation and rationale here*

### Expected Business Impact & Limitations

*Your summary here*