# Other Classification Models
    Beyond Linear and Tree-Based Approaches in Credit Risk
## Objective

This notebook explores alternative classification algorithms that complement linear, tree-based, and ensemble models, covering:

- Support Vector Machines (SVM)

- K-Nearest Neighbors (KNN)

- Naive Bayes

- Neural Networks (MLP)

- When not to use them in regulated environments

It answers:

    Which classification models exist beyond trees — and why are many rarely used in production banking?

## Business Context – Why This Notebook Matters

In Finance:

- Not all high-performing models are deployable

- Regulatory scrutiny prioritizes:

- Stability

- Explainability

- Auditability

These models are commonly used as:

- Benchmarks

- Challengers

- Research prototypes

- Feature signal detectors

## Imports and Dataset

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns


df = pd.read_csv("D:/GitHub/Data-Science-Techniques/datasets/Supervised-classification/synthetic_credit_default_classification.csv")

df.head()


Unnamed: 0,customer_id,age,annual_income,credit_utilization,debt_to_income,loan_amount,loan_term_months,num_past_defaults,employment_years,credit_score,default
0,1,59,23283.682822,0.187813,0.245248,20232.165654,24,0,4.575844,689.627408,1
1,2,49,61262.608063,0.291774,0.396763,26484.067591,36,0,3.317515,697.770541,1
2,3,35,60221.74316,0.230557,0.122859,27142.522594,24,1,11.871955,713.721429,0
3,4,63,93603.112731,0.157906,0.635484,1000.0,12,0,2.256651,655.306417,1
4,5,28,71674.557271,0.167549,0.422446,15254.246561,48,0,6.97127,644.247643,0


In [2]:
target = "default"

X = df.drop(columns=[target, "customer_id"])
y = df[target]


# Train/Test Split (Stratified)

In [3]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    stratify=y,
    random_state=42
)


# Preprocessing (Scaling Required)

Unlike trees, most models here are scale-sensitive.

## Pipeline Preprocessing

In [4]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

preprocess = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])


# MODELS

## Support Vector Machine (SVM)
#### Why Consider SVM?

- Strong theoretical foundation

- Effective in high-dimensional spaces

#### Why Rare in Banking?

- Poor scalability

- Limited interpretability

### SVM

In [5]:
from sklearn.svm import SVC

svm = Pipeline(steps=[
    ("prep", preprocess),
    ("model", SVC(
        kernel="rbf",
        C=1.0,
        gamma="scale",
        probability=True,
        class_weight="balanced"
    ))
])

svm.fit(X_train, y_train)


## K-Nearest Neighbors (KNN)
#### Why Consider KNN?

- Simple

- Non-parametric

#### Why Rare in Finance?

- No global model

- Sensitive to noise

- Hard to justify decisions

### KNN

In [6]:
from sklearn.neighbors import KNeighborsClassifier

knn = Pipeline(steps=[
    ("prep", preprocess),
    ("model", KNeighborsClassifier(
        n_neighbors=15,
        weights="distance"
    ))
])

knn.fit(X_train, y_train)


## Naive Bayes
#### Why Consider NB?

- Fast

- Stable

- Probabilistic

#### Limitations

- Strong independence assumptions

- Often underperforms in credit risk

### Gaussian Naive Bayes

In [7]:
from sklearn.naive_bayes import GaussianNB

nb = Pipeline(steps=[
    ("prep", preprocess),
    ("model", GaussianNB())
])

nb.fit(X_train, y_train)


## Neural Network (MLP)
#### Why Consider Neural Networks?

- Universal approximators

- Can capture complex interactions

#### Why Rare in Banking?

- Low interpretability

- Stability concerns

- Governance challenges

### MLP Classifier

In [8]:
from sklearn.neural_network import MLPClassifier

mlp = Pipeline(steps=[
    ("prep", preprocess),
    ("model", MLPClassifier(
        hidden_layer_sizes=(64, 32),
        activation="relu",
        max_iter=500,
        random_state=42
    ))
])

mlp.fit(X_train, y_train)




# Model Evaluation 

## Evaluation (ROC-AUC)

In [9]:
from sklearn.metrics import roc_auc_score

models = {
    "SVM": svm,
    "KNN": knn,
    "Naive Bayes": nb,
    "MLP Neural Net": mlp
}

results = {}

for name, model in models.items():
    prob = model.predict_proba(X_test)[:, 1]
    results[name] = roc_auc_score(y_test, prob)

pd.Series(results).sort_values(ascending=False)


SVM               0.902066
Naive Bayes       0.898160
KNN               0.882698
MLP Neural Net    0.863234
dtype: float64

# Expected Performance Pattern

| Model       | Typical Outcome       |
| ----------- | --------------------- |
| SVM         | Strong but slow       |
| KNN         | Unstable              |
| Naive Bayes | Baseline              |
| MLP         | Competitive, volatile |


# Interpretability & Governance


| Model          | Interpretability | Regulatory Fit |
| -------------- | ---------------- | -------------- |
| SVM            | Low              | Low            |
| KNN            | Very Low         | Very Low       |
| Naive Bayes    | Medium           | Medium         |
| Neural Network | Low              | Low            |



#  When These Models Make Sense

- `ok` Research
- `ok` Feature signal validation
- `ok` Academic benchmarks
- `ok` Challengers only

- `no` Primary PD model
- `no` Policy decision engines
- `no` Regulated scorecards


# Common Mistakes (Avoided)

- `[neg] -` Deploying black-box models without governance
- `[neg] -` Ignoring scaling requirements
- `[neg] -` Over-tuning unstable models
- `[neg] -` Treating AUC as sufficient
- `[neg] -` Skipping calibration


# Summary Table
- Model	Use Case
- SVM	Benchmark
- KNN	Conceptual
- Naive Bayes	Baseline
- Neural Net	Research
- Trees / LR	Production

# Key Takeaways

Not all models belong in production

Performance ≠ deployability

Governance trumps marginal AUC gains

These models are best used as challengers

Trees + LR dominate real banking systems


# Next Notebook
04_Supervised_Learning/

└── 06_model_governance_and_champion_challenger.ipynb