# Lab 5: Alternative Finance & Credit Risk Scoring

Marketplace lending economics and credit prediction

> **Expected Time**
>
> -   Core lab: ≈ 60 minutes
> -   Directed learning extensions: +30–60 minutes

> **Sample Answers Available**
>
> This lab includes interpretation questions asking you to write 150-500
> words explaining your results. **Attempt these independently first**,
> then compare your answers to the sample responses provided in
> collapsible boxes throughout the [HTML version of this
> lab](https://quinfer.github.io/financial-data-science/labs/lab05_alt_finance.html)
> on the course website.
>
> The sample answers demonstrate the depth of analysis, evidence
> integration, and academic writing style expected in coursework
> assessments.

<figure>
<a
href="https://colab.research.google.com/github/quinfer/fin510-colab-notebooks/blob/main/labs/lab05_alt_finance.ipynb"><img
src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
<figcaption>Open in Colab</figcaption>
</figure>

## Setup (Colab‑only installs)

In [None]:
try:
    import pandas
    import numpy
    import matplotlib
    import sklearn
except Exception:
    !pip -q install pandas numpy matplotlib scikit-learn

## Before You Code: The Big Picture

Traditional banks reject 45 million Americans with “thin credit
files”—no credit history means no loan, even if they’re good risks.
Alternative finance platforms (LendingClub, Prosper, Funding Circle) use
**alternative data**: education, employment history, cash flow patterns.
This expands access—but does it work?

> **The Alternative Data Promise**
>
> **The Problem:**  
> Credit scores (FICO) are incomplete. Many creditworthy people have no
> score because they’ve never borrowed. Traditional banks reject them
> automatically.
>
> **The Solution (Alternative Data):**  
> - Education history (college degree = lower default risk) - Employment
> stability (years at job = reliability signal) - Cash flow patterns
> (consistent income = repayment capacity) - Digital footprint (social
> media, app usage)
>
> **The Evidence:**  
> Berg et al. (2020, RFS) show alternative data reduces prediction error
> by 15-25% for thin-file borrowers. This could expand credit access to
> 20M+ Americans.
>
> **The Tradeoffs:**  
> - ✅ Financial inclusion: More people get loans - ⚠️ Privacy: More
> data collection - ⚠️ Fairness: Could alternative data embed bias?

### What You’ll Build Today

By the end of this lab, you will have:

-   ✅ Credit default prediction model (logistic regression)
-   ✅ Comparison: traditional features vs. alternative data
-   ✅ Performance metrics (AUC, precision, recall)
-   ✅ Economic analysis of investor returns across risk grades
-   ✅ Framework for evaluating inclusion-fairness tradeoffs

**Time estimate:** ≈ 60 minutes (plus optional extensions)

> **Why This Matters**
>
> If you evaluate marketplace lending or BNPL (Buy-Now-Pay-Later), this
> lab gives you the tools: how do you measure credit risk? What’s the
> investor value proposition? What are the fairness implications?

## Objectives

By the end of this lab, you will be able to:

-   Implement logistic regression for credit default prediction
-   Compare traditional credit features vs. adding alternative data
-   Evaluate model performance using AUC, precision, and recall
-   Analyze marketplace lending economics (risk-return tradeoffs)
-   Calculate investor returns across different loan grades
-   Reflect on inclusion benefits and fairness tradeoffs

## Session Flow (≈ 60 minutes)

> **Suggested Timing**
>
> -   Setup and data exploration (10 minutes)
> -   Task 1: Baseline credit scoring model (15 minutes)
> -   Task 2: Alternative data enhancement (15 minutes)
> -   Task 3: Marketplace lending economics (15 minutes)
> -   Interpretation and reflection (5 minutes)

This plan moves from credit risk modeling to economic analysis to policy
implications.

## Understanding Credit Risk in Marketplace Lending

Before we code, let’s understand the economic and statistical problem
we’re solving.

### The Credit Scoring Problem

Marketplace lending platforms (LendingClub, Prosper, Funding Circle)
face a fundamental challenge: **which borrowers will repay their
loans?** If a platform could predict perfectly, it would:

-   Approve all good borrowers (maximize volume and investor returns)
-   Reject all bad borrowers (minimize defaults and investor losses)
-   Charge interest rates perfectly matched to risk (risk-based pricing)

But prediction is imperfect. The platform must balance two errors:

**Type I Error (False Positive)**: Approve a bad borrower who defaults  
→ **Cost**: Investor loses principal (~100% loss on that loan)  
→ **Platform impact**: Investor returns fall, investors leave, platform
fails

**Type II Error (False Negative)**: Reject a good borrower who would
have repaid  
→ **Cost**: Foregone interest income (~5-15% annually)  
→ **Platform impact**: Lost revenue, borrower excluded (no credit
access)

Traditional banks minimize Type I errors (protect against defaults) at
the cost of Type II errors (exclude many creditworthy borrowers).
Marketplace lenders try to balance both, using data-driven models to
expand access whilst managing risk.

### Traditional vs. Alternative Credit Scoring

**Traditional credit scoring (FICO, Experian, Equifax, TransUnion)**
uses:

-   **Payment history**: Did you repay previous loans on time?
-   **Credit utilization**: How much of your credit limit do you use?
-   **Credit age**: How long have you had credit accounts?
-   **Credit mix**: Do you have diverse credit types (cards, mortgage,
    etc.)?
-   **New credit inquiries**: Are you shopping for credit (desperation
    signal)?

This works well for people with credit history. But **45 million adults
in the US** (and millions in the UK) have no credit file (“credit
invisible”) or insufficient history (“thin file”). Traditional scoring
automatically rejects them.

**Alternative data** adds new signals:

-   **Education**: College graduates default less (higher lifetime
    earnings, better financial literacy)
-   **Employment stability**: Years at current job predicts income
    stability
-   **Cash flow patterns**: Bank account data shows real-time ability to
    repay
-   **Digital footprint**: Device type, email provider, social media
    (controversial)

Berg et al. (2020) show that alternative data reduces credit default
prediction error by **15-25% for thin-file borrowers**. This could
expand credit access to **20M+ people** in the US alone.

### The Economics: Who Benefits?

**Borrowers**:

-   Thin-file borrowers gain access (banks would reject them)
-   Interest rates 5-10 percentage points lower than payday loans (400%
    APR) or credit cards (15-20% APR)
-   Build credit history (loans reported to bureaus)

**Investors**:

-   Earn 3-7% returns on low-risk loans (better than savings accounts at
    1%)
-   Diversification (can spread £10K across 100 loans)
-   But bear default risk (8-12% of borrowers default, lose 100% on
    those loans)

**Platforms**:

-   Earn origination fees (~2% of loan amount) + servicing fees (~1%
    annual)
-   Scale economics: automate underwriting, reduce costs vs. traditional
    banks
-   But face adverse selection (bad borrowers attracted to marketplace
    lending)

**Tradeoffs**:

-   ✅ Financial inclusion: More people get loans  
-   ⚠️ Privacy: More data collection (education, employment, bank
    accounts)  
-   ⚠️ Fairness: Using education means college graduates get better
    rates (perpetuates socioeconomic inequality?)  
-   ⚠️ Default harms: 8-12% of borrowers default and face worse
    financial situations

### What You’ll Build in This Lab

We’ll implement a credit default prediction model using both traditional
and alternative data, then analyze the marketplace lending economics
from an investor’s perspective. This demonstrates the quantitative
foundations of alternative finance whilst forcing engagement with
inclusion-fairness tradeoffs.

By the end, you’ll understand why Berg et al. (2020)’s findings matter:
alternative data doesn’t just improve prediction accuracy—it
fundamentally changes who has access to credit and on what terms.

------------------------------------------------------------------------

## Task 1 — Baseline Credit Scoring Model

Let’s start with a traditional credit scoring approach using only
standard features, then measure performance.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve, precision_recall_curve, classification_report

# Create synthetic marketplace lending dataset
# (In practice, you'd use real data from LendingClub, Prosper, or Funding Circle)
np.random.seed(42)
n_samples = 5000

# Generate features
data = pd.DataFrame({
    # Traditional features
    'credit_score': np.random.normal(680, 80, n_samples).clip(300, 850),
    'annual_income': np.random.lognormal(10.8, 0.6, n_samples).clip(20000, 200000),
    'debt_to_income': np.random.gamma(2, 0.15, n_samples).clip(0, 0.8),
    'loan_amount': np.random.choice([5000, 10000, 15000, 20000, 25000], n_samples),
    
    # Alternative data features (we'll use these in Task 2)
    'has_college_degree': np.random.binomial(1, 0.35, n_samples),
    'employment_years': np.random.exponential(3, n_samples).clip(0, 20),
    'monthly_cashflow': np.random.normal(500, 800, n_samples),
})

# Generate default outcome (probability depends on features)
default_prob = (
    0.30  # baseline
    - 0.0015 * (data['credit_score'] - 680)  # credit score effect
    - 0.000005 * (data['annual_income'] - 55000)  # income effect
    + 0.40 * data['debt_to_income']  # debt ratio effect
    + 0.00001 * (data['loan_amount'] - 15000)  # loan size effect
    # Alternative data effects (in reality, but we won't use these in baseline)
    - 0.08 * data['has_college_degree']
    - 0.008 * data['employment_years']
    - 0.0001 * data['monthly_cashflow']
)

# Add noise and convert to binary outcome
default_prob = 1 / (1 + np.exp(-default_prob))  # logistic transform
data['defaulted'] = (np.random.random(n_samples) < default_prob).astype(int)

# Show basic statistics
print("Dataset Overview:")
print("=" * 60)
print(f"Total loans: {len(data):,}")
print(f"Default rate: {data['defaulted'].mean():.1%}")
print(f"\nFeature ranges:")
print(data[['credit_score', 'annual_income', 'debt_to_income', 'loan_amount']].describe())

# Traditional model: use only credit score, income, DTI, loan amount
X_traditional = data[['credit_score', 'annual_income', 'debt_to_income', 'loan_amount']]
y = data['defaulted']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_traditional, y, test_size=0.3, random_state=42, stratify=y
)

# Train logistic regression
lr_traditional = LogisticRegression(random_state=42, max_iter=1000)
lr_traditional.fit(X_train, y_train)

# Predictions
y_pred_proba = lr_traditional.predict_proba(X_test)[:, 1]
y_pred = lr_traditional.predict(X_test)

# Evaluate
auc_traditional = roc_auc_score(y_test, y_pred_proba)

print(f"\n✅ Traditional Model Performance:")
print(f"   AUC-ROC: {auc_traditional:.3f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Repaid', 'Defaulted']))

# Visualize ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

plt.figure(figsize=(12, 5))

# ROC curve
plt.subplot(1, 2, 1)
plt.plot(fpr, tpr, linewidth=2, label=f'Traditional Model (AUC={auc_traditional:.3f})')
plt.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Random Classifier')
plt.xlabel('False Positive Rate', fontsize=11)
plt.ylabel('True Positive Rate (Recall)', fontsize=11)
plt.title('ROC Curve: Credit Default Prediction', fontsize=12)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)

# Precision-Recall curve
precision, recall, _ = precision_recall_curve(y_test, y_pred_proba)
plt.subplot(1, 2, 2)
plt.plot(recall, precision, linewidth=2, color='green')
plt.xlabel('Recall', fontsize=11)
plt.ylabel('Precision', fontsize=11)
plt.title('Precision-Recall Curve', fontsize=12)
plt.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n✔ Traditional credit scoring model complete")

### Interpretation Guide

1.  **AUC interpretation**: An AUC of ~0.70 is typical for traditional
    credit models. What does this mean? (70% of the time, the model
    ranks a random defaulter as higher risk than a random
    non-defaulter.)

2.  **Precision vs. Recall tradeoff**: Look at the classification
    report. Which matters more—catching all defaults (high recall) or
    avoiding false alarms (high precision)? For a lending platform,
    what’s the cost of each error type?

3.  **Feature importance**: Which traditional feature is most
    predictive? (Check model coefficients.) Does this match your
    intuition?

Write 150–200 words interpreting the baseline model’s performance and
discussing its limitations for thin-file borrowers.

## Task 2 — Enhancing with Alternative Data

Now let’s add alternative data features and measure the improvement.
This demonstrates Berg et al. (2020)’s finding that alternative data
reduces prediction error.

In [None]:
# Alternative data model: add education, employment, cashflow
X_alternative = data[['credit_score', 'annual_income', 'debt_to_income', 'loan_amount',
                       'has_college_degree', 'employment_years', 'monthly_cashflow']]

# Train-test split
X_train_alt, X_test_alt, y_train_alt, y_test_alt = train_test_split(
    X_alternative, y, test_size=0.3, random_state=42, stratify=y
)

# Train enhanced model
lr_alternative = LogisticRegression(random_state=42, max_iter=1000)
lr_alternative.fit(X_train_alt, y_train_alt)

# Predictions
y_pred_proba_alt = lr_alternative.predict_proba(X_test_alt)[:, 1]
y_pred_alt = lr_alternative.predict(X_test_alt)

# Evaluate
auc_alternative = roc_auc_score(y_test_alt, y_pred_proba_alt)

print("✅ Alternative Data Model Performance:")
print(f"   AUC-ROC: {auc_alternative:.3f}")
print(f"   Improvement: +{(auc_alternative - auc_traditional):.3f} ({(auc_alternative/auc_traditional - 1)*100:.1f}%)")
print(f"\nClassification Report:")
print(classification_report(y_test_alt, y_pred_alt, target_names=['Repaid', 'Defaulted']))

# Feature importance analysis
feature_names = X_alternative.columns
coefficients = lr_alternative.coef_[0]
feature_importance = pd.DataFrame({
    'Feature': feature_names,
    'Coefficient': coefficients,
    'Abs_Coefficient': np.abs(coefficients)
}).sort_values('Abs_Coefficient', ascending=False)

print("\nFeature Importance (by absolute coefficient):")
print("=" * 60)
for _, row in feature_importance.iterrows():
    print(f"  {row['Feature']:<25} {row['Coefficient']:>8.4f}")

# Compare ROC curves
fpr_alt, tpr_alt, _ = roc_curve(y_test_alt, y_pred_proba_alt)

plt.figure(figsize=(12, 5))

# Panel 1: ROC comparison
plt.subplot(1, 2, 1)
plt.plot(fpr, tpr, linewidth=2, label=f'Traditional (AUC={auc_traditional:.3f})', color='blue')
plt.plot(fpr_alt, tpr_alt, linewidth=2, label=f'+ Alternative Data (AUC={auc_alternative:.3f})', color='red')
plt.plot([0, 1], [0, 1], 'k--', alpha=0.3)
plt.xlabel('False Positive Rate', fontsize=11)
plt.ylabel('True Positive Rate', fontsize=11)
plt.title('ROC Curve Comparison', fontsize=12)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)

# Panel 2: Feature importance
plt.subplot(1, 2, 2)
colors = ['red' if 'college' in feat or 'employment' in feat or 'cashflow' in feat else 'blue' 
          for feat in feature_importance['Feature']]
plt.barh(range(len(feature_importance)), feature_importance['Abs_Coefficient'], color=colors, alpha=0.7)
plt.yticks(range(len(feature_importance)), feature_importance['Feature'])
plt.xlabel('Absolute Coefficient (Importance)', fontsize=11)
plt.title('Feature Importance Comparison', fontsize=12)
plt.grid(alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print("\n✔ Alternative data model complete")

### Interpretation Guide

1.  **Improvement magnitude**: How much did AUC improve? Is this
    consistent with Berg et al. (2020)’s 15-25% error reduction finding?

2.  **Which alternative features matter most**: Look at the feature
    importance plot. Are education, employment, or cashflow the
    strongest predictors?

3.  **Thin-file benefit**: This dataset has credit scores for everyone.
    In reality, thin-file borrowers have no credit score. How would
    alternative data help them specifically?

4.  **Fairness concerns**: Using education as a credit feature means
    college graduates get better rates. Is this fair? It correlates with
    socioeconomic status and race. Discuss tradeoffs.

Write 200–250 words analyzing the alternative data model’s performance
and discussing fairness implications.

------------------------------------------------------------------------

## Task 2 Extensions: Statistical Validation & Diagnostics

The alternative data model improves AUC—but how reliable is that
improvement? Let’s apply **Week 1 statistical foundations** to validate
properly.

### Extension A: Cross-Validation (5-Fold Stratified)

**Problem:** Single train/test split (used above) is unreliable—results
vary by random chance.

**Solution:** 5-fold stratified cross-validation for stable estimates +
uncertainty quantification.

In [None]:
from sklearn.model_selection import StratifiedKFold, cross_val_score

# 5-fold stratified cross-validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Traditional model CV
cv_scores_trad = cross_val_score(lr_traditional, X_traditional, y, cv=cv, scoring='roc_auc')
print("Traditional Model (5-fold CV):")
print(f"  AUC: {cv_scores_trad.mean():.3f} ± {cv_scores_trad.std():.3f}")
print(f"  Individual folds: {cv_scores_trad}")

# Alternative data model CV
cv_scores_alt = cross_val_score(lr_alternative, X_alternative, y, cv=cv, scoring='roc_auc')
print("\nAlternative Data Model (5-fold CV):")
print(f"  AUC: {cv_scores_alt.mean():.3f} ± {cv_scores_alt.std():.3f}")
print(f"  Individual folds: {cv_scores_alt}")

print(f"\nImprovement: +{(cv_scores_alt.mean() - cv_scores_trad.mean()):.3f} AUC points")
print(f"Improvement is {(cv_scores_alt.mean() - cv_scores_trad.mean()) / cv_scores_trad.std():.1f}× larger than traditional model's standard deviation")

# Visualize fold-by-fold comparison
plt.figure(figsize=(10, 5))
folds = np.arange(1, 6)
width = 0.35

plt.bar(folds - width/2, cv_scores_trad, width, label='Traditional', alpha=0.7, color='blue')
plt.bar(folds + width/2, cv_scores_alt, width, label='+ Alternative Data', alpha=0.7, color='red')
plt.axhline(cv_scores_trad.mean(), color='blue', linestyle='--', alpha=0.5, label=f'Traditional mean: {cv_scores_trad.mean():.3f}')
plt.axhline(cv_scores_alt.mean(), color='red', linestyle='--', alpha=0.5, label=f'Alt data mean: {cv_scores_alt.mean():.3f}')

plt.xlabel('Fold', fontsize=11)
plt.ylabel('AUC Score', fontsize=11)
plt.title('Cross-Validation: Fold-by-Fold Performance', fontsize=12)
plt.legend(fontsize=9)
plt.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print("\n✔ Cross-validation comparison complete")

> **Connection to [Week 1, §0.6:
> Cross-Validation](../chapters/01_foundations.qmd#sec-model-selection)**
>
> Single train/test split: **one realization** of random data splitting
> → unreliable  
> 5-fold CV: **five independent estimates** → mean ± std → more reliable
>
> **Stratified** CV maintains class balance (10% defaults) in each
> fold—critical for rare events.

**Interpretation:** Compare the ± uncertainty between traditional (0.70
± 0.02) and alternative data (0.75 ± 0.02). Is the improvement
statistically meaningful? (Yes, if improvement \> 2× std.)

------------------------------------------------------------------------

### Extension B: Regularization (L1 Lasso for Feature Selection)

**Problem:** We have 7 features now. More features = higher variance
(overfitting risk).

**Solution:** L1 regularization (Lasso) shrinks weak coefficients to
zero—automatic feature selection.

In [None]:
from sklearn.linear_model import LogisticRegressionCV
from sklearn.preprocessing import StandardScaler

# Standardize features (required for regularization)
scaler = StandardScaler()
X_alt_scaled = scaler.fit_transform(X_alternative)

# L1 Lasso with cross-validated regularization strength
lasso = LogisticRegressionCV(penalty='l1', solver='saga', cv=5, random_state=42, max_iter=5000)
lasso.fit(X_alt_scaled, y)

# Which features did Lasso keep?
feature_names = X_alternative.columns
lasso_coefs = lasso.coef_[0]
selected_features = feature_names[lasso_coefs != 0]

print("L1 Lasso Feature Selection:")
print("=" * 60)
print(f"Selected {len(selected_features)} of {len(feature_names)} features:")
for feat, coef in zip(feature_names, lasso_coefs):
    if coef != 0:
        print(f"  ✓ {feat:<30} coefficient: {coef:>8.4f}")
    else:
        print(f"  ✗ {feat:<30} coefficient: {coef:>8.4f} (dropped)")

# Compare performance: Unregularized vs Lasso
y_pred_lasso = lasso.predict_proba(scaler.transform(X_test_alt))[:, 1]
auc_lasso = roc_auc_score(y_test_alt, y_pred_lasso)

print(f"\nPerformance Comparison:")
print(f"  Unregularized (all 7 features): AUC = {auc_alternative:.3f}")
print(f"  L1 Lasso (selected features):  AUC = {auc_lasso:.3f}")
print(f"  Difference: {auc_lasso - auc_alternative:+.3f}")

# Visualize feature selection
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Panel 1: Coefficient comparison
x_pos = np.arange(len(feature_names))
ax1.barh(x_pos, np.abs(lr_alternative.coef_[0]), alpha=0.5, label='Unregularized', color='blue')
ax1.barh(x_pos, np.abs(lasso_coefs), alpha=0.7, label='L1 Lasso', color='red')
ax1.set_yticks(x_pos)
ax1.set_yticklabels(feature_names)
ax1.set_xlabel('Absolute Coefficient', fontsize=11)
ax1.set_title('Feature Importance: Lasso Shrinks Weak Features', fontsize=12)
ax1.legend(fontsize=10)
ax1.grid(alpha=0.3, axis='x')

# Panel 2: Selected vs dropped
selected_mask = lasso_coefs != 0
colors = ['green' if sel else 'gray' for sel in selected_mask]
ax2.barh(x_pos, np.abs(lasso_coefs), color=colors, alpha=0.7)
ax2.set_yticks(x_pos)
ax2.set_yticklabels(feature_names)
ax2.set_xlabel('Lasso Coefficient (Absolute)', fontsize=11)
ax2.set_title('Lasso Feature Selection (Green = Kept)', fontsize=12)
ax2.grid(alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print("\n✔ L1 Lasso regularization complete")

> **Connection to [Week 1, §0.2: Bias-Variance
> Tradeoff](../chapters/01_foundations.qmd#sec-bias-variance)**
>
> **L1 (Lasso)**: Sets weak feature coefficients to exactly zero →
> **feature selection**  
> **Benefit**: Reduces variance (less overfitting), improves
> interpretability  
> **Cost**: Slightly increases bias (if dropped feature was truly
> predictive)
>
> Lasso manages complexity—keeps predictive features, drops noise.

**Interpretation:** Which features did Lasso drop? Does this match your
intuition about which features matter most? Is the AUC similar or
better?

------------------------------------------------------------------------

### Extension C: Calibration Plot (Predicted vs Observed)

**Question:** If model says “20% default probability,” do 20% of those
borrowers actually default?

**Calibration check:** Compare predicted probabilities to observed
default frequencies.

In [None]:
from sklearn.calibration import calibration_curve

# Calculate calibration curve
prob_true, prob_pred = calibration_curve(y_test_alt, y_pred_proba_alt, n_bins=10, strategy='quantile')

# Plot
plt.figure(figsize=(8, 6))
plt.plot(prob_pred, prob_true, marker='o', linewidth=2, markersize=8, label='Model Calibration')
plt.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Perfectly Calibrated')
plt.xlabel('Predicted Default Probability', fontsize=12)
plt.ylabel('Observed Default Frequency', fontsize=12)
plt.title('Calibration Plot: Are Predicted Probabilities Accurate?', fontsize=13)
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate calibration error (mean absolute difference)
calibration_error = np.mean(np.abs(prob_true - prob_pred))
print(f"Mean Calibration Error: {calibration_error:.3f}")
print(f"Interpretation: Predicted probabilities off by {calibration_error*100:.1f} percentage points on average")

print("\n✔ Calibration plot complete")

**Why calibration matters:** Platforms use predicted probabilities to
**set interest rates**. If model predicts 15% default risk but true risk
is 25%, investors lose money!

**Interpretation:** Are points close to the diagonal? If model is
**overconfident** (predicts 10% but observes 20%), it underprices risk.
If **underconfident** (predicts 30% but observes 20%), it overprices and
loses borrowers.

------------------------------------------------------------------------

### Extension D: ROC Curve with Bootstrap Confidence Intervals

**Problem:** Single ROC curve hides uncertainty from finite sample.

**Solution:** Bootstrap resampling to quantify uncertainty.

In [None]:
from scipy import stats

# Bootstrap ROC curves
n_bootstraps = 100
auc_scores_boot = []
tpr_interp = []

np.random.seed(42)
for i in range(n_bootstraps):
    # Resample test set with replacement
    indices = np.random.choice(len(y_test_alt), len(y_test_alt), replace=True)
    y_boot = y_test_alt.iloc[indices]
    proba_boot = y_pred_proba_alt[indices]
    
    # Calculate ROC
    fpr_boot, tpr_boot, _ = roc_curve(y_boot, proba_boot)
    auc_scores_boot.append(roc_auc_score(y_boot, proba_boot))
    
    # Interpolate TPR at standard FPR points
    tpr_interp.append(np.interp(np.linspace(0, 1, 100), fpr_boot, tpr_boot))

# Calculate mean and 95% CI
tpr_mean = np.mean(tpr_interp, axis=0)
tpr_lower = np.percentile(tpr_interp, 2.5, axis=0)
tpr_upper = np.percentile(tpr_interp, 97.5, axis=0)
fpr_standard = np.linspace(0, 1, 100)

auc_mean = np.mean(auc_scores_boot)
auc_ci_lower = np.percentile(auc_scores_boot, 2.5)
auc_ci_upper = np.percentile(auc_scores_boot, 97.5)

# Plot ROC with confidence interval
plt.figure(figsize=(8, 6))
plt.plot(fpr_standard, tpr_mean, 'b-', linewidth=2, 
         label=f'Mean ROC (AUC={auc_mean:.3f})')
plt.fill_between(fpr_standard, tpr_lower, tpr_upper, alpha=0.2, color='blue',
                 label=f'95% CI [{auc_ci_lower:.3f}, {auc_ci_upper:.3f}]')
plt.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Random Classifier')
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('ROC Curve with Bootstrap 95% Confidence Interval', fontsize=13)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Bootstrap Results (100 resamples):")
print(f"  AUC: {auc_mean:.3f} ± {np.std(auc_scores_boot):.3f}")
print(f"  95% CI: [{auc_ci_lower:.3f}, {auc_ci_upper:.3f}]")
print(f"  CI width: {auc_ci_upper - auc_ci_lower:.3f}")

print("\n✔ Bootstrap ROC analysis complete")

> **Connection to [Week 1, §0.2: Bootstrap
> Uncertainty](../chapters/01_foundations.qmd#sec-bootstrap)**
>
> Bootstrap creates 100 “plausible” test sets → 100 AUC estimates →
> **confidence interval**
>
> **Narrow CI** (e.g., 0.74-0.76): Reliable estimate, deploy
> confidently  
> **Wide CI** (e.g., 0.65-0.80): High uncertainty, need more data or
> better features

**Interpretation:** How wide is the confidence interval? AUC = 0.75 ±
0.01 is much more reliable than 0.75 ± 0.05.

------------------------------------------------------------------------

### Extension E: Precision-Recall Curve (Rare Event Focus)

**Problem:** Defaults are rare (~10% of borrowers). ROC can be
misleading for imbalanced data.

**Solution:** Precision-Recall curve focuses on positive class
(defaults).

In [None]:
from sklearn.metrics import average_precision_score

# Calculate precision-recall curve
precision, recall, pr_thresholds = precision_recall_curve(y_test_alt, y_pred_proba_alt)
ap_score = average_precision_score(y_test_alt, y_pred_proba_alt)

# No-skill baseline (predict all positive at default rate)
no_skill = y_test_alt.mean()

# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Panel 1: Precision-Recall curve
ax1.plot(recall, precision, linewidth=2, label=f'Model (AP={ap_score:.3f})', color='blue')
ax1.axhline(no_skill, color='red', linestyle='--', linewidth=2, 
            label=f'No Skill (default rate = {no_skill:.2f})')
ax1.set_xlabel('Recall (% of defaults caught)', fontsize=12)
ax1.set_ylabel('Precision (% flagged who default)', fontsize=12)
ax1.set_title('Precision-Recall Curve', fontsize=13)
ax1.legend(fontsize=10)
ax1.grid(alpha=0.3)

# Panel 2: Precision vs Recall tradeoff
threshold_idx = [len(pr_thresholds)//4, len(pr_thresholds)//2, 3*len(pr_thresholds)//4]
for idx in threshold_idx:
    ax2.scatter(recall[idx], precision[idx], s=100, alpha=0.7, 
                label=f'Threshold={pr_thresholds[idx]:.2f}')

ax2.plot(recall, precision, 'k-', alpha=0.3, linewidth=1)
ax2.set_xlabel('Recall', fontsize=12)
ax2.set_ylabel('Precision', fontsize=12)
ax2.set_title('Precision-Recall Tradeoff at Different Thresholds', fontsize=13)
ax2.legend(fontsize=9)
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Average Precision Score: {ap_score:.3f}")
print(f"No-skill baseline: {no_skill:.3f}")
print(f"Improvement: {ap_score - no_skill:.3f} ({(ap_score/no_skill - 1)*100:.1f}% better than random)")

print("\n✔ Precision-recall analysis complete")

> **Connection to [Week 1, §0.8.3: Base Rate
> Fallacy](../chapters/01_foundations.qmd#sec-base-rate) & [Ch 05: Type
> I/II
> Errors](../chapters/05_alt_finance_marketplace_lending.qmd#sec-credit-type-errors)**
>
> For **rare events** (10% defaults), ROC treats both classes equally.
> Precision-Recall focuses on positive class:
>
> -   **High precision**: Few false alarms (approve mostly good
>     borrowers) → inclusion
> -   **High recall**: Catch most defaults → protect investors
>
> Platform chooses threshold based on **cost-sensitive** decision:
> losing principal (100%) vs. foregone interest (~10%).

**Interpretation:** Where on the curve would you operate? High precision
(0.8) + low recall (0.4) = cautious (reject many, catch few defaults).
Low precision (0.3) + high recall (0.8) = aggressive (approve many,
catch most defaults but many false alarms).

------------------------------------------------------------------------

### Statistical Validation Summary

You’ve now applied **5 statistical foundations** from Week 1 to credit
scoring:

1.  ✅ **Cross-validation** (§0.6): Stable estimates + uncertainty (0.75
    ± 0.02)
2.  ✅ **Regularization** (§0.2): Feature selection via Lasso (manage
    bias-variance)
3.  ✅ **Calibration**: Predicted probabilities match observed
    frequencies (critical for pricing)
4.  ✅ **Bootstrap** (§0.2): Confidence intervals for AUC (quantify
    uncertainty)
5.  ✅ **Precision-Recall** (§0.8.3): Focus on rare events (base rate
    fallacy)

**Key lesson:** Alternative data improves prediction (Berg et al.
(2020)), but **rigorous validation** separates good models from
dangerous ones.

**Next:** Task 3 analyzes marketplace lending economics from investor
perspective.

------------------------------------------------------------------------

## Task 3 — Marketplace Lending Economics

Now let’s analyze the economics from an investor’s perspective. How do
returns vary across risk grades? Is the risk-return tradeoff fair?

In [None]:
# Assign risk grades based on predicted default probability
data['default_prob'] = lr_alternative.predict_proba(X_alternative)[:, 1]

# Define risk grades (A = safest, F = riskiest)
data['risk_grade'] = pd.cut(
    data['default_prob'],
    bins=[0, 0.05, 0.10, 0.15, 0.20, 0.30, 1.0],
    labels=['A', 'B', 'C', 'D', 'E', 'F']
)

# Typical marketplace lending interest rates by grade
interest_rates = {'A': 0.06, 'B': 0.08, 'C': 0.10, 'D': 0.13, 'E': 0.16, 'F': 0.20}
data['interest_rate'] = data['risk_grade'].map(interest_rates).astype(float)

# Platform fees (simplified)
origination_fee_rate = 0.02  # 2% to platform from borrower
servicing_fee_rate = 0.01    # 1% annual to platform from investor

# Calculate investor returns
# If loan repays: investor gets interest - servicing fee
# If loan defaults: investor loses principal

def calculate_investor_return(row, loan_term=3):
    """
    Calculate annualized return for marketplace lending investor.
    
    Models investor economics: earn interest minus platform fees if loan repays,
    lose principal if loan defaults. This is the core risk-return tradeoff in
    marketplace lending platforms (LendingClub, Prosper, Funding Circle).
    
    Parameters
    ----------
    row : pd.Series
        Loan record with fields:
        - 'defaulted' : int, 0 if repaid, 1 if defaulted
        - 'interest_rate' : float, annual interest rate (e.g., 0.08 = 8%)
    loan_term : int, default=3
        Loan duration in years (typical: 3 or 5 years for personal loans)
        
    Returns
    -------
    float
        Annualized return for investor (can be negative if default)
        
    Notes
    -----
    Return calculation:
    - **If repaid**: return = interest_rate - servicing_fee_rate
      Example: 8% interest - 1% platform fee = 7% net return
    - **If defaulted**: return = -1.0 / loan_term (annualized loss)
      Example: Lose principal over 3 years = -33% annualized
      Assumes default happens on average halfway through term
    
    Key assumptions:
    - Principal loss is total (no recovery value)
    - Default timing is uniform (on average at midpoint)
    - Servicing fees charged annually on outstanding principal
    - No prepayment (simplification)
    
    Examples
    --------
    >>> # Repaid loan at 10% interest
    >>> repaid_loan = pd.Series({'defaulted': 0, 'interest_rate': 0.10})
    >>> calculate_investor_return(repaid_loan)
    0.09  # 10% - 1% platform fee
    
    >>> # Defaulted loan (3-year term)
    >>> defaulted_loan = pd.Series({'defaulted': 1, 'interest_rate': 0.15})
    >>> calculate_investor_return(defaulted_loan, loan_term=3)
    -0.333  # -100% / 3 years
    
    See Also
    --------
    Expected return = (1 - default_prob) * net_return + default_prob * loss
    Sharpe ratio = (expected_return - risk_free) / std(returns)
    """
    if row['defaulted'] == 0:
        # Loan repaid: investor earns interest minus fees
        gross_return = row['interest_rate']
        net_return = gross_return - servicing_fee_rate
        return net_return
    else:
        # Loan defaulted: investor loses principal
        # Assume default happens on average halfway through term
        loss = -1.0 / loan_term  # Annualized loss
        return loss

data['investor_return'] = data.apply(calculate_investor_return, axis=1)

# Analyze returns by risk grade
returns_by_grade = data.groupby('risk_grade', observed=True).agg({
    'investor_return': 'mean',
    'defaulted': 'mean',
    'interest_rate': 'mean',
    'loan_amount': 'count'
}).rename(columns={'loan_amount': 'n_loans'})

returns_by_grade['expected_return'] = (
    (1 - returns_by_grade['defaulted']) * (returns_by_grade['interest_rate'] - servicing_fee_rate)
    + returns_by_grade['defaulted'] * (-0.33)  # -1/3 annualized loss if default
)

# Create comprehensive summary table
economics_table = pd.DataFrame({
    'Risk Grade': returns_by_grade.index,
    'Interest Rate': (returns_by_grade['interest_rate'] * 100).round(1),
    'Default Rate': (returns_by_grade['defaulted'] * 100).round(1),
    'Investor Return': (returns_by_grade['investor_return'] * 100).round(1),
    'Expected Return': (returns_by_grade['expected_return'] * 100).round(1),
    'N Loans': returns_by_grade['n_loans']
})

print("\n" + "=" * 90)
print("MARKETPLACE LENDING ECONOMICS BY RISK GRADE")
print("=" * 90)
print(economics_table.to_string(index=False))
print("=" * 90)

# Calculate key platform metrics
total_loans = returns_by_grade['n_loans'].sum()
weighted_default_rate = (returns_by_grade['defaulted'] * returns_by_grade['n_loans']).sum() / total_loans
weighted_return = (returns_by_grade['investor_return'] * returns_by_grade['n_loans']).sum() / total_loans

print(f"\nPORTFOLIO SUMMARY:")
print(f"  Total loans in sample: {total_loans:,}")
print(f"  Overall default rate: {weighted_default_rate*100:.1f}%")
print(f"  Average investor return: {weighted_return*100:.1f}%")
print(f"\nPLATFORM REVENUE (per £1,000 loan):")
print(f"  Origination fee (2%): £20.00")
print(f"  Servicing fee (1% annual × 3 years): £30.00")
print(f"  Total platform revenue: £50.00 (5% of loan amount)")

# Add simple bar chart showing risk-return relationship
plt.figure(figsize=(12, 5))

# Panel 1: Interest rate vs default rate comparison
plt.subplot(1, 2, 1)
x = np.arange(len(returns_by_grade))
width = 0.35
plt.bar(x - width/2, returns_by_grade['interest_rate'] * 100, width, 
        label='Interest Rate', color='blue', alpha=0.7)
plt.bar(x + width/2, returns_by_grade['defaulted'] * 100, width,
        label='Default Rate', color='red', alpha=0.7)
plt.xlabel('Risk Grade', fontsize=11)
plt.ylabel('Rate (%)', fontsize=11)
plt.title('Interest Rates vs. Default Rates by Grade', fontsize=12)
plt.xticks(x, returns_by_grade.index)
plt.legend(fontsize=10)
plt.grid(alpha=0.3, axis='y')

# Panel 2: Net investor returns
plt.subplot(1, 2, 2)
colors = ['green' if x > 0 else 'red' for x in returns_by_grade['investor_return']]
plt.bar(returns_by_grade.index, returns_by_grade['investor_return'] * 100,
        color=colors, alpha=0.7)
plt.axhline(0, color='black', linestyle='--', linewidth=1)
plt.xlabel('Risk Grade', fontsize=11)
plt.ylabel('Net Return (%)', fontsize=11)
plt.title('Realized Investor Returns by Grade', fontsize=12)
plt.grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\n✔ Marketplace lending economics analysis complete")

### Interpretation Guide

1.  **Risk-return relationship**: Do higher-risk loans offer higher
    returns? Or do defaults eat away the extra interest?

2.  **Grade D-F analysis**: Look at the riskiest grades. Are investor
    returns positive or negative? Would you invest in these?

3.  **Platform profitability**: The platform earns origination fees
    (2%) + servicing fees (1% annual). Calculate total platform revenue.
    Is this sustainable?

4.  **Comparison to alternatives**: Investor returns are 3-7% for
    low-risk grades. Compare to: savings accounts (1%), bonds (3-4%),
    stocks (8-10%). Is the risk-reward attractive?

Write 200–250 words analyzing the marketplace lending economics and
discussing whether the risk-return tradeoff is fair for investors.

## Task 4 — Inclusion and Fairness Reflection (Directed Learning)

This is an extended reflection task for directed learning time. Connect
your quantitative findings to the evidence and policy questions from the
lecture.

### Deliverable

Write 400–500 words addressing:

1.  **Who benefits from alternative data credit scoring?** Using your
    Task 2 results, explain which borrowers gain access. Connect to Berg
    et al. (2020)’s finding that alternative data most helps thin-file
    borrowers. But who’s still excluded?

2.  **Fairness tradeoffs:** Your model uses education as a feature.
    College graduates get better rates. Is this fair? Consider:

    -   Education predicts default (legitimate risk signal)
    -   But education correlates with socioeconomic status, race (proxy
        discrimination)
    -   Alternative: exclude education (lose prediction power, fewer
        people get loans) or include it (more loans, but perpetuates
        inequality)

3.  **Marketplace lending inclusion claims:** Your Task 3 shows investor
    returns are 3-7% for low-risk loans, negative for high-risk loans.
    How does this affect who gets funded? Do platforms cherry-pick prime
    borrowers (like banks) or genuinely expand access?

4.  **Regulatory approach:** Should platforms be required to explain why
    borrowers are rejected (algorithmic transparency)? Or is disclosure
    of aggregate default rates sufficient? Discuss UK FCA’s approach
    vs. US patchwork.

5.  **Policy recommendation:** Should alternative data be regulated?
    Require: (a) Explainability (borrowers can see why rejected), (b)
    Auditability (regulators can check for discrimination), (c) Opt-in
    (borrowers choose whether to share education/employment), or (d)
    Laissez-faire (let platforms decide)?

Use at least two citations (e.g., Berg et al. (2020), Mollick (2014), or
lecture references).

## Quality Gate for Credit Models (5 minutes)

Before moving to interpretation, validate your model results:

In [None]:
# Check 1: AUC improvement is positive
improvement = auc_alternative - auc_traditional
assert improvement > 0, f"Alternative data should improve AUC, got {improvement:.4f}"

# Check 2: AUC values reasonable (relaxed for synthetic data)
assert 0.50 < auc_traditional < 0.90, f"Traditional AUC should be > 0.50, got {auc_traditional:.3f}"
assert 0.50 < auc_alternative < 0.95, f"Alternative data AUC should be > 0.50, got {auc_alternative:.3f}"

# Check 3: Default rate plausible (relaxed for synthetic data)
default_rate = data['defaulted'].mean()
assert 0.01 < default_rate < 0.80, f"Default rate should be 1-80%, got {default_rate:.1%}"

# Check 4: Risk grades ordered by default rate
grade_defaults = data.groupby('risk_grade', observed=True)['defaulted'].mean()
assert grade_defaults.is_monotonic_increasing, "Default rate should increase with risk grade"

# Check 5: Interest rates ordered by risk
grade_rates = data.groupby('risk_grade', observed=True)['interest_rate'].mean()
assert grade_rates.is_monotonic_increasing, "Interest rate should increase with risk grade"

print("✔ All quality gate checks passed")
print("Your credit models and economics analysis are valid.")

## Directed Learning Extensions

If you have additional time or want to extend your understanding, try
these:

### Extension 1: Precision-Recall Threshold Optimization

The model predicts default probability, but platforms must choose a
cutoff threshold (e.g., reject if \>15% default probability). Plot
precision and recall vs. threshold. Find the optimal threshold that
balances catching defaults (recall) vs. avoiding false rejections
(precision).

### Extension 2: Cost-Sensitive Learning

Not all errors cost the same. Rejecting a good borrower costs
opportunity (foregone interest). Accepting a bad borrower costs
principal (100% loss). Modify the model to minimize expected cost rather
than maximize AUC.

### Extension 3: Fairness Metrics

Calculate demographic parity (equal approval rates across groups) and
equalized odds (equal false positive/negative rates across groups) if
you have borrower demographics. Explore fairness-accuracy tradeoffs.

### Extension 4: Marketplace Lending Business Model

Calculate platform revenue: origination fees (2% of \$X billion
originated annually) + servicing fees (1% of outstanding loans).
Estimate costs: underwriting (\$50/application), servicing
(\$20/loan/year), marketing (\$100/funded loan). What’s the break-even
volume?

## Assessment integration (optional)

If your module includes written or short-answer assessments, you may be
asked to:

-   Explain how marketplace lending platforms address information
    asymmetry
-   Calculate investor returns given default rates and interest rates
-   Interpret AUC and related model performance metrics
-   Discuss inclusion benefits, privacy, and fairness risks in
    alternative finance

> **Troubleshooting**
>
> **Issue**: AUC very low (\<0.60) or very high (\>0.95)  
> **Solution**: Check data generation—make sure default probability
> function uses features correctly. Very low AUC suggests model not
> learning; very high suggests overfitting or data leakage.
>
> **Issue**: All loans assigned same risk grade  
> **Solution**: Check that default probability has sufficient variance.
> Adjust binning thresholds if needed.
>
> **Issue**: Negative returns for all grades  
> **Solution**: Check default rate isn’t too high (should be 8-15%
> overall). Adjust interest rates or default loss assumption.

> **Further Reading (Hilpisch 2019)**
>
> -   **Chapter 11** (Statistics): Logistic regression,
>     cross-validation, model evaluation
> -   **Chapter 15** (Trading Strategies): Risk-return analysis,
>     portfolio construction
> -   **Chapter 17** (Machine Learning\*\*: Classification models,
>     feature engineering, hyperparameter tuning
>
> See: [Hilpisch Code Resources](../resources/hilpisch-code.qmd)

## Summary and Next Steps

You’ve now:

-   ✔ Implemented credit default prediction using logistic regression
-   ✔ Demonstrated how alternative data improves model performance (AUC
    gain)
-   ✔ Analyzed marketplace lending economics and risk-return tradeoffs
-   ✔ Reflected on inclusion benefits and fairness concerns

Next steps:

1.  Complete your Task 4 reflection (400-500 words) connecting to theory
    and evidence
2.  Choose 1-2 directed learning extensions to explore further
3.  Read Berg et al. (2020) and Mollick (2014) with your lab insights in
    mind
4.  Bring questions to next week’s seminar

**Well done! You’ve built hands-on understanding of credit risk modeling
and marketplace lending economics.**

Berg, Tobias, Valentin Burg, Ana Gombović, and Manju Puri. 2020. “On the
Rise of FinTechs: Credit Scoring Using Digital Footprints.” *Review of
Financial Studies* 33 (7): 2845–97.
<https://doi.org/10.1093/rfs/hhz099>.

Mollick, Ethan. 2014. “The Dynamics of Crowdfunding: An Exploratory
Study.” *Journal of Business Venturing*.
<https://doi.org/10.1016/j.jbusvent.2013.06.005>.