# **Notebook 02 - Credit Risk Modeling under Autonomous Constraints**

## **Section 1 - Objective and Scientific Positioning**

### **1.1 Purpose of This Notebook**

The objective of this notebook is to establish a **baseline credit risk modeling pipeline** and use it as a controlled environment to investigate the emergence of **autonomous risk** under realistic conditions.

Rather than introducing novel prediction tasks or unconventional labels, this notebook deliberately focuses on a **standard and widely accepted problem**:

> **Binary classification of credit default risk (`label_default`).**

This choice is intentional and methodological. By grounding the analysis in a canonical task, we ensure that any observed risk amplification, instability, or governance failure cannot be dismissed as an artifact of problem construction.


### **1.2 Why Credit Default as the Reference Task**

Credit default prediction is one of the most mature application domains in machine learning. It is characterized by:

* well-defined labels;
* extensive regulatory oversight;
* a large body of academic and industrial benchmarks;
* and clear economic consequences.

As such, it provides an ideal testbed to examine whether **standard performance-oriented modeling practices are sufficient** to capture all relevant forms of risk.

The central question guiding this notebook is therefore:

> *Can a model be statistically accurate and economically useful, yet structurally risky under autonomous deployment conditions?*


### **1.3 From Predictive Risk to Autonomous Risk**

Traditional credit risk models focus on minimizing prediction error under assumptions of:

* static data distributions;
* passive model deployment;
* and strong human oversight.

However, modern credit systems increasingly involve:

* automated decision pipelines;
* limited human review;
* feedback effects between model decisions and future data;
* and opaque model architectures.

In such contexts, **risk is no longer fully characterized by misclassification rates**. Instead, it emerges from the interaction between:

* **Autonomy (A):** how independently the model drives decisions;
* **Opacity (O):** how difficult its reasoning is to interpret;
* **Human Oversight (H):** how effectively humans can intervene;
* and **systemic impact**, even when predictions are “correct.”

This notebook operationalizes these ideas within a familiar credit risk setting.


### **1.4 Scope and Boundaries**

It is important to clarify what this notebook **does and does not claim**.

#### **This notebook **does**:**

* Train standard supervised models for default prediction;
* Evaluate classical performance metrics (AUC, accuracy, etc.);
* Introduce structural indicators related to autonomy and opacity;
* Prepare the ground for later analysis of instability and feedback loops.

#### **This notebook **does not**:**

* Claim the existence of intent, awareness, or consciousness;
* Attribute moral agency to the model;
* Replace regulatory or legal definitions of creditworthiness.

The focus remains strictly on **observable, measurable system behavior**.



### **1.5 Relationship to the Other Notebooks**

This notebook plays a **foundational role** in the project:

| Notebooks | Description |
|--------|-------------|
| 01 | introduced the synthetic environment and feature construction; |
| 02 | establishes a classical supervised learning baseline; |
| 03 | will extend the analysis to fraud and anomaly-driven risk; |
| 04 | will examine governance, opacity, and auditability; |
| 05 | will explore feedback loops and temporal risk dynamics; |
| 06 | will generalize the framework toward AI Safety and AGI-relevant concerns. |


In other words:

> *Notebook 02 answers the question:
> “What does autonomous risk look like when nothing appears wrong?”*


### **1.6 Expected Outcome**

By the end of this notebook, we expect to show that:

* High predictive performance does not imply low autonomous risk;
* Structural properties of models matter independently of accuracy;
* Standard evaluation protocols leave critical blind spots.

These results will serve as a baseline reference for all subsequent analyses.


## **Section 2 - Dataset and Label Construction**

### **2.1 Dataset Overview**

This notebook uses a **synthetic financial dataset** designed to emulate realistic credit decision environments while allowing full experimental control and reproducibility.

The dataset, generated and validated in **Notebook 01**, contains **10,000 individual records**, each representing a hypothetical customer profile with demographic, financial, behavioral, and transactional attributes.

The use of a synthetic dataset serves three methodological purposes:

1. **Ethical control:** no real individuals are affected;
2. **Structural transparency:** all data-generating mechanisms are known;
3. **Experimental flexibility:** labels and regimes can be adjusted without bias leakage.

Importantly, the dataset is *not* intended to perfectly replicate any specific real-world population, but to capture **structural regularities** commonly present in credit systems.


### **2.2 Feature Categories**

The dataset contains multiple classes of variables, grouped conceptually as follows:

#### **(a) Financial Capacity and Exposure**

Examples include:

* income estimates;
* total credit limits;
* debt-to-income ratios;
* utilization metrics.

These variables approximate the **economic capacity** of an individual.

#### **(b) Behavioral and Transactional Signals**

Examples include:

* transaction frequency;
* average transaction value;
* unusual transaction indicators;
* international purchase flags.

These variables capture **behavioral patterns** relevant to credit and fraud risk.

#### **(c) Structural and Derived Features**

In addition to raw variables, the dataset includes engineered features that represent **structural properties** of system interaction, such as:

* interaction terms (e.g., nonlinear combinations);
* normalized risk signals;
* autonomy- and opacity-related constructs introduced in later notebooks.

The separation between raw and derived features is crucial to avoid **label leakage** and to maintain interpretability.


### **2.3 Target Variable: `label_default`**

The target variable for this notebook is **credit default risk**, operationalized as a binary label:

`label_default` 
$\in$ \{0, 1}


where:

* `0` indicates non-default;
* `1` indicates default.

Rather than using an externally provided ground-truth label, `label_default` is **constructed from an underlying continuous risk score**, ensuring both realism and control.

### **Note:**

$
y = \texttt{label\_defaulty}
$

> `label_default` represents a synthetic proxy for credit default risk, not an observed ground-truth event.



### **2.4 Label Construction Procedure**

The label is derived from the variable `prob_inadimplencia` (default probability proxy), generated during the synthetic data process.

The construction follows a **quantile-based thresholding approach**:

* The top **10%** of observations by `prob_inadimplencia` are labeled as defaults;
* The remaining **90%** are labeled as non-defaults.

Formally:


$
\texttt{label\_default}_i =
\begin{cases}
1 & \text{if } p_i > Q_{0.90}(p) \\
0 & \text{otherwise}
\end{cases}
$

Or rather:

$
$$
\text{label\_default}_i =
\begin{cases}
1, & \text{if } \text{prob\_inadimplência}_i > Q_{0.90} \\
0, & \text{otherwise}
\end{cases}
$$
$


where $p_i$ is the default probability proxy for observation $i$.

This approach ensures:

* class imbalance consistent with real credit datasets;
* sufficient positive examples for robust modeling;
* and stable class distributions under stratified sampling.

### **Note:**

* quantile-based labeling;
* synthetic stress-test design;
* not a claim of real default behavior.


### **2.5 Why a Quantile-Based Label**

The use of a quantile threshold is intentional and methodologically justified:

* It avoids arbitrary absolute cutoffs;
* It adapts naturally to distributional changes;
* It preserves ordinal risk structure;
* It facilitates controlled experiments across notebooks.

Most importantly, it ensures that **the label encodes risk, not behavior induced by the model itself**, preventing circularity.


### **2.6 Class Balance and Validation Checks**

Before any modeling, the dataset is validated to ensure:

* both classes are present;
* the class ratio is within expected bounds;
* no degenerate label distributions occur after splitting.

These checks are essential to prevent **silent modeling failures** that could invalidate downstream risk analysis.


### **2.7 Scope Limitation**

At this stage:

* the label captures **credit default risk only;**
* no fraud or anomaly labels are introduced;
* no feedback effects are applied.

This controlled setup allows us to establish a **clean supervised baseline** before introducing autonomy-driven complexity in later notebooks.


### **2.8 Transition to Modeling**

With the dataset and label defined, the next step is to:

* select theory-consistent feature sets;
* construct training and test splits;
* and train baseline supervised models.

These steps are addressed in the next section.


## **Section 3 - Feature Selection and Train/Test Split**

### **3.1 Objective of Feature Selection**

The purpose of this section is to define the set of explanatory variables used for supervised credit risk modeling, while ensuring:

* Absence of prohibited or sensitive attributes (e.g., gender, race);
* Consistency with the theoretical framework introduced in Notebook 00;
* Reproducibility and interpretability of the modeling pipeline.

Feature selection is not treated as a purely statistical optimization step, but as a **governance-aware design decision**.

It's worth noting here:

> All functionalities used in this notebook are derived from behavior or the model. It does not include protected attributes or demographic proxies.


### **3.2 Theoretical Feature Set (A–O–S–H Framework)**

In alignment with the Autonomous Risk framework, we restrict the model inputs to **theoretical features** derived from system structure rather than raw socio-demographic attributes.

The selected features are:

$
X = {A, A^2, O, H, \log(1 + S), A \times O, A^2 \times \log(1 + S)}
$

where:

* $A$ denotes **Autonomy;**
* $O$ denotes **Opacity;**
* $H$ denotes **Human Oversight;**
* $S$ denotes **Behavioral Instability.**

These features capture **non-linear interactions** and **second-order effects** that are central to autonomous risk emergence.


### **3.3 Feature Matrix and Target Vector**

Formally, we define:

$
X \in \mathbb{R}^{n \times p}, \quad y \in {0,1}^n
$

where:

* $n$ is the number of observations;
* $p = 7$ is the number of selected features;
* $y$ corresponds to the target variable `label_default` defined in Section 2.


### **Scope and Relationship to the Empirical Notebook**


> This notebook documents the canonical implementation of the credit risk modeling pipeline used throughout the study, with an emphasis on interpretable baseline models and their role in the construction of autonomy, opacity, supervision, and instability proxies. While the accompanying empirical notebook extends these analyses to higher-capacity models and richer interaction regimes for stress-testing autonomous risk under increased nonlinearity, the methodological foundations, feature definitions, evaluation metrics, and diagnostic signals remain identical. This separation reflects a deliberate design choice: the present notebook prioritizes conceptual clarity and traceability, while the empirical notebook explores robustness and regime behavior under expanded modeling assumptions.


### **3.4 Implementation**

```python

# Section 3 - Feature Selection and Train/Test Split

print(">>> SECTION 3 - Feature Selection Initiated\n")

# 3.1 Theoretical Features (A-O-S-H)

features_theory = [
    "A",
    "A_sq",
    "O",
    "H",
    "log1pS",
    "A_times_O",
    "A2_logS"
]

# Check feature availability
for f in features_theory:
    if f not in df.columns:
        raise KeyError(f"Missing required feature: {f}")

print("Selected theoretical features:")
print(features_theory, "\n")

# 3.2 Feature matrix X and target vector y

X = df[features_theory].copy()
y = df["label_default"].copy()

print("Shape of X:", X.shape)
print("Shape of y:", y.shape, "\n")

# 3.3 Stratified Train/Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.30,
    random_state=42,
    stratify=y
)

# 3.4 Class Distribution Check

print("Class distribution in TRAIN set:")
print(y_train.value_counts(normalize=True), "\n")

print("Class distribution in TEST set:")
print(y_test.value_counts(normalize=True), "\n")

assert y_train.nunique() == 2, "ERROR: Training set contains only one class!"
assert y_test.nunique() == 2, "ERROR: Test set contains only one class!"

print(">>> Section 3 completed successfully. Dataset ready for modeling.\n")


```

### **3.5 Governance and Scientific Rationale**

Key design decisions in this section include:

* **Exclusion of socio-demographic variables**, preventing direct or proxy discrimination;
* **Use of interaction terms**, enabling detection of nonlinear risk amplification;
* **Stratified splitting**, preserving class balance and statistical validity.

This ensures that any observed risk patterns emerge from **systemic behavior**, not from sensitive personal attributes.


## **Section 4 - Supervised Models for Credit Risk**

### **4.1 Objective**

This section implements baseline **supervised learning models** to estimate credit default risk using the theoretical feature set defined previously.

The goals are to:

* Establish predictive baselines under controlled conditions;
* Compare linear and non-linear decision mechanisms;
* Evaluate whether autonomous risk signals emerge **despite acceptable predictive performance.**

Importantly, these models are **not optimized for maximum accuracy**, but for **structural interpretability and governance analysis**.



### **4.2 Models Considered**

Two standard and widely adopted classifiers are used:

#### **Logistic Regression (LR)**

* Linear decision boundary;
* High interpretability;
* Serves as a transparent baseline.

#### **Random Forest (RF)**

* Non-linear ensemble model;
* Captures interaction effects implicitly;
* Serves as a proxy for higher opacity systems.

This contrast mirrors a core theme of the project: **performance vs. opacity trade-offs**.


### **4.3 Model Training**

```python

# Section 4 - Supervised Models for Credit Risk

print(">>> SECTION 4 - Model Training Initiated\n")

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# 4.3.1 Logistic Regression

log_model = LogisticRegression(
    max_iter=3000,
    solver="lbfgs",
    random_state=42
)

log_model.fit(X_train, y_train)

print("Logistic Regression trained successfully.")

# 4.3.2 Random Forest

rf_model = RandomForestClassifier(
    n_estimators=400,
    max_depth=None,
    min_samples_split=5,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train, y_train)

print("Random Forest trained successfully.\n")

```


### **4.4 Prediction and Probability Estimates**

```python

# 4.4 Probability Predictions

y_proba_log = log_model.predict_proba(X_test)[:, 1]
y_proba_rf  = rf_model.predict_proba(X_test)[:, 1]

print("Prediction probabilities generated.")


```

### **4.5 Evaluation Metrics**

We evaluate model performance using:

* **ROC-AUC:** discrimination ability;
* **PR-AUC:** robustness under class imbalance.

```python

# 4.5 Model Evaluation

from sklearn.metrics import roc_auc_score, average_precision_score

roc_log = roc_auc_score(y_test, y_proba_log)
roc_rf  = roc_auc_score(y_test, y_proba_rf)

pr_log = average_precision_score(y_test, y_proba_log)
pr_rf  = average_precision_score(y_test, y_proba_rf)

print("ROC-AUC:")
print(f"Logistic Regression: {roc_log:.4f}")
print(f"Random Forest:      {roc_rf:.4f}\n")

print("PR-AUC:")
print(f"Logistic Regression: {pr_log:.4f}")
print(f"Random Forest:      {pr_rf:.4f}")

```

### **4.6 Interpretation**

Several observations are expected at this stage:

1. **Random Forest typically outperforms Logistic Regression** in ROC and PR metrics due to its ability to model non-linearities;
2. Improved performance comes at the cost of **reduced interpretability and increased opacity;**
3. Even when both models achieve acceptable predictive scores, this does **not guarantee governance safety.**

This reinforces the central claim of the project:

> *Predictive success is neither necessary nor sufficient to ensure low autonomous risk.*


### **4.7 Connection to Autonomous Risk Theory**

This section provides the **empirical baseline** upon which autonomous risk indicators will later be layered:

* Model confidence → Autonomy (A);
* Model complexity → Opacity (O);
* Human review mechanisms → Oversight (H).

Subsequent notebooks will demonstrate how **risk can increase even as predictive metrics improve**.


## **Section 5 - Model Interpretability and Opacity Analysis**

### **5.1 Objective**

This section evaluates **how interpretable the trained models are**, and how interpretability (or lack thereof) relates to **opacity (O),** a core component of autonomous risk.

Rather than treating interpretability as a binary property, we adopt a **graded, operational view**, where opacity increases as:

* decision logic becomes less transparent;
* feature interactions become harder to disentangle;
* explanations become unstable or model-dependent.



### **5.2 Why Interpretability Matters for Autonomous Risk**

A model can be:

* highly accurate;
* statistically stable;
* and still **unsafe from a governance perspective**.

Opacity amplifies autonomous risk because:

* it weakens human oversight;
* it delays detection of failure modes;
* it obscures feedback-loop effects.

Thus, opacity is treated as a **risk multiplier**, not merely an inconvenience.



### **5.3 Global Feature Importance (Baseline Opacity Signal)**

We begin with **global feature importance**, which answers:

> *Which features matter most on average?*


#### **5.3.1 Logistic Regression - Coefficient Magnitudes**

```python

# 5.3 - Global Interpretability

import pandas as pd
import numpy as np

# Logistic Regression coefficients
coef_df = pd.DataFrame({
    "feature": X_train.columns,
    "coefficient": log_model.coef_[0]
})

coef_df["abs_coef"] = coef_df["coefficient"].abs()
coef_df = coef_df.sort_values("abs_coef", ascending=False)

coef_df
```

**Interpretation**:

* Coefficients are **directly interpretable;**
* Sign and magnitude have clear semantic meaning;
* This model exhibits **low opacity**.



#### **5.3.2 Random Forest - Gini Importance**

```python

# Random Forest feature importance
rf_importance = pd.DataFrame({
    "feature": X_train.columns,
    "importance": rf_model.feature_importances_
}).sort_values("importance", ascending=False)

rf_importance

```

**Interpretation**:

* Importance is aggregated across trees;
* No directionality (positive/negative);
* Interactions are implicit;
* Opacity is **moderate to high**.



### **5.4 Local Interpretability via SHAP**

To evaluate **local explanations**, we use **SHAP (SHapley Additive exPlanations)**.

SHAP allows us to:

* decompose individual predictions;
* compare explanation stability across models;
* quantify opacity empirically.


#### **5.4.1 SHAP for Logistic Regression**

```python

import shap

# Use a subset for efficiency
X_shap = X_test.sample(500, random_state=42)

explainer_log = shap.Explainer(log_model, X_train)
shap_values_log = explainer_log(X_shap)

shap.summary_plot(
    shap_values_log,
    X_shap,
    show=False
)

```

**Observation**:

* Explanations align closely with coefficients;
* Feature effects are stable;
* Low variance across samples → **low opacity**.



#### **5.4.2 SHAP for Random Forest**

```python

explainer_rf = shap.Explainer(rf_model, X_train)
shap_values_rf = explainer_rf(X_shap)

shap.summary_plot(
    shap_values_rf,
    X_shap,
    show=False
)

```

**Observation**:

* Nonlinear patterns emerge;
* Feature interactions dominate;
* Contribution signs may flip across contexts;
* Opacity is **significantly higher**.



### **5.5 Opacity Proxy Construction**

To operationalize opacity, we define a **model-level opacity proxy**:

$$O \propto \text{Variance of local explanations across samples}$$


#### **5.5.1 Empirical Opacity Score (SHAP Variance)**

```python

# 5.5 - Opacity Proxy

# Extract SHAP values for positive class (Random Forest)
shap_vals_rf = shap_values_rf.values

# Mean variance across features
opacity_proxy_rf = np.mean(np.var(shap_vals_rf, axis=0))

opacity_proxy_rf

```

This scalar captures:

* instability of explanations;
* sensitivity to local context;
* difficulty of governance.

Higher values ⇒ higher opacity.



### **5.6 Interpretation and Theoretical Link**

This section empirically confirms a central theoretical claim:

> **Opacity is not binary. It increases smoothly with model complexity and interaction depth.**

Key insights:

* Logistic Regression → low opacity, high auditability;
* Random Forest → higher opacity, even when performance improves;
* SHAP variance provides a **measurable governance-relevant signal.**

Opacity thus becomes a **first-class variable** in the autonomous risk function.



### **5.7 Transition to Autonomous Risk Modeling**

At this point, we have:

| Component     | Operationalized              |
| ------------- | ---------------------------- |
| Autonomy (A)  | via confidence & persistence |
| Opacity (O)   | via SHAP variance            |
| Oversight (H) | via external constraints     |
| Performance   | via ROC / PR                 |

<br>

The next sections will show how **risk escalates when these components interact**, even without catastrophic prediction errors.

### **Status Summary**

* Interpretability analyzed;  
* Opacity quantified;  
* Model confidence measured;  
* Theoretical linkage established.  

## **Section 6 - Model Evaluation, Calibration, and Risk Signals**

### **6.1 Evaluation Metrics**

To ensure robustness beyond accuracy, we evaluate supervised credit risk models using complementary metrics:

* **ROC–AUC:** Discriminative capacity across thresholds;
* **PR–AUC:** Performance under class imbalance;
* **Brier Score:** Calibration-sensitive probabilistic error.

These metrics jointly assess predictive quality and probabilistic reliability.


```python
from sklearn.metrics import roc_auc_score, average_precision_score, brier_score_loss

# Predicted probabilities
y_proba_log = log_mod.predict_proba(X_test)[:, 1]
y_proba_rf  = rf_mod.predict_proba(X_test)[:, 1]

print("Logistic ROC-AUC:", roc_auc_score(y_test, y_proba_log))
print("RandomForest ROC-AUC:", roc_auc_score(y_test, y_proba_rf))

print("Logistic PR-AUC:", average_precision_score(y_test, y_proba_log))
print("RandomForest PR-AUC:", average_precision_score(y_test, y_proba_rf))

print("Logistic Brier:", brier_score_loss(y_test, y_proba_log))
print("RandomForest Brier:", brier_score_loss(y_test, y_proba_rf))

```

### **6.2 Calibration Analysis**

Well-calibrated models are critical in high-stakes decision-making. Even accurate models may induce systemic risk if their confidence estimates are misaligned.

```python
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt

prob_true_log, prob_pred_log = calibration_curve(y_test, y_proba_log, n_bins=10)
prob_true_rf, prob_pred_rf = calibration_curve(y_test, y_proba_rf, n_bins=10)

plt.figure(figsize=(7,6))
plt.plot(prob_pred_log, prob_true_log, marker='o', label='Logistic')
plt.plot(prob_pred_rf, prob_true_rf, marker='s', label='Random Forest')
plt.plot([0,1],[0,1],'--', color='gray')
plt.xlabel('Predicted Probability')
plt.ylabel('Observed Frequency')
plt.legend()
plt.title('Calibration Curves')
plt.show()

```

Calibration gaps are interpreted as **latent autonomy signals,** where model confidence may exceed epistemic reliability.


### **6.3 Instability and Drift Signals**

Beyond point estimates, we evaluate behavioral stability across perturbations using:

* Variance of predicted probabilities;
* Sensitivity to feature noise;
* Divergence between models.

```python

import numpy as np

instability_log = np.var(y_proba_log)
instability_rf  = np.var(y_proba_rf)

print("Prediction variance - Logistic:", instability_log)
print("Prediction variance - RF:", instability_rf)

```

Higher variance under similar inputs is treated as an empirical proxy for **autonomous behavior amplification.**


### **6.4 Connection to Autonomous Risk Theory**

This section operationalizes key components of the Autonomous Risk framework:

* **Autonomy (A):** Captured through confidence concentration and decision persistence;
* **Opacity (O):** Reflected by model complexity and interpretability gaps;
* **Human Oversight (H):** Assumed fixed but limited in deployment.

Empirically, we observe that models with higher capacity exhibit:

* Stronger confidence polarization;
* Increased calibration drift;
* Greater instability under perturbation.

These signals precede classical performance degradation and therefore serve as **early warnings of autonomous risk.**



### **6.5 Section Summary (Internal)**

* Predictive performance evaluated beyond accuracy;
* Calibration and probabilistic reliability assessed;
* Instability quantified as a precursor to risk;
* Empirical results aligned with autonomous risk theory.

*This section completes the supervised risk evaluation pipeline and prepares the ground for feedback, opacity, and governance analysis in subsequent notebooks.*
