# Adaptive Bayesian Network Survival Modeling Demo

**Goal:** Demonstrate an end-to-end causal survival modeling pipeline in a high‑p / low‑n genomics setting.

This notebook is designed to explain *what*, *why*, and *what the model learns*.


## 1. Setup
We import the core modules from the project. The pipeline is fully CPU‑based and runs in minutes.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from prad_bn.simulate import SimConfig, simulate_prad_like_dataset, inject_survival_signal
from prad_bn.discretize import survival_to_km_groups, maybe_supervised_bncuts
from prad_bn.iterative import IterConfig, run_iterative_bn


## 2. Simulate PRAD‑like Data

- ~100 patients
- Hundreds of genes (right‑skewed expression)
- Block‑correlated modules
- Censored survival outcomes


In [None]:
cfg = SimConfig(
    n_samples=120,
    n_genes=300,
    n_blocks=8,
    block_corr=0.6,
    censoring_rate=0.35,
    signal_genes=10,
    signal_strength=3.0,
)

sim = simulate_prad_like_dataset(cfg, seed=7)
expr, time_days, event = sim['expr'], sim['time_days'], sim['event']

## 3. Survival Outcome Engineering

We discretize survival time into Kaplan–Meier risk groups. This aligns with clinical reasoning and makes Bayesian modeling tractable.

In [None]:
km_groups = survival_to_km_groups(time_days, q=5)

plt.hist(km_groups)
plt.title('Discrete Survival Risk Groups')
plt.show()

## 4. Inject Survival‑Associated Signal

We inject monotonic expression shifts into a small gene subset to mimic real prognostic pathways.

In [None]:
rng = np.random.default_rng(7)
expr_signal, signal_genes = inject_survival_signal(expr, km_groups, cfg, rng)

print('True signal genes:', sorted(signal_genes))

## 5. Discretize Gene Expression

Expression is discretized into a small number of bins to keep conditional probability tables learnable.

In [None]:
X_disc = maybe_supervised_bncuts(expr_signal, km_groups, n_bins=3)
X_disc.shape

## 6. Iterative Bayesian Network Learning

Each iteration:
1. Samples a small gene subset
2. Learns a Naive‑Bayes BN (Outcome → genes)
3. Performs probabilistic inference
4. Updates feature sampling probabilities based on AUC feedback

In [None]:
iter_cfg = IterConfig(n_iters=25, subset_size=20)
result = run_iterative_bn(X_disc, km_groups, iter_cfg, seed=7)

print('Best AUC:', result.best_auc)

## 7. Learning Dynamics

The AUC should improve over iterations as the model focuses on informative gene subsets.

In [None]:
plt.plot(result.aucs, marker='o')
plt.xlabel('Iteration')
plt.ylabel('AUC (worst‑risk vs rest)')
plt.title('Adaptive Learning Curve')
plt.show()

## 8. What the Model Learns

- Stable gene subsets repeatedly connected to survival
- Interpretable causal structure
- Personalized risk probabilities with uncertainty


## 9. Key Takeaways

- Bayesian networks excel in low‑data, high‑dimensional biology
- Discretized survival aligns modeling with clinical intuition
- Adaptive sampling balances exploration and exploitation

**This pipeline is designed for real‑world precision oncology constraints.**