# Slope vs No-Slope: Training → Predicting → Testing

This notebook documents the **full pipeline** for comparing:
- **No-slope**: nokappa v3 reparam (proven better than nolr in `nokappa_v3_pipeline.ipynb`)
- **Slope**: 1-phase slope model (`slope_model_nokappa_v3_single_phase`)

Both are evaluated with **40-batch LOO** (leave-one-batch-out): pool 39 batches for parameters, predict on the held-out batch → 400k predictions, then compute AUCs and value figures.

**Sequence:**
1. **Training** — fit models per batch (no-slope already done; slope via `train_slopes_single_phase.py`)
2. **Predicting** — LOO: for each batch, pool other 39, fit delta (and fixed phi for no-slope), save π
3. **Testing** — pool 40 π files → FULL.pt, compute AUCs; optionally run value figures (gamma_slope viz, age-stratified 1yr AUC, individual trajectories)

## Paths and run mode

Set `RUN_SCRIPTS = True` to actually execute the pipeline steps (training/prediction are long-running). Default is `False` so the notebook only shows commands and checks.

In [None]:
import subprocess
import sys
from pathlib import Path

CLAUDE_DIR = Path('/Users/sarahurbut/aladynoulli2/claudefile')
RUN_SCRIPTS = False  # set True to execute pipeline steps

def run_cmd(cmd, description):
    print(f"\n{'='*60}")
    print(description)
    print(f"  {cmd}")
    print('='*60)
    if RUN_SCRIPTS:
        subprocess.run(cmd, shell=True, cwd=str(CLAUDE_DIR))
    else:
        print("(skipped — set RUN_SCRIPTS=True to run)")

---
## Part A: No-slope (nokappa v3) pipeline

| Step | Script / Output |
|------|-----------------|
| **1. Training** | Already done. Checkpoints: `censor_e_batchrun_vectorized_REPARAM_v3_nokappa/` (40 batches, 10k each) |
| **2. LOO predict** | `run_loo_prediction_all40.py` → `enrollment_predictions_nokappa_v3_loo_all40/pi_enroll_fixedphi_sex_{start}_{stop}.pt` (40 files) |
| **3. Pool + test** | Manual pool to FULL.pt (or script); then `nokappa_v3_auc_evaluation.ipynb` loads FULL.pt, computes AUCs → `results_feb18/` |

### A1. No-slope training

Training was done elsewhere (batchrun). Checkpoints live in Dropbox. No cell to run here.

In [None]:
# No-slope checkpoints (reference only)
NOSLOPE_CKPT = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox/censor_e_batchrun_vectorized_REPARAM_v3_nokappa/')
noslope_files = list(NOSLOPE_CKPT.glob('enrollment_model_REPARAM_NOKAPPA_*.pt')) if NOSLOPE_CKPT.exists() else []
print(f'No-slope checkpoints: {len(noslope_files)} files')
if noslope_files:
    print(f'  Example: {noslope_files[0].name}')

### A2. No-slope LOO prediction

For each batch *i*: pool φ, ψ, γ, κ from the other 39 checkpoints; predict on batch *i* with enrollment E; save π.

In [None]:
run_cmd(
    'python run_loo_prediction_all40.py',
    'No-slope LOO prediction (40 folds, can run with nohup for long run)'
)

### A3. No-slope testing (AUCs)

After LOO prediction, concatenate 40 π files into `pi_enroll_fixedphi_sex_FULL.pt`, then run **nokappa_v3_auc_evaluation.ipynb** to get static/dynamic 10yr, static 1yr, rolling 1yr AUCs in `results_feb18/`.

In [None]:
NOSLOPE_PI_DIR = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_nokappa_v3_loo_all40/')
full_pt = NOSLOPE_PI_DIR / 'pi_enroll_fixedphi_sex_FULL.pt'
print(f'No-slope LOO FULL.pt exists: {full_pt.exists()}')
if full_pt.exists():
    import torch
    pi = torch.load(full_pt, weights_only=False)
    print(f'  Shape: {pi.shape}')

---
## Part B: Slope (1-phase) pipeline

| Step | Script / Output |
|------|-----------------|
| **1. Training** | `train_slopes_single_phase.py` → `slope_model_nokappa_v3_single_phase/slope_model_batch_{start}_{stop}.pt` (40 batches) |
| **2. LOO predict** | `run_loo_slope_1phase_all40.py` → `enrollment_predictions_slope_1phase_loo_all40/pi_enroll_fixedphi_sex_{start}_{stop}.pt` (40 files) |
| **3. Pool + test** | `pool_and_evaluate_slope_1phase_loo.py` → FULL.pt + AUCs in `results_slope_1phase_loo/` |
| **4. Value figures** | `slope_value_analyses.py --loo` → `results_holdout_auc/*_loo.pdf` (gamma_slope viz, age-stratified 1yr AUC, individual trajectories) |

### B1. Slope training (single-phase, all 40 batches)

In [None]:
run_cmd(
    'python train_slopes_single_phase.py --start_batch 0 --end_batch 40',
    'Slope 1-phase training on batches 0–39 (output: slope_model_nokappa_v3_single_phase/)'
)

In [None]:
# Check slope checkpoints
SLOPE_CKPT = Path('/Users/sarahurbut/Library/CloudStorage/Dropbox/slope_model_nokappa_v3_single_phase/')
slope_files = sorted(SLOPE_CKPT.glob('slope_model_batch_*.pt')) if SLOPE_CKPT.exists() else []
print(f'Slope single-phase checkpoints: {len(slope_files)} files')
if slope_files:
    print(f'  Example: {slope_files[0].name}')

### B2. Slope LOO prediction

For each batch *i*: pool γ_level, γ_slope, ψ, ε from the other 39 slope checkpoints; fit δ only on batch *i*; save π (same layout as no-slope LOO).

In [None]:
run_cmd(
    'python run_loo_slope_1phase_all40.py',
    'Slope LOO prediction (40 folds; use nohup for long run)'
)

### B3. Pool slope LOO π and compute AUCs

In [None]:
run_cmd(
    'python pool_and_evaluate_slope_1phase_loo.py',
    'Pool 40 slope π → FULL.pt and compute static/dynamic 10yr, static 1yr, rolling 1yr AUCs → results_slope_1phase_loo/'
)

In [None]:
# Optional: pool only (no AUC), or eval only if FULL.pt already exists
# run_cmd('python pool_and_evaluate_slope_1phase_loo.py --pool-only', 'Pool only')
# run_cmd('python pool_and_evaluate_slope_1phase_loo.py --eval-only --n_bootstraps 100', 'Eval only')

### B4. Value figures (slope vs no-slope on LOO 400k)

Produces the "cool examples" using **new LOO** data:
- `gamma_slope_visualization_loo.pdf`
- `age_stratified_1yr_auc_loo.pdf` (+ CSV)
- `individual_trajectories_loo.pdf`

In [None]:
run_cmd(
    'python slope_value_analyses.py --loo',
    'Slope value analyses (LOO): gamma_slope viz, age-stratified 1yr AUC, individual trajectories'
)

---
## Summary: one-shot order

If starting from scratch (no-slope training already done):

1. **No-slope LOO**: `python run_loo_prediction_all40.py` → then pool to FULL.pt and run `nokappa_v3_auc_evaluation.ipynb`.
2. **Slope training**: `python train_slopes_single_phase.py --start_batch 0 --end_batch 40`.
3. **Slope LOO**: `python run_loo_slope_1phase_all40.py`.
4. **Slope pool + AUC**: `python pool_and_evaluate_slope_1phase_loo.py`.
5. **Value figures**: `python slope_value_analyses.py --loo`.

Outputs:
- No-slope AUCs: `results_feb18/`
- Slope AUCs: `results_slope_1phase_loo/`
- Slope vs no-slope figures: `results_holdout_auc/*_loo.pdf`

In [None]:
# Quick check: key output dirs
for name, p in [
    ('results_feb18', CLAUDE_DIR / 'results_feb18'),
    ('results_slope_1phase_loo', CLAUDE_DIR / 'results_slope_1phase_loo'),
    ('results_holdout_auc', CLAUDE_DIR / 'results_holdout_auc'),
]:
    exists = p.exists()
    files = list(p.glob('*')) if exists else []
    print(f'{name}: {"exists" if exists else "missing"} ({len(files)} items)')