# Aladynoulli Performance Evaluation Notebook

This notebook orchestrates all **performance evaluation** analyses for Aladynoulli (AUC, calibration, comparisons with external scores).

**Note**: Model validity and learning analyses (prediction drops, washout validation, signature dynamics) have been moved to `reviewer_responses/notebooks/R2_R3_Model_Validity_Learning.ipynb`.

## Execution Order

1. **Setup** (Run Once): Assemble PI tensors
2. **Generate Predictions** (Run Once Each): Time horizons, washout, age offsets
3. **Load Results** (Safe to Run Multiple Times): Load CSV files
4. **Comparisons** (Run Once Each): External scores, Delphi, Cox baseline
5. **Visualizations** (Safe to Run Multiple Times): Create plots
6. **Summary** (Read-Only): Key findings

## Main Approach: Pooled Retrospective

All comparisons use the `pooled_retrospective` approach by default, which:
- Uses phi trained externally and validated with LOO tests
- Represents clinically implementable performance
- Uses pi from: `enrollment_predictions_fixedphi_RETROSPECTIVE_pooled/pi_enroll_fixedphi_sex_FULL.pt`

## Note on Re-running

- Jupyter will skip cells that have already been executed
- Generation cells should be marked as "not evaluated" after first run
- Loading and visualization cells can be run multiple times safely


---

## SECTION 1: SETUP & DATA PREPARATION

**Note**: For `pooled_retrospective`, we use `pi_full_400k.pt` from Downloads directly.
Assembly step is NOT needed - the downloaded file is the trusted source.


In [1]:
# ============================================================================
# NOTE: ASSEMBLY NOT NEEDED FOR POOLED_RETROSPECTIVE
# ============================================================================
"""
SKIP THIS CELL - Assembly is NOT needed for pooled_retrospective approach.

For pooled_retrospective, we use pi_full_400k.pt from Downloads directly.
This is the trusted, cached version from Dropbox - do NOT reassemble.

The assembly step would create:
  - enrollment_predictions_fixedphi_RETROSPECTIVE_pooled/pi_enroll_fixedphi_sex_FULL.pt
  
But the scripts use:
  - /Users/sarahurbut/Downloads/pi_full_400k.pt (directly)

So assembly is unnecessary and should NOT be run.
"""

print("="*80)
print("ASSEMBLY STEP SKIPPED")
print("="*80)
print("\nFor pooled_retrospective, using pi_full_400k.pt from Downloads directly.")
print("This is the trusted, cached version - no assembly needed.")
print("\nProceeding to generation steps...")
print("="*80)


ASSEMBLY STEP SKIPPED

For pooled_retrospective, using pi_full_400k.pt from Downloads directly.
This is the trusted, cached version - no assembly needed.

Proceeding to generation steps...


---

## SECTION 2: GENERATE PREDICTIONS

**Run Once Each**: Generate predictions for different analyses

**Note**: These cells will use existing results if available. They won't regenerate unless needed.


In [2]:
# ============================================================================
# STEP 1: GENERATE TIME HORIZON PREDICTIONS
# ============================================================================
"""
Generates predictions for multiple time horizons: 5yr, 10yr, 30yr, static 10yr
Results saved to: results/time_horizons/{approach}/
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/pythonscripts')

print("="*80)
print("GENERATING TIME HORIZON PREDICTIONS")
print("="*80)

# Generate for pooled_retrospective (main approach)
print("\n1. Generating pooled_retrospective time horizon predictions...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_time_horizon_predictions.py'),
    '--approach', 'pooled_retrospective'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")

print("\n✓ Time horizon predictions complete")


GENERATING TIME HORIZON PREDICTIONS

1. Generating pooled_retrospective time horizon predictions...
GENERATING TIME HORIZON PREDICTIONS: POOLED_RETROSPECTIVE
Pi tensor: /Users/sarahurbut/Downloads/pi_full_400k.pt
Horizons: ['5', '10', '30', 'static10']
Output directory: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/time_horizons/pooled_retrospective

Loading data...
Loaded pi tensor: torch.Size([400000, 348, 52])
Loaded Y tensor: torch.Size([407878, 348, 52])
Loaded E tensor: torch.Size([407878, 348])
Loaded pce_df: 400000 patients

Subsetting to first 400000 patients...
After subsetting: pi: 400000, Y: 400000, E: 400000, pce_df: 400000

RESULTS ALREADY EXIST - SKIPPING REGENERATION
Found existing results in: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/time_horizons/pooled_retrospective
  ✓ 5yr_results.csv
  ✓ 10yr_results.csv
  ✓ 30yr_results.csv
  ✓ static_10yr_results.csv
  ✓ comparison_all_horizons.csv

To regene

In [6]:
# ============================================================================
# GENERATE 30-YEAR PREDICTIONS WITH VARIABLE FOLLOW-UP
# ============================================================================
"""
Generates 30-year predictions using variable follow-up (uses max available for each patient)
Results saved to: results/time_horizons/pooled_retrospective/30yr_variable_followup_results.csv
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/pythonscripts')

print("="*80)
print("GENERATING 30-YEAR PREDICTIONS WITH VARIABLE FOLLOW-UP")
print("="*80)

result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_time_horizon_predictions.py'),
    '--approach', 'pooled_retrospective',
    '--horizons', '30',
    '--use_variable_followup',
    '--n_bootstraps', '10'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")

print("\n✓ 30-year variable follow-up predictions complete")

GENERATING 30-YEAR PREDICTIONS WITH VARIABLE FOLLOW-UP
GENERATING TIME HORIZON PREDICTIONS: POOLED_RETROSPECTIVE
Pi tensor: /Users/sarahurbut/Downloads/pi_full_400k.pt
Horizons: ['30']
Output directory: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/time_horizons/pooled_retrospective

Loading data...
Loaded pi tensor: torch.Size([400000, 348, 52])
Loaded Y tensor: torch.Size([407878, 348, 52])
Loaded E tensor: torch.Size([407878, 348])
Loaded pce_df: 400000 patients

Subsetting to first 400000 patients...
After subsetting: pi: 400000, Y: 400000, E: 400000, pce_df: 400000

PROCESSING HORIZON: 30
Evaluating dynamic 30-year predictions...
  Using variable follow-up (max available follow-up per patient)...

Evaluating ASCVD (Dynamic 30-Year Risk, Variable Follow-up)...
AUC: 0.532 (0.531-0.533) (calculated on 400000 individuals)
Events (30-Year in Eval Cohort): 48207 (12.1%) (from 400000 individuals)
Follow-up length: mean=22.5yr, median=22.0yr, range=[8.0, 

In [3]:
# ============================================================================
# STEP 2: GENERATE WASHOUT PREDICTIONS
# ============================================================================
"""
Generates 1-year predictions with different washout periods: 0yr, 1yr, 2yr
Results saved to: results/washout/{approach}/
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks')

print("="*80)
print("GENERATING WASHOUT PREDICTIONS")
print("="*80)

# Generate for pooled_retrospective (main approach)
print("\n1. Generating pooled_retrospective washout predictions...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_washout_predictions.py'),
    '--approach', 'pooled_retrospective'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")

print("\n✓ Washout predictions complete")


GENERATING WASHOUT PREDICTIONS

1. Generating pooled_retrospective washout predictions...


python(75674) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


GENERATING WASHOUT PREDICTIONS: POOLED_RETROSPECTIVE
Pi tensor: /Users/sarahurbut/Downloads/pi_full_400k.pt
Washout periods: 0yr, 1yr, 2yr
Output directory: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/washout/pooled_retrospective

Loading data...
Loaded pi tensor: torch.Size([400000, 348, 52])
Loaded Y tensor: torch.Size([407878, 348, 52])
Loaded E tensor: torch.Size([407878, 348])
Loaded pce_df: 400000 patients

Subsetting to first 400000 patients...
After subsetting: pi: 400000, Y: 400000, E: 400000, pce_df: 400000

RESULTS ALREADY EXIST - SKIPPING REGENERATION
Found existing results in: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/washout/pooled_retrospective
  ✓ washout_0yr_results.csv
  ✓ washout_1yr_results.csv
  ✓ washout_2yr_results.csv
  ✓ washout_comparison_all_offsets.csv

To regenerate, delete the existing result files first.


✓ Washout predictions complete


In [4]:
%run compare_age_offset_washout

COMPARING AGE OFFSET vs WASHOUT (0-year predictions)
COMPARING PI TENSORS
Loading age_offset pi: /Users/sarahurbut/Library/CloudStorage/Dropbox/age_offset_files/pi_enroll_fixedphi_age_offset_0_sex_0_10000_try2_withpcs_newrun.pt
Loading washout pi: /Users/sarahurbut/Downloads/pi_full_400k.pt

Age offset pi shape: torch.Size([10000, 348, 52])
Washout pi (batch) shape: torch.Size([10000, 348, 52])
❌ Pi tensors differ!
   Max difference: 3.670473e-04
   Mean difference: 1.848679e-08

COMPARING EVALUATION RESULTS
Loading full data tensors...
Subsetting to batch 0-10000...


python(75704) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
python(75706) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.



Running age_offset evaluation with batch-specific pi (offset 0)...
PCE available: True, PREVENT available: True
Patients with valid PCE scores: 10000/10000 (100.0%)
Patients with valid PREVENT scores: 10000/10000 (100.0%)
Offset 0: Included patients: 10000, 1-year Events: 78
  PCE patients: 10000, PREVENT patients: 10000, 10-year Events: 831

Running age_offset evaluation with full pi tensor subsetted (offset 0)...
PCE available: True, PREVENT available: True
Patients with valid PCE scores: 10000/10000 (100.0%)
Patients with valid PREVENT scores: 10000/10000 (100.0%)
Offset 0: Included patients: 10000, 1-year Events: 78
  PCE patients: 10000, PREVENT patients: 10000, 10-year Events: 831

Running washout evaluation (washout 0yr) - different eval function...
Filtering for 1: Found 4591 individuals in cohort

Summary of Results (1-Year Risk, Sex-Adjusted, Offset=0):
--------------------------------------------------------------------------------
Disease Group        AUC                  

In [5]:
# ============================================================================
# STEP 2B: GENERATE RETROSPECTIVE WASHOUT ALL HORIZONS (RUN ONCE, THEN MARK AS "NOT EVALUATED")
# ============================================================================
"""
IMPORTANT: This cell generates 10-year, 30-year, and static 10-year predictions with washout for pooled_retrospective.
- Generates dynamic 10-year and 30-year predictions with 1-year washout
- Generates static 10-year predictions (1-year score) with 1-year washout
- Processes ALL 400K patients at once using pre-computed pi tensors
- Uses _from_pi versions: evaluate_major_diseases_wsex_with_bootstrap_dynamic_withwashout_from_pi() and 
  evaluate_major_diseases_wsex_with_bootstrap_withwashout_from_pi()
- Only runs for pooled_retrospective approach (main clinically implementable approach)
- Run once, then mark as "not evaluated"
- Script: generate_washout_time_horizons.py
- Results saved to: results/washout_time_horizons/pooled_retrospective/
"""

import subprocess
import sys
from pathlib import Path

# Set script directory
script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks')

print("="*80)
print("GENERATING RETROSPECTIVE WASHOUT ALL HORIZONS")
print("="*80)
print("\nThis will generate:")
print("  - 10-year dynamic predictions with 1-year washout")
print("  - 30-year dynamic predictions with 1-year washout")
print("  - Static 10-year predictions (1-year score) with 1-year washout")
print("Approach: pooled_retrospective only")
print("Processing: ALL 400K patients at once")
print("\nNOTE: Run once, then mark this cell as 'not evaluated'.")
print("="*80)

# Generate washout predictions for all horizons
print("\nGenerating pooled_retrospective washout predictions for all horizons...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_washout_time_horizons.py'),
    '--n_bootstraps', '100',
    '--washout_years', '1'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")
else:
    print("✓ pooled_retrospective washout predictions completed successfully")

print("\n" + "="*80)
print("RETROSPECTIVE WASHOUT PREDICTIONS COMPLETE")
print("="*80)
print("\nResults saved to: results/washout_time_horizons/pooled_retrospective/")
print("  - washout_1yr_10yr_dynamic_results.csv")
print("  - washout_1yr_30yr_dynamic_results.csv")
print("  - washout_1yr_10yr_static_results.csv")
print("  - washout_1yr_comparison_all_horizons.csv")


GENERATING RETROSPECTIVE WASHOUT ALL HORIZONS

This will generate:
  - 10-year dynamic predictions with 1-year washout
  - 30-year dynamic predictions with 1-year washout
  - Static 10-year predictions (1-year score) with 1-year washout
Approach: pooled_retrospective only
Processing: ALL 400K patients at once

NOTE: Run once, then mark this cell as 'not evaluated'.

Generating pooled_retrospective washout predictions for all horizons...


python(75758) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


GENERATING WASHOUT TIME HORIZON PREDICTIONS: POOLED_RETROSPECTIVE
Pi tensor: /Users/sarahurbut/Downloads/pi_full_400k.pt
Washout period: 1 years
Horizons: 10yr, 30yr
Output directory: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/washout_time_horizons/pooled_retrospective

Will process 400,000 patients (standard size for pi_full_400k.pt)

RESULTS ALREADY EXIST - SKIPPING REGENERATION
Found existing results in: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/washout_time_horizons/pooled_retrospective
  ✓ washout_1yr_10yr_dynamic_results.csv
  ✓ washout_1yr_30yr_dynamic_results.csv
  ✓ washout_1yr_10yr_static_results.csv
  ✓ washout_1yr_comparison_all_horizons.csv

To regenerate, delete the existing result files first.

✓ pooled_retrospective washout predictions completed successfully

RETROSPECTIVE WASHOUT PREDICTIONS COMPLETE

Results saved to: results/washout_time_horizons/pooled_retrospective/
  - washout_1yr_10yr_dyna

In [6]:
# ============================================================================
# STEP 3: GENERATE AGE OFFSET PREDICTIONS (OPTIONAL)
# ============================================================================
"""
Generates rolling 1-year predictions using models trained at different time offsets
Evaluates models trained at enrollment + 0, 1, 2, ..., 9 years (offsets 0-9)
Uses pre-computed pi batches from AWS run (downloaded to ~/Dropbox/age_offset_files/)
Currently evaluates on batch 0-10000 (first 10K patients)
Results saved to: results/age_offset/{approach}/
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks')

print("="*80)
print("GENERATING AGE OFFSET PREDICTIONS")
print("="*80)
print("\nThis will generate rolling 1-year predictions using models trained at")
print("enrollment + 0, 1, 2, ..., 9 years (offsets 0-9).")
print("\nUses pre-computed pi batches from AWS run.")
print("Currently evaluates on batch 0-10000 (first 10K patients).")
print("="*80)

# Generate age offset predictions for pooled retrospective (main approach)
print("\n1. Generating pooled_retrospective age offset predictions...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_age_offset_predictions.py'),
    '--approach', 'pooled_retrospective',
    '--max_offset', '9',
    '--start_idx', '0',
    '--end_idx', '10000'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")
else:
    print("✓ Age offset predictions complete")

print("\n" + "="*80)
print("AGE OFFSET PREDICTIONS COMPLETE")
print("="*80)
print("\nResults saved to: results/age_offset/{approach}/")
print("  - age_offset_aucs_batch_0_10000.csv (AUCs by disease and offset)")
print("  - age_offset_aucs_pivot_batch_0_10000.csv (pivot table for easy viewing)")
print("  - age_offset_aucs_summary_batch_0_10000.csv (summary stats: mean, median, std, min, max, count across offsets)")
print("\nROC curves are plotted for ASCVD by default (saved as PDF).")


GENERATING AGE OFFSET PREDICTIONS

This will generate rolling 1-year predictions using models trained at
enrollment + 0, 1, 2, ..., 9 years (offsets 0-9).

Uses pre-computed pi batches from AWS run.
Currently evaluates on batch 0-10000 (first 10K patients).

1. Generating pooled_retrospective age offset predictions...


python(75761) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


GENERATING AGE OFFSET PREDICTIONS
Approach: pooled_retrospective
Batch: 0-10000
Max offset: 9
Loading full data tensors...
Subsetting to batch 0-10000...

Loading pi batches for offsets 0-9...
  Loading offset 0: pi_enroll_fixedphi_age_offset_0_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 1: pi_enroll_fixedphi_age_offset_1_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 2: pi_enroll_fixedphi_age_offset_2_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 3: pi_enroll_fixedphi_age_offset_3_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 4: pi_enroll_fixedphi_age_offset_4_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 5: pi_enroll_fixedphi_age_offset_5_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 6: pi_enroll_fixedphi_age_offset_6_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 7: pi_enroll_fixedphi_age_offset_7_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 8: pi_enroll_fixedphi_age_offset_8_sex_0_10000_try2_withpcs_newrun.pt
  Loading offset 9: pi_e

---

## SECTION 3: LOAD RESULTS

**Safe to Run Multiple Times**: Loads CSV files without regenerating


In [7]:
# ============================================================================
# LOAD GENERATED RESULTS
# ============================================================================
"""
Loads all generated CSV files into dictionaries for analysis.
Safe to run multiple times - reloads results without regenerating.
"""

import pandas as pd
from pathlib import Path

results_base = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results')
approach = 'pooled_retrospective'

print("="*80)
print("LOADING GENERATED RESULTS")
print("="*80)

# Load time horizon results
time_horizon_results = {}
time_horizon_dir = results_base / 'time_horizons' / approach
if time_horizon_dir.exists():
    for horizon in ['5yr', '10yr', '30yr', 'static_10yr']:
        file_path = time_horizon_dir / f'{horizon}_results.csv'
        if file_path.exists():
            time_horizon_results[horizon] = pd.read_csv(file_path)
            print(f"✓ Loaded {horizon} results: {len(time_horizon_results[horizon])} diseases")
        else:
            print(f"⚠️  {horizon} results not found: {file_path}")

# Load washout results
washout_results = {}
washout_dir = results_base / 'washout' / approach
if washout_dir.exists():
    for offset in ['0yr', '1yr', '2yr']:
        file_path = washout_dir / f'washout_{offset}_results.csv'
        if file_path.exists():
            washout_results[offset] = pd.read_csv(file_path)
            print(f"✓ Loaded washout {offset} results: {len(washout_results[offset])} diseases")
        else:
            print(f"⚠️  Washout {offset} results not found: {file_path}")

# Load age offset results (optional)
age_offset_results = {}
age_offset_dir = results_base / 'age_offset' / approach
if age_offset_dir.exists():
    pivot_file = age_offset_dir / 'age_offset_aucs_pivot_batch_0_10000.csv'
    if pivot_file.exists():
        age_offset_results['pivot'] = pd.read_csv(pivot_file)
        print(f"✓ Loaded age offset pivot results: {len(age_offset_results['pivot'])} diseases")
    else:
        print(f"⚠️  Age offset pivot results not found: {pivot_file}")
    
    aucs_file = age_offset_dir / 'age_offset_aucs_batch_0_10000.csv'
    if aucs_file.exists():
        age_offset_results['aucs'] = pd.read_csv(aucs_file)
        print(f"✓ Loaded age offset AUCs results")
else:
    print(f"⚠️  Age offset directory not found: {age_offset_dir} (optional)")

print("\n" + "="*80)
print("RESULTS LOADED")
print("="*80)


LOADING GENERATED RESULTS
✓ Loaded 5yr results: 28 diseases
✓ Loaded 10yr results: 28 diseases
✓ Loaded 30yr results: 28 diseases
✓ Loaded static_10yr results: 28 diseases
✓ Loaded washout 0yr results: 28 diseases
✓ Loaded washout 1yr results: 28 diseases
✓ Loaded washout 2yr results: 28 diseases
✓ Loaded age offset pivot results: 28 diseases
✓ Loaded age offset AUCs results

RESULTS LOADED


---

## SECTION 4: COMPARISONS & VALIDATIONS

**Run Once Each**: Compare with external benchmarks and analyze results


In [8]:
# ============================================================================
# COMPARE WITH EXTERNAL SCORES
# ============================================================================
"""
Compares Aladynoulli with PCE, PREVENT, Gail, QRISK3
Results saved to: results/comparisons/pooled_retrospective/external_scores_comparison.csv
"""

%run compare_with_external_scores.py --approach pooled_retrospective --n_bootstraps 100


Set random seed to 42 for reproducibility
RESULTS ALREADY EXIST - SKIPPING REGENERATION
Found existing results: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/comparisons/pooled_retrospective/external_scores_comparison.csv

To regenerate, delete the existing result file first.


python(75868) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


In [9]:
# ============================================================================
# COMPARE WITH DELPHI (1-YEAR)
# ============================================================================
"""
Compares Aladynoulli 1-year predictions with Delphi-2M 1-year predictions
Results saved to: results/comparisons/pooled_retrospective/delphi_comparison_*.csv
"""

%run compare_delphi_1yr_import.py


ALADYNOULLI vs DELPHI: 1-YEAR PREDICTION COMPARISON

Loading Aladynoulli results...
Loaded Aladynoulli results for 28 diseases

Extracting Delphi results from supplementary table...
Extracted Delphi results for 28 diseases

Creating comparison...

ALADYNOULLI vs DELPHI: DISEASES WHERE ALADYNOULLI WINS (1-YEAR, 0-YEAR GAP)

Total wins: 15 out of 28 diseases
Win rate: 53.6%

Disease                    Aladynoulli       Delphi    Advantage    Percent
----------------------------------------------------------------------------------------------------
Parkinsons                      0.8265       0.6108       0.2157      35.3%
Multiple_Sclerosis              0.8363       0.6545       0.1818      27.8%
Prostate_Cancer                 0.8434       0.6636       0.1798      27.1%
ASCVD                           0.8961       0.7370       0.1591      21.6%
Atrial_Fib                      0.8177       0.6721       0.1456      21.7%
Breast_Cancer                   0.8212       0.6985       0.1227   

In [10]:
# ============================================================================
# COMPARE MULTI-HORIZON WITH DELPHI
# ============================================================================
"""
Compares Aladynoulli multi-horizon predictions (5yr, 10yr, 30yr) with Delphi 1-year
Results saved to: results/comparisons/pooled_retrospective/delphi_comparison_multihorizon.csv
"""

%run compare_delphi_multihorizon.py


ALADYNOULLI vs DELPHI: MULTI-HORIZON COMPARISON

Loading Aladynoulli multi-horizon results...
Loaded Aladynoulli results for 28 diseases
Horizons available: 1yr (0gap), 1yr (1gap), 5yr, 10yr, 30yr, static10yr

Extracting Delphi results from supplementary table...
Extracted Delphi results for 28 diseases

Creating multi-horizon comparison...

ALADYNOULLI PERFORMANCE ACROSS HORIZONS vs DELPHI (1-YEAR, 0-GAP)

Disease                       Delphi    Ala_1yr    Ala_5yr   Ala_10yr   Ala_30yr Ala_st10yr
----------------------------------------------------------------------------------------------------
Parkinsons                    0.6108     0.8265     0.7320     0.7216     0.6222     0.7220
Secondary_Cancer              0.5691     0.6076     0.6112     0.6068     0.5664     0.6110
Prostate_Cancer               0.6636     0.8434     0.7305     0.6794     0.6741     0.6844
Osteoporosis                  0.7037     0.7728     0.7018     0.6891     0.6454     0.6763
ASCVD                       

In [11]:
# ============================================================================
# COMPARE WITH COX BASELINE
# ============================================================================
"""
Compares Aladynoulli static 10-year predictions with Cox baseline (age + sex only)
Results saved to: results/comparisons/pooled_retrospective/cox_baseline_comparison_*.csv
"""

%run compare_with_cox_baseline.py


ALADYNOULLI vs COX BASELINE (AGE + SEX ONLY, NO NOULLI)
Cox model: 10-year follow-up (trained on 20001-30000, tested on 0-10000)
Aladynoulli: Static 10-year (1-year score for 10-year outcome)

Loading Cox baseline results...
✓ Loaded Cox baseline results: 28 diseases
  Columns: ['Unnamed: 0', 'disease_group', 'auc']

First few rows:
    Unnamed: 0      Disease  Cox_AUC
0        ASCVD        ASCVD   0.6338
1     Diabetes     Diabetes   0.6003
2   Atrial_Fib   Atrial_Fib   0.5883
3          CKD          CKD   0.5292
4  All_Cancers  All_Cancers   0.5411

Loading Aladynoulli static 10-year results...
Note: Cox model uses 10-year follow-up, comparing with static 10-year (1-year score for 10-year outcome)
✓ Loaded Aladynoulli static 10-year results: 28 diseases

Merging results...
✓ Merged results: 28 diseases
  Diseases in both: 28
  Only in Cox: 0
  Only in Aladynoulli: 0

ALADYNOULLI vs COX BASELINE: STATIC 10-YEAR PREDICTIONS

Disease                      Cox_AUC Aladynoulli_AUC  Improve

In [12]:
# ============================================================================
# ANALYZE PREDICTION DROPS
# ============================================================================
"""
Analyzes why predictions drop between 0-year and 1-year washout
Focuses on hypercholesterolemia and other precursor diseases
Results saved to: results/analysis/prediction_drops_*.csv
"""

%run analyze_prediction_drops.py --disease ASCVD


Loading data...
ANALYZING PREDICTION DROPS FOR: ASCVD
'ASCVD' is a disease group. Finding individual diseases...
  Found: Myocardial infarction at index 112
  Found: Coronary atherosclerosis at index 114
  Found: Other acute and subacute forms of ischemic heart disease at index 116
  Found: Unstable angina (intermediate coronary syndrome) at index 111
  Found: Angina pectoris at index 113
  Found: Other chronic ischemic heart disease, unspecified at index 115
Found 6 disease(s) for 'ASCVD'

Analyzing 400000 patients...

Collecting predictions and outcomes...
Collected 400000 patients with both 0yr and 1yr predictions

NOTE: Prevalent case exclusion (matches evaluation function logic):
  - For single diseases: Patients with that disease before prediction time are excluded
  - For disease groups (like ASCVD): Prevalent cases are NOT excluded
    (patients can have multiple events in the group, e.g., CAD then MI)
  - This matches the evaluation function's approach for disease groups

Pred

---

## SECTION 5: VISUALIZATIONS

**Safe to Run Multiple Times**: Creates plots from existing results


In [20]:
# ============================================================================
# VISUALIZE PREDICTION DROPS
# ============================================================================
"""
Creates plots for prediction drops analysis
Plots saved to: results/analysis/plots/
"""

%run visualize_prediction_drops.py --disease ASCVD


VISUALIZING PREDICTION DROPS ANALYSIS: ASCVD

Loading results...
✓ Loaded 3 result files

Creating plots...
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots/hyperchol_comparison_ASCVD.png
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots/precursor_comparison_ASCVD.png
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots/precursor_ratios_ASCVD.png

VISUALIZATION COMPLETE

Plots saved to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots


In [21]:
# ============================================================================
# VISUALIZE MODEL LEARNING (KEY INSIGHT FIGURE)
# ============================================================================
"""
Creates a figure showing the key insight: Model learns to distinguish between
high-risk and lower-risk hypercholesterolemia patients.

Non-droppers (predictions stay high) have HIGHER event rates → Model correctly
identifies high-risk patients. This shows the model is learning and calibrating.
"""
%run visualize_model_learning.py --disease ASCVD


CREATING MODEL LEARNING FIGURES
✓ Saved figure to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots/model_learning_hyperchol_ASCVD.png
✓ Saved full comparison figure to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots/model_learning_full_comparison_ASCVD.png
✓ Saved multiple precursors figure to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots/model_learning_multiple_precursors_ASCVD.png

COMPLETE


In [22]:
# ============================================================================
# VISUALIZE ALL COMPARISONS
# ============================================================================
"""
Creates plots for all comparisons (external scores, Delphi, etc.)
Plots saved to: results/comparisons/plots/
"""

%run visualize_all_comparisons.py


VISUALIZING ALL COMPARISONS

1. Loading external scores comparison...
   Columns in CSV: ['Aladynoulli_AUC', 'Aladynoulli_CI_lower', 'Aladynoulli_CI_upper', 'PCE_AUC', 'PCE_CI_lower', 'PCE_CI_upper', 'Difference', 'N_patients', 'N_events', 'QRISK3_AUC', 'QRISK3_CI_lower', 'QRISK3_CI_upper', 'QRISK3_Difference', 'PREVENT_AUC', 'PREVENT_CI_lower', 'PREVENT_CI_upper', 'Gail_AUC', 'Gail_CI_lower', 'Gail_CI_upper']
   Index: ['ASCVD_10yr', 'ASCVD_30yr', 'Breast_Cancer_10yr_Female', 'Breast_Cancer_10yr_Male', 'Breast_Cancer_10yr_All']
   Creating external scores comparison plot...
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/comparisons/plots/external_scores_comparison.png

2. Creating Delphi comparison plot...
   Columns in Delphi file: ['Aladynoulli_1yr_0gap', 'Delphi_1yr_0gap', 'Diff_0gap', 'Aladynoulli_1yr_1gap', 'Delphi_1yr_1gap', 'Diff_1gap']
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/resu

---

## SECTION 7: AGE OFFSET SIGNATURE ANALYSIS (OPTIONAL)

**Advanced Analysis**: How predictions and signatures change across training time points (t0-t9)


In [23]:
# ============================================================================
# MI WASHOUT ANALYSIS: SIGNATURE-BASED LEARNING
# ============================================================================
"""
Analyzes MI (Myocardial Infarction) washout with signature-based learning.

For each patient, tracks:
- 3 MODELS: m0t9, m5t9, m9t9 (all predict at t9, trained to t0, t5, t9)
- 3 TIME PERIODS: 
  1) Baseline (before t0/enrollment)
  2) Interval t0-t5
  3) Interval t5-t9
- For each period: MI status and Signature 5 precursor diseases

Categorizes washout based on what developed in intervals (not baseline).
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks')

result = subprocess.run([
    sys.executable,
    str(script_dir / 'analyze_mi_washout_signature.py'),
    '--start_idx', '0',
    '--end_idx', '10000'
], capture_output=True, text=True)

print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")

python(77208) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


MI WASHOUT ANALYSIS WITH SIGNATURE-BASED LEARNING
Batch: 0-10000

Loading essentials...
✓ Found MI at index 112: Myocardial infarction

Loading cluster assignments...
✓ Loaded clusters: 348 diseases

✓ MI belongs to Signature 5
✓ Found 7 diseases in Signature 5
  Examples: ['Hypercholesterolemia', 'Unstable angina (intermediate coronary syndrome)', 'Myocardial infarction', 'Angina pectoris', 'Coronary atherosclerosis']

Loading data batch 0-10000...

Loading pi batches for offsets 0-9...
✓ Loaded 10 pi batches

Loading model checkpoints to extract lambda...

ANALYZING MI WASHOUT

Analyzing 10000 patients...
MI index: 112
Signature 5 has 7 diseases

✓ Saved results to: results/analysis/mi_washout_analysis_batch_0_10000.csv

SUMMARY STATISTICS
Total patients analyzed: 10000

Washout categories:
washout_category
neither         8694
accurate        1028
conservative     278
Name: count, dtype: int64

MI status at t9: 435 patients (4.3%)

Patients with Signature 5 precursors at t9: 1600 (1

In [24]:

%run visualize_mi_washout_signature.py

VISUALIZING MI WASHOUT ANALYSIS
✓ Saved figure to: /Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks/results/analysis/plots/mi_washout_signature_analysis.png

VISUALIZATION COMPLETE


In [None]:
# ============================================================================
# ANALYZE AGE OFFSET SIGNATURE CHANGES
# ============================================================================
"""
Analyzes how predictions and signature loadings change across age offsets (t0-t9).

For patients with specific precursor diseases, tracks:
1. How their predictions change across offsets 0-9
2. Which signatures/clusters are most impacted
3. Which precursor diseases drive which signature changes

This shows how the model learns and adapts as more data becomes available.
"""

# Run analysis for key precursor diseases
import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/new_oct_revision/new_notebooks')

result = subprocess.run([
    sys.executable,
    str(script_dir / 'analyze_age_offset_signatures.py'),
    '--approach', 'pooled_retrospective',
    '--target_disease', 'ASCVD',
    '--start_idx', '0',
    '--end_idx', '10000'
], capture_output=True, text=True)

print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")


python(77316) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


ANALYZING AGE OFFSET SIGNATURE CHANGES

Approach: pooled_retrospective
Batch: 0-10000
Target disease: ASCVD
Precursor diseases: ['Hypercholesterolemia', 'Essential hypertension', 'Type 2 diabetes', 'Atrial fibrillation and flutter', 'Obesity', 'Chronic Kidney Disease, Stage III', 'Rheumatoid arthritis', 'Sleep apnea', 'Peripheral vascular disease, unspecified']

Loading essentials...
Loading cluster assignments...
  ✓ Loaded clusters as numpy array from: /Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/initial_clusters_400k.pt
  Cluster shape: (348,)
✓ Loaded clusters: 348 diseases, 20 clusters

Loading data batch 0-10000...

Loading pi batches for offsets 0-9...
  Loading offset 0...
  Loading offset 1...
  Loading offset 2...
  Loading offset 3...
  Loading offset 4...
  Loading offset 5...
  Loading offset 6...
  Loading offset 7...
  Loading offset 8...
  Loading offset 9...
✓ Loaded 10 pi batches

Loading model checkpoints to extract lambda (patient-specifi

: 

## Summary: Age Offset Signature Analysis

**Question:** When models are trained with different amounts of data (washout periods), how do patient-specific parameters (lambda) change, and does this reflect conservative vs. accurate washout?

**Findings:**

1. **Conservative washout (with outcome events):**
   - Patients who had ASCVD events during washout
   - Signature 5 (cardiovascular cluster) shows large positive lambda changes (+0.587 for hypercholesterolemia)
   - Model learns from patients who already had outcomes

2. **Accurate washout (with precursor only):**
   - Patients with precursors (e.g., hypercholesterolemia) but no ASCVD outcome during washout
   - Signature 5 shows moderate positive lambda changes (+0.305)
   - Model learns from pre-clinical signals (risk factors before outcomes)

3. **Model refinement (without either):**
   - Patients with neither precursor nor outcome
   - Small negative lambda changes (-0.053)
   - Model becomes more conservative/refined

**Interpretation:**
- The model distinguishes between:
  - Real conditions (outcomes) → large changes
  - Pre-clinical signals (precursors) → moderate changes
  - Neither → small/negative changes
- This validates washout accuracy: the model learns from legitimate risk factors, not just future outcomes
- Signature 5 correctly responds to cardiovascular precursors even when outcomes haven't occurred yet

**Conclusion:** This pattern supports model validity and washout accuracy. The model learns appropriately from pre-clinical signals, which is the intended behavior for accurate washout.

---

## SECTION 6: SUMMARY & KEY FINDINGS

**Read-Only**: Summary of results and interpretations


### Key Findings

**Performance Summary:**
- Aladynoulli outperforms external risk scores (PCE, PREVENT, QRISK3) for ASCVD
- Comparable to Gail model for breast cancer (women only)
- Outperforms Delphi-2M for most diseases across multiple time horizons
- Significantly better than Cox baseline (age + sex only)

**Prediction Drops Analysis:**
- Prediction drops between 0yr and 1yr washout are **expected behavior**
- Model refines/calibrates predictions based on observed outcomes
- Hypercholesterolemia patients show largest drops (model learns to distinguish high-risk vs lower-risk)
- Similar behavior to Delphi - both models learn and calibrate over time

**Reproducibility:**
- AWS vs Local differences are negligible when aggregated
- Per-batch differences average out over large populations
- Results are reproducible across computational environments

**Model Strengths:**
- Multi-horizon predictions (1yr, 5yr, 10yr, 30yr)
- Dynamic risk assessment (updates with new information)
- Works across both sexes (unlike Gail which is women-only)
- Handles multiple diseases simultaneously
