# Aladynoulli Performance Evaluation Notebook

This notebook orchestrates all **performance evaluation** analyses for Aladynoulli (AUC, calibration, comparisons with external scores).

**Note**: Model validity and learning analyses (prediction drops, washout validation, signature dynamics) have been moved to `reviewer_responses/notebooks/R2_R3_Model_Validity_Learning.ipynb`.

## Execution Order

1. **Setup** (Run Once): Assemble PI tensors
2. **Generate Predictions** (Run Once Each): Time horizons, washout, age offsets
3. **Load Results** (Safe to Run Multiple Times): Load CSV files
4. **Comparisons** (Run Once Each): External scores, Delphi, Cox baseline
5. **Visualizations** (Safe to Run Multiple Times): Create plots
6. **Summary** (Read-Only): Key findings

## Main Approach: Pooled Retrospective

All comparisons use the `pooled_retrospective` approach by default, which:
- Uses phi trained externally and validated with LOO tests
- Represents clinically implementable performance
- Uses pi from: `enrollment_predictions_fixedphi_correctedE_vectorized/pi_enroll_fixedphi_sex_FULL.pt`

## Note on Re-running

- Jupyter will skip cells that have already been executed
- Generation cells should be marked as "not evaluated" after first run
- Loading and visualization cells can be run multiple times safely


---

## SECTION 1: SETUP & DATA PREPARATION

**Note**: For `pooled_retrospective`, we use `pi_full_400k.pt` from Downloads directly.
Assembly step is NOT needed - the downloaded file is the trusted source.


In [9]:
%run /Users/sarahurbut/aladynoulli2/verify_age_stratified_table.py

VERIFYING AGE-STRATIFIED TABLE VALUES

1-YEAR MEDIANS (from pivot table, offsets 0-9):
--------------------------------------------------------------------------------

ASCVD:
  39-50: median = 0.925 (from 10 values)
  50-60: median = 0.909 (from 10 values)
  60-71: median = 0.906 (from 10 values)

Heart Failure:
  39-50: median = 0.827 (from 8 values)
  50-60: median = 0.780 (from 10 values)
  60-71: median = 0.890 (from 10 values)

Breast Cancer:
  39-50: median = 0.920 (from 8 values)
  50-60: median = 0.953 (from 10 values)
  60-71: median = 0.951 (from 10 values)

Colorectal Cancer:
  39-50: median = 0.584 (from 5 values)
  50-60: median = 0.928 (from 10 values)
  60-71: median = 0.881 (from 10 values)

Atrial Fibrillation:
  39-50: median = 0.638 (from 6 values)
  50-60: median = 0.754 (from 10 values)
  60-71: median = 0.847 (from 10 values)

Stroke:
  39-50: median = 0.356 (from 6 values)
  50-60: median = 0.597 (from 10 values)
  60-71: median = 0.730 (from 10 values)

CKD:
  

KeyError: '60-71'

In [5]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/verify_pi_full_correctedE.py

VERIFYING pi_enroll_fixedphi_sex_FULL.pt ASSEMBLY
FULL file: /Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_fixedphi_correctedE_vectorized/pi_enroll_fixedphi_sex_FULL.pt
Batch directory: /Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_fixedphi_correctedE_vectorized

Loading FULL file: /Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_fixedphi_correctedE_vectorized/pi_enroll_fixedphi_sex_FULL.pt
  Shape: torch.Size([400000, 348, 52])
Found 40 batch files

Loading and concatenating batches...
  Batch 1/40: pi_enroll_fixedphi_sex_0_10000.pt (indices 0-10000)
    Shape: torch.Size([10000, 348, 52])
  Batch 2/40: pi_enroll_fixedphi_sex_10000_20000.pt (indices 10000-20000)
    Shape: torch.Size([10000, 348, 52])
  Batch 3/40: pi_enroll_fixedphi_sex_20000_30000.pt (indices 20000-30000)
    Shape: torch.Size([10000, 348, 52])
  Batch 4/40: pi_enroll_fixedphi_sex_30000_40000.pt (indices 30000-40000)
    Shape: torch.Size([10000, 3

---

## SECTION 2: GENERATE PREDICTIONS

**Run Once Each**: Generate predictions for different analyses

**Note**: These cells will use existing results if available. They won't regenerate unless needed.


In [8]:
# ============================================================================
# STEP 1: GENERATE TIME HORIZON PREDICTIONS
# ============================================================================
"""
Generates predictions for multiple time horizons: 5yr, 10yr, 30yr, static 10yr
Results saved to: results/time_horizons/{approach}/
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts')

print("="*80)
print("GENERATING TIME HORIZON PREDICTIONS")
print("="*80)

# Generate for pooled_retrospective (main approach)
print("\n1. Generating pooled_retrospective time horizon predictions...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_time_horizon_predictions.py'),
    '--approach', 'pooled_retrospective'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")

print("\n✓ Time horizon predictions complete")


GENERATING TIME HORIZON PREDICTIONS

1. Generating pooled_retrospective time horizon predictions...
GENERATING TIME HORIZON PREDICTIONS: POOLED_RETROSPECTIVE
Pi tensor: /Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_fixedphi_correctedE_vectorized/pi_enroll_fixedphi_sex_FULL.pt
Horizons: ['5', '10', '30', 'static10']
Output directory: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/time_horizons/pooled_retrospective

Loading data...
Loaded pi tensor: torch.Size([400000, 348, 52])
Loaded Y tensor: torch.Size([407878, 348, 52])
Loaded E tensor: torch.Size([407878, 348])
Loaded pce_df: 400000 patients

Subsetting to first 400000 patients...
After subsetting: pi: 400000, Y: 400000, E: 400000, pce_df: 400000

PROCESSING HORIZON: 5
Evaluating dynamic 5-year predictions...

Evaluating ASCVD (Dynamic 10-Year Risk)...
AUC: 0.757 (0.754-0.761) (calculated on 400000 individuals)
Events (10-Year in Eval Cohort): 17167 (4.3%) (from 400000 individu

In [21]:


%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/compare_pi_tensors.py

COMPARING PI TENSORS

1. Loading data for batch 0-10000...
Loading full data tensors...
Subsetting to batch 0-10000...
  Also loading E_enrollment_full for comparison...

2. Loading pi tensors...
  Loading pi_1: pi_enroll_fixedphi_sex_FULL.pt
    Shape: torch.Size([10000, 348, 52])
  Loading pi_2: pi_enroll_fixedphi_age_offset_0_sex_0_10000_try2_withpcs_newrun_pooledall.pt
    Shape: torch.Size([10000, 348, 52])

3. Comparing pi tensors directly...
  Are identical (atol=1e-6): True
  Max difference: 0.000000e+00
  Mean difference: 0.000000e+00

4. Evaluating both pi tensors with 0-year washout...
   Using: evaluate_major_diseases_wsex_with_bootstrap_dynamic_1year_different_start_end_numeric_sex

   Evaluating pi_1 (from big run) with E_enrollment...
Filtering for 1: Found 4591 individuals in cohort

Summary of Results (1-Year Risk, Sex-Adjusted, Offset=0):
--------------------------------------------------------------------------------
Disease Group        AUC                       Eve

In [11]:
# ============================================================================
# STEP 2: GENERATE WASHOUT PREDICTIONS
# ============================================================================
"""
Generates 1-year predictions with different washout periods: 0yr, 1yr, 2yr
Results saved to: results/washout/{approach}/
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts')

print("="*80)
print("GENERATING WASHOUT PREDICTIONS")
print("="*80)

# Generate for pooled_retrospective (main approach)
print("\n1. Generating pooled_retrospective washout predictions...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_washout_predictions.py'),
    '--approach', 'pooled_retrospective'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")

print("\n✓ Washout predictions complete")


GENERATING WASHOUT PREDICTIONS

1. Generating pooled_retrospective washout predictions...
GENERATING WASHOUT PREDICTIONS: POOLED_RETROSPECTIVE
Pi tensor: /Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_fixedphi_correctedE_vectorized/pi_enroll_fixedphi_sex_FULL.pt
Washout periods: 0yr, 1yr, 2yr
Output directory: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/washout/pooled_retrospective

Loading data...
Loaded pi tensor: torch.Size([400000, 348, 52])
Loaded Y tensor: torch.Size([407878, 348, 52])
Loaded E tensor: torch.Size([407878, 348])
Loaded pce_df: 400000 patients

Subsetting to first 400000 patients...
After subsetting: pi: 400000, Y: 400000, E: 400000, pce_df: 400000

PROCESSING WASHOUT: 0yr
Evaluating 1-year predictions with 0-year washout...
Filtering for 1: Found 182542 individuals in cohort

Summary of Results (1-Year Risk, Sex-Adjusted, Offset=0):
----------------------------------------------------------------------------

In [None]:
%run compare_age_offset_washout

In [2]:
# ============================================================================
# STEP 2B: GENERATE RETROSPECTIVE WASHOUT ALL HORIZONS (RUN ONCE, THEN MARK AS "NOT EVALUATED")
# ============================================================================
"""
IMPORTANT: This cell generates 10-year, 30-year, and static 10-year predictions with washout for pooled_retrospective.
- Generates dynamic 10-year and 30-year predictions with 1-year washout
- Generates static 10-year predictions (1-year score) with 1-year washout
- Processes ALL 400K patients at once using pre-computed pi tensors
- Uses _from_pi versions: evaluate_major_diseases_wsex_with_bootstrap_dynamic_withwashout_from_pi() and 
  evaluate_major_diseases_wsex_with_bootstrap_withwashout_from_pi()
- Only runs for pooled_retrospective approach (main clinically implementable approach)
- Run once, then mark as "not evaluated"
- Script: generate_washout_time_horizons.py
- Results saved to: results/washout_time_horizons/pooled_retrospective/
"""

import subprocess
import sys
from pathlib import Path

# Set script directory
script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts')

print("="*80)
print("GENERATING RETROSPECTIVE WASHOUT ALL HORIZONS")
print("="*80)
print("\nThis will generate:")
print("  - 10-year dynamic predictions with 1-year washout")
print("  - 30-year dynamic predictions with 1-year washout")
print("  - Static 10-year predictions (1-year score) with 1-year washout")
print("Approach: pooled_retrospective only")
print("Processing: ALL 400K patients at once")
print("\nNOTE: Run once, then mark this cell as 'not evaluated'.")
print("="*80)

# Generate washout predictions for all horizons
print("\nGenerating pooled_retrospective washout predictions for all horizons...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_washout_time_horizons.py'),
    '--n_bootstraps', '10',
    '--washout_years', '1'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")
else:
    print("✓ pooled_retrospective washout predictions completed successfully")

print("\n" + "="*80)
print("RETROSPECTIVE WASHOUT PREDICTIONS COMPLETE")
print("="*80)
print("\nResults saved to: results/washout_time_horizons/pooled_retrospective/")
print("  - washout_1yr_10yr_dynamic_results.csv")
print("  - washout_1yr_30yr_dynamic_results.csv")
print("  - washout_1yr_10yr_static_results.csv")
print("  - washout_1yr_comparison_all_horizons.csv")


GENERATING RETROSPECTIVE WASHOUT ALL HORIZONS

This will generate:
  - 10-year dynamic predictions with 1-year washout
  - 30-year dynamic predictions with 1-year washout
  - Static 10-year predictions (1-year score) with 1-year washout
Approach: pooled_retrospective only
Processing: ALL 400K patients at once

NOTE: Run once, then mark this cell as 'not evaluated'.

Generating pooled_retrospective washout predictions for all horizons...
GENERATING WASHOUT TIME HORIZON PREDICTIONS: POOLED_RETROSPECTIVE
Pi tensor: /Users/sarahurbut/Library/CloudStorage/Dropbox/enrollment_predictions_fixedphi_correctedE_vectorized/pi_enroll_fixedphi_sex_FULL.pt
Washout period: 1 years
Horizons: 10yr, 30yr
Bootstrap iterations: 10
Output directory: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/washout_time_horizons/pooled_retrospective

NOTE: Processing 400,000 patients with 10 bootstraps.
      For 30-year horizon, this processes ~400,000 patients × 30 years = ~12,000,000 r

In [2]:
# ============================================================================
# STEP 3: GENERATE AGE OFFSET PREDICTIONS (OPTIONAL)
# ============================================================================
"""
Generates rolling 1-year predictions using models trained at different time offsets
Evaluates models trained at enrollment + 0, 1, 2, ..., 9 years (offsets 0-9)
Uses pre-computed pi batches from AWS run (downloaded to ~/Dropbox/age_offset_files/)
Currently evaluates on batch 0-10000 (first 10K patients)
Results saved to: results/age_offset/{approach}/
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts')

print("="*80)
print("GENERATING AGE OFFSET PREDICTIONS")
print("="*80)
print("\nThis will generate rolling 1-year predictions using models trained at")
print("enrollment + 0, 1, 2, ..., 9 years (offsets 0-9).")
print("\nUses pre-computed pi batches from AWS run.")
print("Currently evaluates on batch 0-10000 (first 10K patients).")
print("="*80)

# Generate age offset predictions for pooled retrospective (main approach)
print("\n1. Generating pooled_retrospective age offset predictions...")
result = subprocess.run([
    sys.executable,
    str(script_dir / 'generate_age_offset_predictions.py'),
    '--approach', 'pooled_retrospective',
    '--max_offset', '9',
    '--start_idx', '0',
    '--end_idx', '10000'
], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")
else:
    print("✓ Age offset predictions complete")

print("\n" + "="*80)
print("AGE OFFSET PREDICTIONS COMPLETE")
print("="*80)
print("\nResults saved to: results/age_offset/{approach}/")
print("  - age_offset_aucs_batch_0_10000.csv (AUCs by disease and offset)")
print("  - age_offset_aucs_pivot_batch_0_10000.csv (pivot table for easy viewing)")
print("  - age_offset_aucs_summary_batch_0_10000.csv (summary stats: mean, median, std, min, max, count across offsets)")
print("\nROC curves are plotted for ASCVD by default (saved as PDF).")


GENERATING AGE OFFSET PREDICTIONS

This will generate rolling 1-year predictions using models trained at
enrollment + 0, 1, 2, ..., 9 years (offsets 0-9).

Uses pre-computed pi batches from AWS run.
Currently evaluates on batch 0-10000 (first 10K patients).

1. Generating pooled_retrospective age offset predictions...
GENERATING AGE OFFSET PREDICTIONS
Approach: pooled_retrospective
Batch: 0-10000
Max offset: 9
Loading full data tensors...
Subsetting to batch 0-10000...

Loading pi batches for offsets 0-9...
  Loading offset 0: pi_enroll_fixedphi_age_offset_0_sex_0_10000_try2_withpcs_newrun_pooledall.pt
  Loading offset 1: pi_enroll_fixedphi_age_offset_1_sex_0_10000_try2_withpcs_newrun_pooledall.pt
  Loading offset 2: pi_enroll_fixedphi_age_offset_2_sex_0_10000_try2_withpcs_newrun_pooledall.pt
  Loading offset 3: pi_enroll_fixedphi_age_offset_3_sex_0_10000_try2_withpcs_newrun_pooledall.pt
  Loading offset 4: pi_enroll_fixedphi_age_offset_4_sex_0_10000_try2_withpcs_newrun_pooledall.pt
  

---

## SECTION 3: LOAD RESULTS

**Safe to Run Multiple Times**: Loads CSV files without regenerating


In [3]:
# ============================================================================
# LOAD GENERATED RESULTS
# ============================================================================
"""
Loads all generated CSV files into dictionaries for analysis.
Safe to run multiple times - reloads results without regenerating.
"""

import pandas as pd
from pathlib import Path

results_base = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results')
approach = 'pooled_retrospective'

print("="*80)
print("LOADING GENERATED RESULTS")
print("="*80)

# Load time horizon results
time_horizon_results = {}
time_horizon_dir = results_base / 'time_horizons' / approach
if time_horizon_dir.exists():
    for horizon in ['5yr', '10yr', '30yr', 'static_10yr']:
        file_path = time_horizon_dir / f'{horizon}_results.csv'
        if file_path.exists():
            time_horizon_results[horizon] = pd.read_csv(file_path)
            print(f"✓ Loaded {horizon} results: {len(time_horizon_results[horizon])} diseases")
        else:
            print(f"⚠️  {horizon} results not found: {file_path}")

# Load washout results
washout_results = {}
washout_dir = results_base / 'washout' / approach
if washout_dir.exists():
    for offset in ['0yr', '1yr', '2yr']:
        file_path = washout_dir / f'washout_{offset}_results.csv'
        if file_path.exists():
            washout_results[offset] = pd.read_csv(file_path)
            print(f"✓ Loaded washout {offset} results: {len(washout_results[offset])} diseases")
        else:
            print(f"⚠️  Washout {offset} results not found: {file_path}")

# Load age offset results (optional)
age_offset_results = {}
age_offset_dir = results_base / 'age_offset' / approach
if age_offset_dir.exists():
    pivot_file = age_offset_dir / 'age_offset_aucs_pivot_batch_0_10000.csv'
    if pivot_file.exists():
        age_offset_results['pivot'] = pd.read_csv(pivot_file)
        print(f"✓ Loaded age offset pivot results: {len(age_offset_results['pivot'])} diseases")
    else:
        print(f"⚠️  Age offset pivot results not found: {pivot_file}")
    
    aucs_file = age_offset_dir / 'age_offset_aucs_batch_0_10000.csv'
    if aucs_file.exists():
        age_offset_results['aucs'] = pd.read_csv(aucs_file)
        print(f"✓ Loaded age offset AUCs results")
else:
    print(f"⚠️  Age offset directory not found: {age_offset_dir} (optional)")

print("\n" + "="*80)
print("RESULTS LOADED")
print("="*80)


LOADING GENERATED RESULTS
✓ Loaded 5yr results: 28 diseases
✓ Loaded 10yr results: 28 diseases
✓ Loaded 30yr results: 28 diseases
✓ Loaded static_10yr results: 28 diseases
✓ Loaded washout 0yr results: 28 diseases
✓ Loaded washout 1yr results: 28 diseases
✓ Loaded washout 2yr results: 28 diseases
✓ Loaded age offset pivot results: 28 diseases
✓ Loaded age offset AUCs results

RESULTS LOADED


---

## SECTION 4: COMPARISONS & VALIDATIONS

**Run Once Each**: Compare with external benchmarks and analyze results


In [1]:
# ============================================================================
# COMPARE WITH EXTERNAL SCORES
# ============================================================================
"""
Compares Aladynoulli with PCE, PREVENT, Gail, QRISK3
Results saved to: results/comparisons/pooled_retrospective/external_scores_comparison.csv
"""

%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/compare_with_external_scores.py --approach pooled_retrospective --n_bootstraps 100


Set random seed to 42 for reproducibility
COMPARING WITH EXTERNAL SCORES: POOLED_RETROSPECTIVE

Loading data...
Loaded external scores file: 400000 patients
Loaded pi tensor: torch.Size([400000, 348, 52])
Loaded Y tensor: torch.Size([407878, 348, 52])
Loaded pce_df: 400000 patients

ASCVD COMPARISON: Aladynoulli vs PCE (10yr), PREVENT (10yr), and PREVENT (30yr)

Applying LOESS calibration to predictions...
  Calibrating predictions in batches...
✓ Calibration applied

Computing Aladynoulli predictions...
Processing 400000 patients (this may take 10-20 minutes)...
  Processed 50000/400000 patients (12.5%)...
  Processed 100000/400000 patients (25.0%)...
  Processed 150000/400000 patients (37.5%)...
  Processed 200000/400000 patients (50.0%)...
  Processed 250000/400000 patients (62.5%)...
  Processed 300000/400000 patients (75.0%)...
  Processed 350000/400000 patients (87.5%)...
  ✓ Completed processing 399996 valid patients

Patients with valid PCE scores: 399996/399996 (100.0%)
Patien

In [None]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/compare_with_external_scores.py --approach pooled_retrospective

Set random seed to 42 for reproducibility
COMPARING WITH EXTERNAL SCORES: POOLED_RETROSPECTIVE

Loading data...
Loaded external scores file: 400000 patients
Found Gail_absRisk_oneyr column - will perform 1-year comparison
Loaded pi tensor: torch.Size([400000, 348, 52])
Loaded Y tensor: torch.Size([407878, 348, 52])
Loaded pce_df: 400000 patients

ASCVD COMPARISON: Aladynoulli vs PCE (10yr), PREVENT (10yr), and QRISK3 (10yr)

Applying LOESS calibration to predictions...
  Calibrating predictions in batches...


In [30]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/compare_pi_tensors.py

COMPARING PI TENSORS

1. Loading data for batch 0-10000...
Loading full data tensors...
Subsetting to batch 0-10000...
  Also loading E_enrollment_full for comparison...

2. Loading pi tensors...
  Loading pi_1: pi_enroll_fixedphi_sex_FULL.pt
    Shape: torch.Size([10000, 348, 52])
  Loading pi_2: pi_enroll_fixedphi_age_offset_0_sex_0_10000_try2_withpcs_newrun_pooledall.pt
    Shape: torch.Size([10000, 348, 52])

3. Comparing pi tensors directly...
  Are identical (atol=1e-6): True
  Max difference: 0.000000e+00
  Mean difference: 0.000000e+00

4. Evaluating both pi tensors with 0-year washout...
   Using: evaluate_major_diseases_wsex_with_bootstrap_dynamic_1year_different_start_end_numeric_sex

   Evaluating pi_1 (from big run) with E_enrollment...
Filtering for 1: Found 4591 individuals in cohort

Summary of Results (1-Year Risk, Sex-Adjusted, Offset=0):
--------------------------------------------------------------------------------
Disease Group        AUC                       Eve

In [20]:
# Compare patient filtering and calculations for Breast Cancer
# After running both evaluations:

# 1. Check how many patients each function processes
print("Patient filtering comparison:")
print(f"Washout: Filtering for sex=0 (Female)")
print(f"Rolling: Filtering for Sex='Female'")

# 2. Check if they're using the same pi values
# Extract the actual patients and risks used by each function
# You'll need to add debug prints inside the functions or modify them temporarily

# 3. Check if t_start calculations match
# Both use: t_enroll = int(age - 30), t_start = t_enroll + offset
# But verify they're getting the same age values

# 4. Most importantly - check if they're filtering prevalent cases the same way
# Both check: torch.any(Y_100k[i, d_idx, :t_start] > 0)
# But are they using the same Y tensor and same indices?

# Quick check - are the Y tensors the same?
print(f"\nY tensor check:")
print(f"  Y_batch shape: {Y_batch.shape}")
print(f"  Are Y_batch and Y_100k the same object?: {Y_batch is Y_100k}")

# Check if breast cancer disease indices match
breast_diseases = ['Breast cancer [female]', 'Malignant neoplasm of female breast']
breast_indices_washout = []
breast_indices_rolling = []
for disease in breast_diseases:
    indices = [i for i, name in enumerate(disease_names) if disease.lower() in name.lower()]
    breast_indices_washout.extend(indices)
    breast_indices_rolling.extend(indices)
breast_indices_washout = list(set(breast_indices_washout))
breast_indices_rolling = list(set(breast_indices_rolling))
print(f"\nBreast cancer disease indices:")
print(f"  Washout: {breast_indices_washout}")
print(f"  Rolling: {breast_indices_rolling}")
print(f"  Match?: {breast_indices_washout == breast_indices_rolling}")

Patient filtering comparison:
Washout: Filtering for sex=0 (Female)
Rolling: Filtering for Sex='Female'

Y tensor check:


NameError: name 'Y_batch' is not defined

In [5]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/calculate_corrected_medians.py

CALCULATING CORRECTED MEDIANS
ASCVD                          Median: 0.8795 (from 10 values)
All_Cancers                    Median: 0.7565 (from 10 values)
Anemia                         Median: 0.6905 (from 10 values)
Anxiety                        Median: 0.6390 (from 10 values)
Asthma                         Median: 0.7020 (from 10 values)
Atrial_Fib                     Median: 0.8008 (from 10 values)
Bipolar_Disorder               Median: 0.7579 (from 10 values)
Bladder_Cancer                 Median: 0.8909 (from 10 values)
Breast_Cancer                  Median: 0.8674 (from 10 values)
CKD                            Median: 0.7597 (from 10 values)
COPD                           Median: 0.7375 (from 10 values)
Colorectal_Cancer              Median: 0.8481 (from 10 values)
Crohns_Disease                 Median: 0.9299 (from 9 values)
Depression                     Median: 0.6466 (from 10 values)
Diabetes                       Median: 0.7874 (from 10 values)
Heart_Failure             

In [17]:
pce_df_full_original = pd.read_csv('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/pce_prevent_full.csv')
pce_df_full_original.keys()
pce_df_full = pce_df_full_original.copy()
if 'Sex' in pce_df_full.columns and pce_df_full['Sex'].dtype == 'object':
    pce_df_full['sex'] = pce_df_full['Sex'].map({'Female': 0, 'Male': 1}).astype(int)
    # Keep 'Sex' column for rolling function
elif 'sex' in pce_df_full.columns:
    # Convert numeric sex back to 'Sex' string for rolling function
    pce_df_full['Sex'] = pce_df_full['sex'].map({0: 'Female', 1: 'Male'}).astype(str)
else:
    raise ValueError("Need 'Sex' or 'sex' column in pce_df")
pce_df_full.keys()

Index(['eid', 'SexNumeric', 'Dm_Any', 'Dm_censor_age', 'Ht_Any',
       'Ht_censor_age', 'Cad_Any', 'Cad_censor_age', 'HyperLip_Any',
       'HyperLip_censor_age', 'DmT1_Any', 'DmT1_censor_age', 'Enrollment_Date',
       'age_enrolled', 'birth_year', 'Birthdate', 'SBP', 'tchol', 'hdl',
       'SmokingStatusv2', 'pce_goff', 'race', 'antihtnbase', 'enrollment',
       'age_at_enroll', 'prev_dm', 'prev_dm1', 'prev_ht', 'prev_hl',
       'prev_cad', 'bmi', 'CAD', 'LDL_SF', 'BMI', 'T2D', 'pce',
       'prevent_base_ascvd_risk', 'enrollment_year', 'pce_goff_fuull', 'age',
       'Sex', 'sex'],
      dtype='object')

In [15]:
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

readRDS = robjects.r['readRDS']
pce_data = readRDS('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/pce_df_prevent.rds')
pce_df = pandas2ri.rpy2py(pce_data)
pce_df.head()

Unnamed: 0,id,age,race,Sex,SmokingStatusv2,tchol,hdl,SBP,antihtnbase,dm2_prev,dm1_prev,pce_goff,pce_goff_fuull,pce,prevent_base_ascvd_risk,prevent_impute
1,1000015,69,white,Male,Previous,160.440834,38.59242,146,0,0,0,0.208619,0.208619,0.236008,0.088804,0.088804
2,1000023,44,white,Male,Current,238.631089,35.46017,145,0,0,0,0.128707,0.128707,0.133392,0.031418,0.031418
3,1000037,69,white,Female,Previous,265.003865,54.988399,152,0,0,0,0.128959,0.128959,0.126497,0.073537,0.073537
4,1000042,66,white,Male,Never,187.277648,47.911833,130,0,0,0,0.137629,0.137629,0.132626,0.062329,0.062329
5,1000059,54,white,Female,Previous,203.750965,48.221191,152,0,0,0,0.029323,0.029323,0.029881,0.032188,0.032188


In [19]:
# After loading both dataframes
#pce_df_full_original = pd.read_csv('/Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/pce_prevent_full.csv')
pce_df_full_original = pce_df_full
# Check first 10k
n_check = 10000

# Check what sex columns exist
print("pce_df columns with 'sex':", [c for c in pce_df.columns if 'sex' in c.lower() or 'Sex' in c])
print("pce_df_full_original columns with 'sex':", [c for c in pce_df_full_original.columns if 'sex' in c.lower() or 'Sex' in c])

# Get Sex values from both (handle different column names)
if 'Sex' in pce_df.columns:
    sex_rds = pce_df['Sex'].iloc[:n_check].values
elif 'sex' in pce_df.columns:
    sex_rds = pce_df['sex'].map({0: 'Female', 1: 'Male'}).iloc[:n_check].values
elif 'SexNumeric' in pce_df.columns:
    sex_rds = pce_df['SexNumeric'].map({0: 'Female', 1: 'Male'}).iloc[:n_check].values
else:
    print("No sex column found in pce_df!")

if 'Sex' in pce_df_full_original.columns:
    sex_csv = pce_df_full_original['Sex'].iloc[:n_check].values
elif 'sex' in pce_df_full_original.columns:
    sex_csv = pce_df_full_original['sex'].map({0: 'Female', 1: 'Male'}).iloc[:n_check].values
elif 'SexNumeric' in pce_df_full_original.columns:
    sex_csv = pce_df_full_original['SexNumeric'].map({0: 'Female', 1: 'Male'}).iloc[:n_check].values
else:
    print("No sex column found in pce_df_full_original!")

# Compare
mismatches = (sex_rds != sex_csv).sum()
print(f"\nMismatches in first {n_check} rows: {mismatches}")
if mismatches > 0:
    mismatch_indices = np.where(sex_rds != sex_csv)[0]
    print(f"First 10 mismatch indices: {mismatch_indices[:10]}")
    print(f"Example mismatches:")
    for idx in mismatch_indices[:5]:
        print(f"  Row {idx}: RDS={sex_rds[idx]}, CSV={sex_csv[idx]}")
else:
    print("✓ All Sex values match!")

pce_df columns with 'sex': ['Sex']
pce_df_full_original columns with 'sex': ['SexNumeric', 'Sex', 'sex']

Mismatches in first 10000 rows: 0
✓ All Sex values match!


In [7]:
# ============================================================================
# COMPARE WITH DELPHI (1-YEAR)
# ============================================================================
"""
Compares Aladynoulli 1-year predictions with Delphi-2M 1-year predictions
Results saved to: results/comparisons/pooled_retrospective/delphi_comparison_*.csv
"""

%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/compare_delphi_1yr_import.py


ALADYNOULLI vs DELPHI: 1-YEAR PREDICTION COMPARISON

Loading Aladynoulli results...
Loaded Aladynoulli results for 28 diseases

Extracting Delphi results from supplementary table...
Extracted Delphi results for 28 diseases

Creating comparison...

ALADYNOULLI vs DELPHI: DISEASES WHERE ALADYNOULLI WINS (1-YEAR, 0-YEAR GAP)

Total wins: 15 out of 28 diseases
Win rate: 53.6%

Disease                    Aladynoulli       Delphi    Advantage    Percent
----------------------------------------------------------------------------------------------------
Parkinsons                      0.8091       0.6108       0.1983      32.5%
Multiple_Sclerosis              0.8395       0.6545       0.1850      28.3%
Prostate_Cancer                 0.8312       0.6636       0.1676      25.2%
ASCVD                           0.8809       0.7370       0.1439      19.5%
Atrial_Fib                      0.7966       0.6721       0.1245      18.5%
Breast_Cancer                   0.7818       0.6985       0.0833   

In [8]:
# ============================================================================
# COMPARE MULTI-HORIZON WITH DELPHI
# ============================================================================
"""
Compares Aladynoulli multi-horizon predictions (5yr, 10yr, 30yr) with Delphi 1-year
Results saved to: results/comparisons/pooled_retrospective/delphi_comparison_multihorizon.csv
"""

%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/compare_delphi_multihorizon.py


ALADYNOULLI vs DELPHI: MULTI-HORIZON COMPARISON

Loading Aladynoulli multi-horizon results...
Loaded Aladynoulli results for 28 diseases
Horizons available: 1yr (0gap), 1yr (1gap), 5yr, 10yr, 30yr, static10yr

Extracting Delphi results from supplementary table...
Extracted Delphi results for 28 diseases

Creating multi-horizon comparison...

ALADYNOULLI PERFORMANCE ACROSS HORIZONS vs DELPHI (1-YEAR, 0-GAP)

Disease                       Delphi    Ala_1yr    Ala_5yr   Ala_10yr   Ala_30yr Ala_st10yr
----------------------------------------------------------------------------------------------------
Parkinsons                    0.6108     0.8091     0.7306     0.7237     0.6219     0.7231
Secondary_Cancer              0.5691     0.6001     0.6093     0.6104     0.5615     0.6103
Prostate_Cancer               0.6636     0.8312     0.7266     0.6873     0.6773     0.6828
ASCVD                         0.7370     0.8809     0.7575     0.7299     0.7047     0.7329
Osteoporosis                

In [9]:
# ============================================================================
# COMPARE WITH COX BASELINE
# ============================================================================
"""
Compares Aladynoulli static 10-year predictions with Cox baseline (age + sex only)
Results saved to: results/comparisons/pooled_retrospective/cox_baseline_comparison_*.csv
"""

%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/compare_with_cox_baseline.py


ALADYNOULLI vs COX BASELINE (AGE + SEX ONLY, NO NOULLI)
Cox model: 10-year follow-up (trained on 20001-30000, tested on 0-10000)
Aladynoulli: Static 10-year (1-year score for 10-year outcome)

Loading Cox baseline results...
✓ Loaded Cox baseline results: 28 diseases
  Columns: ['Unnamed: 0', 'disease_group', 'auc']

First few rows:
    Unnamed: 0      Disease  Cox_AUC
0        ASCVD        ASCVD   0.6338
1     Diabetes     Diabetes   0.6003
2   Atrial_Fib   Atrial_Fib   0.5883
3          CKD          CKD   0.5292
4  All_Cancers  All_Cancers   0.5411

Loading Aladynoulli static 10-year results...
Note: Cox model uses 10-year follow-up, comparing with static 10-year (1-year score for 10-year outcome)
✓ Loaded Aladynoulli static 10-year results: 28 diseases

Merging results...
✓ Merged results: 28 diseases
  Diseases in both: 28
  Only in Cox: 0
  Only in Aladynoulli: 0

ALADYNOULLI vs COX BASELINE: STATIC 10-YEAR PREDICTIONS

Disease                      Cox_AUC Aladynoulli_AUC  Improve

In [10]:
# ============================================================================
# ANALYZE PREDICTION DROPS
# ============================================================================
"""
Analyzes why predictions drop between 0-year and 1-year washout
Focuses on hypercholesterolemia and other precursor diseases
Results saved to: results/analysis/prediction_drops_*.csv
"""

%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/analyze_prediction_drops.py --disease ASCVD


Loading data...
ANALYZING PREDICTION DROPS FOR: ASCVD
'ASCVD' is a disease group. Finding individual diseases...
  Found: Myocardial infarction at index 112
  Found: Coronary atherosclerosis at index 114
  Found: Other acute and subacute forms of ischemic heart disease at index 116
  Found: Unstable angina (intermediate coronary syndrome) at index 111
  Found: Angina pectoris at index 113
  Found: Other chronic ischemic heart disease, unspecified at index 115
Found 6 disease(s) for 'ASCVD'

Analyzing 400000 patients...

Collecting predictions and outcomes...
Collected 400000 patients with both 0yr and 1yr predictions

NOTE: Prevalent case exclusion (matches evaluation function logic):
  - For single diseases: Patients with that disease before prediction time are excluded
  - For disease groups (like ASCVD): Prevalent cases are NOT excluded
    (patients can have multiple events in the group, e.g., CAD then MI)
  - This matches the evaluation function's approach for disease groups

Pred

---

## SECTION 5: VISUALIZATIONS

**Safe to Run Multiple Times**: Creates plots from existing results


In [11]:
# ============================================================================
# VISUALIZE PREDICTION DROPS
# ============================================================================
"""
Creates plots for prediction drops analysis
Plots saved to: results/analysis/plots/
"""

%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/visualize_prediction_drops.py --disease ASCVD


VISUALIZING PREDICTION DROPS ANALYSIS: ASCVD

Loading results...
✓ Loaded 3 result files

Creating plots...
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots/hyperchol_comparison_ASCVD.png


  return points[1, 0] - points[0, 0]
posx and posy should be finite values
posx and posy should be finite values


✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots/precursor_comparison_ASCVD.png


posx and posy should be finite values


✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots/precursor_ratios_ASCVD.png

VISUALIZATION COMPLETE

Plots saved to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots


In [12]:
# ============================================================================
# VISUALIZE MODEL LEARNING (KEY INSIGHT FIGURE)
# ============================================================================
"""
Creates a figure showing the key insight: Model learns to distinguish between
high-risk and lower-risk hypercholesterolemia patients.

Non-droppers (predictions stay high) have HIGHER event rates → Model correctly
identifies high-risk patients. This shows the model is learning and calibrating.
"""
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/visualize_model_learning.py --disease ASCVD


CREATING MODEL LEARNING FIGURES
✓ Saved figure to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots/model_learning_hyperchol_ASCVD.png
✓ Saved full comparison figure to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots/model_learning_full_comparison_ASCVD.png
✓ Saved multiple precursors figure to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots/model_learning_multiple_precursors_ASCVD.png

COMPLETE


In [13]:
# ============================================================================
# VISUALIZE ALL COMPARISONS
# ============================================================================
"""
Creates plots for all comparisons (external scores, Delphi, etc.)
Plots saved to: results/comparisons/plots/
"""

%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/visualize_all_comparisons.py


VISUALIZING ALL COMPARISONS

1. Loading external scores comparison...
   Columns in CSV: ['Aladynoulli_AUC', 'Aladynoulli_CI_lower', 'Aladynoulli_CI_upper', 'PCE_AUC', 'PCE_CI_lower', 'PCE_CI_upper', 'Difference', 'N_patients', 'N_events', 'QRISK3_AUC', 'QRISK3_CI_lower', 'QRISK3_CI_upper', 'QRISK3_Difference', 'PREVENT_AUC', 'PREVENT_CI_lower', 'PREVENT_CI_upper', 'Gail_AUC', 'Gail_CI_lower', 'Gail_CI_upper']
   Index: ['ASCVD_10yr', 'ASCVD_30yr', 'Breast_Cancer_10yr_Female', 'Breast_Cancer_10yr_Male', 'Breast_Cancer_10yr_All']
   Creating external scores comparison plot...
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/comparisons/plots/external_scores_comparison.png

2. Creating Delphi comparison plot...
   Columns in Delphi file: ['Aladynoulli_1yr_0gap', 'Delphi_1yr_0gap', 'Diff_0gap', 'Aladynoulli_1yr_1gap', 'Delphi_1yr_1gap', 'Diff_1gap']
✓ Saved plot to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/

---

## SECTION 7: AGE OFFSET SIGNATURE ANALYSIS (OPTIONAL)

**Advanced Analysis**: How predictions and signatures change across training time points (t0-t9)


In [19]:
# ============================================================================
# MI WASHOUT ANALYSIS: SIGNATURE-BASED LEARNING
# ============================================================================
"""
Analyzes MI (Myocardial Infarction) washout with signature-based learning.

For each patient, tracks:
- 3 MODELS: m0t9, m5t9, m9t9 (all predict at t9, trained to t0, t5, t9)
- 3 TIME PERIODS: 
  1) Baseline (before t0/enrollment)
  2) Interval t0-t5
  3) Interval t5-t9
- For each period: MI status and Signature 5 precursor diseases

Categorizes washout based on what developed in intervals (not baseline).
"""

import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts')

result = subprocess.run([
    sys.executable,
    str(script_dir / 'analyze_mi_washout_signature.py'),
    '--start_idx', '0',
    '--end_idx', '10000'
], capture_output=True, text=True)

print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")

MI WASHOUT ANALYSIS WITH SIGNATURE-BASED LEARNING
Batch: 0-10000

Loading essentials...
✓ Found MI at index 112: Myocardial infarction

Loading cluster assignments...
✓ Loaded clusters: 348 diseases

✓ MI belongs to Signature 5
✓ Found 7 diseases in Signature 5
  Examples: ['Hypercholesterolemia', 'Unstable angina (intermediate coronary syndrome)', 'Myocardial infarction', 'Angina pectoris', 'Coronary atherosclerosis']

Loading data batch 0-10000...

Loading pi batches for offsets 0-9...
✓ Loaded 10 pi batches

Loading model checkpoints to extract lambda...

ANALYZING MI WASHOUT

Analyzing 10000 patients...
MI index: 112
Signature 5 has 7 diseases

✓ Saved results to: results/analysis/mi_washout_analysis_batch_0_10000.csv

SUMMARY STATISTICS
Total patients analyzed: 10000

Washout categories:
washout_category
neither         8694
accurate        1028
conservative     278
Name: count, dtype: int64

MI status at t9: 435 patients (4.3%)

Patients with Signature 5 precursors at t9: 1600 (1

In [20]:
%run /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts/visualize_mi_washout_signature.py

VISUALIZING MI WASHOUT ANALYSIS
✓ Saved figure to: /Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/results/analysis/plots/mi_washout_signature_analysis.png

VISUALIZATION COMPLETE


In [18]:
# ============================================================================
# ANALYZE AGE OFFSET SIGNATURE CHANGES
# ============================================================================
"""
Analyzes how predictions and signature loadings change across age offsets (t0-t9).

For patients with specific precursor diseases, tracks:
1. How their predictions change across offsets 0-9
2. Which signatures/clusters are most impacted
3. Which precursor diseases drive which signature changes

This shows how the model learns and adapts as more data becomes available.
"""

# Run analysis for key precursor diseases
import subprocess
import sys
from pathlib import Path

script_dir = Path('/Users/sarahurbut/aladynoulli2/pyScripts/dec_6_revision/new_notebooks/pythonscripts')

result = subprocess.run([
    sys.executable,
    str(script_dir / 'analyze_age_offset_signatures.py'),
    '--approach', 'pooled_retrospective',
    '--target_disease', 'ASCVD',
    '--start_idx', '0',
    '--end_idx', '10000'
], capture_output=True, text=True)

print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
if result.returncode != 0:
    print(f"\n⚠️  WARNING: Script exited with return code {result.returncode}")


ANALYZING AGE OFFSET SIGNATURE CHANGES

Approach: pooled_retrospective
Batch: 0-10000
Target disease: ASCVD
Precursor diseases: ['Hypercholesterolemia', 'Essential hypertension', 'Type 2 diabetes', 'Atrial fibrillation and flutter', 'Obesity', 'Chronic Kidney Disease, Stage III', 'Rheumatoid arthritis', 'Sleep apnea', 'Peripheral vascular disease, unspecified']

Loading essentials...
Loading cluster assignments...
  ✓ Loaded clusters as numpy array from: /Users/sarahurbut/Library/CloudStorage/Dropbox-Personal/data_for_running/initial_clusters_400k.pt
  Cluster shape: (348,)
✓ Loaded clusters: 348 diseases, 20 clusters

Loading data batch 0-10000...

Loading pi batches for offsets 0-9...
  Loading offset 0...
  Loading offset 1...
  Loading offset 2...
  Loading offset 3...
  Loading offset 4...
  Loading offset 5...
  Loading offset 6...
  Loading offset 7...
  Loading offset 8...
  Loading offset 9...
✓ Loaded 10 pi batches

Loading model checkpoints to extract lambda (patient-specifi

## Summary: Age Offset Signature Analysis

**Question:** When models are trained with different amounts of data (washout periods), how do patient-specific parameters (lambda) change, and does this reflect conservative vs. accurate washout?

**Findings:**

1. **Conservative washout (with outcome events):**
   - Patients who had ASCVD events during washout
   - Signature 5 (cardiovascular cluster) shows large positive lambda changes (+0.587 for hypercholesterolemia)
   - Model learns from patients who already had outcomes

2. **Accurate washout (with precursor only):**
   - Patients with precursors (e.g., hypercholesterolemia) but no ASCVD outcome during washout
   - Signature 5 shows moderate positive lambda changes (+0.305)
   - Model learns from pre-clinical signals (risk factors before outcomes)

3. **Model refinement (without either):**
   - Patients with neither precursor nor outcome
   - Small negative lambda changes (-0.053)
   - Model becomes more conservative/refined

**Interpretation:**
- The model distinguishes between:
  - Real conditions (outcomes) → large changes
  - Pre-clinical signals (precursors) → moderate changes
  - Neither → small/negative changes
- This validates washout accuracy: the model learns from legitimate risk factors, not just future outcomes
- Signature 5 correctly responds to cardiovascular precursors even when outcomes haven't occurred yet

**Conclusion:** This pattern supports model validity and washout accuracy. The model learns appropriately from pre-clinical signals, which is the intended behavior for accurate washout.

---

## SECTION 6: SUMMARY & KEY FINDINGS

**Read-Only**: Summary of results and interpretations


### Key Findings

**Performance Summary:**
- Aladynoulli outperforms external risk scores (PCE, PREVENT, QRISK3) for ASCVD
- Comparable to Gail model for breast cancer (women only)
- Outperforms Delphi-2M for most diseases across multiple time horizons
- Significantly better than Cox baseline (age + sex only)

**Prediction Drops Analysis:**
- Prediction drops between 0yr and 1yr washout are **expected behavior**
- Model refines/calibrates predictions based on observed outcomes
- Hypercholesterolemia patients show largest drops (model learns to distinguish high-risk vs lower-risk)
- Similar behavior to Delphi - both models learn and calibrate over time

**Reproducibility:**
- AWS vs Local differences are negligible when aggregated
- Per-batch differences average out over large populations
- Results are reproducible across computational environments

**Model Strengths:**
- Multi-horizon predictions (1yr, 5yr, 10yr, 30yr)
- Dynamic risk assessment (updates with new information)
- Works across both sexes (unlike Gail which is women-only)
- Handles multiple diseases simultaneously
