# R3 Q8: Heterogeneity Definition

## Reviewer Question

**Referee #3, Question 8**: "You talk about heterogeneity, both patient and biological heterogeneity. In the literature one sometimes talks about disease heterogeneity, but it's not clear to me whether that's what you mean. Please clarify what you mean by heterogeneity."

## Why This Matters

Clarifying heterogeneity is essential for:
- Understanding what signatures capture
- Interpreting model outputs
- Communicating findings to clinical audiences

## Our Approach

We clarify heterogeneity through **pathway analysis** that demonstrates:

1. **Patient Heterogeneity**: Different individuals with same diagnosis have different signature profiles
2. **Biological Heterogeneity**: Same phenotype arises from different biological pathways
3. **Disease Heterogeneity**: Umbrella term capturing that clinical diagnoses are not single entities

---

## Key Findings

✅ **4 distinct pathways** to myocardial infarction identified  
✅ **44.8% of MI patients** follow "Hidden Risk" pathway (minimal pre-existing disease)  
✅ **Different pathways** show different signature patterns and genetic risk profiles

---


## Pathway Analysis: 4 Distinct Pathways to Myocardial Infarction

We identified 4 distinct biological pathways to MI using deviation-from-reference clustering (10-year lookback).


In [None]:
import pandas as pd

# Pathway summary (from heterogeneity analysis)
pathways = pd.DataFrame({
    'Pathway': ['Pathway 0: Progressive Ischemia', 'Pathway 1: Hidden Risk', 
                'Pathway 2: Multimorbid Inflammatory', 'Pathway 3: Metabolic'],
    'Size': [1836, 11108, 4439, 7420],
    'Percent': [7.4, 44.8, 17.9, 29.9],
    'Age at MI': [70, 66, 72, 62],
    'Key Characteristics': [
        '86% have CAD (highest), chronic progressive disease',
        'Minimal pre-existing disease (8% CAD), low genetic risk',
        '35% arthropathy, 26% GI disease, inflammatory pathway',
        '32% diabetes (highest), metabolic syndrome, youngest onset'
    ]
})

print("="*80)
print("PATHWAYS TO MYOCARDIAL INFARCTION")
print("="*80)
display(pathways)


## Definitions of Heterogeneity

### 1. Patient Heterogeneity
- **Definition**: Different individuals with the same disease diagnosis have different signature profiles
- **Example**: Two patients with CAD may have different metabolic vs. inflammatory signatures
- **Measurement**: Average pairwise distance in signature space
- **Finding**: Average within-disease signature distance = 3.87 (substantial heterogeneity)

### 2. Biological Heterogeneity
- **Definition**: Same clinical phenotype can arise from different biological pathways
- **Example**: CAD can be driven by metabolic dysfunction, inflammation, or genetic factors
- **Measurement**: Signature diversity within disease groups
- **Finding**: 4 distinct pathways to MI demonstrate biological heterogeneity

### 3. Disease Heterogeneity (What We Mean)
- **Definition**: Umbrella term encompassing both patient and biological heterogeneity
- **Observation**: "CAD" is not a single entity but a collection of related conditions
- **Model Capture**: Individual-specific signature loadings capture this heterogeneity


## Summary & Response Text

### Key Findings

1. **Patient Heterogeneity**: Different individuals with same diagnosis show different signature profiles (average distance = 3.87)
2. **Biological Heterogeneity**: Same phenotype (MI) arises from 4 distinct pathways
3. **Disease Heterogeneity**: Clinical diagnoses are not single entities—our model captures this through individual-specific signatures

### Response to Reviewer

> "We clarify our use of 'heterogeneity' as follows: (1) **Patient Heterogeneity**: Different individuals with the same disease diagnosis have different signature profiles (measured by average pairwise distance in signature space = 3.87). For example, two patients with CAD may have different metabolic vs. inflammatory signatures. (2) **Biological Heterogeneity**: Same clinical phenotype can arise from different biological pathways. For example, CAD can be driven by metabolic dysfunction, inflammation, or genetic factors. We demonstrate this through pathway analysis identifying 4 distinct pathways to myocardial infarction (progressive ischemia, hidden risk, inflammatory, metabolic). (3) **Disease Heterogeneity**: Umbrella term encompassing both patient and biological heterogeneity—the observation that 'CAD' is not a single entity but a collection of related conditions. Our model captures this through individual-specific signature loadings. We now consistently use these terms throughout the manuscript and define them in Methods."

### References

- Pathway analysis: `heterogeneity_analysis_summary.ipynb`
- Pathway discovery: `run_complete_pathway_analysis_deviation_only.py`
