# Methylation Clocks Analysis

This notebook explores epigenetic (DNA methylation) clocks and their validation.

## Background
DNA methylation clocks are predictive models that estimate biological age based on DNA methylation patterns at specific CpG sites. These clocks have emerged as powerful biomarkers of aging and age-related diseases.


In [None]:
# Import required libraries
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from pathlib import Path

# Set up output directory
output_dir = Path('../analysis')
output_dir.mkdir(exist_ok=True)


## Epigenetic Clock Validation Plot

This section creates a validation plot showing the correlation between chronological age and predicted DNAm (DNA methylation) age.

Key metrics for clock validation:
- **MAE (Mean Absolute Error)**: Average absolute difference between predicted and actual age
- **RMSE (Root Mean Square Error)**: Square root of average squared differences
- **R² (Coefficient of Determination)**: Proportion of variance explained by the model


In [None]:
# Simulate validation data (n=100)
np.random.seed(42)
chronological_age = np.linspace(20, 90, 100)
# Add some noise to simulate biological variation
dnam_age = chronological_age + np.random.normal(0, 4, 100) 

plt.figure(figsize=(8, 6))
sns.regplot(x=chronological_age, y=dnam_age, color="#2b5c8a", scatter_kws={'alpha':0.6})
plt.plot([20, 90], [20, 90], ls="--", c="red", label="Perfect Correlation")
plt.xlabel("Chronological Age (Years)")
plt.ylabel("DNAm Age (Predicted)")
plt.title("Benchmarking ROGEN Epigenetic Clock (Validation Set)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig(output_dir / "Clock_Validation_Nov.png", dpi=300, bbox_inches='tight')
plt.show()

print("Plot saved to:", output_dir / "Clock_Validation_Nov.png")


## Calculate Validation Metrics


In [None]:
# Calculate performance metrics
from scipy.stats import pearsonr

# Mean Absolute Error
mae = np.mean(np.abs(dnam_age - chronological_age))

# Root Mean Square Error
rmse = np.sqrt(np.mean((dnam_age - chronological_age)**2))

# Pearson correlation coefficient
r, p_value = pearsonr(chronological_age, dnam_age)
r_squared = r**2

print("="*50)
print("ROGEN Epigenetic Clock - Validation Metrics")
print("="*50)
print(f"Mean Absolute Error (MAE): {mae:.2f} years")
print(f"Root Mean Square Error (RMSE): {rmse:.2f} years")
print(f"Pearson Correlation (r): {r:.4f}")
print(f"R² Score: {r_squared:.4f}")
print(f"P-value: {p_value:.2e}")
print("="*50)


## Interpretation

The validation plot shows:
- Strong correlation between chronological and DNAm age
- Slight biological variation (noise) around the diagonal
- Red dashed line represents perfect age prediction

### Future Directions
1. Integration with real methylation data (450K/EPIC arrays)
2. Cross-validation across different cohorts
3. Analysis of age acceleration (difference between DNAm age and chronological age)
4. Association with longevity genes and pathways


## Notable Epigenetic Clocks

### First Generation Clocks
- **Horvath Clock (2013)**: 353 CpG sites, multi-tissue predictor
- **Hannum Clock (2013)**: 71 CpG sites, blood-specific

### Second Generation Clocks
- **PhenoAge (2018)**: Predicts phenotypic age incorporating clinical biomarkers
- **GrimAge (2019)**: Predicts mortality and healthspan

### Third Generation Clocks
- **DunedinPACE (2022)**: Measures pace of aging
- Deep learning-based clocks using neural networks
