## 01-02-2025

# Module 4: PCA & Feature Selection

**Purpose:**
Perform PCA separately on fragmentomics and methylation features. Select top discriminative features for classification.

**Strategy:**
1. **Fragmentomics PCA:** Fragment size + motifs + top 100 coverage bins
2. **Methylation PCA:** Global stats + top 100 regional bins
3. **Feature Selection:** Top 50 features from each modality

**Outputs:**
- PCA plots (2 figures)
- Selected features (100 total)
- Feature importance rankings

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image, display

# Import Module 4
from src.feature_selection import run_module_4_supervised

## Option 1: Quick Run - Execute Complete Module 4

In [4]:
# Run complete Module 4 pipeline
selected_disc, selected_val = run_module_4_supervised()

In [None]:
print("PCA Visualizations:")
print("=" * 70)

# Display PCA plots
for plot_file in sorted(PCA_FIGURES_DIR.glob('*.png')):
    print(f"\n{plot_file.stem.replace('_', ' ').title()}")
    display(Image(filename=str(plot_file)))

In [None]:
print("Top 10 Fragmentomics Features:")
print("=" * 70)
print(frag_importance.head(10)[['feature', 'p_value', 'effect_size']].to_string(index=False))

print("\n\nTop 10 Methylation Features:")
print("=" * 70)
print(meth_importance.head(10)[['feature', 'p_value', 'effect_size']].to_string(index=False))