## 01-02-2026

# Module 3: Exploratory Analysis & Visualization

**Purpose:**
Generate required plots for the assignment and perform exploratory data analysis.

**Required Assignment Plots:**
1. ✅ Fragment length distribution
2. ✅ Start and end position distributions
3. ✅ End motif distribution
4. ✅ Methylation analysis

**Additional Analysis:**
- PCA and clustering
- Univariate statistical tests
- Feature importance identification

**Inputs:**
- `data/processed/all_features.csv` (from Module 2)

**Outputs:**
- `results/figures/visualization/` - Required plots
- `results/figures/eda/` - Exploratory plots
- `results/tables/univariate_comparisons.csv` - Statistical tests

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import Module 3 functions
from src.visualization import (
    plot_fragment_distribution,
    plot_position_distributions,
    plot_motif_distribution,
    plot_methylation_analysis,
    perform_pca,
    univariate_tests,
    run_module_3
)

# Import config
from src.config import (
    ALL_FEATURES,
    VIZ_FIGURES_DIR,
    EDA_FIGURES_DIR,
    UNIVARIATE_TESTS
)

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

## Option 1: Quick Run - Generate All Plots

Run the complete Module 3 pipeline:

In [3]:
# Run complete Module 3 pipeline
# This will generate all required plots and perform EDA
results = run_module_3()

features_df = results['features']
univariate_results = results['univariate_results']


MODULE 3: Exploratory Analysis & Visualization

Loading features from: /Users/maggiebrown/Desktop/PrimaMente/wgbs_classifier/data/processed/all_features.csv
✓ Loaded features: 22 samples × 395 columns

GENERATING REQUIRED PLOTS FOR ASSIGNMENT

1. Fragment Length Distribution (REQUIRED)
----------------------------------------------------------------------
  ✓ Saved: fragment_length_distribution.png
  ✓ Saved: fragment_size_detailed.png
  ✓ Saved: fragment_size_bins.png

2. Start and End Position Distributions (REQUIRED)
----------------------------------------------------------------------
  ✓ Saved: position_distributions.png
  ✓ Saved: position_statistics.png

3. End Motif Distribution (REQUIRED)
----------------------------------------------------------------------
  ✓ Saved: end_motif_distribution.png
  ✓ Saved: motif_diversity.png
  ✓ Saved: motif_heatmap.png

4. Methylation Analysis (REQUIRED)
----------------------------------------------------------------------
  ✓ Saved: meth