# ðŸ§¬ FragMentor Quickstart

**From BAM to biological insight in minutes.**

This notebook demonstrates the core functionality of FragMentor for cfDNA fragmentomics analysis.

## Installation

```bash
pip install fragmentomics
```

In [None]:
# Import FragMentor
import fragmentomics
from fragmentomics import FragMentor, analyze_sizes, plot_size_distribution

print(f"FragMentor v{fragmentomics.__version__}")

## 1. Fragment Size Analysis

The most fundamental fragmentomics analysis - extracting and analyzing the distribution of cfDNA fragment sizes.

In [None]:
# Option 1: Using the high-level FragMentor interface
# fm = FragMentor("path/to/sample.bam")
# dist = fm.sizes()

# Option 2: For this demo, we'll create synthetic data
import numpy as np
np.random.seed(42)

# Simulate healthy cfDNA fragment sizes
healthy_sizes = np.concatenate([
    np.random.normal(167, 20, size=8000),   # Mononucleosome peak
    np.random.normal(334, 30, size=1500),   # Dinucleosome
    np.random.normal(120, 15, size=300),    # Short fragments
]).astype(np.int32)
healthy_sizes = healthy_sizes[(healthy_sizes >= 50) & (healthy_sizes <= 500)]

print(f"Generated {len(healthy_sizes):,} synthetic fragments")

In [None]:
# Analyze the size distribution
dist = analyze_sizes(healthy_sizes)

# Print summary
print(dist.summary())

In [None]:
# Visualize the distribution
import matplotlib.pyplot as plt

fig, ax = plot_size_distribution(dist, title="Healthy cfDNA Sample")
plt.show()

## 2. Key Fragmentomics Features

FragMentor extracts several features known to be important for cancer detection:

In [None]:
# Access individual features
print("=== Fragmentomics Features ===")
print(f"Short fragment ratio (<150bp): {dist.ratio_short:.1%}")
print(f"Mononucleosome ratio (140-180bp): {dist.ratio_mono:.1%}")
print(f"Dinucleosome ratio (280-360bp): {dist.ratio_di:.1%}")
print(f"Mononucleosome peak: {dist.peak_mono} bp")
print(f"Dinucleosome peak: {dist.peak_di} bp")
print(f"10bp periodicity score: {dist.periodicity_10bp:.3f}")

## 3. Comparing Healthy vs Cancer Samples

Cancer samples typically show elevated short fragments (<150bp) due to increased apoptosis and altered nucleosome positioning.

In [None]:
# Simulate cancer cfDNA (more short fragments)
np.random.seed(43)
cancer_sizes = np.concatenate([
    np.random.normal(165, 25, size=6000),   # Broader mono peak
    np.random.normal(330, 35, size=1000),   # Dinucleosome
    np.random.normal(115, 20, size=2500),   # Elevated short fragments!
]).astype(np.int32)
cancer_sizes = cancer_sizes[(cancer_sizes >= 50) & (cancer_sizes <= 500)]

cancer_dist = analyze_sizes(cancer_sizes)

# Compare
print("Feature Comparison:")
print(f"{'Feature':<30} {'Healthy':>10} {'Cancer':>10}")
print("-" * 52)
print(f"{'Short ratio (<150bp)':<30} {dist.ratio_short:>10.1%} {cancer_dist.ratio_short:>10.1%}")
print(f"{'Mono ratio (140-180bp)':<30} {dist.ratio_mono:>10.1%} {cancer_dist.ratio_mono:>10.1%}")
print(f"{'Median size':<30} {dist.median:>10.0f} {cancer_dist.median:>10.0f}")

In [None]:
# Visual comparison
from fragmentomics.viz.plots import plot_size_comparison

fig, ax = plot_size_comparison(
    [dist, cancer_dist],
    ["Healthy", "Cancer"],
    title="Fragment Size Distribution: Healthy vs Cancer"
)
plt.show()

## 4. Exporting Features for Machine Learning

FragMentor makes it easy to extract features for downstream ML analysis.

In [None]:
# Export as dictionary (for pandas/JSON)
features = dist.to_dict()

# Show available features
print("Available features:")
for key, value in features.items():
    if not key.startswith('motif'):
        print(f"  {key}: {value}")

In [None]:
# Create a simple feature matrix
import pandas as pd

samples = [
    {"sample": "healthy_1", **dist.to_dict()},
    {"sample": "cancer_1", **cancer_dist.to_dict()},
]

df = pd.DataFrame(samples)
feature_cols = ['ratio_short', 'ratio_mono', 'ratio_di', 'median', 'peak_mono']
df[['sample'] + feature_cols]

## 5. Command Line Interface

FragMentor also provides a powerful CLI:

```bash
# Analyze fragment sizes
fragmentomics sizes sample.bam -o results/

# Analyze end motifs (requires reference)
fragmentomics motifs sample.bam -r hg38.fa -o results/

# Extract all features
fragmentomics extract sample.bam -r hg38.fa -f all -o results/

# Get BAM info
fragmentomics info sample.bam
```

## Next Steps

- **Tutorial 02**: End motif analysis
- **Tutorial 03**: Building a cancer classifier
- **Tutorial 04**: Working with real data

---

**FragMentor** â€” See what others miss. ðŸ§¬