# SPATCH Workflow Example

This notebook demonstrates the full SPATCH analysis pipeline built on SpatialData + Sopa.

## Stages:
1. Load spatial transcriptomics data
2. Segmentation and aggregation (Sopa)
3. Preprocessing and QC (scanpy)
4. Clustering (squidpy)
5. Custom SPATCH analysis modules

In [None]:
# Core libraries
import spatialdata as sd
import spatialdata_io as sdio
import sopa
import scanpy as sc
import squidpy as sq

# Custom SPATCH modules
from spatch_modules import get_module, list_modules, run_single_module

# Visualization
import matplotlib.pyplot as plt

# Configure
sc.settings.verbosity = 2
sc.logging.print_header()

## 1. List Available Modules

In [None]:
# Show all registered SPATCH modules
for module in list_modules():
    print(f"{module['name']:30} v{module['version']:8} [{module['category']:10}] {module['description']}")

## 2. Load Spatial Data

SpatialData-io provides native loaders for all major platforms.

In [None]:
# Load Xenium data (example)
DATA_PATH = "/path/to/xenium_output/"  # Update this path

sdata = sdio.xenium(DATA_PATH)
print(sdata)

## 3. Segmentation and Aggregation (Sopa)

Sopa handles cell segmentation, boundary resolution, and transcript aggregation.

In [None]:
# Patch-based segmentation for memory efficiency
sopa.make_image_patches(sdata)

# Cellpose segmentation on DAPI channel
sopa.segmentation.cellpose(
    sdata,
    channels="DAPI",
    diameter=30,
    model_type="cyto2"
)

# Resolve conflicts at patch boundaries
sopa.resolve_conflicts(sdata)

# Aggregate transcripts per cell
sopa.aggregate(sdata)

## 4. Preprocessing and QC (scanpy)

Standard single-cell preprocessing workflow.

In [None]:
# Get the cell x gene table
adata = sdata.tables["table"]

# Basic QC
sc.pp.calculate_qc_metrics(adata, inplace=True)

# Filter cells and genes
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

# Normalize
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# HVG selection
sc.pp.highly_variable_genes(adata, n_top_genes=2000)

# PCA
sc.pp.pca(adata, n_comps=50)

# Neighborhood graph
sc.pp.neighbors(adata, n_neighbors=15)

# UMAP
sc.tl.umap(adata)

print(f"Preprocessed: {adata.n_obs} cells, {adata.n_vars} genes")

## 5. Clustering

In [None]:
# Spatial neighbors graph (for spatial analyses)
sq.gr.spatial_neighbors(adata, coord_type="generic", n_neighs=15)

# Leiden clustering
sc.tl.leiden(adata, resolution=0.5, key_added="leiden_0.5")
sc.tl.leiden(adata, resolution=1.0, key_added="leiden_1.0")

print(f"Found {adata.obs['leiden_0.5'].nunique()} clusters at resolution 0.5")

## 6. SPATCH Custom Modules

Run SPATCH-specific analyses using custom modules.

### 6.1 Diffusion Analysis

Compare in-tissue vs out-of-tissue signal to quantify transcript diffusion.

In [None]:
# Run diffusion analysis
sdata = run_single_module(
    sdata,
    "diffusion_analysis",
    table_key="table",
    in_tissue_col="in_tissue",
    compute_distances=True
)

# View diffusion metrics
diffusion_metrics = sdata.tables["diffusion_metrics"]
print(diffusion_metrics.obs.head())

### 6.2 Cell Shape Metrics

Compute morphological measurements from cell boundaries.

In [None]:
# Run cell shape analysis
sdata = run_single_module(
    sdata,
    "cell_shape_metrics",
    boundaries_key="cell_boundaries",
    table_key="table"
)

# View shape metrics
adata = sdata.tables["table"]
print(adata.obs[["area_um2", "circularity", "eccentricity", "solidity"]].describe())

### 6.3 Gene-Protein Correlation (Multi-modal)

If you have paired CODEX data, run cross-modal correlation analysis.

In [None]:
# Load CODEX data (if available)
CODEX_PATH = "/path/to/codex_output/"  # Update this path

codex_loader = get_module("codex_loader", data_path=CODEX_PATH)
sdata_codex = codex_loader.run(None).sdata

# Merge into single SpatialData object (after registration)
# sdata.tables["codex_table"] = sdata_codex.tables["codex_table"]

# Run correlation analysis
# sdata = run_single_module(
#     sdata,
#     "gene_protein_correlation",
#     gene_table_key="table",
#     protein_table_key="codex_table",
#     resolution_um=[100, 200, 300, 400, 500]
# )

## 7. Visualization

In [None]:
# UMAP colored by cluster
sc.pl.umap(adata, color=["leiden_0.5", "leiden_1.0"], wspace=0.4)

In [None]:
# Spatial plot (using squidpy)
sq.pl.spatial_scatter(adata, color="leiden_0.5", size=1)

In [None]:
# Cell shape distributions
fig, axes = plt.subplots(1, 3, figsize=(12, 4))

adata.obs["area_um2"].hist(ax=axes[0], bins=50)
axes[0].set_xlabel("Area (µm²)")
axes[0].set_title("Cell Area Distribution")

adata.obs["circularity"].hist(ax=axes[1], bins=50)
axes[1].set_xlabel("Circularity")
axes[1].set_title("Cell Circularity Distribution")

adata.obs["eccentricity"].hist(ax=axes[2], bins=50)
axes[2].set_xlabel("Eccentricity")
axes[2].set_title("Cell Eccentricity Distribution")

plt.tight_layout()
plt.show()

## 8. Save Results

In [None]:
# Save SpatialData object
sdata.write("results/processed.zarr")

# Export AnnData table separately
adata.write("results/processed_table.h5ad")

print("Results saved!")