# Spatial Transcriptomics Analysis Pipeline and Cell2location Mapping

This notebook performs quality control, normalization, and visualization on the **merged Visium** spatial transcriptomics data (GSE235315),Cell2location deconvolution of the Visium data using the with single-cell reference, followed by neighborhood enrichment analysis and cell-cell interaction graph construction. It estimates the abundance of each cell type at every spatial spot.
 Paper reference: PMC10832111

In [None]:
# =============================================================================
# Setup and Imports
# =============================================================================

import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import squidpy as sq
import cell2location
import warnings
warnings.filterwarnings('ignore')

# Additional packages for spatial analysis
from scipy.spatial.distance import pdist, squareform
from sklearn.neighbors import NearestNeighbors
import networkx as nx
from scipy import stats

# Configure settings
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=80, facecolor='white')

print("All packages loaded successfully!")
print(f"Scanpy version: {sc.__version__}")
print(f"Squidpy version: {sq.__version__}")
print(f"Cell2location version: {cell2location.__version__}")

In [None]:
"""
SPATIAL TRANSCRIPTOMICS BIOLOGICAL CONTEXT:

Spatial transcriptomics preserves the spatial organization of cells within tissue,
allowing us to study:
1. Cell type spatial distribution and organization
2. Cell-cell interactions and communication
3. Tissue architecture and microenvironments
4. Disease-related spatial patterns

Your dataset focuses on:
- Endothelial cells: vascular organization and angiogenesis
- Epithelial cells: tissue barriers and functional units
- Fibroblasts: stromal organization and ECM deposition
- Schwann cells: neural innervation patterns
- Stellate cells: activation foci and fibrotic progression

This spatial context is crucial for understanding organ function and pathology.
"""

# =============================================================================
# Step 1: Load Reference Single-Cell Data
# =============================================================================

"""
Data Loading Strategy:
1. Load CD45-negative data (your processed data)
2. Load CD45-positive data (immune cells)
3. Concatenate for comprehensive reference
4. Load spatial transcriptomics data

BIOLOGICAL RATIONALE:
- CD45-negative: stromal, epithelial, endothelial populations
- CD45-positive: immune cell populations (T cells, B cells, macrophages, etc.)
- Combined reference captures complete cellular ecosystem
- Essential for accurate spatial deconvolution
"""

def load_reference_data(cd45_neg_path, cd45_pos_path):
    """
    Load and concatenate CD45-negative and CD45-positive reference data
    
    Parameters:
    -----------
    cd45_neg_path : str
        Path to CD45-negative processed data (your data)
    cd45_pos_path : str
        Path to CD45-positive data
    
    Returns:
    --------
    adata_ref : AnnData
        Concatenated reference dataset
    """
    print("Loading reference single-cell datasets...")
    
    # Load CD45-negative data (your five cell types)
    adata_cd45_neg = sc.read_h5ad(cd45_neg_path)
    adata_cd45_neg.obs['CD45_status'] = 'CD45_negative'
    print(f"CD45-negative data: {adata_cd45_neg.n_obs} cells × {adata_cd45_neg.n_vars} genes")
    
    # Load CD45-positive data
    adata_cd45_pos = sc.read_h5ad(cd45_pos_path)
    adata_cd45_pos.obs['CD45_status'] = 'CD45_positive'
    print(f"CD45-positive data: {adata_cd45_pos.n_obs} cells × {adata_cd45_pos.n_vars} genes")
    
    # Concatenate datasets
    adata_ref = adata_cd45_neg.concatenate(adata_cd45_pos, batch_key='dataset_origin')
    
    print(f"Combined reference: {adata_ref.n_obs} cells × {adata_ref.n_vars} genes")
    
    # Update cell type annotations to avoid conflicts
    # Add CD45 status to cell type names for clarity
    adata_ref.obs['cell_type_detailed'] = (
        adata_ref.obs['cell_type'] + '_' + adata_ref.obs['CD45_status']
    )
    
    return adata_ref

def load_spatial_data(spatial_path):
    """
    Load spatial transcriptomics data
    
    Parameters:
    -----------
    spatial_path : str
        Path to spatial data (10X Visium or similar format)
    
    Returns:
    --------
    adata_spatial : AnnData
        Spatial transcriptomics data
    """
    print("Loading spatial transcriptomics data...")
    
    # Load spatial data - adjust based on your data format
    # For 10X Visium data:
    # adata_spatial = sc.read_visium(spatial_path)
    
    # Alternative loading methods:
    adata_spatial = sc.read_h5ad(spatial_path)  # If already processed
    # adata_spatial = sq.read.xenium(spatial_path)  # For Xenium data
    
    # Make variable names unique
    adata_spatial.var_names_unique()
    
    print(f"Spatial data: {adata_spatial.n_obs} spots × {adata_spatial.n_vars} genes")
    
    # Calculate basic QC metrics for spatial data
    adata_spatial.var['mt'] = adata_spatial.var_names.str.startswith('MT-')
    sc.pp.calculate_qc_metrics(adata_spatial, percent_top=None, log1p=False, inplace=True)
    
    return adata_spatial

# Load your data - modify these paths
CD45_NEG_PATH = "./data_and_results/scRNA_seq/GSE194247/adata_all_preprocessed_annotated.h5ad"  # CD45- data
CD45_POS_PATH = "./data_and_results/scRNA_seq/GSE235449/adata_processed_immune_cells.h5ad"  # CD45+ data 
SPATIAL_PATH = "./data_and_results/spatial/GSE235315/adata_spatial_merged.h5ad"  # Spatial data path

# Load reference data
adata_ref = load_reference_data(CD45_NEG_PATH, CD45_POS_PATH)
adata_spatial = load_spatial_data(SPATIAL_PATH)

BIOLOGICAL SIGNIFICANCE OF COMBINED REFERENCE:

CD45-NEGATIVE POPULATIONS (Your data):
- Endothelial: vascular structure and function
- Epithelial: tissue barriers and secretory function
- Fibroblasts: ECM production and tissue support
- Schwann cells: peripheral nerve support
- Stellate cells: tissue-specific stromal regulation

CD45-POSITIVE POPULATIONS (Immune cells):
- T cells: adaptive immunity and tissue surveillance
- B cells: antibody production and immune memory
- Macrophages: tissue homeostasis and immune response
- Dendritic cells: antigen presentation
- Neutrophils: acute inflammatory response

SPATIAL INTEGRATION IMPORTANCE:
- Vascular-immune interactions at vessel walls
- Epithelial-immune crosstalk at barriers
- Stromal-immune interactions in tissue remodeling
- Neural-immune communication in innervation
- Stellate cell activation by immune signals

This comprehensive reference enables accurate deconvolution of complex
tissue architecture with both resident and immune cell populations.

In [None]:
# =============================================================================
# Step 2: Spatial Data Quality Control and Preprocessing
# =============================================================================

"""
Spatial QC considerations:
1. Spot quality (similar to cell QC in scRNA-seq)
2. Spatial integrity (tissue morphology preservation)
3. Gene detection efficiency across tissue regions
4. Background signal assessment
"""

def spatial_qc_analysis(adata_spatial):
    """
    Perform quality control analysis for spatial data
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial transcriptomics data
    """
    print("Performing spatial quality control analysis...")
    
    # Calculate QC metrics if not already done
    if 'total_counts' not in adata_spatial.obs.columns:
        sc.pp.calculate_qc_metrics(adata_spatial, percent_top=None, log1p=False, inplace=True)
    
    # Spatial-specific QC visualizations
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    fig.suptitle('Spatial Transcriptomics Quality Control', fontsize=16)
    
    # Plot 1: Total UMI counts spatial distribution
    sq.pl.spatial_scatter(adata_spatial, color='total_counts', size=1.5, ax=axes[0, 0], show=False)
    axes[0, 0].set_title('Total UMI Counts\n(Sequencing Depth)')
    
    # Plot 2: Number of genes spatial distribution
    sq.pl.spatial_scatter(adata_spatial, color='n_genes_by_counts', size=1.5, ax=axes[0, 1], show=False)
    axes[0, 1].set_title('Number of Genes\n(Transcriptional Complexity)')
    
    # Plot 3: MT percentage spatial distribution
    if 'pct_counts_mt' in adata_spatial.obs.columns:
        sq.pl.spatial_scatter(adata_spatial, color='pct_counts_mt', size=1.5, ax=axes[0, 2], show=False)
        axes[0, 2].set_title('Mitochondrial Gene %\n(Tissue Quality)')
    else:
        axes[0, 2].text(0.5, 0.5, 'No MT data\navailable', ha='center', va='center', 
                       transform=axes[0, 2].transAxes, fontsize=12)
        axes[0, 2].set_title('Mitochondrial Gene %')
    
    # Plot 4: Histogram of UMI counts
    axes[1, 0].hist(adata_spatial.obs['total_counts'], bins=50, alpha=0.7, color='skyblue')
    axes[1, 0].set_xlabel('Total UMI counts')
    axes[1, 0].set_ylabel('Number of spots')
    axes[1, 0].set_title('UMI Count Distribution')
    
    # Plot 5: Histogram of gene counts
    axes[1, 1].hist(adata_spatial.obs['n_genes_by_counts'], bins=50, alpha=0.7, color='lightgreen')
    axes[1, 1].set_xlabel('Number of genes')
    axes[1, 1].set_ylabel('Number of spots')
    axes[1, 1].set_title('Gene Count Distribution')
    
    # Plot 6: UMI vs Gene scatter
    axes[1, 2].scatter(adata_spatial.obs['total_counts'], adata_spatial.obs['n_genes_by_counts'],
                      alpha=0.6, s=1)
    axes[1, 2].set_xlabel('Total UMI counts')
    axes[1, 2].set_ylabel('Number of genes')
    axes[1, 2].set_title('UMI vs Gene Complexity')
    
    plt.tight_layout()
    plt.show()
    
    # Print QC statistics
    print(f"\nSpatial QC Statistics:")
    print(f"Mean UMI per spot: {adata_spatial.obs['total_counts'].mean():.0f}")
    print(f"Mean genes per spot: {adata_spatial.obs['n_genes_by_counts'].mean():.0f}")
    if 'pct_counts_mt' in adata_spatial.obs.columns:
        print(f"Mean MT%: {adata_spatial.obs['pct_counts_mt'].mean():.1f}%")

def filter_spatial_data(adata_spatial, min_counts=500, min_genes=200):
    """
    Apply quality filters to spatial data
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data
    min_counts : int
        Minimum UMI counts per spot
    min_genes : int
        Minimum genes per spot
    
    Returns:
    --------
    adata_filtered : AnnData
        Filtered spatial data
    """
    print(f"Filtering spatial data...")
    print(f"Before filtering: {adata_spatial.n_obs} spots")
    
    # Filter spots
    sc.pp.filter_cells(adata_spatial, min_counts=min_counts)
    sc.pp.filter_cells(adata_spatial, min_genes=min_genes)
    
    # Filter genes (remove genes expressed in very few spots)
    sc.pp.filter_genes(adata_spatial, min_cells=3)
    
    print(f"After filtering: {adata_spatial.n_obs} spots × {adata_spatial.n_vars} genes")
    
    return adata_spatial

# Perform spatial QC
spatial_qc_analysis(adata_spatial)
adata_spatial = filter_spatial_data(adata_spatial)

BIOLOGICAL INTERPRETATION OF SPATIAL QC:

SPATIAL PATTERNS TO OBSERVE:
1. UMI count gradients may reflect tissue regions (e.g., core vs periphery)
2. Gene complexity variations indicate different tissue compartments
3. MT% patterns can reveal tissue damage or fixation artifacts
4. Uniform high quality suggests good tissue preservation

TISSUE-SPECIFIC EXPECTATIONS:
- Vascular regions: high endothelial marker expression
- Epithelial regions: organized structures with high epithelial markers
- Stromal areas: diffuse fibroblast and ECM signatures
- Neural regions: Schwann cell marker localization
- Activated zones: stellate cell activation markers

QUALITY INDICATORS:
- Consistent UMI counts across tissue regions
- Preserved tissue morphology in spatial plots
- Expected gene expression gradients
- Minimal edge effects or technical artifacts

Poor spatial quality can result from:
- Tissue damage during processing
- Incomplete permeabilization
- Technical variability in sequencing
- Fixation artifacts

In [None]:
# =============================================================================
# Step 3: Reference Data Preparation for Cell2location
# =============================================================================

"""
Cell2location requires properly formatted reference data with:
1. Normalized and log-transformed expression
2. Highly variable genes identified
3. Clean cell type annotations
4. Batch correction if needed
"""

def prepare_reference_for_cell2location(adata_ref):
    """
    Prepare reference data for Cell2location analysis
    
    Parameters:
    -----------
    adata_ref : AnnData
        Combined reference dataset
    
    Returns:
    --------
    adata_ref_prepared : AnnData
        Reference data ready for Cell2location
    """
    print("Preparing reference data for Cell2location...")
    
    # Save raw data
    adata_ref.raw = adata_ref
    
    # Normalize and log-transform
    sc.pp.normalize_total(adata_ref, target_sum=1e4)
    sc.pp.log1p(adata_ref)
    
    # Find highly variable genes
    sc.pp.highly_variable_genes(adata_ref, min_mean=0.0125, max_mean=3, min_disp=0.5)
    
    print(f"Found {adata_ref.var['highly_variable'].sum()} highly variable genes in reference")
    
    # Check cell type distribution
    print("\nReference cell type distribution:")
    cell_type_counts = adata_ref.obs['cell_type_detailed'].value_counts()
    for cell_type, count in cell_type_counts.items():
        percentage = (count / adata_ref.n_obs) * 100
        print(f"  {cell_type}: {count:,} cells ({percentage:.1f}%)")
    
    return adata_ref

adata_ref = prepare_reference_for_cell2location(adata_ref)

BIOLOGICAL SIGNIFICANCE OF REFERENCE PREPARATION:

CELL TYPE DIVERSITY IMPORTANCE:
- Comprehensive cell type representation ensures accurate deconvolution
- Each major cell type should have sufficient cells (>100) for robust signatures
- Rare cell types may need special handling or grouping

EXPECTED REFERENCE COMPOSITION:
CD45-negative populations:
- Endothelial: 5-15% (varies by tissue vascularity)
- Epithelial: 20-50% (organ-dependent)
- Fibroblasts: 10-30% (stromal content)
- Schwann: 1-5% (neural innervation)
- Stellate: 2-10% (activation state dependent)

CD45-positive populations:
- T cells: 30-60% of immune cells
- Macrophages: 20-40% of immune cells
- B cells: 5-20% of immune cells
- Other immune: 5-15% of immune cells

TECHNICAL CONSIDERATIONS:
- Balanced representation prevents deconvolution bias
- High-quality signatures from each cell type
- Consistent gene expression measurements
- Proper normalization for cross-dataset integration

In [None]:
# =============================================================================
# Step 4: Cell2location Spatial Deconvolution
# =============================================================================

"""
Cell2location methodology (from paper):
1. Uses Bayesian inference for spatial deconvolution
2. Estimates cellular abundance in each spatial spot
3. Uses 5% quantile estimates as final abundance values
4. Accounts for technical variability and spatial context
"""

def run_cell2location(adata_spatial, adata_ref, cell_type_col='cell_type_detailed'):
    """
    Run Cell2location spatial deconvolution
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial transcriptomics data
    adata_ref : AnnData
        Reference single-cell data
    cell_type_col : str
        Column containing cell type annotations
    
    Returns:
    --------
    adata_spatial : AnnData
        Spatial data with deconvolution results
    """
    print("Running Cell2location spatial deconvolution...")
    
    # Find shared genes between spatial and reference data
    shared_genes = list(set(adata_spatial.var_names) & set(adata_ref.var_names))
    print(f"Found {len(shared_genes)} shared genes for deconvolution")
    
    # Subset both datasets to shared genes
    adata_spatial_subset = adata_spatial[:, shared_genes].copy()
    adata_ref_subset = adata_ref[:, shared_genes].copy()
    
    # Prepare Cell2location model
    cell2location.models.Cell2location.setup_anndata(
        adata=adata_ref_subset,
        batch_key=None,  # Set to batch column if you have batch effects
        labels_key=cell_type_col
    )
    
    # Train reference model
    print("Training reference signature model...")
    mod_ref = cell2location.models.RegressionModel(adata_ref_subset)
    mod_ref.train(max_epochs=250, use_gpu=True)  # Set use_gpu=False if no GPU
    
    # Export estimated cell abundance (summary of the posterior distribution)
    adata_ref_subset = mod_ref.export_posterior(
        adata_ref_subset, sample_kwargs={'num_samples': 1000, 'batch_size': 2500, 'use_gpu': True}
    )
    
    # Save reference signatures
    ref_cell_type_df = adata_ref_subset.varm['means_per_cluster_mu_fg'][[f'means_per_cluster_mu_fg_{i}' 
                                                                        for i in adata_ref_subset.uns['mod']['factor_names']]].copy()
    ref_cell_type_df.columns = adata_ref_subset.uns['mod']['factor_names']
    
    # Setup spatial data for Cell2location
    cell2location.models.Cell2location.setup_anndata(adata=adata_spatial_subset)
    
    # Train spatial deconvolution model
    print("Training spatial deconvolution model...")
    mod_spatial = cell2location.models.Cell2location(
        adata_spatial_subset, 
        cell_state_df=ref_cell_type_df,
        N_cells_per_location=30,  # Expected number of cells per spot
        detection_alpha=20  # Sensitivity parameter
    )
    
    mod_spatial.train(max_epochs=30000, use_gpu=True)
    
    # Export spatial deconvolution results
    adata_spatial_subset = mod_spatial.export_posterior(
        adata_spatial_subset, 
        sample_kwargs={'num_samples': 1000, 'batch_size': 2500, 'use_gpu': True}
    )
    
    # Extract 5% quantile estimates (as specified in paper)
    cell_abundance_df = adata_spatial_subset.obsm['q05_cell_abundance_w_sf']
    
    # Add results to original spatial data
    adata_spatial.obsm['cell_abundance'] = cell_abundance_df
    adata_spatial.uns['cell_types'] = list(cell_abundance_df.columns)
    
    print("Cell2location deconvolution completed!")
    print(f"Deconvolved {len(cell_abundance_df.columns)} cell types across {adata_spatial.n_obs} spots")
    
    return adata_spatial

# Run Cell2location deconvolution
adata_spatial = run_cell2location(adata_spatial, adata_ref)

BIOLOGICAL INTERPRETATION OF CELL2LOCATION RESULTS:

DECONVOLUTION ADVANTAGES:
- Estimates cell type abundance in each spatial spot
- Accounts for technical noise and biological variability
- Preserves spatial organization information
- Enables quantitative analysis of cell type distributions

EXPECTED PATTERNS:
- Endothelial cells: enriched along vessel-like structures
- Epithelial cells: organized in tissue-specific patterns (ducts, acini)
- Fibroblasts: distributed throughout stromal regions
- Schwann cells: associated with nerve tracks
- Stellate cells: may show focal activation patterns
- Immune cells: varied distributions based on tissue state

QUALITY INDICATORS:
- Biologically plausible cell type distributions
- Spatial coherence (similar cell types in neighboring spots)
- Expected cell type co-localizations
- Reasonable total cell abundance per spot

QUANTITATIVE INSIGHTS:
- Cell type density gradients across tissue regions
- Spatial heterogeneity within cell populations
- Microenvironment composition analysis
- Disease-associated spatial reorganization

In [None]:
# =============================================================================
# Step 5: Visualize Spatial Deconvolution Results
# =============================================================================

def visualize_spatial_deconvolution(adata_spatial):
    """
    Visualize Cell2location deconvolution results
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data with deconvolution results
    """
    print("Visualizing spatial deconvolution results...")
    
    # Get cell type names
    cell_types = adata_spatial.uns['cell_types']
    n_cell_types = len(cell_types)
    
    # Create spatial abundance plots
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    fig.suptitle('Spatial Cell Type Abundance (Cell2location Deconvolution)', fontsize=16)
    
    # Plot first 6 cell types (adjust as needed)
    for i, cell_type in enumerate(cell_types[:6]):
        row, col = i // 3, i % 3
        
        # Add abundance to obs for plotting
        adata_spatial.obs[f'{cell_type}_abundance'] = adata_spatial.obsm['cell_abundance'][cell_type]
        
        sq.pl.spatial_scatter(adata_spatial, 
                             color=f'{cell_type}_abundance', 
                             size=1.5, 
                             ax=axes[row, col], 
                             show=False,
                             cmap='viridis')
        axes[row, col].set_title(f'{cell_type}\nAbundance')
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    print("\nCell type abundance summary:")
    abundance_df = adata_spatial.obsm['cell_abundance']
    for cell_type in abundance_df.columns:
        mean_abundance = abundance_df[cell_type].mean()
        max_abundance = abundance_df[cell_type].max()
        spots_present = (abundance_df[cell_type] > 0.1).sum()
        print(f"{cell_type}: Mean={mean_abundance:.2f}, Max={max_abundance:.2f}, "
              f"Present in {spots_present} spots ({spots_present/len(abundance_df)*100:.1f}%)")

visualize_spatial_deconvolution(adata_spatial)

BIOLOGICAL INSIGHTS FROM SPATIAL ABUNDANCE MAPS:

ENDOTHELIAL SPATIAL PATTERNS:
- Should form vessel-like structures and networks
- Higher abundance at tissue borders (vascular supply)
- Continuous patterns indicating vascular connectivity

EPITHELIAL SPATIAL ORGANIZATION:
- Organized structures reflecting tissue architecture
- May show polarity and directional organization
- Higher abundance in functional tissue regions

FIBROBLAST DISTRIBUTION:
- Distributed throughout stromal regions
- May show activation gradients around damage/inflammation
- Higher abundance in fibrous/scarred regions

SCHWANN CELL LOCALIZATION:
- Associated with neural pathways
- May form linear/track-like patterns
- Lower overall abundance but spatially restricted

STELLATE CELL ACTIVATION:
- May show focal activation around injury/inflammation
- Gradient patterns from activation sources
- Correlation with fibroblast activation

IMMUNE CELL INFILTRATION:
- Varied patterns based on tissue immune status
- Focal infiltrates in inflammatory regions
- Perivascular accumulation patterns

These patterns provide insights into tissue organization,
disease progression, and cellular interactions in their
native spatial context.

In [None]:
# =============================================================================
# Step 6: Neighborhood Enrichment Analysis (Paper Methodology)
# =============================================================================

"""
Paper methodology for neighborhood analysis:
1. Identify "high spots" for each cell type (5% quantile abundance > 3)
2. Sum abundance profiles of neighboring spots (up to 3rd proximal spots)
3. Compare observed vs expected abundance profiles
4. Calculate enrichment as observed/expected ratio
5. Build interaction graph based on mutual enrichment
"""

def identify_high_spots(adata_spatial, abundance_threshold=3):
    """
    Identify high abundance spots for each cell type
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data with abundance estimates
    abundance_threshold : float
        Threshold for defining high spots (paper uses 3)
    
    Returns:
    --------
    high_spots_dict : dict
        Dictionary of high spots for each cell type
    """
    print(f"Identifying high spots with abundance > {abundance_threshold}...")
    
    abundance_df = adata_spatial.obsm['cell_abundance']
    high_spots_dict = {}
    
    for cell_type in abundance_df.columns:
        high_spots = abundance_df[abundance_df[cell_type] > abundance_threshold].index
        high_spots_dict[cell_type] = high_spots
        print(f"{cell_type}: {len(high_spots)} high spots ({len(high_spots)/len(abundance_df)*100:.1f}%)")
    
    return high_spots_dict

def calculate_spatial_distances(adata_spatial):
    """
    Calculate spatial distances between all spots
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data
    
    Returns:
    --------
    distance_matrix : np.array
        Distance matrix between all spots
    """
    print("Calculating spatial distances...")
    
    # Get spatial coordinates
    if 'spatial' in adata_spatial.obsm.keys():
        coords = adata_spatial.obsm['spatial']
    elif 'X_spatial' in adata_spatial.obsm.keys():
        coords = adata_spatial.obsm['X_spatial']
    else:
        # Try to extract from uns if Visium data
        if 'spatial' in adata_spatial.uns.keys():
            coords = adata_spatial.obs[['array_row', 'array_col']].values
        else:
            raise ValueError("No spatial coordinates found in adata.obsm['spatial'] or similar")
    
    # Calculate pairwise distances
    distance_matrix = squareform(pdist(coords))
    
    return distance_matrix

def find_neighboring_spots(distance_matrix, k_neighbors=3):
    """
    Find k nearest neighbors for each spot
    
    Parameters:
    -----------
    distance_matrix : np.array
        Distance matrix
    k_neighbors : int
        Number of neighbors to find (paper uses 3)
    
    Returns:
    --------
    neighbors_dict : dict
        Dictionary mapping each spot to its neighbors
    """
    print(f"Finding {k_neighbors} nearest neighbors for each spot...")
    
    neighbors_dict = {}
    
    for i in range(len(distance_matrix)):
        # Get indices of k+1 closest spots (including self)
        neighbor_indices = np.argsort(distance_matrix[i])[:k_neighbors+1]
        # Remove self from neighbors
        neighbor_indices = neighbor_indices[neighbor_indices != i][:k_neighbors]
        neighbors_dict[i] = neighbor_indices
    
    return neighbors_dict

def calculate_neighborhood_enrichment(adata_spatial, high_spots_dict, neighbors_dict):
    """
    Calculate neighborhood enrichment as described in the paper
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data
    high_spots_dict : dict
        High spots for each cell type
    neighbors_dict : dict
        Neighbor relationships
    
    Returns:
    --------
    enrichment_results : dict
        Enrichment profiles for each cell type
    """
    print("Calculating neighborhood enrichment profiles...")
    
    abundance_df = adata_spatial.obsm['cell_abundance']
    cell_types = abundance_df.columns
    enrichment_results = {}
    
    # Calculate expected abundance (average across all spots)
    expected_abundance = abundance_df.mean()
    
    for cell_type in cell_types:
        print(f"Processing {cell_type}...")
        
        high_spots = high_spots_dict[cell_type]
        enrichment_profiles = []
        
        for spot_idx in high_spots:
            if spot_idx in neighbors_dict:
                neighbor_indices = neighbors_dict[spot_idx]
                
                # Sum abundance profiles of neighboring spots
                neighbor_abundance = abundance_df.iloc[neighbor_indices].sum()
                
                # Calculate expected abundance (number of neighbors × average abundance)
                expected_neighbor_abundance = len(neighbor_indices) * expected_abundance
                
                # Calculate enrichment ratio (observed/expected)
                enrichment_ratio = neighbor_abundance / expected_neighbor_abundance
                enrichment_profiles.append(enrichment_ratio)
        
        # Store results
        if enrichment_profiles:
            enrichment_results[cell_type] = pd.DataFrame(enrichment_profiles)
        else:
            print(f"No valid enrichment profiles for {cell_type}")
    
    return enrichment_results

def build_interaction_graph(enrichment_results, mutual_enrichment_threshold=1.0):
    """
    Build cell-cell interaction graph based on mutual enrichment
    
    Parameters:
    -----------
    enrichment_results : dict
        Enrichment profiles from calculate_neighborhood_enrichment
    mutual_enrichment_threshold : float
        Threshold for mutual enrichment (paper uses 1.0)
    
    Returns:
    --------
    interaction_graph : networkx.Graph
        Cell-cell interaction network
    interaction_matrix : pd.DataFrame
        Quantitative interaction matrix
    """
    print("Building cell-cell interaction graph...")
    
    cell_types = list(enrichment_results.keys())
    n_cell_types = len(cell_types)
    
    # Initialize interaction matrix
    interaction_matrix = pd.DataFrame(
        np.zeros((n_cell_types, n_cell_types)),
        index=cell_types,
        columns=cell_types
    )
    
    # Calculate pairwise enrichments
    for i, cell_type_a in enumerate(cell_types):
        for j, cell_type_b in enumerate(cell_types):
            if i != j and cell_type_a in enrichment_results and cell_type_b in enrichment_results:
                
                # Get enrichment of cell_type_b in neighborhoods of cell_type_a
                enrichment_a_to_b = enrichment_results[cell_type_a][cell_type_b].mean()
                
                # Get enrichment of cell_type_a in neighborhoods of cell_type_b  
                enrichment_b_to_a = enrichment_results[cell_type_b][cell_type_a].mean()
                
                # Check for mutual enrichment
                if (enrichment_a_to_b > mutual_enrichment_threshold and 
                    enrichment_b_to_a > mutual_enrichment_threshold):
                    
                    # Use geometric mean of mutual enrichments
                    mutual_score = np.sqrt(enrichment_a_to_b * enrichment_b_to_a)
                    interaction_matrix.loc[cell_type_a, cell_type_b] = mutual_score
    
    # Build NetworkX graph
    interaction_graph = nx.Graph()
    
    # Add nodes
    for cell_type in cell_types:
        interaction_graph.add_node(cell_type)
    
    # Add edges for significant interactions
    for i, cell_type_a in enumerate(cell_types):
        for j, cell_type_b in enumerate(cell_types):
            if i < j:  # Avoid duplicate edges
                score = interaction_matrix.loc[cell_type_a, cell_type_b]
                if score > 0:
                    interaction_graph.add_edge(cell_type_a, cell_type_b, weight=score)
    
    print(f"Interaction graph built with {interaction_graph.number_of_nodes()} nodes and "
          f"{interaction_graph.number_of_edges()} edges")
    
    return interaction_graph, interaction_matrix

# Run neighborhood enrichment analysis
print("\nRunning neighborhood enrichment analysis...")

# Step 1: Identify high spots
high_spots_dict = identify_high_spots(adata_spatial, abundance_threshold=3)

# Step 2: Calculate spatial distances and find neighbors
distance_matrix = calculate_spatial_distances(adata_spatial)
neighbors_dict = find_neighboring_spots(distance_matrix, k_neighbors=3)

# Step 3: Calculate enrichment profiles
enrichment_results = calculate_neighborhood_enrichment(adata_spatial, high_spots_dict, neighbors_dict)

# Step 4: Build interaction graph
interaction_graph, interaction_matrix = build_interaction_graph(enrichment_results)

BIOLOGICAL INTERPRETATION OF NEIGHBORHOOD ENRICHMENT:

MUTUAL ENRICHMENT SIGNIFICANCE:
- Indicates spatial co-localization beyond random chance
- Suggests functional cell-cell interactions
- May reflect developmental relationships or functional dependencies

EXPECTED INTERACTIONS FOR YOUR CELL TYPES:

ENDOTHELIAL-EPITHELIAL:
- Basement membrane interactions
- Angiocrine signaling to epithelium
- Barrier function coordination

FIBROBLAST-STELLATE:
- Shared ECM production
- Coordinated tissue remodeling
- Similar activation triggers

ENDOTHELIAL-IMMUNE:
- Vascular immune surveillance
- Extravasation sites
- Inflammatory recruitment

SCHWANN-ALL TYPES:
- Neural regulation of tissue function
- Neurovascular interactions
- Neural control of metabolism

STELLATE ACTIVATION ZONES:
- Co-localization with inflammatory cells
- Fibroblast recruitment
- Tissue remodeling coordination

PATHOLOGICAL INTERACTIONS:
- Stellate-immune activation in fibrosis
- Endothelial-immune in inflammation
- Epithelial-stromal in tissue damage

The interaction graph reveals the spatial organization principles
of your tissue and identifies key cell-cell communication hubs.

In [None]:
# =============================================================================
# Step 7: Visualize Neighborhood Analysis Results
# =============================================================================

def visualize_interaction_results(interaction_graph, interaction_matrix, adata_spatial, high_spots_dict):
    """
    Visualize neighborhood enrichment and interaction results
    
    Parameters:
    -----------
    interaction_graph : networkx.Graph
        Cell-cell interaction network
    interaction_matrix : pd.DataFrame
        Quantitative interaction matrix
    adata_spatial : AnnData
        Spatial data
    high_spots_dict : dict
        High spots for each cell type
    """
    print("Visualizing interaction analysis results...")
    
    # Create comprehensive visualization
    fig = plt.figure(figsize=(20, 15))
    
    # Plot 1: Interaction network graph
    ax1 = plt.subplot(2, 3, 1)
    pos = nx.spring_layout(interaction_graph, k=3, iterations=50)
    
    # Draw nodes
    nx.draw_networkx_nodes(interaction_graph, pos, 
                          node_color='lightblue', 
                          node_size=3000, 
                          alpha=0.8, ax=ax1)
    
    # Draw edges with thickness proportional to interaction strength
    edges = interaction_graph.edges(data=True)
    edge_weights = [edge[2]['weight'] for edge in edges]
    
    if edge_weights:
        max_weight = max(edge_weights)
        edge_widths = [5 * weight / max_weight for weight in edge_weights]
        
        nx.draw_networkx_edges(interaction_graph, pos, 
                              width=edge_widths, 
                              alpha=0.6, 
                              edge_color='gray', ax=ax1)
    
    # Draw labels
    nx.draw_networkx_labels(interaction_graph, pos, 
                           font_size=8, 
                           font_weight='bold', ax=ax1)
    
    ax1.set_title('Cell-Cell Interaction Network\n(Mutual Spatial Enrichment)', 
                 fontsize=12, fontweight='bold')
    ax1.axis('off')
    
    # Plot 2: Interaction matrix heatmap
    ax2 = plt.subplot(2, 3, 2)
    sns.heatmap(interaction_matrix, 
                annot=True, 
                fmt='.2f', 
                cmap='viridis', 
                cbar_kws={'label': 'Enrichment Score'},
                ax=ax2)
    ax2.set_title('Interaction Matrix\n(Quantitative Enrichment)', 
                 fontsize=12, fontweight='bold')
    
    # Plot 3: High spots distribution
    ax3 = plt.subplot(2, 3, 3)
    high_spots_summary = {ct: len(spots) for ct, spots in high_spots_dict.items()}
    cell_types = list(high_spots_summary.keys())
    counts = list(high_spots_summary.values())
    
    bars = ax3.bar(range(len(cell_types)), counts, color='skyblue', alpha=0.8)
    ax3.set_xlabel('Cell Type')
    ax3.set_ylabel('Number of High Spots')
    ax3.set_title('High Abundance Spots\n(Threshold > 3)', fontsize=12, fontweight='bold')
    ax3.set_xticks(range(len(cell_types)))
    ax3.set_xticklabels(cell_types, rotation=45, ha='right')
    
    # Add value labels on bars
    for i, bar in enumerate(bars):
        height = bar.get_height()
        ax3.text(bar.get_x() + bar.get_width()/2., height + 0.5,
                f'{int(height)}', ha='center', va='bottom', fontsize=8)
    
    # Plots 4-6: Spatial distribution of high spots for top 3 cell types
    top_cell_types = sorted(high_spots_summary.items(), key=lambda x: x[1], reverse=True)[:3]
    
    for i, (cell_type, count) in enumerate(top_cell_types):
        ax = plt.subplot(2, 3, 4 + i)
        
        # Create binary mask for high spots
        high_spot_mask = np.zeros(adata_spatial.n_obs)
        high_spot_indices = high_spots_dict[cell_type]
        high_spot_mask[high_spot_indices] = 1
        
        adata_spatial.obs[f'{cell_type}_high_spots'] = high_spot_mask
        
        sq.pl.spatial_scatter(adata_spatial, 
                             color=f'{cell_type}_high_spots',
                             size=2.0,
                             cmap='Reds',
                             ax=ax, 
                             show=False)
        ax.set_title(f'{cell_type}\nHigh Spots (n={count})', fontsize=10, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    # Print interaction summary
    print("\nSpatial Interaction Summary:")
    print(f"Total possible interactions: {len(interaction_matrix)**2 - len(interaction_matrix)}")
    significant_interactions = (interaction_matrix > 0).sum().sum()
    print(f"Significant interactions detected: {significant_interactions}")
    
    print("\nTop spatial interactions:")
    # Get top interactions
    interaction_pairs = []
    for i in range(len(interaction_matrix)):
        for j in range(i+1, len(interaction_matrix)):
            cell_a = interaction_matrix.index[i]
            cell_b = interaction_matrix.columns[j]
            score = interaction_matrix.iloc[i, j]
            if score > 0:
                interaction_pairs.append((cell_a, cell_b, score))
    
    # Sort by score
    interaction_pairs.sort(key=lambda x: x[2], reverse=True)
    
    for cell_a, cell_b, score in interaction_pairs[:10]:
        print(f"  {cell_a} ↔ {cell_b}: {score:.3f}")

visualize_interaction_results(interaction_graph, interaction_matrix, adata_spatial, high_spots_dict)

BIOLOGICAL INSIGHTS FROM INTERACTION ANALYSIS:

NETWORK TOPOLOGY INTERPRETATION:
- Hub nodes: cell types with many interactions (coordinators)
- Isolated nodes: spatially segregated cell types
- Edge weights: strength of spatial association

EXPECTED BIOLOGICAL INTERACTIONS:

HIGH-PROBABILITY INTERACTIONS:
1. Endothelial-Epithelial: vascular-parenchymal interface
2. Fibroblast-Stellate: shared stromal/ECM functions
3. Endothelial-Immune: vascular immune surveillance
4. Stellate-Immune: activation by inflammatory signals

TISSUE-SPECIFIC PATTERNS:
- Liver: Hepatocyte-stellate, stellate-immune interactions
- Pancreas: Acinar-stellate, islet-endothelial interactions
- Complex organs: Multi-way interactions reflecting tissue complexity

PATHOLOGICAL IMPLICATIONS:
- Increased stellate-immune interactions: fibrosis/inflammation
- Disrupted epithelial-endothelial: barrier dysfunction
- Enhanced fibroblast-stellate: tissue remodeling/scarring

FUNCTIONAL INSIGHTS:
- Spatial co-localization suggests:
  * Paracrine signaling
  * Shared microenvironmental requirements
  * Coordinated functional responses
  * Developmental relationships

The interaction network reveals the spatial organization principles
that govern tissue function and provides insights into how
cellular neighborhoods are structured in health and disease.

In [None]:
# =============================================================================
# Step 8: Advanced Spatial Analysis
# =============================================================================

"""
Additional spatial analyses to extract deeper biological insights
"""

def calculate_spatial_autocorrelation(adata_spatial):
    """
    Calculate spatial autocorrelation for cell type abundances
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data with abundance estimates
    """
    print("Calculating spatial autocorrelation...")
    
    # Use squidpy for spatial autocorrelation analysis
    sq.gr.spatial_autocorr(adata_spatial, 
                          mode='moran',
                          genes=adata_spatial.obsm['cell_abundance'].columns.tolist())
    
    # Visualize results
    fig, ax = plt.subplots(figsize=(10, 6))
    sq.pl.spatial_autocorr(adata_spatial, ax=ax, show=False)
    ax.set_title('Spatial Autocorrelation (Moran\'s I)\nCell Type Abundance')
    plt.show()
    
    print("Spatial autocorrelation analysis completed!")

def identify_spatial_domains(adata_spatial, resolution=0.5):
    """
    Identify spatial domains based on cell type composition
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data
    resolution : float
        Clustering resolution
    """
    print("Identifying spatial domains...")
    
    # Use cell abundance matrix for domain identification
    abundance_df = adata_spatial.obsm['cell_abundance']
    
    # Normalize abundance profiles
    abundance_normalized = abundance_df.div(abundance_df.sum(axis=1), axis=0)
    
    # Add to adata for clustering
    adata_temp = sc.AnnData(abundance_normalized)
    adata_temp.obsm['spatial'] = adata_spatial.obsm['spatial']
    
    # Build spatial graph
    sq.gr.spatial_neighbors(adata_temp)
    
    # Cluster spatial domains
    sc.tl.leiden(adata_temp, resolution=resolution, key_added='spatial_domains')
    
    # Add results back to original data
    adata_spatial.obs['spatial_domains'] = adata_temp.obs['spatial_domains']
    
    # Visualize spatial domains
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    sq.pl.spatial_scatter(adata_spatial, 
                         color='spatial_domains',
                         size=2.0,
                         ax=axes[0], 
                         show=False)
    axes[0].set_title('Spatial Domains\n(Cell Type Composition)')
    
    # Show domain composition
    domain_composition = pd.crosstab(adata_spatial.obs['spatial_domains'], 
                                   adata_spatial.obsm['cell_abundance'].idxmax(axis=1))
    
    sns.heatmap(domain_composition, 
                annot=True, 
                fmt='d', 
                cmap='viridis',
                ax=axes[1])
    axes[1].set_title('Domain Composition\n(Dominant Cell Types)')
    
    plt.tight_layout()
    plt.show()
    
    print(f"Identified {adata_spatial.obs['spatial_domains'].nunique()} spatial domains")

# Run advanced spatial analyses
calculate_spatial_autocorrelation(adata_spatial)
identify_spatial_domains(adata_spatial)

In [None]:
# =============================================================================
# Step 9: Save Results and Generate Summary
# =============================================================================

def save_spatial_results(adata_spatial, interaction_graph, interaction_matrix, 
                        enrichment_results, high_spots_dict):
    """
    Save all spatial analysis results
    
    Parameters:
    -----------
    adata_spatial : AnnData
        Spatial data with all results
    interaction_graph : networkx.Graph
        Cell-cell interaction network
    interaction_matrix : pd.DataFrame
        Interaction matrix
    enrichment_results : dict
        Enrichment analysis results
    high_spots_dict : dict
        High abundance spots
    """
    print("Saving spatial analysis results...")
    
    # Save spatial data with all results
    adata_spatial.write('spatial_data_with_deconvolution.h5ad', compression='gzip')
    
    # Save interaction results
    interaction_matrix.to_csv('spatial_interaction_matrix.csv')
    
    # Save network as GraphML
    nx.write_graphml(interaction_graph, 'spatial_interaction_network.graphml')
    
    # Save high spots information
    high_spots_summary = pd.DataFrame([
        {'cell_type': ct, 'n_high_spots': len(spots), 
         'percentage': len(spots)/adata_spatial.n_obs*100}
        for ct, spots in high_spots_dict.items()
    ])
    high_spots_summary.to_csv('spatial_high_spots_summary.csv', index=False)
    
    # Generate comprehensive summary
    with open('spatial_analysis_summary.txt', 'w') as f:
        f.write("SPATIAL TRANSCRIPTOMICS ANALYSIS SUMMARY\n")
        f.write("="*50 + "\n\n")
        
        f.write(f"Dataset Information:\n")
        f.write(f"- Spatial spots: {adata_spatial.n_obs:,}\n")
        f.write(f"- Genes analyzed: {adata_spatial.n_vars:,}\n")
        f.write(f"- Cell types deconvolved: {len(adata_spatial.uns['cell_types'])}\n\n")
        
        f.write("Cell Type Abundance Summary:\n")
        abundance_df = adata_spatial.obsm['cell_abundance']
        for cell_type in abundance_df.columns:
            mean_abundance = abundance_df[cell_type].mean()
            spots_present = (abundance_df[cell_type] > 0.1).sum()
            f.write(f"- {cell_type}: Mean={mean_abundance:.3f}, "
                   f"Present in {spots_present} spots ({spots_present/len(abundance_df)*100:.1f}%)\n")
        
        f.write(f"\nSpatial Interaction Analysis:\n")
        f.write(f"- Significant interactions: {(interaction_matrix > 0).sum().sum()}\n")
        f.write(f"- Network connectivity: {nx.number_of_edges(interaction_graph)} edges\n")
        
        if nx.number_of_edges(interaction_graph) > 0:
            f.write(f"- Average clustering coefficient: {nx.average_clustering(interaction_graph):.3f}\n")
        
        f.write(f"\nSpatial Domains:\n")
        if 'spatial_domains' in adata_spatial.obs.columns:
            f.write(f"- Number of domains identified: {adata_spatial.obs['spatial_domains'].nunique()}\n")
        
        f.write(f"\nFiles Generated:\n")
        f.write(f"- spatial_data_with_deconvolution.h5ad\n")
        f.write(f"- spatial_interaction_matrix.csv\n")
        f.write(f"- spatial_interaction_network.graphml\n")
        f.write(f"- spatial_high_spots_summary.csv\n")
        f.write(f"- spatial_analysis_summary.txt\n")
    
    print("All results saved successfully!")

# Save all results
save_spatial_results(adata_spatial, interaction_graph, interaction_matrix, 
                    enrichment_results, high_spots_dict)

print("\n" + "="*80)
print("SPATIAL TRANSCRIPTOMICS ANALYSIS COMPLETE!")
print("="*80)
print("\nKey Outputs:")
print("1. Cell2location deconvolution results")
print("2. Spatial interaction network")
print("3. Neighborhood enrichment profiles")
print("4. Spatial domain identification")
print("5. Comprehensive visualization suite")
print("\nFiles ready for downstream analysis!")

FINAL BIOLOGICAL CONCLUSIONS:

SPATIAL ORGANIZATION INSIGHTS:
1. Cell type spatial distributions reveal tissue architecture
2. Interaction networks identify functional cellular neighborhoods
3. Enrichment analysis quantifies spatial associations
4. Domain analysis reveals tissue microenvironments

CLINICAL/RESEARCH APPLICATIONS:
1. Disease progression mapping
2. Treatment response monitoring
3. Biomarker spatial validation
4. Therapeutic target identification

INTEGRATION OPPORTUNITIES:
1. Multi-omics spatial integration
2. Temporal spatial dynamics
3. Cross-tissue comparisons
4. Computational model validation

Your spatial analysis provides a comprehensive view of tissue
organization at the cellular level, revealing how your five
major cell types (endothelial, epithelial, fibroblast, Schwann,
stellate) organize spatially and interact functionally within
their native tissue environment.

Next steps could include:
- Integration with additional spatial datasets
- Pathway enrichment in spatial domains
- Ligand-receptor analysis in interacting regions
- Comparison with disease/control conditions

In [None]:
# =============================================================================
# Optional: Integration with Additional Datasets
# =============================================================================

"""
Code templates for integrating additional datasets:

# For multi-sample integration:
def integrate_multiple_spatial_samples(spatial_paths, reference_path):
    # Load and process multiple spatial samples
    # Use consistent reference for deconvolution
    # Compare spatial patterns across samples
    pass

# For temporal analysis:
def temporal_spatial_analysis(spatial_timepoints):
    # Analyze spatial changes over time
    # Track cell type dynamics
    # Identify temporal interaction changes
    pass

# For disease comparison:
def disease_control_spatial_comparison(disease_spatial, control_spatial):
    # Compare spatial organization
    # Identify disease-specific interactions
    # Map pathological spatial changes
    pass
"""

print("\nSpatial transcriptomics analysis pipeline completed!")
print("Ready for advanced integrative analyses and biological interpretation.")