# Spatial analysis across multiple regions

We'll now think about how to break down large samples into multiple regions, and how to aggregate statistics between multiple regions. As we've seen in the talk, these pose very similar statistical problems. We'll start by considering a large sample, here using an example of an adenoma stained for immune cells using the Vectra IF platform (from the Leedham lab, https://doi.org/10.1101/2024.06.02.597010).

In [None]:
# Import necessary libraries
import muspan as ms
import pandas as pd
import numpy as np  
import matplotlib.pyplot as plt 

# Load immune cell data from a CSV file into a DataFrame - we'll grab the csv from our website
colon_ad_immune = pd.read_csv('https://www.docs.muspan.co.uk/workshops/data_for_workshops/Adenoma_Immune.csv')
colon_ad_immune

Let's first convert this into a single giant domain. We'll add the cell centres, some associated labels, and we'll update the domain boundary.

In [None]:
# Get the list of columns in the DataFrame
cols = colon_ad_immune.columns.tolist()

# Create a new DataFrame with the same columns
colon_ad_immune_out = colon_ad_immune[cols]

# Extract coordinates for the points (XMin, YMin) and create a mask for cell types
pts = np.array([colon_ad_immune['XMin'], colon_ad_immune['YMin']]).T
mask = np.zeros(len(pts))

# Dictionary to map cell type indices to their names
ctdict = {
    0: 'None',
    1: 'T Helper Cell',
    2: 'Treg Cell', 
    3: 'Cytotoxic T Cell', 
    4: 'Macrophage',
    5: 'Neutrophil', 
    6: 'Epithelium'
}

# Assign indices to the mask based on the presence of each cell type
for i, ct in enumerate(['T Helper Cell', 'Treg Cell', 'Cytotoxic T Cell', 'Macrophage', 'Neutrophil', 'Epithelium']):
    mask[colon_ad_immune[ct] == 1] = i + 1

# Create labels for each point based on the mask
labels = [ctdict[v] for v in mask]

# Create a muspan domain object
domain = ms.domain('adcar')

# Add points and labels to the domain
domain.add_points(pts, collection_name='Immune cells')
domain.add_labels('Celltype', labels)

# Define colors for each cell type
newcolors = {
    'None': [0.9, 0.9, 0.9, 1],
    'T Helper Cell': plt.cm.tab10(0),
    'Treg Cell': plt.cm.tab10(1), 
    'Cytotoxic T Cell': plt.cm.tab10(2), 
    'Macrophage': plt.cm.tab10(3),
    'Neutrophil': plt.cm.tab10(4), 
    'Epithelium': [0.7, 0.7, 0.7, 1]
}

# Update the domain colors based on the defined color mapping
domain.update_colors(newcolors, label_name='Celltype')

# Estimate the boundary of the domain using the alpha shape method
domain.estimate_boundary(method='alpha shape', alpha_shape_kwargs=dict(alpha=200))    

# Visualize the domain with the cell type labels and boundary
ms.visualise.visualise(domain, 'Celltype', marker_size=1, show_boundary=True)

This is a massive sample! Let's generate some lattice to break the domain down into some smaller regions. We'll first do this by generating a hexagonal lattice and assigning all cell centres to a hexagon.

In [None]:
ms.region_based.generate_hexgrid(domain,side_length=500,region_label_name='subregion',regions_collection_name='subregions')
ms.visualise.visualise(domain, 'subregion', marker_size=1)  

So we now have labels for each object assigning them to a subregion (hexagon) in space. We now have two main options to conduct analysis within them:
1. Using the generated lattice shapes as boundaries for statistics to compute stats within each region;
2. Generate new domains for each subregion using the `ms.helpers.crop_domains` function in muspan;


We'll go with 1 for now, and aim to visualise the spatial heterogenity of our data. If we want to compute statistics within each region, we'll need to know what region label categories we have. We can grab these using the `ms.query.get_labels()` function: 

In [None]:
# Retrieve labels for subregions from the domain
subregions_labels, _ = ms.query.get_labels(domain, 'subregion')

# Get unique categories of subregions
subregions_labels_categories = np.unique(subregions_labels)

# Print the unique subregion categories
print(subregions_labels_categories)


For now, we'll keep it simple and just compute the density of cells in each region of our sample. We'll add the resultant densities as labels to the region shapes and also store the raw numeric outputs in a list.

In [5]:
# make some empty lists to hold results
Neutrophil_density = []    

for subregion_id in subregions_labels_categories:
    
    # get the boundary shape of the subregion
    this_shape_boundary = ms.query.query_container(('collection','subregions'),'AND',('subregion',subregion_id),domain)
    
    # get the cell population within the subregion
    this_sub_population = ms.query.query_container(('collection','Immune cells'),'AND',('subregion',subregion_id),domain)
    
    # run a statistic (here we'll be really bold and just calculate cell density)
    cell_density,_,label_categories = ms.summary_statistics.label_density(domain,label_name='Celltype',population=this_sub_population,include_boundaries=this_shape_boundary)
    
    # add result to a list (just Neutrophils for now)
    this_Neutrophil_density = 0
    if 'Neutrophil' in label_categories:
        neutrophil_index = np.where(label_categories=='Neutrophil')[0][0]
        this_Neutrophil_density = cell_density[neutrophil_index]

    Neutrophil_density.append(this_Neutrophil_density)
    
    # add result to the region shapa as a label (just Neutrophils for now)
    domain.add_labels('neutrophil density', [this_Neutrophil_density], add_labels_to=this_shape_boundary,label_type='continuous')


Now we've computed our neutrophil densities, let's visualise this on our tissue

In [None]:
ms.visualise.visualise(domain,'neutrophil density', objects_to_plot=('collection','subregions'),vmin=0,vmax=5e-4, shape_kwargs=dict(alpha=1))

We can also plot our numeric densities as a histogram - we don't need MuSpAn for this, because we have the raw data, so we'll just use PyPlot.

In [None]:
fig,ax=plt.subplots(1,1,figsize=(8,6))
ax.hist(Neutrophil_density, bins=20, color='blue', alpha=0.7)
ax.set_title('Histogram of Neutrophil Density')
ax.set_xlabel('Density (µm⁻²)')
ax.set_ylabel('Frequency')


### Spatial analysis in multiple regions

Now that we understand how to compute a non-spatial metric within subregions of the tissue, we can move on to more informative spatial metrics within each of these regions:

1. Run the QCM to identify pairs of cell types that are co-localised or mutually excluded (make sure to note the chosen region size).
2. Divide the tissue domain into regions of that same size.
3. For each region, compute the cross-PCF for the cell-type pairs of interest and extract a summary statistic (e.g. a scalar value such as g(r = 0)).
4. Attach these summary values to the corresponding region shapes for visualisation.
5. Finally, apply the Getis-Ord statistic to the region shapes to identify hotspots—areas where regions with similar summary statistics cluster together.
