# Practical session: neighbourhoods


### Synthetic Nolan example

Before we get started on tacking the practical, we'll quicky review and reproduce the Nolan pipeline from Schürch et al. who used this approach in understanding immune spatial organisation in CRC (https://doi.org/10.1016/j.cell.2020.07.005), but here, we'll stick with synthetic data for now.

In [None]:
# Import necessary libraries
import muspan as ms
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load an example domain dataset
pc = ms.datasets.load_example_domain('Synthetic-Points-Architecture')

# Visualize the dataset, coloring by 'Celltype'
ms.visualise.visualise(pc, color_by='Celltype')

In MuSpAn, we leverage the flexibility of our `muspan.networks` submodule and the sklearn clustering packages (via `muspan.helpers.cluster_data`) to allow an combination of spatial relation of points and how they are clustered. Here, we will explore how we might use our `cluster_neighbourhoods` functionality ranging from commonly used methods in literature to our recommendations. Critically, this tutorial should also serve as a demonstration of the sensitivity of resultant neighbourhoods to both the spatial networks and clustering method used. Before using this function, we also recommend seeing our documentation on `cluster_neighbourhoods` to get and understanding of the parameters associated.

We can reproduce their implementation by using our `KNN` spatial network generation, with 10 neighbours and selecting the `minibatchkmeans` clustering methods with 4 clusters as shown below where we apply this clustering procedure to our synthetic data using the 'Celltype' label.

In [3]:
# Perform neighbourhood clustering on the dataset using KNN and minibatchkmeans
neighbourhood_enrichment_matrix, consistent_global_labels, unique_cluster_labels = ms.networks.cluster_neighbourhoods(
    pc,  # The domain dataset
    label_name='Celltype',  # The label to use for clustering
    network_kwargs=dict(network_type='KNN', max_edge_distance=np.inf, min_edge_distance=0, number_of_nearest_neighbours=10),  # The network parameters
    k_hops=1,  # The number of hops to consider for the neighbourhood
    neighbourhood_label_name='Neighbourhood ID',  # Name for the neighbourhood label
    cluster_method='minibatchkmeans',  # Clustering method
    cluster_parameters={'n_clusters': 4},  # Parameters for the clustering method
    neighbourhood_enrichment_as='log-fold' # Neighbourhood enrichment as log-fold 
)

The function has three outputs:
1. A differential exppression matirx where rows represent cluster IDs and rows are labels used in to cluster. Here this is presented as log-fold change to match the Sch&uuml;rch et al. pipeline but our default is a zscore.

2. The labels used to cluster the neighbourhoods

3. A list of the resultant cluster labels

4. (Optional) The observation matrix of neighbourhood compositions used to cluster. This is not output here but can be, see our documentation on `cluster_neighbourhoods`.

We can visualise the differential expression matrix using `ms.visualise.heatmap` or if we want to show similarities of clusters, we can use the sns.heatmap function as show below.

In [None]:
# Create a DataFrame from the neighbourhood enrichment matrix
df_ME_id = pd.DataFrame(data=neighbourhood_enrichment_matrix, index=unique_cluster_labels, columns=consistent_global_labels)
df_ME_id.index.name = 'Neighbourhood ID'
df_ME_id.columns.name = 'Celltype ID'

# Visualize the neighbourhood enrichment matrix using a clustermap
sns.clustermap(
    df_ME_id,
    xticklabels=consistent_global_labels,
    yticklabels=unique_cluster_labels,
    figsize=(7.5, 3.5),
    cmap='RdBu_r',
    dendrogram_ratio=(.05, .3),
    col_cluster=False,
    row_cluster=False,
    square=True,
    linewidths=0.5,
    linecolor='black',
    cbar_kws=dict(use_gridspec=False, location="top", label='Neighbourhood enrichment (log-fold)', ticks=[-2, 0, 2]),
    cbar_pos=(0.12, 0.75, 0.72, 0.08),
    vmin=-2,
    vmax=2,
    tree_kws={'linewidths': 0, 'color': 'white'}
)

We can see this neigbourhood clustering method has found two very similar clusters (4 and 0) and the rest picking a dominant celltype. This makes sense as there are only 4 celltypes in our dataset. The `cluster_neighbourhoods` function automatically adds these cluster labels to the objects that were used in the computation using the and is identified by the `neighbourhood_label_name` parameter.

Using this label, let's see the result of the clustering spatially-resolved on our dataset.

In [None]:
# Visualize the dataset with the neighbourhood clustering results
ms.visualise.visualise(pc, color_by='Neighbourhood ID')


Now we have an idea on how to identify neighbourhoods using MuSpAn, let jump back to our practical exercise.

--- 

### Free play: Contact-based Neighbourhoods 


In complex tissues like the colon, cells do not act in isolation, their behavior and identity are strongly influenced by their immediate neighbours. Cellular neighbourhoods — spatially organized groups of interacting cells, can reveal key aspects of tissue organization, homeostasis, and disease progression. For example, in the colon, epithelial cells, immune cells, and stromal cells form structured microenvironments that regulate barrier integrity and immune surveillance.

Spatial transcriptomics technologies such as Xenium allow us to not only identify which transcripts are expressed in each cell, but also where these cells sit relative to each other. By reconstructing cell–cell contact networks, we can start to ask biologically meaningful questions such as:
- How do epithelial and immune cells organize into distinct microenvironments in healthy colon tissue?
- Are there characteristic “neighbourhoods” defined by contact-based interactions that may underlie functional tissue domains?
- Can we identify transcriptional signatures unique to specific spatial niches?


In this practical, we’ll explore these questions in the `Xenium-Mouse-Colon` dataset by:
1. Building a contact-based cell network from cell boundaries,
2. Identifying cellular neighbourhoods using clustering of the contact graph,
3. Converting spatially contiguous neighbourhoods into shape objects, and
4. Characterising the transcriptomic profiles of these spatial domains.

Together, this workflow provides a way to move from single-cell identities to tissue-level organization, uncovering the rules by which cells assemble into functional units.

TIP: to check if clustering parameters are correct in `cluster_neighbourhoods`, return the `observation_matrix` used to cluster the objects using the `return_observation_matrix=True` parameter. This matrix can be projected and clustered independently to check the quality of resultant classification. See https://docs.muspan.co.uk/latest/generated/muspan.networks.cluster_neighbourhoods.html#muspan.networks.cluster_neighbourhoods

