# Practical session: spatial networks and neighbourhoods

Before moving to neighbourhood analysis, we'll first introduce how to construct networks using MuSpAn objects and how to quickly visualise them. 

The network infrastructure in MuSpAn is designed to be highly flexible to accommodate a wide variety of spatial analysis using spatial networks. In particular, MuSpAn translates the spatial data contained within the domain into a network that is constructed using [NetworkX](https://networkx.org). This means we can leverage the extensive set of network tools produced by networkX in the context of spatial biology and multiscale spatial analysis. 

MuSpAn allows the construction of three types of networks: Delaunay, K-Nearest Neighbour (KNN) and Proximity. Each network type has specific uses and varied interpretations dependent on the context of the data on which they are being constructed. For example, Delaunay networks generate a triangulated mesh of the data, which can be used to describe both local and global questions about the structure of our data. Alternatively, KNN networks are typically used to only understand local interactions between data.

In general, a MuSpAn describes pair-wise interactions between objects stored within a domain. These pair-wise interactions are represented using edges, and in the context of spatial data, the pair-wise relationship is typically dependent on the distance/space between any two objects, also known as **Spatial Networks** (for more information, see [Spatial Networks](https://arxiv.org/abs/1010.0302)). Critically, all nodes in a network are indexed using object IDs, which can be useful for integrating your networks with other analysis and querying. See the following tutorials. 

In [None]:
# Import necessary libraries
import muspan as ms
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load an example domain dataset
pc = ms.datasets.load_example_domain('Synthetic-Points-Architecture')

# Visualize the dataset, coloring by 'Celltype'
ms.visualise.visualise(pc, color_by='Celltype')

If no boundary information about our spatial objects is given in our dataset (i.e., no segmentation mask), then Delaunay networks on point-like data are a common approximation of the local connectivity of the data. Its construction is based on an area (volume) filling process between all points such that edges are generated to produce triangles that do not contain any other point. Therefore, an edge between points presents the adjacency of the Voronoi cells of the data. For more information, see this [link](https://en.wikipedia.org/wiki/Delaunay_triangulation).

Applying this procedure to our cell centroids, are approximating cell-cell connectivity such that an edge represents approximate cell-cell boundary contact. Let's use the `ms.networks.generate_network` function to generate a Delaunay network on our 'Cell centroid' data by using the shorthand query syntax.

In [2]:
# Generate a Delaunay network on the 'Cell centres' data
# The network will be stored in the domain.networks dictionary with the name 'Delaunay CC'
del_network = ms.networks.generate_network(
    pc,
    network_name='Delaunay CC',
    network_type='Delaunay',
    objects_as_nodes=('collection', 'Cell centres')
)

Before we take a look at our network, we should mention what MuSpAn is doing when generating a network. The output of this function is a NetworkX object which means we can treat it as any other NetworkX network if we'd like. Also, we named our network 'Delaunay CC'. This name is the identifier given to the network, which is stored in the domain.network dictionary. This allows us to reuse the same network for analysis later on. Let's check this is the case.

In [None]:
print('Type of del_network =',type(del_network))
print(pc)

Using the name of the network is the typical way we call our networks for analysis but if we can always retrieve our stored networks using the dictionary syntax as follows:

In [4]:
this_network=pc.networks['Delaunay CC']

Now, let's take a look at the network we generated using our helper visualise function `ms.visualise.visualise_network`. This provides the same functionality as `ms.visualise.visualise` but provides some additional arguments to modify the look of our networks on the domain. Check out our documentation for more detials on the arguments of `ms.visualise.visualise_network`.

In [None]:
# Create a figure and axis for plotting the Delaunay network
fig, ax = plt.subplots(1, 1, figsize=(8, 6))

# Visualise the Delaunay network on the domain
# The visualise_kwargs argument allows us to specify additional visualization parameters
ms.visualise.visualise_network(
    pc,
    network_name='Delaunay CC',
    ax=ax,
    visualise_kwargs=dict(objects_to_plot=('collection', 'Cell centres'),color_by='Celltype', marker_size=10)
)

#### Controlling Edge Distance

Great, this looks like a Delaunay network of our Cell centres point data. We can see that behind the scenes, MuSpAn has calculated the distance between the adjacent nodes and added this as an edge weight. In fact, a distance similarity 'Inverse distance' has also been calculated (but let's ignore this for now). Edge weights are values that can be used to describe the connectivity between nodes. We can check the different edge weights in the network but using the `edge viewer` functionality from networkX:

In [None]:
list(del_network.edges(data=True))[0]

Here, we pulled retrieved just the first edge that was stored in our edge list that is defined between objects with IDs 74174 and 74823. Knowing now that an edge exists between these two nodes, we can explicitly call these edge weight values using the dictionary structure of the network:

In [None]:
del_network[0][86]['Distance']

Depending on the context of our data and/or analysis, we may want to filter our edges depending on the distance of nodes. In our case, if we want an edge to represent two nodes in direct contact then we'd like the distance between them to be ~<25μm. We can do this by using the `max_edge_distance` and `min_edge_distance` parameters. Let's make the same network as above but now with filtered edge lengths: 

In [None]:
ms.networks.generate_network(pc,network_name='Delaunay CC filtered',network_type='Delaunay',
                             objects_as_nodes=('collection','Cell centres'),min_edge_distance=0,max_edge_distance=25)

This should reduce the number of edges compared to our first network 'Delaunay CC'. Let's check this is the case and have a look the resultant networks.

In [None]:
print('Delaunay CC:',pc.networks['Delaunay CC'])
print('Delaunay CC filtered:',pc.networks['Delaunay CC filtered'])

In [None]:
# Create a 1x2 subplot for visualizing the original and filtered Delaunay networks
fig, ax = plt.subplots(1, 2, figsize=(13, 6))

# Plot the original Delaunay network
ax[0].set_title('Delaunay CC')
ms.visualise.visualise_network(
    pc,
    network_name='Delaunay CC',
    ax=ax[0],
    edge_weight_name=None,
    visualise_kwargs=dict(objects_to_plot=('collection', 'Cell centres'),color_by='Celltype', marker_size=10, add_cbar=False)
)

# Plot the filtered Delaunay network
ax[1].set_title('Delaunay CC filtered')
ms.visualise.visualise_network(
    pc,
    network_name='Delaunay CC filtered',
    ax=ax[1],
    edge_weight_name=None,
    visualise_kwargs=dict(objects_to_plot=('collection', 'Cell centres'),color_by='Celltype', marker_size=10, add_cbar=False)
)

We can see that more of the tissue architural features are visable the following the distance filtration. We recommend using `max_edge_distance` and `min_edge_distance` parameters to prevent any unphysical edges in spatial networks.

Now we have the hang of generating and visualising our networks, try:
1. change the network type (knn, proximity, rng)
2. change the distance filters on the edges

Visualise these networks to get an intuition of what their describing in terms of some biological processes.


### Neighbourhoods: Synthetic Nolan example

Before we get started on tacking the practical, we'll quicky review and reproduce the Nolan pipeline from Schürch et al. who used this approach in understanding immune spatial organisation in CRC (https://doi.org/10.1016/j.cell.2020.07.005), but here, we'll stick with synthetic data for now.

In [None]:
# Import necessary libraries
import muspan as ms
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load an example domain dataset
pc = ms.datasets.load_example_domain('Synthetic-Points-Architecture')

# Visualize the dataset, coloring by 'Celltype'
ms.visualise.visualise(pc, color_by='Celltype')

In MuSpAn, we leverage the flexibility of our `muspan.networks` submodule and the sklearn clustering packages (via `muspan.helpers.cluster_data`) to allow an combination of spatial relation of points and how they are clustered. Here, we will explore how we might use our `cluster_neighbourhoods` functionality ranging from commonly used methods in literature to our recommendations. Critically, this tutorial should also serve as a demonstration of the sensitivity of resultant neighbourhoods to both the spatial networks and clustering method used. Before using this function, we also recommend seeing our documentation on `cluster_neighbourhoods` to get and understanding of the parameters associated.

We can reproduce their implementation by using our `KNN` spatial network generation, with 10 neighbours and selecting the `minibatchkmeans` clustering methods with 4 clusters as shown below where we apply this clustering procedure to our synthetic data using the 'Celltype' label.

In [12]:
# Perform neighbourhood clustering on the dataset using KNN and minibatchkmeans
neighbourhood_enrichment_matrix, consistent_global_labels, unique_cluster_labels = ms.networks.cluster_neighbourhoods(
    pc,  # The domain dataset
    label_name='Celltype',  # The label to use for clustering
    network_kwargs=dict(network_type='KNN', max_edge_distance=np.inf, min_edge_distance=0, number_of_nearest_neighbours=10),  # The network parameters
    k_hops=1,  # The number of hops to consider for the neighbourhood
    neighbourhood_label_name='Neighbourhood ID',  # Name for the neighbourhood label
    cluster_method='minibatchkmeans',  # Clustering method
    cluster_parameters={'n_clusters': 4},  # Parameters for the clustering method
    neighbourhood_enrichment_as='log-fold' # Neighbourhood enrichment as log-fold 
)

The function has three outputs:
1. A differential exppression matirx where rows represent cluster IDs and rows are labels used in to cluster. Here this is presented as log-fold change to match the Sch&uuml;rch et al. pipeline but our default is a zscore.

2. The labels used to cluster the neighbourhoods

3. A list of the resultant cluster labels

4. (Optional) The observation matrix of neighbourhood compositions used to cluster. This is not output here but can be, see our documentation on `cluster_neighbourhoods`.

We can visualise the differential expression matrix using `ms.visualise.heatmap` or if we want to show similarities of clusters, we can use the sns.heatmap function as show below.

In [None]:
# Create a DataFrame from the neighbourhood enrichment matrix
df_ME_id = pd.DataFrame(data=neighbourhood_enrichment_matrix, index=unique_cluster_labels, columns=consistent_global_labels)
df_ME_id.index.name = 'Neighbourhood ID'
df_ME_id.columns.name = 'Celltype ID'

# Visualize the neighbourhood enrichment matrix using a clustermap
sns.clustermap(
    df_ME_id,
    xticklabels=consistent_global_labels,
    yticklabels=unique_cluster_labels,
    figsize=(7.5, 3.5),
    cmap='RdBu_r',
    dendrogram_ratio=(.05, .3),
    col_cluster=False,
    row_cluster=False,
    square=True,
    linewidths=0.5,
    linecolor='black',
    cbar_kws=dict(use_gridspec=False, location="top", label='Neighbourhood enrichment (log-fold)', ticks=[-2, 0, 2]),
    cbar_pos=(0.12, 0.75, 0.72, 0.08),
    vmin=-2,
    vmax=2,
    tree_kws={'linewidths': 0, 'color': 'white'}
)

We can see this neigbourhood clustering method has found two very similar clusters (4 and 0) and the rest picking a dominant celltype. This makes sense as there are only 4 celltypes in our dataset. The `cluster_neighbourhoods` function automatically adds these cluster labels to the objects that were used in the computation using the and is identified by the `neighbourhood_label_name` parameter.

Using this label, let's see the result of the clustering spatially-resolved on our dataset.

In [None]:
# Visualize the dataset with the neighbourhood clustering results
ms.visualise.visualise(pc, color_by='Neighbourhood ID')


Now we have an idea on how to identify neighbourhoods using MuSpAn, let jump back to our practical exercise.

--- 

### Free play: Contact-based Neighbourhoods 


In complex tissues like the colon, cells do not act in isolation, their behavior and identity are strongly influenced by their immediate neighbours. Cellular neighbourhoods — spatially organized groups of interacting cells, can reveal key aspects of tissue organization, homeostasis, and disease progression. For example, in the colon, epithelial cells, immune cells, and stromal cells form structured microenvironments that regulate barrier integrity and immune surveillance.

Spatial transcriptomics technologies such as Xenium allow us to not only identify which transcripts are expressed in each cell, but also where these cells sit relative to each other. By reconstructing cell–cell contact networks, we can start to ask biologically meaningful questions such as:
- How do epithelial and immune cells organize into distinct microenvironments in healthy colon tissue?
- Are there characteristic “neighbourhoods” defined by contact-based interactions that may underlie functional tissue domains?
- Can we identify transcriptional signatures unique to specific spatial niches?


In this practical, we’ll explore these questions in the `Xenium-Healthy-Colon` dataset by:
1. Building a contact-based cell network from cell boundaries,
2. Identifying cellular neighbourhoods using clustering of the contact graph,
3. Converting spatially contiguous neighbourhoods into shape objects, and
4. Characterising the transcriptomic profiles of these spatial domains.

Together, this workflow provides a way to move from single-cell identities to tissue-level organization, uncovering the rules by which cells assemble into functional units.

TIP: to check if clustering parameters are correct in `cluster_neighbourhoods`, return the `observation_matrix` used to cluster the objects using the `return_observation_matrix=True` parameter. This matrix can be projected and clustered independently to check the quality of resultant classification. See https://docs.muspan.co.uk/latest/generated/muspan.networks.cluster_neighbourhoods.html#muspan.networks.cluster_neighbourhoods

