# Pipeline practical session: CODEX Colorectal Cancer Dataset

In this practical, we’ll be working with data from **Schürch  (Cell, 2020)** https://www.sciencedirect.com/science/article/pii/S0092867420308709#abs0020 —  
*“Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front.”*  

This dataset was generated using **CODEX (CO-Detection by Indexing)**, a highly multiplexed imaging technology that enables spatially resolved, single-cell profiling of many protein markers within intact tissue sections.

---

#### 🧫 Data summary
- **35 patients** with colorectal cancer  
- **140 tissue microarray (TMA) cores**, sampled from regions near the **invasive front** of each tumour  
- Each row in the dataset corresponds to a **single cell**, annotated with:
  - **Spatial coordinates (X, Y)**
  - **Cell type** (from multiplexed immunofluorescence clustering)
  - **TMA core / domain ID**
  - **Patient group information**

---

#### 🧠 Biological context
The study identified two key immune microenvironment patterns:
- **Crohn’s-like reaction (CLR)** — characterised by organised immune cell aggregates and a coordinated anti-tumour response.  
- **Diffuse inflammatory infiltration (DII)** — showing a more scattered, less coordinated immune cell presence.  

Patients exhibiting **CLR** have **significantly better overall survival** compared to those with **DII**.

---

We’ll use this dataset to reproduce and extend aspects of the Nolan neighbourhood analysis, exploring how spatial organisation and cellular neighbourhoods can reveal functional structure within the tumour microenvironment.


#### Steps
1. **Load the TMA data** into a DataFrame.  
2. **Create domains** for each TMA — be sure to include a way to identify groups (for example, using the domain name).  
3. **Run neighbourhood clustering** across all samples.
4. **Identify neighbourhood adjancecy** by converting neighbourhoods to shapes
6. **Compare groups** in terms of neighbourhood composition in each sample and neighbourhood adjacency
5. **Save the MuSpAn domains** (either as `.muspan` files or as a `.csv`). 


💡 *Don’t forget to update the boundary for each core if needed.* Think loops for multiple samples!

---

#### Data Information
- Each row represents a **single cell**.  
- **`spots`** – unique ID for each TMA.  
- **`groups`** – patient group ID (`1` for CLR or `2` for DII).  
- **`ClusterName`** – cell type, as determined by clustering of immunofluorescence per cell.  
- **`X:X`** – x-coordinate of the cell.  
- **`Y:Y`** – y-coordinate of the cell.  

---

We’re providing the **full dataset** for this analysis so you can practice **extracting only the relevant information** required for spatial analysis.


In [10]:
import pandas as pd

# grab the csv file from the URL and load it into a pandas DataFrame
nolan_tma_dataframe = pd.read_csv("https://docs.muspan.co.uk/workshops/data_for_workshops/Nolan_2020_data%201.csv")
nolan_tma_dataframe

Unnamed: 0.1,Unnamed: 0,CellID,ClusterID,EventID,File Name,Region,TMA_AB,TMA_12,Index in File,groups,...,CD68+Ki67+,CD68+PD-1+,CD8+ICOS+,CD8+Ki67+,CD8+PD-1+,Treg-ICOS+,Treg-Ki67+,Treg-PD-1+,neighborhood number final,neighborhood name
0,0,0,10668,0,reg001_A,reg001,A,1,0,1,...,0,0,0,0,0,0,0,0,9.0,Granulocyte enriched
1,1,1,10668,4,reg001_A,reg001,A,1,4,1,...,0,0,0,0,0,0,0,0,4.0,Macrophage enriched
2,2,2,10668,5,reg001_A,reg001,A,1,5,1,...,0,0,0,0,0,0,0,0,3.0,Immune-infiltrated stroma
3,3,3,10668,6,reg001_A,reg001,A,1,6,1,...,0,0,0,0,0,0,0,0,3.0,Immune-infiltrated stroma
4,4,4,10668,30,reg001_A,reg001,A,1,30,1,...,0,0,0,0,0,0,0,0,4.0,Macrophage enriched
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
258380,258380,258380,10664,216709,reg057_B,reg057,B,2,1002,1,...,0,0,0,0,0,0,0,0,5.0,Follicle
258381,258381,258381,10664,222124,reg059_A,reg059,A,1,1272,2,...,0,0,0,0,0,0,0,0,3.0,Immune-infiltrated stroma
258382,258382,258382,10664,234850,reg062_A,reg062,A,1,735,2,...,0,0,0,0,0,0,0,0,1.0,T cell enriched
258383,258383,258383,10664,249806,reg067_A,reg067,A,1,174,1,...,0,0,0,0,0,0,0,0,6.0,Tumor boundary
