# Pipeline practical session: CODEX Colorectal Cancer Dataset

In this practical, we’ll be working with data from **Schürch  (Cell, 2020)** https://www.sciencedirect.com/science/article/pii/S0092867420308709#abs0020 —  
*“Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front.”*  

This dataset was generated using **CODEX (CO-Detection by Indexing)**, a highly multiplexed imaging technology that enables spatially resolved, single-cell profiling of many protein markers within intact tissue sections.

---

#### 🧫 Data summary
- **35 patients** with colorectal cancer  
- **140 tissue microarray (TMA) cores**, sampled from regions near the **invasive front** of each tumour  
- Each row in the dataset corresponds to a **single cell**, annotated with:
  - **Spatial coordinates (X, Y)**
  - **Cell type** (from multiplexed immunofluorescence clustering)
  - **TMA core / domain ID**
  - **Patient group information**

---

#### 🧠 Biological context
The study identified two key immune microenvironment patterns:
- **Crohn’s-like reaction (CLR)** — characterised by organised immune cell aggregates and a coordinated anti-tumour response.  
- **Diffuse inflammatory infiltration (DII)** — showing a more scattered, less coordinated immune cell presence.  

Patients exhibiting **CLR** have **significantly better overall survival** compared to those with **DII**.

---

We’ll use this dataset to reproduce and extend aspects of the Nolan neighbourhood analysis, exploring how spatial organisation and cellular neighbourhoods can reveal functional structure within the tumour microenvironment.


#### Steps
1. **Load the TMA data** into a DataFrame.  
2. **Create domains** for each TMA — be sure to include a way to identify groups (for example, using the domain name).  
3. **Run neighbourhood clustering** across all samples.
4. **Identify neighbourhood adjancecy** by converting neighbourhoods to shapes
6. **Compare groups** in terms of neighbourhood composition in each sample and neighbourhood adjacency
5. **Save the MuSpAn domains** (either as `.muspan` files or as a `.csv`). 


💡 *Don’t forget to update the boundary for each core if needed.* Think loops for multiple samples!

---

#### Data Information
- Each row represents a **single cell**.  
- **`spots`** – unique ID for each TMA.  
- **`groups`** – patient group ID (`1` for CLR or `2` for DII).  
- **`ClusterName`** – cell type, as determined by clustering of immunofluorescence per cell.  
- **`X:X`** – x-coordinate of the cell.  
- **`Y:Y`** – y-coordinate of the cell.  

---

We’re providing the **full dataset** for this analysis so you can practice **extracting only the relevant information** required for spatial analysis.


In [3]:
import pandas as pd

# grab the csv file from the URL and load it into a pandas DataFrame
nolan_tma_dataframe = pd.read_csv('https://www.docs.muspan.co.uk/workshops/data_for_workshops/Adenoma_Immune.csv')
nolan_tma_dataframe

Unnamed: 0.1,Unnamed: 0,Object Id,XMin,YMin,T Helper Cell,Treg Cell,Cytotoxic T Cell,Macrophage,Neutrophil,Epithelium,...,CD68 (Opal 620) Cytoplasm Intensity,CD8 (Opal 650) Cytoplasm Intensity,ECad (Opal 690) Cytoplasm Intensity,Cell Area (µm²),Cytoplasm Area (µm²),Membrane Area (µm²),Nucleus Area (µm²),Nucleus Perimeter (µm),Nucleus Roundness,Classifier Label
0,0,0,21358,20644,0,0,0,0,0,0,...,0.128205,0.160601,1.252720,66.11719,53.73569,0,12.38150,14.43110,0.703656,epi
1,1,1,21361,20659,0,0,0,0,0,1,...,0.209170,0.193062,1.821321,133.22490,88.89915,0,44.32576,30.85271,0.636474,epi
2,2,2,21377,20678,0,0,0,0,0,1,...,0.288331,0.288142,2.769074,85.67995,74.04135,0,11.63861,13.93348,0.838873,epi
3,3,3,21383,20709,0,0,0,0,0,0,...,0.141584,0.190175,1.341652,154.76870,89.14677,0,65.62193,38.31707,0.646902,epi
4,4,4,21355,20711,0,0,0,0,0,1,...,0.526328,0.317425,2.210780,245.15360,139.66330,0,105.49030,52.25055,0.624307,epi
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
274677,274677,274677,13855,32236,0,1,0,1,0,1,...,2.503274,0.615916,1.547021,74.78424,55.96436,0,18.81987,17.91447,0.787028,stroma
274678,274678,274678,13789,32238,1,0,0,0,0,0,...,0.130234,0.250628,0.965695,77.50816,62.15511,0,15.35306,16.91923,0.645897,stroma
274679,274679,274679,14050,32239,0,0,0,0,0,0,...,0.126561,0.199811,0.704792,57.45014,44.82102,0,12.62913,15.92398,0.549334,stroma
274680,274680,274680,14065,32242,1,0,0,0,0,0,...,0.146052,0.214028,0.862087,100.04250,82.46076,0,17.58172,16.42160,0.813634,stroma
