In [None]:
import os
import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import seaborn as sns
# Set figure parameters
sc.set_figure_params(dpi=100, vector_friendly=True) 
# vector_friendly=True rasterizes large objects (such as dots in a scatterplot as pixels).
# More at https://scanpy.readthedocs.io/en/stable/generated/scanpy.set_figure_params.html

In [None]:
## read data
data = sc.read("/home/shared/spatial-workshop-GCB-2025/visium_v1_ustekinumab.h5ad")

ℹ️ AnnData stores data as a hierarchical array store HDF5. A brief introduction on what is usually stored and how it can be accessed is here: [AnnData](https://jupyterhub.ims.bio/user/robin/lab/tree/epyc/robin/GCB2025/spatial-workshop-GCB-2025/slides/anndata-brief.pdf).


In [None]:
## Normalize to sum to 10000 to make total RNA content comparable and then log-normalize data


In [None]:
## Select highly variable genes


In [None]:
## Compute principal components


In [None]:
## Produce batch effect corrected PC embeddings


In [None]:
## Compute UMAP


In [None]:
## Plot UMAP and color by batch



For evaluation of batch effect removal, several metrics are available in [scib](https://scib.readthedocs.io/en/latest/). Often requires true label. Alternatively, cluster labels can be used as true label and batch key can be used as batch label. In this case, assessment will be about whether cluster labels are mixture of batches and not purely of some batches.


<u>Optional</u>: Perform clustering with sc.tl.leiden() and evaluate batch correction.

In [None]:
## Perform Leien clustering with resolution of 1


Other general clustering approaches (KMeans, agglomerative clustering and DBSCAN etc.) are availble from [scikit-learn](https://scikit-learn.org/stable/modules/clustering.html).

In [None]:
## show clusters on UMAP



<u>Questions</u>
1. Find optimal resolution for clustering
2. Use spatial information for clustering

## 1. Find optimal resolution for clustering

Annoatations from the manuscript are available at /home/shared/spatial-workshop-GCB-2025/visium_v1_compartments_gex.csv

<u>Optional<u/>

1. A dictionary of kindey cell type marker genes is available here: /home/shared/spatial-workshop-GCB-2025/kidney_markers.json.  
Based on these lists, annotate above clusters. Is it possible to annotate all cleanly?

In [None]:
## save processed anndata object with clustering info

data.write("visium_v1_processed.h5ad")

## 2. Using spatial Information: 

**2.1 SpatialLeiden**
 
An alternative of Leiden clustering is SpatialLeiden: [Paper](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03489-7), [Code](https://github.com/HiDiHlabs/SpatialLeiden), [Usage](https://spatialleiden.readthedocs.io/stable/usage.html). It integrates several spatial information at various steps of leiden clustering. Does using it improve clustering and homogeneity of markers?

ℹ️ SpatialLeiden has two important hyperparameters:  
1. resolution: resolution for latent space (of gene expression) and spatial layer (coordinates)
2. layer_ratio: ratio of weighting if the latent space and spatial layers.  A higher ratio will increase relevance of the spatial neighbors and lead to more spatially homogeneous clusters

Also, since spot-coordinates are used for clustering and each slide has its own frame of reference, the current implementation is not useful for multi-slide dataset.

In [None]:
sub = data[data.obs["Slide_ID"]=="V4_B"]


<u>Tasks</u>

1. SpatialLeiden also provides a nice way to select "best" resolution and layer_ratio. However, this requires information on the number of clusters. Run SpatialLeiden with n_clusters being the number of cluster obtained with Leiden. Check [Usage]() for script.

2. How does SpatialLeiden clustering compare with Leiden? For this, redoing Leiden clustering for this subset of data (object "sub") is necessary. Does it lead to homogeneous distribution of marker expression? How is the spatial distribution different?

**2.2 Spatial domain identification developed specifically for spatial transcriptomics**

Several more domain-specific spatial domain identification algorithms exist that work with Visium data. Carry on to notebok number 3 (3-visium-domains-2.ipynb or 3-visium-domains-2_complete.ipynb)