The code for the paper "Comprehensive benchmarking of batch integration methods for spatial transcriptomics using a large-scale cancer atlas"
This benchmark uses data from MOSAIC Window, a 60-patient subset of the larger MOSAIC dataset featuring spatial omics data across four cancer indications: Bladder, Ovarian, Glioblastoma, and Mesothelioma. To access the data, researchers must request access through the European Genome-phenome Archive (EGA) portal. Once approved, data can be downloaded using the pyega3 download client. The required files are the Space Ranger count outputs for Visium data. Deconvolution outputs (cell type proportions for each spot) are already provided in data/deconvolution_outputs and are necessary to run the experiments. Additional preprocessing steps for each integration method are specified in the experiment configuration files.
For detailed information on configuring data paths and environment variables, see src/st_benchmark/datasets/README.md.
mkdir st-benchmark-project && cd st-benchmark-project
git clone git@github.com:owkin/st-benchmark.git
cd st-benchmarkNote: This project uses a hybrid package management approach:
- UV manages Python packages and dependencies
- Conda/Mamba manages R packages and system dependencies
zsh: This setup expects you to be usingzsh
make setup-environment
source ~/.zshrcThis will:
- Install Miniforge if not present
- Clean up old conda installations
- Create a mamba alias for easier package management
- Set path variables in you
~/.zshrc
conda create -n st-benchmark "python==3.11"
conda activate st-benchmarkmake external-reposmake install-uvmake lock
make install-python
source ./.venv/bin/activatemake install-R(This will take a rather long time)
Note: Make sure to activate both environments before running experiments:
conda activate st-benchmarkand
source ./.venv/bin/activateor prepend your command with uv run ... to run with the locked Python environment.
Use uv to add a library to the project.
uv add numpyUse uv to update the lock file.
uv lockExperiments are run through the hydra framework.
Make sure to conda activate st-benchmark and then source ./.venv/bin/activate before running your experiments.
An experiment call then looks like
uv run python run.py experiment=st/integration_pca cohort=Bladder experiment_type=all_at_once batch_type=patient data=mosaic_window_devExperiment configs live in the experiment_configs folder as .yaml files. Experiments are highly configurable and can be specified through the command line.
experiment: Determines which of 12 integration methods is used.batch_type: We investigate three types of batch effect--patient, center, and indication.cohort: Each batch effect type has different cohorts. For patient batch effect, we have five cohorts: Glioblastoma, Lymphoma, Bladder, Mesothelioma, and Ovarian.experiment_type: This determines how we perform train-test splitting.all_at_oncerefers to doing all-at-once integration.iidandoodperform the split before integration and, the output metrics are averaged over 5 fold CV.data: You can use either thefulldataset or adevdataset with a small number of randomly subsampled spots.
You can also run sweeps with the following syntax:
uv run python run.py --multirun experiment=st/integration_pca batch_type=patient cohort=Bladder,Mesothelioma,Glioblastoma,Ovarian,Lymphoma experiment_type=all_at_once data=mosaic_window_fullPre-configured experipents scripts that replicate the paper's results are available in the runners/ folder. For more details refer to runners/README.md.
Each such call to run.py will create a folder within st_benchmark/outputs with embedding plots and a metric .csv file, organized by date and time of the run.
Many options for various sweeps are given in st_benchmark/runners but feel free to make your own :
Experiments are composed using Hydra's config composition system. The main_config.yaml selects an experiment config via the defaults list:
defaults:
- experiment: st/integration_combat
- _self_Each experiment config (e.g., experiment/st/integration_pca.yaml) then composes multiple sub-configs:
defaults:
- /default@experiment.default: default # Packages into 'experiment' namespace
- /experiment_type: all_at_once # Global config for experiment type
- /data: mosaic_window_full # Packages into 'data' namespace
- /transform: st/pca # Packages into 'transform' namespace
- _self_ # Includes current config valuesThis allows modular composition: swap experiment_type, data, or transform configs independently to create new experiment variants.
To generate comparison plots and latex tables from sweep experiments, you can run, for example
uv python figures/run_comparison.py --metrics_dir=outputs/2025-09-24/15-33-14The resulting figures and latex tables will be placed in st_benchmark/figures in a subdirectory with the same time and date.
Both scripts generate:
- Comparison plots showing performance gaps across methods and metrics
- Summary tables with statistical analysis
- LaTeX tables for publication
- CSV files with detailed results
The bash script figures/generate_figures.sh contains
