In [1]:
import os

In this tutorial we will run GHIST on an H&E image to generate single-cell spatial expression predictions. This requires a saved model checkpoint. We will use a small demo dataset. This is a subset of 10x Genomics (Breast Cancer In Situ Sample 2): https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast. This demo is expected to be completed within a few minutes.

## Download demo checkpoint

A previously saved checkpoint is provided for this demo. This downloads to `experiments/demo`:

In [2]:
os.chdir("../")

In [3]:
!gdown --folder https://drive.google.com/drive/folders/1ecTOXmSeQU9v8aKYniQQab2QqkOlIl8u?usp=drive_link

Retrieving folder contents
Retrieving folder 1HQW5oRYRf4yvFlR5ojdDaxSb0t0W2Ota demo
Retrieving folder 1PkhNZhzXQNEAHT59lgOC7qLIcchXR7cU models
Processing file 1t0eGI_QrZ2NOpauU0UiUwXJh3BzZ6B-o model.pth
Processing file 1AHi1Y5Z-4R8usQY9hb2tVlKjj4ddNJjo optim.pth
Processing file 1PmF6AqYKIre1tFJskDXPi2BkIIBV2-Yq genes.txt
Processing file 1rdqNu0fZCsXbi-PkwKspVQ24f3LrYiTo standardisation_hist_demo_predict.npy
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From (original): https://drive.google.com/uc?id=1t0eGI_QrZ2NOpauU0UiUwXJh3BzZ6B-o
From (redirected): https://drive.google.com/uc?id=1t0eGI_QrZ2NOpauU0UiUwXJh3BzZ6B-o&confirm=t&uuid=08e54649-6996-4cf4-86a3-8c1aa975db23
To: /dskh/nobackup/helenf/project_GHIST/_GITHUB_restructure_branch/experiments/demo/models/model.pth
100%|████████████████████████████████████████| 117M/117M [00:05<00:00, 23.2MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=1AHi1Y

## Config file

Parameters are defined in a config file (``./configs/config_demo.json`` for this demo). Important parameters include:

- ``comps``: these need to be consistent with the settings used during training. For the demo checkpoint, all components were used.
- ``cell_types``: also need to be consistent with the settings used during training.
- ``data_sources_predict``: locations of data for prediction. For your own data ensure to update`fp_hist` and `fp_nuc_seg`.
- ``regions_predict.divisions``: By default, the whole image will be used for prediction.
- ``experiment_dirs.load_dir``: The experiment ID to load the checkpoint from (``demo`` for this demo, or set to ``latest`` to use the latest experiment by timestamp)

## Get predictions

```sh
python inference.py --config_file configs/FILENAME.json --epoch EPOCH --mode predict --fold_id FOLD --gpu_id GPU_NUM
```

- ``--config_file`` path to config file
- ``--epoch`` specifies which epoch to use, e.g., ``10`` to use the model from epoch 10, or use `last` for the most recent, or `all` for all epochs
- ``--fold_id`` specifies the cross-validation fold (1, 2, 3...) the model was trained from
- ``--gpu_id`` which GPU to use (0, 1, 2...)

In [4]:
!python inference.py --config_file configs/config_demo.json --epoch last --mode predict --fold_id 1 --gpu_id 0

Using GPUs: 0
['B', 'Myeloid', 'Endothelial', 'Fibroblast', 'Macrophage', 'Malignant', 'Epithelial', 'Plasma', 'T']
Num cell types 9
280 genes
Avgexp shape  (63, 280)
Histology image (5120, 5120, 3), Nuclei (5120, 5120)
9517 cells
Patches min/max coords 0 5120
Getting valid patches
100%|████████████████████████████████████████| 529/529 [00:00<00:00, 752.08it/s]
Standardisation
Predict using experiments/demo/models/model.pth
100%|███████████████████████████████████████████| 67/67 [00:20<00:00,  3.34it/s]
Saved predicted expressions of 9517 cells to experiments/demo/predict_output//epoch_demo_predict_expr.csv


## Outputs

The predictions were saved to ``experiments/demo/predict_output/``, and the csv files contain the predicted gene expressions for each cell, where the index is the cell ID that corresponds to the IDs from the nuclei segmentation image, and the columns are the genes. An example is provided as ``example_output.csv`` to show the format.  