#  Analyse of Spatial Transcriptome data using VGP

This tutorial shows loading, preprocessing, VGP analyse of Spatial Transcriptome dataset.

## Import packages
Here, we’ll import scbean along with other popular packages.

In [1]:
from scbean.model import vgp
import pandas as pd
import multiprocessing as mp
import warnings
warnings.filterwarnings('ignore')

## Loading dataset
This tutorial uses spatial transcriptome data of human breast cancer (layer 2), which contains 14,789 genes measured on 251 spots. It can be downloaded from [Spatial Transcriptomics Research](https://www.spatialresearch.org/resources-published-datasets/doi-10-1126science-aaf2403/).

In [2]:
filepath = 'Users/wyr/data/Layer2_BC_count_matrix-1.tsv'
data = pd.read_csv(filepath, sep='\t')

In [3]:
data

Unnamed: 0.1,Unnamed: 0,GAPDH,USP4,MAPKAPK2,CPEB1,LANCL2,MCL1,TMEM109,TMEM189,ITPK1,...,TREML1,C12orf79,ZCCHC12,ZNF222,TRIM17,RNASEK,KSR2,PCDHGB4,ACOXL,CASQ2
0,17.907x4.967,1,1,1,0,0,2,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,18.965x5.003,7,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,18.954x5.995,5,0,0,0,0,2,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,17.846x5.993,1,0,0,0,0,2,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,20.016x6.019,2,0,1,0,0,2,0,0,1,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
246,23.094x23.975,4,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
247,24.981x23.964,4,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
248,21.874x24.852,4,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
249,23.096x24.93,2,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


Each row represents a two-dimensional position and each column represents a gene.

## Preprocessing
Here, we separate location information and gene expression from the original data to meet the input of the model. In addition, we will filter out some genes and spots with low expression levels.

In [4]:
location = pd.DataFrame(index=data.index)
location['x'] = data['Unnamed: 0'].str.split('x').str.get(0).map(float)
location['y'] = data['Unnamed: 0'].str.split('x').str.get(1).map(float)
data.drop('Unnamed: 0', axis=1, inplace=True)
# Filter practically unobserved genes
data = data.T[data.sum(0) >= 10].T
# genes * location
data = data.T
location = location[data.sum(0) >= 10]
data = data.T[data.sum(0) >= 10].T
# model inputs
X = location.values
Y = data

## Identify SV genes via VGP
The result will be a DataFrame with p-value and adjusted p-value (q-value) for each gene.
We identified genes with q-value less than 0.05 as spatially variable genes.

In [5]:
if __name__ == '__main__':
    result = vgp.run(X, Y, mp.cpu_count())

100%|████████████████████████████████████████████████████████████████████████████| 9907/9907 [4:25:00<00:00,  1.60s/it]
  0%|                                                                                 | 1/9907 [00:00<21:47,  7.57it/s]

Results....................


100%|████████████████████████████████████████████████████████████████████████████| 9907/9907 [00:06<00:00, 1631.04it/s]


In [6]:
result

Unnamed: 0,gene,p_value,q_value
0,GAPDH,0.0,0.0
1,USP4,0.549338,0.549543
2,MAPKAPK2,0.000159,0.00016
3,CPEB1,0.609691,0.610036
4,LANCL2,0.890992,0.89118
...,...,...,...
9902,AC245100.1,0.257729,0.257815
9903,HSD17B6,0.576356,0.576387
9904,ARNTL,0.583607,0.583698
9905,PAQR8,0.000013,0.000013
