# SpaGFT Tutorial

## Outline
1. Installation
2. Import module
3. Loading Visium data
4. QC and preprocessing
5. Find Spatially variable genes
6. Detect tissue module
7. Imputation

### 1. Installation
SpaGFT is a python package to analysis spatial transcriptomics data. To install SpaGFT, the python version is requried to >= 3.7. You can check your python version by:

In [None]:
import platform
platform.python_version()

We recommend you create a virtual environment for running SpaGFT. You achieve this by conda easily:
```shell
conda create -n spagft python==3.8.0
conda activate spgft
```
If you want to exit this virtual envrionment, just run:
```shell
conda deactivate
```
Next, install SpaGFT by
```shell
git colne url
cd dir
conda install requiremnt.txt
python3 setup.py build
python3 setup.py install
```

### 2. Import packages

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scanpy as sc
import SpaGFT as spg
import scanpy as sc
import seaborn as sns

sc.settings.verbosity = 3      
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor='white')

scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.21.5 scipy==1.8.0 pandas==1.4.2 scikit-learn==1.0.2 statsmodels==0.13.2 python-igraph==0.9.10 louvain==0.7.1 pynndescent==0.5.6


Define where save the results:

In [4]:
results_folder = './results/lymph_nodes_analysis/'

### 3. Loading Visium data

In this tutorial, we choosed lymph node Visium data generated by 10X, which could be download easily for analysis. For your personal datasets, just load data to _AnnData_ object. Note that, two parts are essential, count matrix and spatial information, which need to be found by _adata.X_ and _adata.obs_(or _adata.obsm_) respectively.

In [13]:
adata = sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")
adata.var_names_make_unique()
adata.raw = adata

reading /users/PCON0022/jxliu/scripts/python/SpaGFT/SpaGFT/source/spatial/data/V1_Human_Lymph_Node/filtered_feature_bc_matrix.h5
 (0:00:01)


### 4. QC and preprocessing

We proceed to normalize Visium counts data with the built-in normalize_total method from Scanpy and filter genes which express less than 1o spots.

In [14]:
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)

filtered out 16788 genes that are detected in less than 10 cells
normalizing counts per cell
    finished (0:00:00)


### 5. Find Spatially variable genes

In [16]:
gene_df = spg.rank_gene_smooth(adata,
                                ratio_low_freq=0.5,
                                ratio_high_freq=3,
                                ratio_neighbors=1,
                                filter_peaks=True,
                                spatial_info=['array_row', 'array_col'],
                                normalize_lap=False)

Graph Fourier Transform finished!
SVG ranking could be found in adata.obs['Rank']
The spatially variable genes judged by gft_score could be found in adata.obs['cutoff_gft_score']
Gene signals in frequency domain could be found in adata.varm['freq_domain']


In [17]:
gene_df

Unnamed: 0,gene_ids,feature_types,genome,n_cells,gft_score,svg_rank,cutoff_gft_score,pvalue,qvalue
MT-CO2,ENSG00000198712,Gene Expression,GRCh38,4030,3.135962,1,True,1.978833e-12,9.751678e-10
EEF1A1,ENSG00000156508,Gene Expression,GRCh38,4031,3.092944,2,True,1.615654e-12,8.215622e-10
TMSB4X,ENSG00000205542,Gene Expression,GRCh38,4033,3.081130,3,True,5.497518e-11,1.584114e-08
MT-CO1,ENSG00000198804,Gene Expression,GRCh38,4024,3.014566,4,True,1.849670e-12,9.180583e-10
ACTG1,ENSG00000184009,Gene Expression,GRCh38,4031,2.987519,5,True,6.712261e-09,9.597403e-07
...,...,...,...,...,...,...,...,...,...
Z69720.1,ENSG00000269482,Gene Expression,GRCh38,22,0.389564,19809,False,1.334593e-02,2.798807e-01
CYP1B1-AS1,ENSG00000232973,Gene Expression,GRCh38,42,0.378407,19810,False,7.306433e-01,1.000000e+00
LYRM1,ENSG00000102897,Gene Expression,GRCh38,1720,0.375874,19811,False,6.543386e-02,1.000000e+00
MYO15A,ENSG00000091536,Gene Expression,GRCh38,49,0.369546,19812,False,1.737962e-02,3.496291e-01


In [19]:
svg_list = gene_df[gene_df.cutoff_gft_score][gene_df.qvalue < 0.05].index.tolist()

In [26]:
svg_list

['MT-CO2',
 'EEF1A1',
 'TMSB4X',
 'MT-CO1',
 'ACTG1',
 'IFI44L',
 'PKM',
 'ISG15',
 'LBP',
 'GAPDH',
 'RPL32',
 'CXCL12',
 'IFI6',
 'NR4A1',
 'MX1',
 'OASL',
 'FBLN1',
 'SFRP2',
 'IFIT1',
 'AL627171.2',
 'MT-CO3',
 'MT-ATP6',
 'C7',
 'RPS11',
 'CCL2',
 'RPS2',
 'SLC40A1',
 'MT-ND2',
 'IFI44',
 'HSP90AB1',
 'HSPE1',
 'RPS19',
 'RPL31',
 'RPL13A',
 'MT-CYB',
 'RPS18',
 'ACTB',
 'RPS20',
 'TPT1',
 'SAMD9L',
 'MARCO',
 'RPL37',
 'AL357556.4',
 'THBS1',
 'XAF1',
 'RPL13',
 'RPLP2',
 'STAT1',
 'MT-ND1',
 'MT-ND4',
 'CD209',
 'OAS1',
 'IGHM',
 'USP18',
 'JUNB',
 'LTF',
 'RPL7',
 'CCL14',
 'DIO2',
 'RPL27A',
 'TIMP3',
 'MT-ND3',
 'RPS29',
 'HSPA8',
 'TMSB10',
 'RPL38',
 'HP',
 'RPL39',
 'ISM1',
 'PSME2',
 'MGP',
 'MT-ND5',
 'OAS2',
 'RPS26',
 'MALAT1',
 'AC092053.2',
 'IGHG4',
 'EIF2AK2',
 'KPNB1',
 'RNASE1',
 'EIF4A1',
 'RPS6',
 'IGHG2',
 'RPL37A',
 'COX5A',
 'CMPK2',
 'IFIT3',
 'NPM1',
 'ADAMTS4',
 'RSAD2',
 'OVCH1-AS1',
 'OAS3',
 'AC092902.4',
 'PPA1',
 'RPL23',
 'UBA52',
 'RPS25',
 'RPS4X'

In [None]:
sc.pl.spatial(adata, color=svg_list[:5])

### 6. Detect tissue module

In [22]:
spg.gft.find_tissue_module(adata)

Gene signals in frequency domain could be found in  adata.varm['freq_domain']
computing neighbors
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:17)
computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:20)
computing neighbors
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 9 clusters and added
    'louvain', the cluster labels (adata.obs, categorical) (0:00:00)
computing neighbors
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (

In [23]:
adata

AnnData object with n_obs × n_vars = 4035 × 19813
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'gft_score', 'svg_rank', 'cutoff_gft_score', 'pvalue', 'qvalue', 'tm_genes'
    uns: 'spatial', 'log1p', 'frequencies_svg', 'fms_low', 'fms_high'
    obsm: 'spatial', 'tm_expression', 'tm_region', 'subTm_expression', 'subTm_region'
    varm: 'freq_domain_svg', 'gft_umap'

In [25]:
tm_expression_df = adata.obsm['tm_expression']
tm_region_df = adata.obsm['tm_region']

In [28]:
tm_expression_df

Unnamed: 0,tm_0,tm_1,tm_2,tm_3,tm_4,tm_5,tm_6,tm_7,tm_8
AAACAAGTATCTCCCA-1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
AAACAATCTACTAGCA-1,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
AAACACCAATAACTGC-1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
AAACAGAGCGACTCCT-1,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
AAACAGCTTTCAGAAG-1,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...
TTGTTTCACATCCAGG-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TTGTTTCATTAGTCTA-1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
TTGTTTCCATACAACT-1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TTGTTTGTATTACACG-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 7. Imputation

In [29]:
new_count = spg.low_pass_imputation(adata)

In [30]:
new_count.iloc[:50, :10]

Unnamed: 0,AL627309.1,AL627309.5,LINC01409,LINC01128,LINC00115,FAM41C,AL645608.2,LINC02593,SAMD11,NOC2L
AAACAAGTATCTCCCA-1,0.036549,0.0,0.131275,0.15092,0.158153,0.0,0.0,0.0,0.0,0.941929
AAACAATCTACTAGCA-1,0.004474,0.0,0.091009,0.346172,0.100323,0.0,0.0,0.0,0.013964,0.495774
AAACACCAATAACTGC-1,0.067605,0.018366,0.088792,0.25585,0.037115,0.097764,0.0,0.108837,0.354739,0.602693
AAACAGAGCGACTCCT-1,0.029849,0.0,0.038388,0.207653,0.130876,0.0,0.024258,0.099174,0.037837,0.646222
AAACAGCTTTCAGAAG-1,0.0,0.0,0.034172,0.204094,0.0,0.0,0.0,0.014275,0.099182,0.283672
AAACAGGGTCTATATT-1,0.0,0.053021,0.097055,0.323413,0.114272,0.0,0.000262,0.023805,0.133081,0.722971
AAACAGTGTTCCTGGG-1,0.003403,0.0,0.090111,0.248749,0.0,0.059784,0.0,0.010839,0.066807,0.525474
AAACATTTCCCGGATT-1,0.0,0.000265,0.041697,0.248721,0.021183,0.0,0.014332,0.01626,0.161709,0.705501
AAACCCGAACGAAATC-1,0.035312,0.064094,0.017209,0.112767,0.0,0.0,0.000252,0.0,0.000299,0.775006
AAACCGGGTAGGTACC-1,0.105811,0.035272,0.027631,0.532343,0.088621,0.057544,0.015388,0.0,0.101351,0.582037
