Generate `sc.AnnData` by the gene expression file and the spatial coordination
=====

stCluster requires the input as a `sc.AnnData`.
In this section, we will introduce how to generate a `sc.AnnData` by the csv files.

## Data generation
First, we save the DLPFC 151507 slice's gene expression, spatial coordination, and metadata of spots to csv.
To simplify this process, we only save 300 HVGs for each spot.

In [1]:
import scanpy as sc
import pandas as pd
from st_datasets.dataset import get_data, get_dlpfc_data

adata, n_cluster = get_data(dataset_func=get_dlpfc_data, id='151507', top_genes=300)
adata = adata[:, adata.var.highly_variable]
gene_expression = pd.DataFrame(adata.X.todense().A, index=adata.obs.index, columns=adata.var.index).to_csv('gene_exp.csv')
coors = pd.DataFrame(adata.obsm['spatial']).to_csv('coors.csv', index=None)
adata.obs.to_csv('metadata.csv')

>>> INFO: Use local data.
>>> INFO: dataset name: dorsolateral prefrontal cortex (DLPFC), slice: 151507, size: (4226, 33538), cluster: 7.(0.381s)


## Load the files
Then, we can load those data via the file path.  
In the gene expression file, each row is a spot and each column is a gene.

In [2]:
gene_exp_file = pd.read_csv('gene_exp.csv', index_col=0)
gene_exp_file

Unnamed: 0,AL357140.1,EPHA2,C1QC,AL009181.1,TEKT2,NT5C1A,FAM183A,KDM4A-AS1,AL158840.1,AC092813.2,...,YWHAH,C22orf42,RFPL2,PVALB,Z82188.2,FBLN1,CPT1B,PCP4,TFF1,LINC01678
AAACAACGAATAGTTC-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.446557,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.0
AAACAAGTATCTCCCA-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,3.450282,0.0,0.0,0.000000,0.0,1.208025,0.0,0.000000,1.739366,0.0
AAACAATCTACTAGCA-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.0
AAACACCAATAACTGC-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.937048,0.0,0.0,0.000000,0.0,0.000000,0.0,1.378545,0.000000,0.0
AAACAGCTTTCAGAAG-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.403673,0.0,0.0,1.471228,0.0,0.000000,0.0,0.000000,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTGTTGTGTGTCAAGA-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.896793,0.0,0.0,2.257376,0.0,0.000000,0.0,0.000000,0.000000,0.0
TTGTTTCACATCCAGG-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.259678,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.0
TTGTTTCATTAGTCTA-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.580975,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.0
TTGTTTCCATACAACT-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,3.162901,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.0


In the spatial coordination file, each column is an axis.

In [3]:
coors_file = pd.read_csv('coors.csv')
coors_file

Unnamed: 0,0,1
0,3276,2514
1,9178,8520
2,5133,2878
3,3462,9581
4,2779,7663
...,...,...
4221,7464,6239
4222,5045,9466
4223,4218,9703
4224,4017,7906


In the spatial coordination file, each column is a metadata.

In [4]:
spot_metadata_file = pd.read_csv('metadata.csv', index_col=0)
spot_metadata_file

Unnamed: 0,in_tissue,array_row,array_col,cluster
AAACAACGAATAGTTC-1,1,0,16,Layer_1
AAACAAGTATCTCCCA-1,1,50,102,Layer_3
AAACAATCTACTAGCA-1,1,3,43,Layer_1
AAACACCAATAACTGC-1,1,59,19,WM
AAACAGCTTTCAGAAG-1,1,43,9,Layer_6
...,...,...,...,...
TTGTTGTGTGTCAAGA-1,1,31,77,Layer_3
TTGTTTCACATCCAGG-1,1,58,42,Layer_6
TTGTTTCATTAGTCTA-1,1,60,30,WM
TTGTTTCCATACAACT-1,1,45,27,Layer_6


## generate adata
Next, we can generate the `sc.AnnData` object by the stCluster.

In [5]:
from stCluster.utils import gen_adata

adata = gen_adata(gene_exp_file, coors_file, spot_metadata_file, gene_exp_file.columns.to_list(), gene_exp_file.index.to_list())
adata

AnnData object with n_obs × n_vars = 4226 × 300
    obs: 'in_tissue', 'array_row', 'array_col', 'cluster'
    obsm: 'spatial'

The gene expression matrix is saved at `adata.X` as a sparse matrix.
The spatial coordination can be accessed at `adata.obsm['spatial']`