In [1]:
import os
import torch
import numpy as np
import scanpy as sc
from sklearn import metrics

from STING.STING import STING

## Running STING to obtain embeddings for each spot/cell in the ST data

In [2]:
# Change filepath to refer to the data file you wish to use
filepath = './Data/MERA1C1.h5ad'
adata = sc.read_h5ad(filepath)
adata.var_names_make_unique()

# We recommend using the GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


In [3]:
model = STING(adata, device=device, epochs = 600)

# train model
adata = model.train()

  return fn(*args_all, **kw)
  return fn(*args_all, **kw)


Begin to train ST data...


100%|██████████| 600/600 [13:12<00:00,  1.32s/it]


Optimization finished for ST data!


In [4]:
embed = adata.obsm['emb']

print("Embeddings shape - ", embed.shape)

Embeddings shape -  (6000, 64)


### Accessing embeddings
`embed` is an n$\times$d shaped array that contains d-sized embeddings for the n-spots. These embeddings can be used as an input into any clustering algorithm to generate clusters.

## Obtaining Attention Scores

Once we have trained the STING model, we can use the adata file to obtain the attention scores for either the entire slice or each cluster. 

For a slice-wide attention score matrix, you don't need to perform clustering. You can set `clustered = False` when calling `get_attention_matrics`. The function will return two objects. The first is the attention matrix - a numpy array of size g\*g, where g is the number of highly variable genes (HVGs). The second output is the HVG list - a list with the order of the HVGs. You can use both arrays to obtain edge scores between any gene pair.

For a cluster-wide attention score matrix, you need to perform clustering. Store the clusters in `adata.obs['clusters']` for this step and set `clustered = True` when calling `get_attention_matrics`. The function will return three objects. The first is the attention matrix -  a numpy array of size c\*g\*g, where c is the number of clusters, and g is the number of highly variable genes (HVGs). The second output is the HVG list. The third output is the cluster order - a numpy array containing the order of  the clusters in the attention matrix. You can use all three arrays to obtain edge scores between any gene pair in any cluster.

In [5]:
from STING.attention import get_attention_matrices
att_matrix, hvg_list = get_attention_matrices(adata, clustered = False)

  final_avgs = np.nan_to_num(np.divide(avgs + avgs.T, counts + counts.T))


In [6]:
print("Attention matrix shape -", att_matrix.shape)
print("HVG list length -", len(hvg_list))

Attention matrix shape - (1122, 1122)
HVG list length - 1122
