# Load the packages

In [1]:
import numpy as np
import scanpy as sc

# Load the data

In [2]:
adata = sc.read_h5ad('/Users/junjietang/Desktop/GeneLLM/Heimdall-main/data/adata_C57BL6J-638850.01.h5ad')

### This dataset is a MERFISH-RNA dataset, which contains the gene expression and spatial information of the cells (23389 cells and 550 genes). The CCC inference task for this dataset is to infer whether two spatially neighbor cells communicate through specific Ligand-creceptor pairs based on the expression level of these two cells.

In [3]:
print(adata)

AnnData object with n_obs × n_vars = 23389 × 550
    obs: 'cells', 'Tissue', 'Technology', 'xcoord', 'ycoord', 'donor_label', 'donor_genotype', 'donor_sex', 'neurotransmitter', 'class', 'subclass', 'supertype', 'cluster'
    var: 'genes'
    obsm: 'spatial'
    obsp: 'Agt -> Agtr1a', 'Calcb -> Calcr', 'Gal -> Galr1', 'Grp -> Grpr', 'Hgf -> Met', 'Kitl -> Kit', 'Lama1 -> Sv2b', 'Lama1 -> Sv2c', 'Pdyn -> Oprd1', 'Pdyn -> Oprk1', 'Penk -> Oprd1', 'Penk -> Oprk1', 'Ptprm -> Ptprm', 'Tac2 -> Tacr3'


### The classification labels, which represent whether two spatially neighbor cells communicate through a specific Ligand-receptor pair, are list in a dict formed "adata.obsp". Specifically, keys represent the Ligand-receptor pair (e.g., Agt(Ligand) -> Agtr1a(Receptor)), and values represent the induced silver standard cell-to-cell interaction relationship through the ligand-receptor pair.

In [4]:
print(adata.obsp.keys())
print("The number of used Ligand-receptor pairs in this dataset: " + str(len(adata.obsp.keys())))

KeysView(PairwiseArrays with keys: Agt -> Agtr1a, Calcb -> Calcr, Gal -> Galr1, Grp -> Grpr, Hgf -> Met, Kitl -> Kit, Lama1 -> Sv2b, Lama1 -> Sv2c, Pdyn -> Oprd1, Pdyn -> Oprk1, Penk -> Oprd1, Penk -> Oprk1, Ptprm -> Ptprm, Tac2 -> Tacr3)
The number of used Ligand-receptor pairs in this dataset: 14


### The value of a specific key (i.e., Ligand-receptor pair) is cell-by-cell matrix, in which elements "1" represent two spatially neighbor cells are interacted through a specific Ligand-receptor pair, elements "-1" represent two cells are not interacted through a specific Ligand-receptor pair. The elements "0" are not considered in the task.


In [5]:
adata.obsp['Agt -> Agtr1a']

<23389x23389 sparse matrix of type '<class 'numpy.int32'>'
	with 233890 stored elements in Compressed Sparse Row format>

In [6]:
print("The unique elements in this matrix: " + str(np.unique(np.array(adata.obsp['Agt -> Agtr1a'].todense())).tolist()))

The unique elements in this matrix: [-1, 0, 1]


### In summary, the target of this task is as follow. Given the expression value of two cells and a specific Ligand-receptor pair, train a model with binary output to determine whether the two cells interact through such a ligand-receptor pair (i.e., label "-1" or "1").