# In-silico perturbation for customized pathway databases
UNAGI allows users to use their own pathway databases or systematically perturb some pathways of their interests. Here is the guidance to show the format of pathway database that UNAGI can recognize.

## Built-in pathway databases 
We downloaded pathway database from [GSEA](https://www.gsea-msigdb.org/gsea/index.jsp), it includes pathways from REACTOME, MatrisomeDB, and KEGG. Then the database was parsed into a `.npy` file. The format of built-in pathway database is shown as below:

In [6]:
import numpy as np
built_in_pathway_data = np.load('../data/gesa_pathways.npy',allow_pickle=True).item()

#The keys are the pathway names
print(list(built_in_pathway_data.keys())[:10]) # show first 10 pathways
#The values are the gene sets
print(built_in_pathway_data['BIOCARTA_GRANULOCYTES_PATHWAY'])

['BIOCARTA_GRANULOCYTES_PATHWAY', 'BIOCARTA_LYM_PATHWAY', 'BIOCARTA_BLYMPHOCYTE_PATHWAY', 'BIOCARTA_CARM_ER_PATHWAY', 'BIOCARTA_LAIR_PATHWAY', 'BIOCARTA_VDR_PATHWAY', 'BIOCARTA_MTA3_PATHWAY', 'BIOCARTA_GABA_PATHWAY', 'BIOCARTA_EGFR_SMRTE_PATHWAY', 'BIOCARTA_MONOCYTE_PATHWAY']
['CXCL8', 'IFNG', 'IL1A', 'CSF3', 'SELP', 'ITGAM', 'ITGAL', 'TNF', 'ITGB2', 'PECAM1', 'ICAM2', 'C5', 'SELPLG', 'ICAM1', 'SELL']


## Customize pathway database
As long as the pathway database follows the previous `.npy` file format, UNAGI is able to recognize the customized database. Here is an example: 

In [4]:
import numpy as np
customized_pathway_database = {}
customized_pathway_database['Pathway_A'] = ['COL6A3', 'MET', 'COL7A1', 'MMP1', 'COL11A1', 'COL1A2', 'COL5A2', 'COL4A3', 'COL12A1', 'COL10A1', 'COL5A1', 'COL3A1', 'COL4A4', 'COL14A1', 'COL8A1', 'MMP9', 'COL4A1', 'MMP7', 'COL15A1', 'COL1A1', 'COL17A1', 'COL4A6']
customized_pathway_database['Pathway_B'] = ['MAP2','THBS1',]
np.save('customized_pathway_database.npy',customized_pathway_database)

## Perform in-silico pathway perturbation on the customized pathway database
Genes that are not overlapped with the input single data will be ignored.

In [1]:
from UNAGI import UNAGI
import warnings
warnings.filterwarnings("ignore")
unagi = UNAGI()
data_path = 'PATH_TO_TARGET/dataset.h5ad'
iteration = 0 #which iteration of the model to use
change_level = 0.5 #reduce the expression to 50% of the original value
customized_pathway = 'PATH_TO/customized_pathway_database.npy'
results = unagi.customize_pathway_perturbation(data_path,iteration,customized_pathway,change_level,target_dir='PATH_TO_TARGET',device='cuda:0')

  from .autonotebook import tqdm as notebook_tqdm


calculateDataPathwayOverlapGene done
Start perturbation....
track: 10
processing....
track: 11
processing....
track: 0
processing....
track: 1
processing....
track: 4
processing....
track: 8
processing....
track: 9
processing....
track: 3
processing....
track: 5
processing....
track: 6
processing....
track: 2
processing....
track: 7
processing....
random background done
Finish results analysis


In [2]:
 from UNAGI.perturbations import get_top_pathways
get_top_pathways(results, change_level, top_n=10)

Unnamed: 0,pathways,perturbation score,pval_adjusted,regulated genes,idrem_suggestion
0,Pathway_A,0.473684,0.000713,"[COL6A3, MET, COL7A1, MMP1, COL11A1, COL1A2, C...","[COL6A3:-, MET:-, COL7A1:-, MMP1:-, COL11A1:-,..."
