## DAISY- the DAta-mIning SYnthetic-lethality-identification pipeline
```
Title:   Data Mining Synthetic Lethality Identification Pipeline (DAISY)
Author:  Bahar Tercan
Created: 02-07-2022
Purpose: Retrieve Synthetic Lethal Partners of The Genes in the Given List Using DAISY Algorithm 
Notes: Runs in MyBinder 
```


If any piece of this analysis is used in a publication please cite the following:

For the DAISY algorithm:  
Jerby-Arnon, L., Pfetzer, N., Waldman, Y. Y., McGarry, L., James, D., Shanks, E., ... & Gottlieb, E. (2014). Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality. Cell, 158(5), 1199-1209.

For the CCLE Omics data:  
Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3

For the CRISPR Data:  
Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984
Dempster, J. M., Rossen, J., Kazachkova, M., Pan, J., Kugener, G., Root, D. E., & Tsherniak, A. (2019). Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell Lines. BioRxiv, 720243.

For the shRNA Data:  
James M. McFarland, Zandra V. Ho, Guillaume Kugener, Joshua M. Dempster, Phillip G. Montgomery, Jordan G. Bryan, John M. Krill-Burger, Thomas M. Green, Francisca Vazquez, Jesse S. Boehm, Todd R. Golub, William C. Hahn, David E. Root, Aviad Tsherniak. (2018). Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration. Nature Communications 9, 1. https://doi.org/10.1038/s41467-018-06916-5

For ISB-CGC:  
Reynolds, S. M., Miller, M., Lee, P., Leinonen, K., Paquette, S. M., Rodebaugh, Z., ... & Shmulevich, I. (2017). The ISB Cancer Genomics Cloud: a flexible cloud-based platform for cancer genomics research. Cancer research, 77(21), e7-e10.

For Pancancer Atlas Data:  
Hutter, C., and Zenklusen, J.C. (2018). The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell 173, 283–285.

This notebook is a reimplementation of DAISY Synthetic Lethal Pair Prediction Algorithm. It consists of 3 modules which are aggregated into one ranked list of candidate SL pairs:
1. SL candidate determination using gene co-expression
2. SL candidate determination using survival of fittest
3. SL candidate determination using CRISPR and shRNA experiments

Input Parameters
* Cancer type 
* Genes whose SL partners are desired

Input Data (available in BigQuery tables)
* Gene expression data 
* Gene mutation data
* Copy number variation data
* Gene effect data (CRISPR)
* Gene dependency scores data (shRNA)

Output
* List of candidate SL pairs

Please contact Bahar Tercan btercan@systemsbiology.org for your questions and detailed information. 

In [None]:
# This code block installs the dependencies, please run it only once, the first time you run this notebook
!pip3 install google.cloud
!pip3 install importlib
!pip3 install pandas
!pip3 install ipywidgets
!pip3 install numpy
!pip3 install statsmodels


### 1. Import the python libraries


In [1]:
import sys
sys.path.append('../Scripts/') # to be able to use the .py files in ../Scripts folder
from google.cloud import bigquery
import importlib
import pandas as pd
import DAISY_operations
importlib.reload(DAISY_operations)
from DAISY_operations import *
import ipywidgets as widgets

In [3]:
if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

### 2. Google Authentication
Running the BigQuery cells in this notebook requires a Google Cloud Project, instructions for creating a project can be found in the [Google Documentation](https://cloud.google.com/resource-manager/docs/creating-managing-projects#console). The instance needs to be authorized to bill the project for queries.
For more information on getting started in the cloud see ['Quick Start Guide to ISB-CGC'](https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/HowToGetStartedonISB-CGC.html) and alternative authentication methods can be found in the [Google Documentation](https://googleapis.dev/python/google-api-core/latest/auth.html).

In [None]:
# Please make sure that you have installed Cloud SDK.
# See support from https://cloud.google.com/sdk/docs/install

!gcloud auth application-default login


### 3. Prediction of synthetic lethal partners using different modules in DAISY


There are three modules for synthetic lethal pair inferences in DAISY :
1. Pairwise gene coexpression.  
2. Genomic survival of the fittest.  
3. shRNA or CRISPR based functional examination.
You can get more information in the original paper : https://www.sciencedirect.com/science/article/pii/S0092867414009775.

In the pairwise gene coexpression module and genomic survial of the fittest module, we will use PancancerAtlas and CCLE data.  
In the functional examination module, we will use CRISPR and shRNA data together with CCLE data. <br>

Python codes required are in the ../Scripts/ folder and they are imported at the beginning. 


#### 3.0. Runtime parameters for DAISY

For SDL prediction replace 'SL' with 'SDL' and 'Inactive' with 'Overactive' below.  
For SOF and FuncExamination Procedures, input_mutations is an optional parameter, if you don't want to use, you can skip 

In [7]:
input_mutations = ['Nonsense_Mutation', 'Frame_Shift_Ins', 'Frame_Shift_Del'] 
# DAISY default parameters for SL prediction
percentile_threshold = 10 # used for deciding whether the gene is inactive
cn_threshold = -0.3 # used for deciding whether the gene is inactive
cor_threshold = 0.5 # used for inferring whether two genes are in SL relationship
p_threshold = 0.05
pval_correction = 'Bonferroni'
fdr_level='gene_level' #it can be gene_level or analysis_level

# for SDL prediction DAISY parameters are 
#percentile_threshold = 90
#cn_threshold = 0.3 
#cor_threshold = 0.5
#p_threshold = 0.05


The tumor types are the TCGA cancer types, the cancer types that have corresponding Celllines are listed in the combobox. Please click on the tissue(s) you want to do the analyses on.

In [8]:
TCGA_list=GetTCGASubtypes(client)
TCGA_list = [i for i in TCGA_list if i]

tumor_type = widgets.SelectMultiple(
    options=['pancancer'] + TCGA_list  ,
    value=[],
    description='Tumor type',
    disabled=False
)
display(tumor_type)

SelectMultiple(description='Tumor type', options=('pancancer', 'ALL', 'LAML', 'CML', 'NSCLC', 'SCLC', 'SKCM', …

The list of genes for which we would like to find SL partners.

In [9]:
gene_list=["BRCA1", "BRCA2", "ARID1A"] # any number of genes in list format

#### 3.1. Pairwise gene coexpression module

In the pairwise co-expression module, DAISY makes inferences based on the assumption that synthetic lethal gene pairs play a role in related biological processes and are co-expressed. Gene expression was measured in TCGA patient-derived tumor samples and CCLE cancer cell lines. Pairwise co-expression is estimated from the Spearman correlation calculated for all pairs of genes of interest. Candidate synthetic lethal gene pairs are those with correlation coefficient greater than 0.5 and whose Bonferroni-corrected P value was smaller than 0.05 by default, these parameters can be set below: 


3.1.1. Pairwise gene coexpression module on PancancerAtlas.

In [10]:
coexp_pancancer = CoexpressionAnalysis(client, 'SL', "PanCancerAtlas", gene_list , pval_correction, fdr_level, list(tumor_type.value))
try:
    coex_pan_intermediate_report=coexp_pancancer.loc[(coexp_pancancer['FDR'] < p_threshold)&(coexp_pancancer['Correlation'] > cor_threshold)]
    coexp_pancancer_report=coex_pan_intermediate_report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    coexp_pancancer_report=pd.DataFrame()
    print("No results returned.")
    
coexp_pancancer_report

Unnamed: 0_level_0,Unnamed: 1_level_0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ARID1A,0,ARID1A,ARID1A,SPEN,9953,0.726948,0.0,0.0,['pancancer']
ARID1A,101,ARID1A,ARID1A,ZNF407,9953,0.525198,0.0,0.0,['pancancer']
ARID1A,102,ARID1A,ARID1A,BOD1L,9953,0.525071,0.0,0.0,['pancancer']
ARID1A,103,ARID1A,ARID1A,C2CD3,9953,0.524484,0.0,0.0,['pancancer']
ARID1A,104,ARID1A,ARID1A,ZNF644,9953,0.524323,0.0,0.0,['pancancer']
...,...,...,...,...,...,...,...,...,...
BRCA2,41196,BRCA2,BRCA2,MLF1IP,9953,0.649312,0.0,0.0,['pancancer']
BRCA2,41197,BRCA2,BRCA2,DHX15,9953,0.649128,0.0,0.0,['pancancer']
BRCA2,41198,BRCA2,BRCA2,RAD54B,9953,0.649032,0.0,0.0,['pancancer']
BRCA2,41218,BRCA2,BRCA2,CCNB1,9953,0.638053,0.0,0.0,['pancancer']


**Inactive/Overactive** (for SL and for SDL respectively): The gene name coming from the input list.  
**InactiveDB/OveractiveDB**: The corresponding gene name in the dataset (Pancancer Atlas).  
**SL_Candidate**: The Candidate SL pair of the gene in the same row.  
**#Samples**: The number of samples that have been used in Correlation Calculation  
**Correlation**: The Spearman correlation rho on gene expression.  
**PValue**: The Spearman correlation p value  
**FDR**	:Corrected p value  
**Tissue**: The tissue(s) that the analysis has been performed. 



<br>
3.1.2. Pairwise gene coexpression module on CCLE data

In [11]:
coexp_CCLE=CoexpressionAnalysis(client, 'SL', 'CCLE', gene_list, pval_correction, fdr_level, list(tumor_type.value ))
try: 
    coex_ccle_intermediate_report=coexp_CCLE.loc[(coexp_CCLE['FDR'] < p_threshold)&(coexp_CCLE['Correlation'] > cor_threshold)]
    coexp_CCLE_report=coex_ccle_intermediate_report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    coexp_CCLE_report=pd.DataFrame()
    print("No results returned.")
coexp_CCLE_report    
    

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
0,ARID1A,ARID1A,HCFC1,1297,0.779066,8.589353e-265,1.642198e-260,['pancancer']
1,ARID1A,ARID1A,SPEN,1297,0.765432,4.079862e-250,7.800289e-246,['pancancer']
2,ARID1A,ARID1A,SF1,1297,0.763544,3.668113e-248,7.013066e-244,['pancancer']
3,ARID1A,ARID1A,UBTF,1297,0.748176,6.490310e-233,1.240882e-228,['pancancer']
4,ARID1A,ARID1A,KMT2D,1297,0.733084,5.556467e-219,1.062341e-214,['pancancer']
...,...,...,...,...,...,...,...,...
38488,BRCA2,BRCA2,SUV39H2,1297,0.501409,1.681160e-83,3.213873e-79,['pancancer']
38489,BRCA2,BRCA2,RRM1,1297,0.501355,1.760925e-83,3.366359e-79,['pancancer']
38490,BRCA2,BRCA2,SMARCA5,1297,0.501238,1.949504e-83,3.726867e-79,['pancancer']
38491,BRCA2,BRCA2,CEP295,1297,0.501160,2.086518e-83,3.988797e-79,['pancancer']


The same analysis/output with Pancancer Atlas Coexpression analysis except for the dataset used (CCLE). 

#### 3.2. Genomic survival of fittest module
The genomic survival of the fittest inference module is based on the copy number of the gene in the search domain, given whether the gene of interest is inactive or not.The gene of interest in a sample is considered inactive if its expression is in the 10th percentile across all samples and its copy number alteration is less than -0.3 or if it has a nonsense, frame shift or frame-del mutation.The gene of interest in a sample is considered overactive if it has gene expression in the 90th percentile across all samples and its copy number alteration is greater than 0.3 (over-activity is used in synthetic dosage lethal pair prediction)
The one-sided Wilcoxon rank-sum (Mann-Whitney U) test was applied to the copy number measure of the candidate synthetic lethal pair of each gene of interest. The higher copy number of the candidate synthetic lethal pair for  the samples whose gene of interest is inactive (overactive) is considered as an indicator of the genes being in a synthetic lethal or synthetic dosage lethal relationship.The SL/SDL pairs with Bonferroni-corrected p-values of less than 0.05 were returned. This inference procedure was applied to PanCancer Atlas and CCLE data separately. 

3.2.1. Genomic survival of fittest module on PancancerAtlas data

In [13]:
sof_pancancer = SurvivalOfFittest(client, 'SL', "PanCancerAtlas", gene_list, percentile_threshold, cn_threshold, pval_correction, fdr_level, list(tumor_type.value), input_mutations)
try:
    sof_pancancer_intermediate_report=sof_pancancer.loc[(sof_pancancer['FDR'] < p_threshold),]                
    sof_pancancer_report=sof_pancancer_intermediate_report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    sof_pancancer_report=pd.DataFrame()
    print("No results returned.") 
sof_pancancer_report

Unnamed: 0_level_0,Unnamed: 1_level_0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,PValue,FDR,Tissue
Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ARID1A,0,ARID1A,ARID1A,ENTPD7,879,8930,2.053913e-14,5.161072e-10,['pancancer']
ARID1A,1,ARID1A,ARID1A,SLC25A28,879,8930,2.436940e-14,6.123542e-10,['pancancer']
ARID1A,2,ARID1A,ARID1A,MYOF,879,8930,2.692291e-14,6.765188e-10,['pancancer']
ARID1A,3,ARID1A,ARID1A,RNLS,879,8930,2.969847e-14,7.462631e-10,['pancancer']
ARID1A,4,ARID1A,ARID1A,ANKRD1,879,8930,3.075318e-14,7.727659e-10,['pancancer']
...,...,...,...,...,...,...,...,...,...
BRCA1,2813,BRCA1,BRCA1,OR2A2,229,8930,1.968141e-06,4.945543e-02,['pancancer']
BRCA1,2815,BRCA1,BRCA1,LRRC28,229,8930,1.976030e-06,4.965368e-02,['pancancer']
BRCA1,2816,BRCA1,BRCA1,FAM177A1,229,8930,1.976277e-06,4.965989e-02,['pancancer']
BRCA1,2818,BRCA1,BRCA1,ZFP82,229,8930,1.983702e-06,4.984646e-02,['pancancer']


**Inactive/Overactive** (for SL and for SDL respectively): The gene name coming from the input list.  
**InactiveDB/OveractiveDB**: The corresponding gene name in the dataset (Pancancer Atlas).  
**SL_Candidate**: The Candidate SL pair of the gene in the same row.  
**#InactiveSamples:** The number of inactive samples.  
**#Samples:** The total number of samples that the analysis is performed on.  
**PValue:** The p values for the Wilcoxon umpaired one sided test on Somatic Copy Number Alteration data  
**FDR:** Corrected p value  
**Tissue:** The tissue(s) that the analysis has been performed.

 3.2.2. Genomic survival of fittest module on CCLE data

In [12]:
sof_CCLE = SurvivalOfFittest(client, 'SL', "CCLE", gene_list,  percentile_threshold, cn_threshold, pval_correction, fdr_level, list(tumor_type.value), input_mutations)
try: 
    sof_ccle_intermediate_report=sof_CCLE.loc[(sof_CCLE['FDR'] < p_threshold),]
    sof_ccle_report=sof_ccle_intermediate_report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    sof_ccle_report=pd.DataFrame()
    print("No results returned.")
sof_ccle_report

Unnamed: 0_level_0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,PValue,FDR,Tissue
Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1


The same analysis/output  with Pancancer Atlas Survival of Fittest analysis except for the dataset used (CCLE). 

#### 3.3. Functional examination inference module

The rationale for the functional examination inference module is that if the synthetic lethal partner of a gene is inactive in a given sample, subsequent inactivation of that gene will be lethal. Therefore, for a gene of interest, we first defined two groups for the test, one in which the gene was inactive and the other in which it was not. We then performed a one-sided Wilcoxon rank-sum (Mann-Whitney U) test on the knockdown/knockout sensitivity of candidate synthetic lethal pairs of interest. Lower viability that is associated with higher knockout/knockdown sensitivity is an indicator of a potential SLI. The synthetic lethal pairs for  whom the test result P value was lower than 0.05 were returned. This inference procedure was applied to the gene-dependency scores or gene effect scores for the shRNA and CRISPR datasets separately. 

3.3.1. CRISPR based functional examination inference module

In [14]:
crispr_result = FunctionalExamination(client,'SL', "CRISPR", gene_list, percentile_threshold, 
                                      cn_threshold, pval_correction,  fdr_level, list(tumor_type.value), input_mutations )
try:
    crispr_intermediate_report=crispr_result.loc[(crispr_result['PValue'] < p_threshold),]
    crispr_report=crispr_intermediate_report.groupby('Inactive').apply(lambda x: x.sort_values('PValue'))
except:
    crispr_report=pd.DataFrame()
    print("No results returned.")
crispr_report   

Unnamed: 0_level_0,Unnamed: 1_level_0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,PValue,FDR,Tissue
Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ARID1A,0,ARID1A,ARID1A,ARID1B,100,771,1.114187e-09,0.000020,['pancancer']
ARID1A,1,ARID1A,ARID1A,CCDC102A,101,782,6.694568e-07,0.012130,['pancancer']
ARID1A,2,ARID1A,ARID1A,ARHGEF7,101,782,1.747027e-06,0.031654,['pancancer']
ARID1A,3,ARID1A,ARID1A,RPL22L1,101,782,2.011955e-06,0.036455,['pancancer']
ARID1A,5,ARID1A,ARID1A,PAK2,101,782,4.170099e-06,0.075558,['pancancer']
...,...,...,...,...,...,...,...,...,...
BRCA2,3873,BRCA2,BRCA2,IGLON5,71,782,4.970267e-02,1.000000,['pancancer']
BRCA2,3880,BRCA2,BRCA2,FRMD4A,71,782,4.981589e-02,1.000000,['pancancer']
BRCA2,3883,BRCA2,BRCA2,DSTYK,71,782,4.987257e-02,1.000000,['pancancer']
BRCA2,3886,BRCA2,BRCA2,CCDC170,71,782,4.992931e-02,1.000000,['pancancer']


**Inactive/Overactive** (for SL and for SDL respectively): The gene name coming from the input list.  
**InactiveDB/OveractiveDB**: The corresponding gene name in the dataset (Pancancer Atlas).  
**SL_Candidate**: The Candidate SL pair of the gene in the same row.  
**#InactiveSamples:** The number of inactive samples.  
**#Samples:** The total number of samples that the analysis is performed on.  
**PValue:** The p values for the Wilcoxon umpaired one sided test on CRISPR gene essentiality data.  
**FDR:** Corrected p value  
**Tissue:** The tissue(s) that the analysis has been performed.

<br>
3.3.2. shRNA based functional examination inference module

In [15]:
shRNA_result = FunctionalExamination(client, 'SL', "shRNA", gene_list , percentile_threshold, \
                                     cn_threshold, pval_correction,  fdr_level, list(tumor_type.value),input_mutations)
try:
    shRNA_intermediate_report=shRNA_result.loc[(shRNA_result['PValue'] < p_threshold),]
    shRNA_report=shRNA_intermediate_report.groupby('Inactive').apply(lambda x: x.sort_values('PValue'))
    
except:
    shRNA_report=pd.DataFrame()
    print("No results returned.")
shRNA_report

Unnamed: 0_level_0,Unnamed: 1_level_0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,PValue,FDR,Tissue
Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ARID1A,0,ARID1A,ARID1A,SIRPA,109,651,0.000006,0.108836,['pancancer']
ARID1A,1,ARID1A,ARID1A,BTF3,110,651,0.000008,0.145394,['pancancer']
ARID1A,3,ARID1A,ARID1A,KCNK1,109,649,0.000010,0.169046,['pancancer']
ARID1A,4,ARID1A,ARID1A,VAV2,110,651,0.000022,0.380350,['pancancer']
ARID1A,5,ARID1A,ARID1A,RPS6KC1,109,649,0.000031,0.527259,['pancancer']
...,...,...,...,...,...,...,...,...,...
BRCA2,3227,BRCA2,BRCA2,PPP1R2P3,57,506,0.049892,1.000000,['pancancer']
BRCA2,3228,BRCA2,BRCA2,SLC43A3,57,506,0.049892,1.000000,['pancancer']
BRCA2,3229,BRCA2,BRCA2,SGO2,57,506,0.049892,1.000000,['pancancer']
BRCA2,3230,BRCA2,BRCA2,HAND2,76,651,0.049903,1.000000,['pancancer']


The same analysis/output with CRISPR Functional Examination analysis except for the dataset used (shRNA- gene dependency). 

### 4. Integration of results

4.1. Integration of the pairwise Co-expression gene co-expression results on Pancancer and CCLE

The union of results from PanCancer Atlas and CCLE was used. 


In [16]:
try:
    coexpression_result = UnionResults([coexp_pancancer_report, coexp_CCLE_report],'SL', ['FDR', 'FDR'],  list(tumor_type.value))
    coexpression_result=coexpression_result.sort_values('Inactive')
except:
    coexpression_result=pd.DataFrame()
    print("No Result From Pairwise Co-expression Inference Procedure")
    
coexpression_result

Unnamed: 0,Inactive,SL_Candidate,FDR0,FDR1,Tissue
0,ARID1A,SPEN,0.0,7.800289e-246,['pancancer']
1477,ARID1A,MCM4,,6.419231e-112,['pancancer']
1476,ARID1A,ZNF346,,4.861848e-112,['pancancer']
1475,ARID1A,AKAP8L,,3.725635e-112,['pancancer']
1474,ARID1A,MDM4,,3.436705e-112,['pancancer']
...,...,...,...,...,...
779,BRCA2,FAM102B,0.0,,['pancancer']
780,BRCA2,RPAP2,0.0,,['pancancer']
781,BRCA2,STAG1,0.0,,['pancancer']
765,BRCA2,C9orf41,0.0,,['pancancer']


**Inactive:** The Inactive gene name  
**SL_Candidate/SDL_Candidate:** The candidate SL or SDL partner  
**p values or FDRs** The aggregated p value or FDR with Fishers method  
**Tissue:** The tissue(s) that the analysis has been performed.

    

<br>
4.2. Integration of Survival of Fittest results on Pancancer and CCLE

To integrate the results we created the union of results from PanCancer Atlas and CCLE. 

In [17]:
try:
    sof_result = UnionResults([sof_ccle_report, sof_pancancer_report],  'SL', ['FDR', 'FDR'], list(tumor_type.value))
    sof_result=sof_result.sort_values('Inactive')
except:
    sof_result=pd.DataFrame()
    print("No Result From Survival of Fittest Inference Procedure")
sof_result    

Unnamed: 0,Inactive,SL_Candidate,FDR0,Tissue
0,ARID1A,ENTPD7,5.161072e-10,['pancancer']
649,ARID1A,PTPN20A,6.916944e-04,['pancancer']
650,ARID1A,PTPN20B,6.916944e-04,['pancancer']
651,ARID1A,IPMK,7.616928e-04,['pancancer']
652,ARID1A,TFAM,7.727423e-04,['pancancer']
...,...,...,...,...
1590,BRCA1,ACAN,1.153027e-04,['pancancer']
1589,BRCA1,KIF23,1.148171e-04,['pancancer']
1588,BRCA1,MFGE8,1.141844e-04,['pancancer']
1586,BRCA1,AEN,1.085066e-04,['pancancer']


**Inactive:** The Inactive gene name  
**SL_Candidate/SDL_Candidate:** The candidate SL or SDL partner  
**p values or FDRs** The aggregated p value or FDR with Fishers method  
**Tissue:** The tissue(s) that the analysis has been performed.

    

<br>
4.3. Integration of shRNA and CRISPR based functional examination inference module.

We report the union of results from shRNA and CRISPR-based datasets. 

In [18]:
try:
    functional_screening_result = UnionResults([crispr_report, shRNA_report],'SL', ['PValue', 'PValue'], list(tumor_type.value))
    functional_screening_result=functional_screening_result.sort_values('Inactive')
    
except:
    functional_screening_result=pd.DataFrame()
    print("No Result From Functional Examination Inference Procedure")
functional_screening_result    

Unnamed: 0,Inactive,SL_Candidate,PValue0,PValue1,Tissue
0,ARID1A,ARID1B,1.114187e-09,0.000350,['pancancer']
4850,ARID1A,GALNT16,,0.029362,['pancancer']
4849,ARID1A,KAT14,,0.029348,['pancancer']
4848,ARID1A,TTC23L,,0.029338,['pancancer']
4847,ARID1A,TMEM259,,0.029251,['pancancer']
...,...,...,...,...,...
3426,BRCA2,C17orf78,3.083288e-02,,['pancancer']
3427,BRCA2,C9orf3,3.087125e-02,,['pancancer']
3428,BRCA2,PFDN4,3.087125e-02,0.031764,['pancancer']
3420,BRCA2,DYNC1I2,3.064162e-02,0.032130,['pancancer']


**Inactive:** The Inactive gene name  
**SL_Candidate/SDL_Candidate:** The candidate SL or SDL partner  
**p values or FDRs** The aggregated p value or FDR with Fishers method  
**Tissue:** The tissue(s) that the analysis has been performed.

    

<br>
4.4. Merging the results from all three inference procedures

The intersection of SL pairs from different inference procedures compose the final list. 


In [19]:
try:
    all_merged_results = MergeResults([coexpression_result, sof_result, functional_screening_result], 'SL',  list(tumor_type.value))
    all_merged_results=all_merged_results.sort_values('Inactive')
except:
    all_merged_results=pd.DataFrame()
    print("No results found")
all_merged_results

Unnamed: 0,Inactive,SL_Candidate
0,ARID1A,SUZ12
1,ARID1A,LCOR
2,ARID1A,PHF12
3,BRCA1,HNRNPM
4,BRCA1,ANP32A


**Inactive/Overactive**: The gene from the user input  
**SL/SDL candidate**: The inferred SL/SDL partner of the gene in the same row. 

Results can also be saved into the excel files.

In [None]:
WriteToExcel("DAISY_SL_results.xlsx", [all_merged_results], ["final results"])