#  DAISY- the DAta-mIning SYnthetic-lethality-identification pipeline

Please cite: 
For Implementation: 

Our paper,

For DAISY algorithm: 

Jerby-Arnon, L., Pfetzer, N., Waldman, Y. Y., McGarry, L., James, D., Shanks, E., ... & Gottlieb, E. (2014). Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality. Cell, 158(5), 1199-1209.

For CCLE Omics data:

Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3

For CRISPR Data: 

Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984

Dempster, J. M., Rossen, J., Kazachkova, M., Pan, J., Kugener, G., Root, D. E., & Tsherniak, A. (2019). Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell Lines. BioRxiv, 720243.

For shRNA Data:

James M. McFarland, Zandra V. Ho, Guillaume Kugener, Joshua M. Dempster, Phillip G. Montgomery, Jordan G. Bryan, John M. Krill-Burger, Thomas M. Green, Francisca Vazquez, Jesse S. Boehm, Todd R. Golub, William C. Hahn, David E. Root, Aviad Tsherniak. (2018). Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration. Nature Communications 9, 1. https://doi.org/10.1038/s41467-018-06916-5

For ISB-CGC:
Reynolds, S. M., Miller, M., Lee, P., Leinonen, K., Paquette, S. M., Rodebaugh, Z., ... & Shmulevich, I. (2017). The ISB Cancer Genomics Cloud: a flexible cloud-based platform for cancer genomics research. Cancer research, 77(21), e7-e10.

For Pancancer Atlas Data:
Hutter, C., and Zenklusen, J.C. (2018). The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell 173, 283–285.

This notebook is a reimplementation of DAISY Synthetic Lethal Pair Prediction Algorithm

It consists 3 modules: 

1. SL candidate determination using gene co-expression
2. SL candidate determination using survival of fittest
3. SL candidate determination using CRISPR and shRNA experiments

* The results from the three modules were then aggregated into one ranked list of candidate SL pairs

Input Parameters
* Cancer type 
* The genes whose SL partners are seeked

Input Data (available in bigquery tables)
* Gene expression data 
* Gene mutation data
* Copy number variation data
* Gene effect data (CRISPR)
* Gene dependency scores data (shRNA)

Output
* Ranked list of candidate SL pairs

Please contact Bahar Tercan btercan@systemsbiology.org for your questions and detailed information. 

In [None]:
reset 

In [None]:
pwd

### 1. Import python libraries required
The required libraries are imported. 

In [None]:
import sys
sys.path.append('/scripts') # to be able to use the .py files in ../scripts folder
from google.cloud import bigquery
import importlib
import pandas as pd
import DAISY_operations
importlib.reload(DAISY_operations)
from DAISY_operations import *
import ipywidgets as widgets

In [None]:
if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

### 2. Sign in Google Bigquery with the project id

Bigquery connection
Please replace syntheticlethality with your project name

In [None]:
# please replace 'syntheticlethality' with your project id
project_id='syntheticlethality'
client = bigquery.Client(project_id)

#Please make sure that you have installed google clouds.
#For more detailed information: https://cloud.google.com/sdk/docs/install
!gcloud auth login

### 4. Prediction of synthetic lethal partners using different modules on DAISY


There are three modules for synthetic lethal pair inferences on DAISY : 1. Pairwise gene coexpression, 2. Genomic survival of the fittest. 3. shRNA or CRISPR based functional examination. You can get more information in the original paper : https://www.sciencedirect.com/science/article/pii/S0092867414009775.

In pairwise gene coexpression module and genomic survial of the fittest module, we will use PancancerAtlas and CCLE data.<br>
In functional examination module, we will use CRISPR and shRNA data. <br>

Python codes required are  in the ../scripts/ folder and they are imported at the beginning. 


#### 4.0. Default parameters for DAISY, you can edit them

In [None]:
input_mutations = ['Nonsense_Mutation', 'Frame_Shift_Ins', 'Frame_Shift_Del'] 
# DAISY default parameters for SL prediction
percentile_threshold = 10
cn_threshold = -0.3 
cor_threshold = 0.5
p_threshold = 0.05
pval_correction = 'Bonferroni'

# for SDL prediction DAISY parameters are 
#percentile_threshold = 90
#cn_threshold = 0.3 
#cor_threshold = 0.5
#p_threshold = 0.05

# for SDL prediction, please replace 'SL' with 'SDL' and 'Inactive' with 'Overactive' in the following code lines

In [None]:
TCGA_list=GetTCGASubtypes(client)
TCGA_list = [i for i in TCGA_list if i]

tumor_type = widgets.SelectMultiple(
    options=['pancancer'] + TCGA_list  ,
    value=[],
    description='Tumor type',
    disabled=False
)
display(tumor_type)

In [None]:
gene_list=["BRCA1", "BRCA2",  "ARID1A"] # any number of genes in list format

#### 4.1. Pairwise gene coexpression module

4.1.1. Pairwise gene coexpression module on PancancerAtlas.

In [None]:
coexp_pancancer = CoexpressionAnalysis(client, 'SL', "PanCancerAtlas", gene_list , pval_correction, list(tumor_type.value))
try:
    report=coexp_pancancer.loc[(coexp_pancancer['FDR'] < p_threshold)&(coexp_pancancer['Correlation'] > cor_threshold)]
    if report.shape[0]>0:
        coexp_pancancer_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    print("No results returned.")
    

<br>
4.1.2. Pairwise gene coexpression module on CCLE data

In [None]:
coexp_CCLE=CoexpressionAnalysis(client, 'SL', 'CCLE', gene_list, pval_correction, list(tumor_type.value))
try: 
    report=coexp_CCLE.loc[(coexp_CCLE['FDR'] < p_threshold)&(coexp_CCLE['Correlation'] > cor_threshold)]
    if report.shape[0]>0:
        coexp_CCLE_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    print("No results returned.")

#### 4.2. Genomic survival of fittest module

4.2.1. Genomic survival of fittest module on CCLE data

In [None]:
sof_CCLE = SurvivalOfFittest(client, 'SL', "CCLE", gene_list,  percentile_threshold, cn_threshold, pval_correction, list(tumor_type.value), input_mutations)
try: 
    report=sof_CCLE.loc[(sof_CCLE['FDR'] < p_threshold),]
    sof_ccle_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    print("No results returned.")

In [None]:
sof_pancancer = SurvivalOfFittest(client, 'SL', "PanCancerAtlas", gene_list, percentile_threshold, cn_threshold, pval_correction, list(tumor_type.value), input_mutations)
try:
    report=sof_pancancer.loc[(sof_pancancer['FDR'] < p_threshold),]                
    if report.shape[0]>0:
        sof_pancancer_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    print("No results returned.") 

#### 4.3. Functional examination inference module

4.3.1. CRISPR based functional examination inference module

In [None]:
crispr_result = FunctionalExamination(client,'SL', "CRISPR", gene_list, percentile_threshold, cn_threshold, pval_correction,list(tumor_type.value), input_mutations )
try:
    report=crispr_result.loc[(crispr_result['PValue'] < p_threshold),]
                      
    if report.shape[0]>0:
        crispr_report=report.groupby('Inactive').apply(lambda x: x.sort_values('PValue'))
except:
    print("No results returned.")
    


<br>
4.3.2. shRNA based functional examination inference module

In [None]:
shRNA_result = FunctionalExamination(client, 'SL', "shRNA", gene_list , percentile_threshold, \
                                     cn_threshold, pval_correction, list(tumor_type.value),input_mutations)
try:
    report=shRNA_result.loc[(shRNA_result['PValue'] < p_threshold),]
                      
    if report.shape[0]>0:
        shRNA_report=report.groupby('Inactive').apply(lambda x: x.sort_values('PValue'))
    shRNA_report
except:
    print("No results returned.")


### 5. Integration of results

5.1. Integration of the pairwise Co-expression gene co-expression results on Pancancer and CCLE

In [None]:
coexpression_result = UnionResults([coexp_pancancer_report, coexp_CCLE_report],'SL', ['FDR', 'FDR'],  list(tumor_type.value))
coexpression_result=coexpression_result.groupby('Inactive').apply(lambda x: x.sort_values('AggregatedP'))

<br>
5.2. Integration of Survival of Fittest results on Pancancer and CCLE

In [None]:
sof_result = UnionResults([sof_ccle_report, sof_pancancer_report],  'SL', ['FDR', 'FDR'], list(tumor_type.value))
sof_result=sof_result.groupby('Inactive').apply(lambda x: x.sort_values('AggregatedP'))

<br>
5.3. Integration of shRNA and CRISPR based functional examination inference module.

In [None]:
functional_screening_result = UnionResults([crispr_report, shRNA_report],'SL', ['PValue', 'PValue'], list(tumor_type.value))
functional_screening_result=functional_screening_result.groupby('Inactive').apply(lambda x: x.sort_values('AggregatedP'))


<br>
5.4. Merging the results from all three inference procedures

In [None]:
all_merged_results = MergeResults([coexpression_result, sof_result, functional_screening_result], 'SL', list(tumor_type.value))
all_merged_results=all_merged_results.groupby('Inactive').apply(lambda x: x.sort_values('FinalP'))

Results are saved in excel file

In [None]:
all_merged_results.to_csv("DAISY_SL_results.xlsx")