#  DAISY- the DAta-mIning SYnthetic-lethality-identification pipeline

Please cite: 
For Implementation: 

Our paper,

For DAISY algorithm: 

Jerby-Arnon, L., Pfetzer, N., Waldman, Y. Y., McGarry, L., James, D., Shanks, E., ... & Gottlieb, E. (2014). Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality. Cell, 158(5), 1199-1209.

For CCLE Omics data:

Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3

For CRISPR Data: 

Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984

Dempster, J. M., Rossen, J., Kazachkova, M., Pan, J., Kugener, G., Root, D. E., & Tsherniak, A. (2019). Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell Lines. BioRxiv, 720243.

For shRNA Data:


This notebook is a reimplementation of DAISY Synthetic Lethal Pair Prediction Algorithm

It consists 3 modules: 

1. SL candidate determination using gene co-expression
2. SL candidate determination using survival of fittest
3. SL candidate determination using CRISPR and ShRNA experiment

* The results from the three modules were then aggregated into one ranked list of candidate SL pairs

Input Parameters
* Cancer type 
* The genes whose SL partners are seeked

Input Data (available in bigquery tables)
* Gene expression data 
* Gene mutation data
* Copy number variation data
* Gene effect data (CRISPR)
* Gene Dependency scores data (shRNA)

Output
* Ranked list of candidate SL pairs

Please contact Bahar Tercan btercan@systemsbiology.org for your questions and detailed information. 

In [79]:
reset 

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


In [80]:
pwd

'/Users/bahar/Desktop/Boris__revised_daisy/DAISY_pipeline'

### 1. Import python libraries required
The required libraries are imported. 

In [81]:
from datetime import datetime
import sys
sys.path.append('../scripts/') #need to add "scripts" directory in a parent directory 
from google.cloud import bigquery
import importlib
import pandas as pd
import DAISY_operations
importlib.reload(DAISY_operations)
from DAISY_operations import *
from helper_functions import *
from BIGQUERY_operations import *
import ipywidgets as widgets

In [82]:
if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

### 2. Sign in Google Bigquery with the project id

Bigquery connection
Please replace syntheticlethality with your project name

In [83]:
project_id='syntheticlethality'
client = bigquery.Client(project_id)
#client = bigquery.Client(credentials=credentials, project=credentials.project_id)

!gcloud auth login

Traceback (most recent call last):
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/gcloud.py", line 104, in <module>
    main()
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/gcloud.py", line 100, in main
    sys.exit(gcloud_main.main())
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 171, in main
    gcloud_cli = CreateCLI([])
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 151, in CreateCLI
    generated_cli = loader.Generate()
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 504, in Generate
    cli = self.__MakeCLI(top_group)
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 674, in __MakeCLI
    log.AddFileLogging(self.__logs_dir)
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/core/log.py", line 1039, in AddFileLogging
    _log_manager.AddLogsDir(logs_dir=logs_dir)
  File "/Use

### 4. Prediction of synthetic lethal partners using different modules on DAISY


There are three modules for synthetic lethal pair inferences on DAISY : 1. Pairwise gene coexpression, 2. Genomic survival of the fittest. 3. shRNA or CRISPR based functional examination. You can get more information in the original paper : https://www.sciencedirect.com/science/article/pii/S0092867414009775.

In pairwise gene coexpression module and genomic survial of the fittest module, we will use PancancerAtlas and CCLE data.<br>
In functional examination module, we will use CRISPR and shRNA data. <br>

Python codes for each module are built in our internal library (../scripts/SL_library.py) which was already imported at the beginning. 


#### 4.0. Default parameters for DAISY, you can edit them

In [84]:
input_mutations = ['Nonsense_Mutation', 'Frame_Shift_Ins', 'Frame_Shift_Del'] # DAISY default parameters
percentile_threshold = 10
cn_threshold = -0.3 
cor_threshold = 0.4
p_threshold = 0.05
pval_correction = 'Bonferroni'

In [85]:

TCGA_list=GetTCGASubtypes(client)
TCGA_list = [i for i in TCGA_list if i]

tumor_type = widgets.SelectMultiple(
    options=['pancancer'] + TCGA_list  ,
    value=[],
    description='Tumor type',
    disabled=False
)
display(tumor_type)

SelectMultiple(description='Tumor type', options=('pancancer', 'CHOL', 'BLCA', 'GBM', 'BRCA', 'CESC', 'COAD', …

#### 4.1. Pairwise gene coexpression module

4.1.1. Pairwise gene coexpression module on PancancerAtlas.

In [86]:
coexp_pancancer = CoexpressionAnalysis(client, 'SL', "PanCancerAtlas", ['BRCA1'] , pval_correction, list(tumor_type.value))
try:
    report=coexp_pancancer.loc[(coexp_pancancer['FDR'] < p_threshold)&(coexp_pancancer['Correlation'] > cor_threshold)]
    if report.shape[0]>0:
        coexp_pancancer_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    print("No results returned.")
    

In [87]:
coexp_pancancer

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
0,BRCA1,BRCA1,ATAD5,9953,0.827015,0.0,0.0,['pancancer']
1,BRCA1,BRCA1,TOP2A,9953,0.819636,0.0,0.0,['pancancer']
2,BRCA1,BRCA1,CDC6,9953,0.807944,0.0,0.0,['pancancer']
3,BRCA1,BRCA1,DTL,9953,0.802882,0.0,0.0,['pancancer']
4,BRCA1,BRCA1,BRIP1,9953,0.802565,0.0,0.0,['pancancer']
...,...,...,...,...,...,...,...,...
20277,BRCA1,BRCA1,UBXN6,9953,-0.548051,0.0,0.0,['pancancer']
20278,BRCA1,BRCA1,TENC1,9953,-0.559599,0.0,0.0,['pancancer']
20279,BRCA1,BRCA1,MXD4,9953,-0.562911,0.0,0.0,['pancancer']
20280,BRCA1,BRCA1,CRY2,9953,-0.571943,0.0,0.0,['pancancer']


In [88]:
coexp_pancancer.loc[coexp_pancancer['SL_Candidate']=="PARP1",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
587,BRCA1,BRCA1,PARP1,9953,0.452299,0.0,0.0,['pancancer']


In [89]:
coexp_pancancer.loc[coexp_pancancer['SL_Candidate']=="PARP2",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
2092,BRCA1,BRCA1,PARP2,9953,0.266241,4.0676009999999996e-161,8.249908e-157,['pancancer']


In [90]:
coexp_pancancer

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
0,BRCA1,BRCA1,ATAD5,9953,0.827015,0.0,0.0,['pancancer']
1,BRCA1,BRCA1,TOP2A,9953,0.819636,0.0,0.0,['pancancer']
2,BRCA1,BRCA1,CDC6,9953,0.807944,0.0,0.0,['pancancer']
3,BRCA1,BRCA1,DTL,9953,0.802882,0.0,0.0,['pancancer']
4,BRCA1,BRCA1,BRIP1,9953,0.802565,0.0,0.0,['pancancer']
...,...,...,...,...,...,...,...,...
20277,BRCA1,BRCA1,UBXN6,9953,-0.548051,0.0,0.0,['pancancer']
20278,BRCA1,BRCA1,TENC1,9953,-0.559599,0.0,0.0,['pancancer']
20279,BRCA1,BRCA1,MXD4,9953,-0.562911,0.0,0.0,['pancancer']
20280,BRCA1,BRCA1,CRY2,9953,-0.571943,0.0,0.0,['pancancer']


<br>
4.1.2. Pairwise gene coexpression module on CCLE data

In [91]:
coexp_CCLE=CoexpressionAnalysis(client, 'SL', 'CCLE', ['BRCA1'], pval_correction, list(tumor_type.value))
try: 
    report=coexp_CCLE.loc[(coexp_CCLE['FDR'] < p_threshold)&(coexp_CCLE['Correlation'] > cor_threshold)]
    if report.shape[0]>0:
        coexp_CCLE_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    print("No results returned.")

In [92]:
coexp_CCLE.loc[coexp_CCLE['SL_Candidate']=="PARP1",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
135,BRCA1,BRCA1,PARP1,1297,0.591197,4.59964e-123,8.794052000000001e-119,['pancancer']


In [93]:
coexp_CCLE.loc[coexp_CCLE['SL_Candidate']=="PARP2",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
1123,BRCA1,BRCA1,PARP2,1297,0.434211,9.154777e-61,1.750302e-56,['pancancer']


In [94]:
coexp_CCLE

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#Samples,Correlation,PValue,FDR,Tissue
0,BRCA1,BRCA1,ATAD5,1297,0.746083,6.359384e-231,1.215851e-226,['pancancer']
1,BRCA1,BRCA1,FANCI,1297,0.718720,1.404211e-206,2.684711e-202,['pancancer']
2,BRCA1,BRCA1,C17orf53,1297,0.711617,9.940688e-201,1.900560e-196,['pancancer']
3,BRCA1,BRCA1,CHAF1A,1297,0.704579,4.185098e-195,8.001489e-191,['pancancer']
4,BRCA1,BRCA1,TOPBP1,1297,0.702716,1.209211e-193,2.311890e-189,['pancancer']
...,...,...,...,...,...,...,...,...
19114,BRCA1,BRCA1,LAMB2,1297,-0.319869,3.065237e-32,5.860427e-28,['pancancer']
19115,BRCA1,BRCA1,IL1R1,1297,-0.323175,6.536468e-33,1.249707e-28,['pancancer']
19116,BRCA1,BRCA1,KDELR3,1297,-0.325537,2.141650e-33,4.094620e-29,['pancancer']
19117,BRCA1,BRCA1,S100A11,1297,-0.361802,2.160366e-41,4.130403e-37,['pancancer']


#### 4.2. Genomic survival of fittest module

4.2.1. Genomic survival of fittest module on CCLE data

In [95]:
sof_CCLE = SurvivalOfFittest(client, 'SL', "CCLE", ['BRCA1'],  percentile_threshold, cn_threshold, pval_correction, list(tumor_type.value), input_mutations)
try: 
    report=sof_CCLE.loc[(sof_CCLE['PValue'] < p_threshold),]
    sof_ccle_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
except:
    print("No results returned.")

In [96]:
sof_CCLE.loc[sof_CCLE['SL_Candidate']=="PARP1",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
1400,BRCA1,BRCA1,PARP1,38,1284,29766.0,0.00341,1.0,['pancancer']


In [97]:
sof_CCLE.loc[sof_CCLE['SL_Candidate']=="PARP2",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
5332,BRCA1,BRCA1,PARP2,38,1284,27416.0,0.048271,1.0,['pancancer']


In [98]:
sof_pancancer = SurvivalOfFittest(client, 'SL', "PanCancerAtlas", ['BRCA1'], percentile_threshold, cn_threshold, pval_correction, list(tumor_type.value), input_mutations)
try:
    report=sof_pancancer.loc[(sof_pancancer['FDR'] < p_threshold),]                
    if report.shape[0]>0:
        sof_pancancer_report=report.groupby('Inactive').apply(lambda x: x.sort_values('FDR'))
        sof_pancancer_report  
except:
    print("No results returned.") 

In [99]:
sof_pancancer

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
0,BRCA1,BRCA1,NWD1,234,8930,1305850.5,6.250556e-14,1.570640e-09,['pancancer']
1,BRCA1,BRCA1,KIAA1683,234,8930,1305687.5,6.450396e-14,1.620855e-09,['pancancer']
2,BRCA1,BRCA1,MIR3188,234,8930,1305525.5,6.655787e-14,1.672466e-09,['pancancer']
3,BRCA1,BRCA1,JUND,234,8930,1305525.5,6.655787e-14,1.672466e-09,['pancancer']
4,BRCA1,BRCA1,F2RL3,234,8930,1305301.0,6.955547e-14,1.747790e-09,['pancancer']
...,...,...,...,...,...,...,...,...,...
25123,BRCA1,BRCA1,SNORD10,234,8930,525158.0,1.000000e+00,1.000000e+00,['pancancer']
25124,BRCA1,BRCA1,TBX21,234,8930,474835.0,1.000000e+00,1.000000e+00,['pancancer']
25125,BRCA1,BRCA1,YWHAEP7,234,8930,367733.0,1.000000e+00,1.000000e+00,['pancancer']
25126,BRCA1,BRCA1,C1QBP,234,8930,548335.0,1.000000e+00,1.000000e+00,['pancancer']


In [101]:
sof_pancancer.loc[sof_pancancer['SL_Candidate']=="PARP2",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
749,BRCA1,BRCA1,PARP2,234,8930,1234410.5,1.233575e-08,0.00031,['pancancer']


In [103]:
sof_pancancer.loc[sof_pancancer['SL_Candidate']=="PARP1",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
18913,BRCA1,BRCA1,PARP1,234,8930,831956.0,0.999999,1.0,['pancancer']


#### 4.3. Functional examination inference module

4.3.1. CRISPR based functional examination inference module

In [104]:
crispr_result = FunctionalExamination(client,'SL', "CRISPR", ['BRCA1'], percentile_threshold, cn_threshold, pval_correction,list(tumor_type.value), input_mutations )
try:
    report=crispr_result.loc[(crispr_result['PValue'] < p_threshold),]
                      
    if report.shape[0]>0:
        crispr_report=report.groupby('Inactive').apply(lambda x: x.sort_values('PValue'))
        crispr_report
except:
    print("No results returned.")
    


In [105]:
crispr_result.loc[crispr_result['SL_Candidate']=="PARP1",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
2,BRCA1,BRCA1,PARP1,26,771,5418.0,6.6e-05,1.0,['pancancer']


In [106]:
crispr_result.loc[crispr_result['SL_Candidate']=="PARP2",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
5430,BRCA1,BRCA1,PARP2,26,771,9080.0,0.293921,1.0,['pancancer']


<br>
4.3.2. shRNA based functional examination inference module

In [107]:
shRNA_result = FunctionalExamination(client, 'SL', "shRNA", ['BRCA1'] , percentile_threshold, \
                                     cn_threshold, pval_correction, list(tumor_type.value),input_mutations)
try:
    report=shRNA_result.loc[(shRNA_result['PValue'] < p_threshold),]
                      
    if report.shape[0]>0:
        shRNA_report=report.groupby('Inactive').apply(lambda x: x.sort_values('PValue'))
    shRNA_report
except:
    print("No results returned.")


In [108]:
shRNA_result.loc[shRNA_result['SL_Candidate']=="PARP1",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
7573,BRCA1,BRCA1,PARP1,27,651,8429.0,0.502085,1.0,['pancancer']


In [109]:
shRNA_result.loc[shRNA_result['SL_Candidate']=="PARP2",]

Unnamed: 0,Inactive,InactiveDB,SL_Candidate,#InactiveSamples,#Samples,U1,PValue,FDR,Tissue
5317,BRCA1,BRCA1,PARP2,15,394,2687.0,0.359623,1.0,['pancancer']


### 5. Integration of results

5.1. Integration of the pairwise Co-expression gene co-expression results on Pancancer and CCLE

In [None]:
coexpression_result = UnionResults([coexp_pancancer_report, coexp_CCLE_report],'SL', ['FDR', 'FDR'],  list(tumor_type.value))
coexpression_result=coexpression_result.groupby('Inactive').apply(lambda x: x.sort_values('AggregatedP'))


<br>
5.2. Integration of Survival of Fittest results on Pancancer and CCLE

In [303]:
sof_result = UnionResults([sof_ccle_report, sof_pancancer_report],  'SL', ['FDR', 'FDR'], list(tumor_type.value))
sof_result=sof_result.groupby('Inactive').apply(lambda x: x.sort_values('AggregatedP'))


In [302]:
sof_result=sof_pancancer_report

<br>
5.3. Integration of shRNA and CRISPR based functional examination inference module.

In [None]:
functional_screening_result = UnionResults([crispr_report, shRNA_report],'SL', ['PValue', 'PValue'], list(tumor_type.value))
functional_screening_result=functional_screening_result.groupby('Inactive').apply(lambda x: x.sort_values('AggregatedP'))
functional_screening_result

<br>
5.4. Merging the results from all three inference procedures

In [None]:
all_merged_results = MergeResults([coexpression_result, sof_result], 'SL', list(tumor_type.value))
all_merged_results=all_merged_results.groupby('Inactive').apply(lambda x: x.sort_values('FinalP'))
all_merged_results

Results are saved in excel file

In [241]:

WriteToExcel("DAISY_SL_results.xlsx", [ all_merged_results],[ "SL_results"])
