#  DAISY- the DAta-mIning SYnthetic-lethality-identification pipeline

Please cite: 

For Implementation: 

Our paper,

For DAISY algorithm: 

Jerby-Arnon, L., Pfetzer, N., Waldman, Y. Y., McGarry, L., James, D., Shanks, E., ... & Gottlieb, E. (2014). Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality. Cell, 158(5), 1199-1209.

For CCLE Omics data:

Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3

For CRISPR Data: 

Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984

Dempster, J. M., Rossen, J., Kazachkova, M., Pan, J., Kugener, G., Root, D. E., & Tsherniak, A. (2019). Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell Lines. BioRxiv, 720243.


This notebook is a reimplementation of DAISY Synthetic Lethal Pair Prediction Algorithm

Please first run the table_creation notebook before runnnig the DAISY notebook. 

It consists 3 modules: 

1. SL candidate determination using gene co-expression
2. SL candidate determination using survival of fittest
3. SL candidate determination using CRISPR and ShRNA experiment


* The results from the three modules were then aggregated into one ranked list of candidate SL pairs


Input Parameters
* Cancer type 
* The genes whose SL partners are seeked


Input Data
* Gene expression data 
* Gene mutation data
* Copy number variation data
* Gene effect data (CRISPR)
* Gene Dependency scores data (shRNA)

Output
* Ranked list of candidate SL pairs
![../../figures/daisy_pipeline.png](attachment:dene.png)

In [46]:
reset 

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


In [47]:
pwd

'/Users/bahar/Downloads/SyntheticLethality-master_16/Notebooks/DAISY_pipeline'

The required libraries are imported. 

In [66]:
from datetime import datetime
import sys
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import sys
sys.path.append('../../scripts/')
import SL_Library 
#from SL_Library import *
from google.cloud import bigquery
import importlib
import pandas as pd


In [58]:
importlib.reload(SL_Library)

<module 'SL_Library' from '../../scripts/SL_Library/__init__.py'>

In [67]:
import sys

if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

Bigquery connection
Please replace syntheticlethality with your project name

In [60]:
project_id='syntheticlethality'
client = bigquery.Client(project_id)
#client = bigquery.Client(credentials=credentials, project=credentials.project_id)

!gcloud auth login

Traceback (most recent call last):
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/gcloud.py", line 104, in <module>
    main()
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/gcloud.py", line 100, in main
    sys.exit(gcloud_main.main())
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 171, in main
    gcloud_cli = CreateCLI([])
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 151, in CreateCLI
    generated_cli = loader.Generate()
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 504, in Generate
    cli = self.__MakeCLI(top_group)
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 674, in __MakeCLI
    log.AddFileLogging(self.__logs_dir)
  File "/Users/bahar/Downloads/google-cloud-sdk/lib/googlecloudsdk/core/log.py", line 1039, in AddFileLogging
    _log_manager.AddLogsDir(logs_dir=logs_dir)
  File "/Use

This query retrieves cancer driver tumor supressor genes

In [61]:
query='''SELECT Gene_Symbol
FROM `isb-cgc.COSMIC_v90_grch38.Cancer_Gene_Census`  
WHERE Role_in_Cancer LIKE '%TSG%'

INTERSECT DISTINCT

SELECT HGNC_gene_symbol FROM `syntheticlethality.gene_information.cancer_driver_genes`
'''
driver_tsg_genes= client.query(query).result().to_dataframe()

Conversion from Hugo Symbols into EntrezIDs 

In [68]:
input_genes=driver_tsg_genes["Gene_Symbol"].to_list()
input_entrez_ids=SL_Library.ConvertGene(client, input_genes, 'Gene', ['EntrezID'])

AttributeError: module 'SL_Library' has no attribute 'ConvertGene'

In [32]:
input_entrez_ids

Unnamed: 0,Gene,EntrezID
0,APC,324
1,ATM,472
2,ATR,545
3,B2M,567
4,CIC,23152
...,...,...
108,PRKAR1A,5573
109,SMARCA4,6597
110,SMARCB1,6598
111,TBL1XR1,79718


Default parameters for DAISY, you can edit them

In [33]:
input_mutations = ['Nonsense_Mutation', 'Frame_Shift_Ins', 'Frame_Shift_Del'] 
percentile_threshold=10
cn_threshold=-0.3 
cor_threshold=0.5
p_threshold=0.05
pval_correction='Bonferroni'

Pairwise gene coexpression module on Pancancer Atlas data


In [37]:
coexp_pancancer=CoexpressionAnalysis(client, "PanCancerAtlas", input_entrez_ids['EntrezID'], cor_threshold, p_threshold, pval_correction) 

In [38]:
coexp_pancancer

Unnamed: 0_level_0,Unnamed: 1_level_0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,Correlation,PValue
Gene_Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
AMER1,3006,139285,AMER1,254065,BRWD3,0.513545,0.0
AMER1,3405,139285,AMER1,64682,ANAPC1,0.512611,0.0
AMER1,6722,139285,AMER1,6502,SKP2,0.513338,0.0
AMER1,10116,139285,AMER1,55609,ZNF280C,0.504315,0.0
AMER1,10422,139285,AMER1,9880,ZBTB39,0.510557,0.0
...,...,...,...,...,...,...,...
ZMYM3,12104,9203,ZMYM3,162963,ZNF610,0.530604,0.0
ZMYM3,11044,9203,ZMYM3,11188,NISCH,0.503019,0.0
ZMYM3,11037,9203,ZMYM3,23373,CRTC1,0.539256,0.0
ZMYM3,13085,9203,ZMYM3,84838,ZNF496,0.504975,0.0


Results are saved into bigquery table

In [23]:
SL.CreateTable(client, coexp_pancancer, 'pipeline_results', 'DAISY_coexpression_pancancer_sl_pairs', project_id, "")

1it [00:05,  5.16s/it]


Table created successfully


Pairwise gene coexpression module on CCLE data


In [36]:
coexp_CCLE=CoexpressionAnalysis(client, "CCLE", input_entrez_ids['EntrezID'], cor_threshold, p_threshold, pval_correction) 

In [39]:
coexp_CCLE

Unnamed: 0_level_0,Unnamed: 1_level_0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,Correlation,PValue
Gene_Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
AMER1,6937,139285,AMER1,6872,TAF1,0.618656,2.231030e-134
AMER1,19610,139285,AMER1,81887,LAS1L,0.606935,7.315441e-128
AMER1,6593,139285,AMER1,9880,ZBTB39,0.598761,1.775621e-123
AMER1,9278,139285,AMER1,23133,PHF8,0.598430,2.656653e-123
AMER1,10455,139285,AMER1,4841,NONO,0.592015,5.948916e-120
...,...,...,...,...,...,...,...
ZMYM3,8408,9203,ZMYM3,84950,PRPF38A,0.501011,1.425741e-79
ZMYM3,10790,9203,ZMYM3,8880,FUBP1,0.500516,2.197475e-79
ZMYM3,15581,9203,ZMYM3,146059,CDAN1,0.500465,2.296914e-79
ZMYM3,233,9203,ZMYM3,5469,MED1,0.500437,2.353243e-79


Results are saved into bigquery table

In [29]:
SL.CreateTable(client, coexp_CCLE, 'pipeline_results', 'DAISY_coexpression_CCLE_sl_pairs', project_id, "")

1it [00:03,  3.29s/it]


Table created successfully


Genomic survival of fittest module on CCLE data

In [39]:
sof_CCLE=SurvivalOfFittest(client, "CCLE", p_threshold, input_entrez_ids['EntrezID'], input_mutations, percentile_threshold, cn_threshold, pval_correction)


In [40]:
sof_CCLE

Unnamed: 0_level_0,Unnamed: 1_level_0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,PValue
Gene_Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ACVR2A,446,92,ACVR2A,1030,CDKN2B,0.044765
AMER1,11323,139285,AMER1,283598,C14orf177,0.000081
AMER1,11421,139285,AMER1,388011,LINC01550,0.000093
AMER1,11314,139285,AMER1,64919,BCL11B,0.000125
AMER1,11394,139285,AMER1,84439,HHIPL1,0.000128
...,...,...,...,...,...,...
ZMYM3,7761,9203,ZMYM3,91754,NEK9,0.041885
ZMYM3,8520,9203,ZMYM3,6252,RTN1,0.045588
ZMYM3,5073,9203,ZMYM3,9495,AKAP5,0.046226
ZMYM3,9425,9203,ZMYM3,22890,ZBTB1,0.046424


Results are saved into bigquery table

In [82]:
CreateTable(client, sof_CCLE, 'pipeline_results', 'DAISY_sof_CCLE_sl_pairs', project_id, "")

1it [00:05,  5.37s/it]


Table created successfully


Genomic survival of fittest module on PancancerAtlas

In [42]:
sof_pancancer=SurvivalOfFittest(client, "PanCancerAtlas", p_threshold, input_entrez_ids['EntrezID'], input_mutations, percentile_threshold, cn_threshold,pval_correction)


In [43]:
sof_pancancer

Unnamed: 0_level_0,Unnamed: 1_level_0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,PValue
Gene_Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ACVR2A,85281,92,ACVR2A,3612,IMPA1,0.021901
ACVR2A,88643,92,ACVR2A,79752,ZFAND1,0.021944
ACVR2A,87542,92,ACVR2A,347051,SLC10A5,0.021971
ACVR2A,84416,92,ACVR2A,92421,CHMP4C,0.022888
ACVR2A,84890,92,ACVR2A,646486,FABP12,0.026031
...,...,...,...,...,...,...
ZFHX3,167155,463,ZFHX3,23181,DIP2A,0.029849
ZFHX3,167597,463,ZFHX3,54059,YBEY,0.029849
ZFHX3,167157,463,ZFHX3,100862692,DIP2A-IT1,0.029849
ZFHX3,167471,463,ZFHX3,5116,PCNT,0.029849


Results are saved in bigquery table

In [81]:
SL.CreateTable(client, sof_pancancer, 'pipeline_results', 'DAISY_sof_pancancer_sl_pairs', project_id, "")

1it [00:23, 23.09s/it]


Table created successfully


CRISPR functional examination inference procedure

In [54]:
crispr_result=FunctionalExamination(client, "CRISPR", p_threshold, input_entrez_ids['EntrezID'], percentile_threshold, cn_threshold, 'none')


In [55]:
crispr_result

Unnamed: 0_level_0,Unnamed: 1_level_0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,PValue
Gene_Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ACVR2A,74664,92,ACVR2A,5695,PSMB7,0.001069
ACVR2A,32771,92,ACVR2A,100507588,TGFBR3L,0.001167
ACVR2A,101453,92,ACVR2A,23030,KDM4B,0.001211
ACVR2A,96010,92,ACVR2A,7535,ZAP70,0.001340
ACVR2A,97288,92,ACVR2A,130340,AP1S3,0.001340
...,...,...,...,...,...,...
ZMYM3,112231,9203,ZMYM3,5507,PPP1R3C,0.049735
ZMYM3,18793,9203,ZMYM3,10313,RTN3,0.049735
ZMYM3,1358,9203,ZMYM3,3460,IFNGR2,0.049798
ZMYM3,86987,9203,ZMYM3,2738,GLI4,0.049862


Results are saved in bigquery table

In [64]:
CreateTable(client, crispr_result, 'pipeline_results', 'DAISY_func_ex_crispr_sl_pairs', project_id, "")

1it [00:11, 11.45s/it]


Table created successfully


shRNA functional examination inference procedure

In [62]:
siRNA_result=FunctionalExamination(client, "siRNA", p_threshold, input_entrez_ids['EntrezID'], percentile_threshold, cn_threshold, 'none')


In [63]:
siRNA_result

Unnamed: 0_level_0,Unnamed: 1_level_0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,PValue
Gene_Inactive,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ACVR2A,36174,92,ACVR2A,8396,PIP4K2B,0.001404
ACVR2A,56449,92,ACVR2A,3768,KCNJ12,0.001687
ACVR2A,42907,92,ACVR2A,8482,SEMA7A,0.001806
ACVR2A,58122,92,ACVR2A,3082,HGF,0.003357
ACVR2A,56428,92,ACVR2A,307,ANXA4,0.003671
...,...,...,...,...,...,...
ZMYM3,23480,9203,ZMYM3,51094,ADIPOR1,0.047138
ZMYM3,2518,9203,ZMYM3,79772,MCTP1,0.047526
ZMYM3,59466,9203,ZMYM3,57758,SCUBE2,0.048311
ZMYM3,19880,9203,ZMYM3,55362,TMEM63B,0.049508


Results are saved in bigquery table

In [66]:
CreateTable(client, siRNA_result, 'pipeline_results', 'DAISY_func_ex_siRNA_sl_pairs', project_id, "")

1it [00:04,  4.73s/it]


Table created successfully


Pairwise Co-expression gene co-expression results on Pancancer and CCLE are integrated

In [67]:
coexpression_result=UnionResults([coexp_pancancer, coexp_CCLE])

In [68]:
coexpression_result

Unnamed: 0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,Correlation_0,PValue_0,Correlation_1,PValue_1,PValue
0,139285,AMER1,254065,BRWD3,0.513545,0.0,0.512958,3.399950e-84,0.000000e+00
1,139285,AMER1,64682,ANAPC1,0.512611,0.0,,,0.000000e+00
2,139285,AMER1,6502,SKP2,0.513338,0.0,,,0.000000e+00
3,139285,AMER1,55609,ZNF280C,0.504315,0.0,,,0.000000e+00
4,139285,AMER1,9880,ZBTB39,0.510557,0.0,0.598761,1.775621e-123,0.000000e+00
...,...,...,...,...,...,...,...,...,...
29915,9203,ZMYM3,84950,PRPF38A,,,0.501011,1.425741e-79,1.425741e-79
29916,9203,ZMYM3,8880,FUBP1,,,0.500516,2.197475e-79,2.197475e-79
29917,9203,ZMYM3,146059,CDAN1,,,0.500465,2.296914e-79,2.296914e-79
29918,9203,ZMYM3,5469,MED1,,,0.500437,2.353243e-79,2.353243e-79


In [84]:
#SL.CreateTable(client, coexpression_result, 'DAISY_RESULTS', 'Coexpression_Union_Pancancer_CCLE', project_id)

1it [00:13, 13.96s/it]

Table created successfully





Survival of Fittest results on Pancancer and CCLE are integrated

In [69]:
sof_result=UnionResults([sof_CCLE, sof_pancancer])

In [71]:
sof_result

Unnamed: 0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,PValue_0,PValue_1,PValue
0,92,ACVR2A,1030,CDKN2B,0.044765,,0.044765
1,139285,AMER1,283598,C14orf177,0.000081,,0.000081
2,139285,AMER1,388011,LINC01550,0.000093,,0.000093
3,139285,AMER1,64919,BCL11B,0.000125,,0.000125
4,139285,AMER1,84439,HHIPL1,0.000128,,0.000128
...,...,...,...,...,...,...,...
177798,463,ZFHX3,23181,DIP2A,,0.029849,0.029849
177799,463,ZFHX3,54059,YBEY,,0.029849,0.029849
177800,463,ZFHX3,100862692,DIP2A-IT1,,0.029849,0.029849
177801,463,ZFHX3,5116,PCNT,,0.029849,0.029849



Results are saved in bigquery table

In [87]:
SL.CreateTable(client, sof_result, 'DAISY_RESULTS', 'SOF_Union_Pancancer_CCLE', project_id)

1it [00:37, 37.60s/it]

Table created successfully





Survival of Fittest results on Pancancer and CCLE are integrated

In [72]:
functional_screening_result=UnionResults([crispr_result, siRNA_result])

In [89]:
functional_screening_result.loc[(functional_screening_result['Gene_Inactive']=='BRCA1')& (functional_screening_result['Gene_SL_Candidate']=='PARP1'), ]

Unnamed: 0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,PValue_0,PValue_1,PValue
17424,672,BRCA1,142,PARP1,0.00152,,0.00152


In [90]:
#SL.CreateTable(client, functional_screening_result, 'DAISY_RESULTS', 'FuncEx_Union_CRISPR_siRNA', project_id)

1it [00:40, 40.61s/it]

Table created successfully





The results from three inference procedures are merged

In [74]:
all_merged_results=MergeResults([coexpression_result, sof_result, functional_screening_result])

In [75]:
all_merged_results

Unnamed: 0,EntrezID_Inactive,Gene_Inactive,EntrezID_SL_Candidate,Gene_SL_Candidate,PValue
0,324,APC,257218,SHPRH,0.000000e+00
1,8289,ARID1A,57649,PHF12,0.000000e+00
2,8289,ARID1A,284058,KANSL1,0.000000e+00
3,546,ATRX,3708,ITPR1,0.000000e+00
4,546,ATRX,5108,PCM1,0.000000e+00
...,...,...,...,...,...
145,7428,VHL,7014,TERF2,1.800103e-225
146,7428,VHL,23019,CNOT1,1.818288e-197
147,7428,VHL,30827,CXXC1,8.511482e-185
148,7428,VHL,9794,MAML1,0.000000e+00


Results are saved in bigquery tables

In [80]:
CreateTable(client, all_merged_results, 'pipeline_results', 'DAISY_final_sl_pairs', project_id, "")

1it [00:03,  3.39s/it]


Table created successfully


Results are saved in excel file

In [100]:
WriteToExcel("tsg_driver.results.xlsx", [coexp_pancancer,  coexp_CCLE,  sof_CCLE, sof_pancancer,  crispr_result, siRNA_result,  coexpression_result,  sof_result, functional_screening_result, co_ex_func_merged_results,  co_ex_sof_merged_results, sof_func_merged_results, all_merged_results],["Co-exp_Pancancer",  "Co-exp_CCLE" , "SOF_CCLE",  "SOF_Pancancer",  "CRISPR", "siRNA" , "Coexp_Union",  "Sof_Union", "Func_Sc_Union", "Coexp_Func_Merged", "Coexp_Sof_Merged", "Sof_Fun_Merged", "All"])


In [101]:
end_time= datetime.now()