### About this notebook: 
This notebook is used to answer which gene knockout or gene knockdown  show sensitivity to certain gene mutation or the mutation of a group of genes. <br/>

Please cite the following paper when use this notebook. 

<font color='blue'>The functional screening data and omics data for cell lines is from the Depmap and CCLE project from the Broad institute (DepMap Public 20Q3). To use this jupyter notebook and the data which are used in the jupyter notebook, Please cite the following papers</font> <br/>

....our paper

For this DepMap release:
DepMap, Broad (2020): DepMap 20Q3 Public. figshare. Dataset doi:10.6084/m9.figshare.11791698.v2.

For CRISPR datasets:
Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984

Dempster, J. M., Rossen, J., Kazachkova, M., Pan, J., Kugener, G., Root, D. E., & Tsherniak, A. (2019). Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell Lines. BioRxiv, 720243.

For omics datasets:
Mahmoud Ghandi, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, ... Todd R. Golub, Levi A. Garraway & William R. Sellers. 2019. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).

For more detailed information, please contact gqin@systemsbiology.org


In [None]:
#Make sure to have the libraries required to be installed.
from google.cloud import bigquery
import pandas as pd
import numpy as np
import ipywidgets as widgets
from scipy import stats 
import statsmodels.stats.multitest as multi
import sys
sys.path.append('../scripts/')
import mutation_dependent_PP as mdpp

In [None]:
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 

In [None]:
# Users need to a google cloud project to query the data in the BigQuery tables. 
project_id='syntheticlethality'
client = bigquery.Client(project_id)

!gcloud auth login

#### Get mutation data from CCLE, crispr gene knockout effects from Depmap and shRNA gene knockdown gene dependency data from demeter

In [None]:
Mut_mat = mdpp.get_ccle_mutation_data()
Demeter_data = mdpp.get_demeter_shRNA_data()
Depmap_matrix = mdpp.get_depmap_crispr_data()

#### Set user input:
###### 1, Data_source: only two options are avaiable, "shRNA" or "Crispr", datatype: string
###### 2, Mutated genes to be interested. It can be a list of genes or one single gene in a list format. 
###### 3, Tumor types to be included in the analysis. Users can select 'pancancer' or select one or multiple tumor types to theirs interests.

In [None]:
# User input
Data_source = "shRNA" # only two options are avaiable, "shRNA" or "Crispr", datatype: string
Gene_list = ['BRCA2'] # data type: list of gene symbols

In [None]:
#All DDR genes
#Gene_list = pd.read_csv("/Users/gloria/Documents/ISB/Synthetic_lethality/Manuscript/scripts/Synthetic_lethylity/Data/DDRGenes.csv")
#Gene_list = list(Gene_list['HGNC_gene_symbol'])

In [None]:
# ID mapping between the CCLE annotation and input gene symbols
id_mapping, Gene_list_matched = mdpp.GeneSymbol_standardization(Gene_list)


#### Select tumor types

In [None]:
query = ''' 
SELECT DepMap_ID, primary_disease,TCGA_subtype
FROM `syntheticlethality.DepMap_public_20Q3.sample_info_Depmap_withTCGA_labels` 
'''
sample_info = client.query(query).result().to_dataframe()

pancancer_cls = sample_info.loc[~sample_info['primary_disease'].isin(['Non-Cancerous','Unknown','Engineered','Immortalized'])]
pancancer_cls = pancancer_cls.loc[~(pancancer_cls['primary_disease'].isna())]

TCGA_list = [x for x in list(set(pancancer_cls['TCGA_subtype'])) if x == x]

Not_none_values = filter(None.__ne__, TCGA_list)
TCGA_list = list(Not_none_values)

tumor_type = widgets.SelectMultiple(
    options=['pancancer'] + TCGA_list  ,
    value=[],
    description='Tumor type',
    disabled=False
)
display(tumor_type)

#### Select shRNA dataset or Crispr dataset to infer synthetic lethality pairs for mutated genes! 

In [None]:
Data_source = "shRNA"
if Data_source == "shRNA":
    result_shRNA = mdpp.Mutational_based_SL_pipeline(list(tumor_type.value), Gene_list_matched, Mut_mat, Demeter_data, Data_source)
    if result_shRNA.shape[0] > 0:
        result_shRNA_sig = result_shRNA.loc[result_shRNA['FDR_all_exp'] < 0.05]
        result_shRNA_sig = result_shRNA_sig.loc[result_shRNA_sig['ES']<0]

In [None]:
result_shRNA_sig.to_csv("DDR_shRNA_sig.csv")

In [None]:
Data_source = "Crispr"
if Data_source == "Crispr":
    result_Crispr = mdpp.Mutational_based_SL_pipeline(list(tumor_type.value), Gene_list_matched, Mut_mat, Depmap_matrix, Data_source)
    if result_Crispr.shape[0] > 0:
        result_Crispr_sig = result_Crispr.loc[result_Crispr['FDR_all_exp'] < 0.05]
        result_Crispr_sig = result_Crispr_sig.loc[result_Crispr_sig['ES'] < 0] 

In [None]:
result_Crispr_sig.to_csv("DDR_Crispr_sig.csv")

###### Result interpretation 
result_Crispr_sig or result_shRNA_sig contains the synthetic lethal gene pairs predicted from this pipeline.<br/>
###### table annotation:
Gene_mut: mutated genes;<br/>
Gene_kd: gene knockdown or knock out <br/>
Mutated_samples: Number of mutated cell lines in the selected tumor type<br/>
pvalue: p-value result from t-test<br/>
ES: effect size of gene effects between the mutated group and wild type group<br/>
FDR_all_exp: <br/>
FDR_by_gene: FDR for p-value by one gene mutation<br/>
Tumor_type: tumor types in analysis

In [None]:
# User defined analysis