# Dependency Map Data Analysis

First we load the CERES scores for the CRISPR Avana data; the CERES scores account for the effects of gene copy number on viability effects in CRISPR. The data can be downloaded at: https://depmap.org/portal/dataset/Avana.

For information on the CERES algorithm, see: Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984


In [143]:
import pandas as pd
from indra.databases import hgnc_client
data = pd.read_csv('portal-Avana-2018-06-08.csv', index_col=0, header=0)
data = data.transpose()

In [3]:
# The correlations take a long time (> 1 hr) to calculate, so cache them
# and don't recalculate unless desired
recalculate = False
if recalculate:
    corr = data.corr()
    corr.to_hdf('correlations.h5', 'correlations')
else:
    corr = pd.read_hdf('correlations.h5', 'correlations')

In [89]:
#labels_to_drop = get_redundant_pairs(corr)
#au_corr = corr_list.drop(labels=labels_to_drop).sort_values(ascending=False)
corr_list = corr.unstack()
large_corr = corr_list[corr_list != 1.0]
large_corr = large_corr[large_corr.abs() > 0.5]
sort_corrs = large_corr.abs().sort_values(ascending=False)

In [154]:
with open('prior_genes.txt', 'rt') as f:
    prior_genes = [line.strip() for line in f.readlines()]
metab_genes = []
with open('metabolic_genes.txt', 'rt') as f:
    for line in f.readlines():
        gene_name = line.strip().upper()
        if gene_name in data:
            metab_genes.append(gene_name)

In [145]:
prior_corrs = large_corr[metab_genes]

In [157]:
metab_data = data[metab_genes]

In [166]:
metab_corr = metab_data.corr()

In [168]:
mcorr_list = metab_corr.unstack()
mlarge_corr = mcorr_list[mcorr_list != 1.0]
mlarge_corr = mlarge_corr[mlarge_corr.abs() > 0.5]
msort_corrs = mlarge_corr.abs().sort_values(ascending=False)

In [176]:
metab_data.mean().sort_values(ascending=False)

GSTT2       0.348826
ATAD1       0.341677
MGST1       0.314010
AKR1B1      0.264788
ARSG        0.254816
STEAP1      0.250087
POLI        0.248122
HSD17B7     0.237662
CASP2       0.212710
B3GALT5     0.211714
RDH12       0.207986
ACSS2       0.200598
UGT2A1      0.196365
PTS         0.189176
LPIN1       0.188622
ADH6        0.183579
ATP6V1G3    0.180598
AKR1C2      0.180061
ACAT2       0.179501
IDI1        0.178993
ACAD8       0.178447
PUSL1       0.174587
FKBP14      0.170670
GLB1L2      0.170357
BBOX1       0.167658
STEAP4      0.166218
COX7A1      0.164698
HSD17B2     0.162923
OBSCN       0.161878
GAD2        0.160797
              ...   
POLR2C     -1.139142
NFS1       -1.142792
ATP6V0B    -1.154462
TRNT1      -1.155042
POLR2D     -1.156831
ATP6V1A    -1.169582
DUT        -1.178837
POLR3B     -1.186540
GUK1       -1.187185
RPA1       -1.191544
DTYMK      -1.208647
SOD1       -1.215013
KARS       -1.222383
POLD3      -1.228439
POLR2A     -1.231820
POLR2B     -1.257460
TARS       -1

In [97]:
kras.PTPN11


-0.023686431352091052

In [None]:
kras.