<img src='figures/Cover.png'>

In [None]:
# Set autoreload module for dev
%load_ext autoreload
%autoreload 2
%aimport rnaseq_lib

In [79]:
# Imports
import pandas as pd
import rnaseq_lib as r
import holoviews as hv
import numpy as np
hv.extension('bokeh', logo=False)

In [None]:
## Synapse ID: syn11515015
df_path = '/mnt/rnaseq-cancer/Objects/tcga-gtex-metadata-expression.tsv'
df = pd.read_csv(df_path, sep='\t', index_col=0, dtype=r.tissues.dtype)

In [8]:
h = r.plot.Holoview(df)

# Review
1. The Cancer Genome Atlas (TCGA) has collected mutation and expression data for over 20,000 tumor samples, but most subtypes of cancer have few normal tissue samples to compare against. 

2. We uniformly computed expression data for both TCGA and The Genotype Tissue Expression Consortium (GTEx), which collected expression data from thousands of normal tissue samples, to create a large repository of cancer and normal expression data free of computational batch effects. 

3. Combined expression data was validated by identifying known cancer phenotypes for several antineoplastic drug targets and finding similar expression patterns in both TCGA and GTEx. Repositioning candidates were found by identifying cancer subtypes that share phenotypes with the positively validated targets.

<tr align='center'>
    <td> <img src="figures/toil-rnaseq.png" width=700> </td>
    <td> <img src="figures/Datasets.png" width=400> </td>
</tr>

## GTEx as a Prior

In [92]:
tissues = ['Breast', 'Colon', 'Kidney', 'Liver', 'Lung', 'Prostate', 'Stomach', 'Thyroid', 'Uterus']
de_gtex = h.differential_expression_tissue_concordance(tissue_subset=tissues, tcga=False).relabel('GTEx')
de_tcga = h.differential_expression_tissue_concordance(tissue_subset=tissues, gtex=False).relabel('TCGA')

### Differential Expression Gene Concordance (PearsonR)

In [93]:
%%opts HeatMap [width=450 height=425]
(de_gtex + de_tcga)

# Update

<img src='figures/Expression_Discovery_Methods.png' width=700>

## Antineoplastic Biomarkers and Cancer Subtypes

In [81]:
# Read in link data
biomarker_path = '/mnt/rnaseq-cancer/Metadata/Drug_Biomarker_Table.tsv'
bio = pd.read_csv(biomarker_path, sep='\t', index_col=0)
pd.options.display.max_columns=len(bio.columns)
print 'Number of Drugs: {}'.format(len(bio))
bio.head()

Number of Drugs: 54


Unnamed: 0_level_0,Approved Biomarker(s) [RESPONSIVE],Potential Biomarker(s) [RESPONSIVE],Approved Biomarker(s) [RESISTANT],Potential Biomarker(s) [RESISTANT],Main Target(s),Other Target(s),Gene(s) Involved,Primary Cancer Type(s),Secondary Cancer Type(s),Clinical Trial Cancer Type(s),Main Tissue(s),Other Tissue(s),Papers,Biomarker Paper,Pharmocology Synopsis,Grouping of drugs →,Drug Type,Subtype/Target Class,Class/Family (if applicable),Mechanism of Action,Pathway(s) Affected,Notes,Biomarker(s) for Sankey Graph
Drug,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
ERLOTINIB HYDROCHLORIDE,"EGFR (L858R,L861,G719,S768I), exon 19 deletion...",,EGFR (T790M),,EGFR (mutant),,ERBB1,EGFR+ non-small cell lung cancer (NSCLC); loca...,,,"Lung, pancreatic",,"Shepherd et al. 2005, (Grunwald et al. 2003, P...",,Erlotinib binds to EGFR and inhibits EGFR homo...,,PKI,RTK,I,ErbB1-inhibition,"ErbB signaling, MAPK signaling, cytokine-cytok...",,"EGFR (L858R,L861,G719,S768I), exon 19 deletion..."
ABIRATERONE ACETATE,,"AR (L702H,T878A), amplification; CYP17 expression",,,"CYP17, AR, PSA",,"CYP17A1, AR, KLK3",Metastatic castration-resistant prostate cance...,,,Prostate,,"Romanel et al. 2017, Efstathiou et al. 2012, A...",,CYP17 catalyses the rate limiting step in the ...,,Sex hormone antagonist,Antiandrogen,,CYP17-inhition,"Steroid hormone biosynthesis, cytochrome P450",,"AR (L702H,T878A), amplification; CYP17 expression"
ADO-TRASTUZUMAB EMTANSINE,"ERBB2 amplificaition, overexpression","ERBB2 (V659E,S310F), inframe insertion (A775YV...",,,HER-2,,ERBB2,"HER2+ metastatic breast cancer, HER2+ lung cancer",,,Breast,"Lung, colon",Haslem et al. 2017,Lambert et al. 2014,Ado-trastuzumab emtansine is a HER2-antibody d...,,Monoclonal antibody/PKI (ADC),RTK,I,ErbB2-inhibition,"ErbB signaling, focal adhesion, adherens junct...",,"ERBB2 amplificaition, overexpression"


In [52]:
from rnaseq_lib.plot.sankey import Sankey, make_links
# Read in link data
drug_class_path = '../1-Data-Collection-and-Processing/KEGG/tables/drug-classification-tissue.tsv'
drug_df = pd.read_csv(biomarker_path, sep='\t', index_col=0)

links = make_links(drug_df, ['Class', 'Subgroup', 'Specification', 'Tissue'])
drug_sankey = Sankey((links)).redim(source='Source', target='Target', value='Count')

<img src='figures/drugs.png'>

## CA9

Tumor hypoxia is associated clinically with therapeutic resistance and poor patient outcomes. One feature of tumor hypoxia is activated expression of carbonic anhydrase IX (CA9), a regulator of pH and tumor growth. Disruption of the downstream bicarbonate products can acidify tumor cells and suppress tumor growth [[McIntyre]](http://cancerres.aacrjournals.org/content/76/13/3744.short). Hypoxia also promotes tumour heterogeneity through the epigenetic regulation of CA9 [[Ledaki]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4637295/). CA9 is also a _transmembrane protein_ and is stained for for use as an endogenous marker for investigating hypoxia [[Newbold]](http://www.sciencedirect.com/science/article/pii/S0360301608031799). 
<img src='figures/hypoxia.png'>

Tumor hypoxia is associated clinically with therapeutic resistance and poor patient outcomes. One feature of tumor hypoxia is activated expression of carbonic anhydrase IX (CA9), a regulator of pH and tumor growth. Disruption of the downstream bicarbonate products can acidify tumor cells and suppress tumor growth \cite{mcintyre_disrupting_2016}. Hypoxia also promotes tumour heterogeneity through the epigenetic regulation of CA9 \cite{ledaki_carbonic_2015}. CA9 is also a transmembrane protein and is stained for for use as an endogenous marker for investigating hypoxia \cite{newbold_exploratory_2009}.

CA9 is part of a family of carbonic anhydrases (zinc metalloenzymes) that catalyze reversible hydration of carbion dioxide to form carbonic acid. Girentuximab (trade name Rencarex) is a chimeric IgG1 monoclonal antibody to carbonic anhydrase IX which was granted fast track status and orphan drug designation by the FDA for renal cancer \cite{girentuximab}. In January 2017, Telix Pharmaceuticals Limited, an Australian biotechnology company, announced that it had in-licensed Girentuximab for use as a radioimmunoconjugate, iodine (124I) girentuximab, called Redectane \cite{wilexwilex}.

CA9 is reported as a ubiquitous marker in renal cell carcinoma (RCC) \cite{chen_expression_2005,turner_hypoxia-inducible_2002,kim_using_2005,wykoff_hypoxia-inducible_2000}, and should be simple to validate by examining the expression distributions of CA9.

<img src='figures/ca9-timeline.png' width=800>

### CA9 Expression in Kidney

In [82]:
h.gene_kde(gene='CA9', tissue_subset=['Kidney'])

In [83]:
sequence = ['blue', 'red']

In [84]:
%%opts Scatter [color_index='label' size_index='size' width=450 height=450] (cmap=sequence)
label = {'CA9': ['CA9']}
extents = (0, -12, 20, 12)
ca9_gtex = h.tissue_de('Kidney', gene_labels=label, extents=extents).relabel('GTEx') 
ca9_tcga = h.tissue_de('Kidney', gene_labels=label, tcga_normal=True, extents=extents).relabel('TCGA')
hv.Layout([ca9_gtex, ca9_tcga]).relabel('CA9 Overexpression in TCGA and GTEx')

### CA9 Expression Across Tissues

In [85]:
%%opts Bars [xrotation=80]
h.sample_counts()

In [86]:
h.gene_distribution(gene='CA9')

In [87]:
path = [(4, 4), (9.5, 4), (9.5, 8), (4, 8), (4, 4)]
de = h.gene_de('CA9', extents=(1.3, -4.3, 14, 12)) * hv.Path([path])
de

Kidney is an extreme outlier, but several tissues possess both high levels of expression as well as signifcant L2FC for CA9 (blue boxed area).
- **Bladder**
    - Very few normal samples
    - "CA9 are differentially regulated in superficial vs invasive bladder cancer" [[Turner]](https://www.nature.com/articles/6600215)
    - "Carbonic Anhydrase  [...] as Urinary Biomarkers for Bladder Cancer Detection" [[Urquidi]](http://www.sciencedirect.com/science/article/pii/S0090429512000428)
- **Pancreas**
    - _Very few TCGA normals_
    - "Hypoxia activates the hedgehog signaling pathway in a ligand-independent manner by upregulation of Smo transcription in pancreatic cancer" [[Onishi]](http://onlinelibrary.wiley.com/doi/10.1111/j.1349-7006.2011.01912.x/full)
- **Uterus**
    - Few TCGA normals
    - "Tumor carbonic anhydrase 9 expression is associated with the presence of lymph node metastases in uterine cervical cancer" [[Lee]](http://onlinelibrary.wiley.com/doi/10.1111/j.1349-7006.2007.00396.x/full)
- **Colon**
    - "Stromal expression of hypoxia regulated proteins is an adverse prognostic factor in colorectal carcinomas." [[Cleven]](https://www.ncbi.nlm.nih.gov/pubmed/17452775?dopt=Abstract&holding=npg)
- **Lung**
    - "Expression of Hypoxia-inducible Carbonic Anhydrase-9 Relates to Angiogenic Pathways and Independently to Poor Outcome in Non-Small Cell Lung Cancer" [[Giatromanolaki]](http://cancerres.aacrjournals.org/content/61/21/7992.short)
- **Esophagus**
    - _Very few TCGA normals_
    - "We also observed higher frequency gains at 9p (13% versus 4%; p = 0.04) containing putative cancer loci such as CA9" - Comparative Genomics of Esophageal Adenocarcinoma and Squamous Cell Carcinoma. [[Bandla]](http://www.sciencedirect.com/science/article/pii/S0003497512001762)

# Future Work

## Data Improvements

### Integrating Pathway Data

- Talked to Vlado about getting Kinase/TF/Target pathway tables
- Olena suggested doing pathway enrichment
- Find tumor hypoxia database that includes CA9
- Identify coregulated genes to CA9 via regression or other modeling

### Subpopulations

- Start identifying subpopulations to compare instead of globabally
    - Example: HER2 ER+/ER- populations
    - Olena recommends exploring GD2 (neuroblastoma biomarker) in tissue set and against Treehouse cohort

## Collaborators

Olena has a few people she thinks would be interested in downstream wet lab validation

## Biomarker Validation and Discovery

- Build tumor biomarker dashboard for undergrad (almost done)
- Have undergrad validate known tumor biomarkers by cross-referencing literature
- Identify candidate novel tumor biomarkers

## Talks
 - Submit ISMB Abstract
 - Treehouse talk

## Timeline

<img src='figures/timeline.png' width=800>

# Fin

<img src='figures/xkcd.png' width=700>