# OncoKB to Drug Schema

- Identify best source of high-quality drug information:
    - FDA API (https://open.fda.gov/api/reference/)
    - CIViC API (http://griffithlab.org/civic-api-docs/)
    - **OncoKB** (http://oncokb.org/#/)  
- Identify Information to Save
- Export as Drug Schema (YAML format)

## Imports and Functions

In [2]:
import requests
import pandas as pd
import yaml
from collections import defaultdict

## OncoKB Table

Information obtained from <cite data-cite="5029384/ZKNZBXU2"></cite>.

- Levels
    - 1 - Standard therapeutic implications include Food and Drug Administration (FDA)–recognized biomarkers that are predictive of response to an FDA-approved drug in a specific indication
    - 2a - Standard care biomarkers that are predictive of response to an FDA-approved drug in a specific indication
    - 2b - Investigational therapeutic implications include FDA-approved biomarkers predictive of response to an FDA-approved drug detected in an off-label indication
    - 3A/B - FDA- or non–FDA-recognized biomarkers that are predictive of response to novel targeted agents that have shown promising results in clinical trials
    - 4 - Non–FDA-recognized biomarkers that are predictive of response to novel targeted agents on the basis of compelling biologic data
    - R1 - Standard care biomarker predictive of resistance to an FDA-approved drug in this indication
    - R2 - Compelling clinical evidence as resistance biomaker, but neither drug nor biomarker is standard care.
    - R3 - Compelling biological evidence as resistance biomaker, but neither drug nor biomarker is standard care.

In [21]:
oncokb_url = 'http://oncokb.org/api/v1/utils/allActionableVariants.txt'
# Downloaded locally
df = pd.read_csv('./data/allActionableVariants.txt', sep='\t')
df.tail(2)

Unnamed: 0,Isoform,RefSeq,Entrez Gene ID,Gene,Alteration,Cancer Type,Level,Drugs(s),PMIDs for drug,Abstracts for drug
262,ENST00000257290,NM_006206.4,5156,PDGFRA,Oncogenic Mutations,Gastrointestinal Stromal Tumor,2A,Imatinib,"24963404, 15928335",
263,ENST00000257290,NM_006206.4,5156,PDGFRA,D842V,Gastrointestinal Stromal Tumor,R1,Imatinib,"25905001, 17087936, 12949711, 24963404, 227188...",


Process table into nested data structure for YAML export

- Drug:
    - List
        - Gene
        - Alteration
        - Cancer Type
        - Level
        - [Samples]

In [4]:
oncokb = defaultdict(list)
for i, row in df.iterrows():
    # Save desired values for a given alteration
    v = {'Gene': row.Gene,
         'Alteration': row.Alteration,
         'Subtype': row['Cancer Type'],
         'Level': row.Level}
    # Save for each drug listed for an alteration
    drugs = row['Drugs(s)'].split(', ')
    for drug in drugs:
        oncokb[drug].append(v)

Save output as YAML

In [None]:
yaml_path = './drug-schema/oncokb.yaml'
with open(yaml_path, 'wb') as f:
    yaml.dump(oncokb, f)

## Convert to Table

In [22]:
sub = df[['Drugs(s)', 'Level', 'Cancer Type', 'Alteration']]
sub.columns = ['Drug', 'Level', 'Subtype', 'Alteration']
sub.head()

Unnamed: 0,Drug,Level,Subtype,Alteration
0,Pembrolizumab,1,All Solid Tumors,Microsatellite Instability-High
1,Nivolumab,1,Colorectal Cancer,Microsatellite Instability-High
2,Tazemetostat,4,Diffuse Large B-Cell Lymphoma,Oncogenic Mutations
3,"Dasatinib, Imatinib",1,Acute Lymphoid Leukemia,BCR-ABL1 Fusion
4,"Nilotinib, Dasatinib, Imatinib",1,Chronic Myelogenous Leukemia,BCR-ABL1 Fusion


In [23]:
drug_tsv = './drug-schema/oncokb.tsv'
sub.to_csv(drug_tsv, sep='\t')