*This notebook is an artifact from the [Biomedical Translator Project](https://ncats.nih.gov/translator/about) hackathon hosted by NIH NCATS on May 9-11th, 2017 (followed by minimal post-hackathon activities through May 25th, 2017)*

### Clinical query workflow question: Can we identify novel off-label use for early-onset and late-onset disease respectively?

**Contributors**
* Kenneth Roe, Richard Zhu, Casey Overby (Hopkins)
* Rajarshi Guha (NCATS)
* Chunhua Weng, Chi Yuan (Columbia)
* James Champion (UNC)
* Marcin von Grotthuss, Oliver Ruebanacker (Broad)

**Overview of steps**
* Find age of diagosis for Asthma patients
* Find medications (and RxNorm codes for meds) taken by Asthma patients
* Compute drug status for drugs taken by Astma patients
* (Compare on-/off- label status of drugs taken by patients diagosed with Asthma during childhood vs during adulthood)

**Data Sources**
* [HUSH+ synthetic data resource](https://github.com/NCATS-Tangerine/cq-notebooks/tree/master/GreenTeam_Data_Documentation)
* [NCBO annotator](https://www.bioontology.org/)
* [NDK (NCATS Disease Knowledgebase)](https://tripod.nih.gov/)

In [1]:
## All the imports we need
from urllib2 import Request, urlopen, URLError
from urllib import quote_plus

import mysql.connector

import pprint, json, requests

from datetime import datetime, timedelta
from dateutil.parser import parse as parse_date
from greentranslator.api import GreenTranslator

import dateutil

### (1) Find age of diagosis for Asthma patients

* NOTE: given the current query API specifications we specify demographics that return a small number of patients

**Find patients diagnosed with Asthma as a child (Group 1)**

In [12]:
## Find HUSH+ patients matching a list of ICD codes as children

query = GreenTranslator ().get_query ()

HUSHplusChildren = query.clinical_get_patients (age='15', sex='male', 
                                        race='white', location='EMERGENCY')
#pprint.pprint (HUSHplusChildren)

** Find patients diagnosed with Asthma as an adult (Group 2)**

In [15]:
## Find HUSH+ patients matching a list of ICD codes as adults

query = GreenTranslator ().get_query ()

HUSHplusAdults = query.clinical_get_patients (age='50', sex='male', 
                                        race='white', location='EMERGENCY')
#pprint.pprint (HUSHplusAdults)

### (2) Find medications (and RxNorm codes for meds) taken by Asthma patients

** Find medications for patients in children **

In [17]:
## Pull meds for HUSH+ patients
meds = {}
for x in HUSHplusChildren:
    medList = x['medList']
    # Collect the unique meds
    for m in medList.keys():
        found = False
        try:
            meds[medList[m]] = meds[medList[m]]+1
        except KeyError:
            meds[medList[m]] = 1
#pprint.pprint(meds)

** Find RxNorm codes for meds prescribed in children **

In [18]:
## For a given medication string get NCBO annotations
## We let NCBO match any ontology since just using RxNORM doesn't
## always give us just the drug name (e.g., "CLINDAMYCIN 15 MG/ML ORAL SOLUTION" is
## a valid RxNORM term)
def med2rxnorm(txt):
    url = 'http://data.bioontology.org/annotator?text=%s&apikey=b792dd1b-cdc2-4cc8-aaf2-4fa4fbf47e4e'
    txt = urlopen(url % quote_plus(txt)).read()
    resp = json.loads(txt)
    if len(resp) == 0: return([])
    annos = []
    for aresp in resp:
        annos.extend([ x['text'] for x in aresp['annotations'] ])
    ##annos = filter(lambda x: not any(d in x for d in'0123456789'), annos)
    return(annos)
#print med2rxnorm("CLINDAMYCIN 15 MG/ML ORAL SOLUTION")        

In [22]:
## Get RxNORM codes for medication strings
medrxnorm = {}
for med in meds.keys():
    if med is None: continue    
    annos = med2rxnorm(med)
    print 'Processing %s and found %d annotations' % (med, len(annos))
    medrxnorm[med] = {'count':meds[med], 'annos':annos}    

Processing Aspirin 81 MG Delayed Release Oral Tablet and found 5 annotations
Processing FENTANYL 12 MCG/HR TRANSDERMAL PATCH and found 1 annotations
Processing ALBUTEROL SULFATE HFA 90 MCG/ACTUATION AEROSOL INHALER and found 3 annotations
Processing METHYLPHENIDATE ER 36 MG TABLET,EXTENDED RELEASE 24 HR and found 1 annotations
Processing AZITHROMYCIN 500 MG TABLET and found 2 annotations
Processing FLUTICASONE 50 MCG/ACTUATION NASAL SPRAY,SUSPENSION and found 2 annotations
Processing Magnesium Oxide 400 MG Oral Tablet and found 5 annotations
Processing PREDNISONE 20 MG TABLET and found 2 annotations
Processing Cyclobenzaprine hydrochloride 5 MG Oral Tablet and found 5 annotations
Processing Morphine Sulfate and found 2 annotations
Processing METHYLPHENIDATE ER 54 MG TABLET,EXTENDED RELEASE 24 HR and found 1 annotations
Processing benzonatate 200 MG Oral Capsule and found 4 annotations
Processing prednisolone 3 MG/ML Oral Solution and found 4 annotations
Processing Oxycodone Hydrochlori

### (3) Compute drug status for drugs taken by Astma patients

** Find all indications (as ICD-10 codes) for any of the drugs prescribed in children **

In [23]:
## Given a drug (identified by RXNORM), get conditions that the drug has a 
## status (approved, phase 3) for, via NDK API
def drug2conditions(drug):
    import urllib
    url = "https://tripod.nih.gov/ndk/treatment/%s/conditions" % (drug)
    page = urllib.urlopen(url).read().strip()
    if page == "":
        return None
    #print page
    try:
        resp = json.loads(page)
    except ValueError, HTTPError:
        return None
    conds = []
    for aresp in resp:
        condname = aresp['name']
        if 'ICD10' in aresp.keys():
            condicd10 = aresp['ICD10']
        else: condicd10 = []
        conds.append( (aresp['status'], condname, condicd10) )
    return(conds)

# Given a medication try all annotations to find conditions
def med2conditions(m):
    annos = medrxnorm[m]['annos']
    for a in annos:
        r = drug2conditions(a)
        if (r != None):
            return r
    return None

In [45]:
keys = list(medrxnorm.keys())
print keys
'''
n = 10
annos = medrxnorm[keys[n]]['annos']
conds = []
for anno in annos:
    conds.extend(drug2conditions(anno))
print "##", keys[n], "##", conds
'''

[u'Aspirin 81 MG Delayed Release Oral Tablet', u'FENTANYL 12 MCG/HR TRANSDERMAL PATCH', u'ALBUTEROL SULFATE HFA 90 MCG/ACTUATION AEROSOL INHALER', u'METHYLPHENIDATE ER 36 MG TABLET,EXTENDED RELEASE 24 HR', u'AZITHROMYCIN 500 MG TABLET', u'FLUTICASONE 50 MCG/ACTUATION NASAL SPRAY,SUSPENSION', u'Magnesium Oxide 400 MG Oral Tablet', u'PREDNISONE 20 MG TABLET', u'Cyclobenzaprine hydrochloride 5 MG Oral Tablet', u'Morphine Sulfate', u'ONDANSETRON 4 MG DISINTEGRATING TABLET', u'benzonatate 200 MG Oral Capsule', u'prednisolone 3 MG/ML Oral Solution', u'Oxycodone Hydrochloride 5 MG Oral Tablet', u'MELOXICAM 7.5 MG TABLET', u'Lisinopril 2.5 MG Oral Tablet', u'Promethazine Hydrochloride 12.5 MG Oral Tablet', u'Amoxicillin 80 MG/ML Oral Suspension', u'COENZYME Q10 200 MG CAPSULE', u'METHYLPHENIDATE ER 54 MG TABLET,EXTENDED RELEASE 24 HR', u'cetirizine hydrochloride 10 MG Oral Tablet', u'Atenolol 25 MG Oral Tablet', u'CHOLECALCIFEROL (VITAMIN D3) 1,000 UNIT TABLET', u'oxcarbazepine 300 MG Oral Tab

'\nn = 10\nannos = medrxnorm[keys[n]][\'annos\']\nconds = []\nfor anno in annos:\n    conds.extend(drug2conditions(anno))\nprint "##", keys[n], "##", conds\n'

In [34]:
## Get approval status for each medication in a list--Generate a dictionary that contains a dictionary for each
## medication with the approval status for each condition (by ICD10 code)
def approvalStatus(meds):
    approved = {}
    phase4 = {}
    phase3 = {}
    phase2 = {}
    phase1 = {}
    unknownMeds = []
    for m in meds:
        if not(m==None):
            x = med2conditions(m)
            #print "drug conditions"
            #print x
            if x==None:
                print "Unknown mediction conditions for "+m
                print medrxnorm[m]['annos']
                unknownMeds.append(m)
            else:
                for d in x:
                    #print "Med info"
                    #print d
                    if d[0]=="Approved":
                        for c in d[2]:
                            try:
                                approved[m].append(c)
                            except KeyError:
                                approved[m] = [c]
                    if d[0]=="Phase 4":
                        for c in d[2]:
                            try:
                                phase4[m].append(c)
                            except KeyError:
                                phase4[m] = [c]
                    if d[0]=="Phase 3":
                        for c in d[2]:
                            try:
                                phase3[m].append(c)
                            except KeyError:
                                phase3[m] = [c]
                    if d[0]=="Phase 2":
                        for c in d[2]:
                            try:
                                phase2[m].append(c)
                            except KeyError:
                                phase2[m] = [c]
                    if d[0]=="Phase 1":
                        for c in d[2]:
                            try:
                                phase1[m].append(c)
                            except KeyError:
                                phase1[m] = [c]

    return (approved,phase4,phase3,phase2,phase1,unknownMeds)


In [35]:
## For a given patient record (HUSH+ format), return the the approval status for each of the patient's medications
def patientMedicationStatus(p):
    diags = []
    #print "Diag codes"
    for d in p['diag']:
        if d[0:6]=="ICD10:":
            #print d
            diags.append(d[6:])
    med = []
    for m in p['medList'].keys():
        med.append(p['medList'][m])
    #print "Meds"
    #print med
    (approved,phase4,phase3,phase2,phase1,unknownMeds) = approvalStatus(med)

    drugStatus = {}
    for m in p['medList'].keys():
        if m in unknownMeds:
            status = "Unknown"
        else:
            x = p['medList'][m]
            #print x
            try:
                a = approved[x]
            except KeyError:
                a = []
            try:
                p4 = phase4[x]
            except KeyError:
                p4 = []
            try:
                p3 = phase3[x]
            except KeyError:
                p3 = []
            try:
                p2 = phase2[x]
            except KeyError:
                p2 = []
            try:
                p1 = phase1[x]
            except KeyError:
                p1 = []

            if len(annos)==0:
                status = "Unknown"
            else:
                status = "offLabel"
            for d in diags:
                if d in p1:
                    status = "Phase1"
                if d in p2:
                    status = "Phase2"
                if d in p3:
                    status = "Phase3"
                if d in p4:
                    status = "Phase4"
                if d in a:
                    status = "Approved"
        drugStatus[x] = status
    return drugStatus


**For each patient, compute drug status for drugs on their medication list**

In [43]:
#compute drug status info for all child patients
patientDrugInfo = {}
for p in HUSHplusChildren:
    #print p
    print "Processing "+p['patient_id']
    patientDrugInfo[p['patient_id']] = patientMedicationStatus(p)

pprint.pprint(patientDrugInfo)

Processing 174943566
Processing 178871520
Processing 181065648
Processing 181799472
{u'174943566': {u'AZITHROMYCIN 500 MG TABLET': 'offLabel',
                u'Amoxicillin 80 MG/ML Oral Suspension': 'offLabel',
                u'Cyclobenzaprine hydrochloride 5 MG Oral Tablet': 'offLabel',
                u'FLUTICASONE 50 MCG/ACTUATION NASAL SPRAY,SUSPENSION': 'offLabel',
                u'HYDROCODONE 7.5 MG-ACETAMINOPHEN 500 MG/15 ML ORAL SOLUTION': 'Approved',
                u'MELOXICAM 7.5 MG TABLET': 'offLabel',
                u'ONDANSETRON 4 MG DISINTEGRATING TABLET': 'Approved',
                u'Promethazine Hydrochloride 12.5 MG Oral Tablet': 'offLabel'},
 u'178871520': {u'Acetaminophen 325 MG Oral Tablet': 'Approved',
                u'Aspirin 81 MG Delayed Release Oral Tablet': 'Approved',
                u'Atenolol 25 MG Oral Tablet': 'offLabel',
                u'CHOLECALCIFEROL (VITAMIN D3) 1,000 UNIT TABLET': 'offLabel',
                u'COENZYME Q10 200 MG CAPSULE': '

A next step could be to compare the on-/off- label status of drugs taken by patients diagosed with Asthma during childhood vs during adulthood