**Question: What are symptoms of Asthma subtypes? **
* Find patients diagnosed with Asthma
* Find symptoms for Asthma
* Find occurences of symptoms in Asthma patients
* Find symptom clusters among Asthma patients

### Function and dataset definitions

In [None]:
import urllib, urllib2
import pprint, json, requests
from greentranslator.api import GreenTranslator

try:
    cnx = mysql.connector.connect(user='tadmin',
                                password='ncats_translator!',
                                database='umls',
                                host='translator.ceyknq0yekb3.us-east-1.rds.amazonaws.com')
except mysql.connector.Error as err:
    if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
        print("Something is wrong with your user name or password")
    elif err.errno == errorcode.ER_BAD_DB_ERROR:
        print("Database does not exist")
    else:
        print(err)

In [None]:
## Pull in disease to symptom mappings taken from the SI of
## https://www.nature.com/articles/ncomms5212. Takes a bit of time to pull down
DISEASE2SYMPTOMS = [x.split("\t") for x in urlopen("https://www.nature.com/article-assets/npg/ncomms/2014/140626/ncomms5212/extref/ncomms5212-s4.txt").read().split("\n")]
DISEASE2SYMPTOMS = filter(lambda x: len(x) == 4, DISEASE2SYMPTOMS)

In [None]:
## Given disease/condition term, get back ICD codes from OHDSI
def findICD_ohdsi(txt, icd_version = 9):
    if icd_version == 9:
        icd_type = 'ICD9CM'
    elif icd_version == 10:
        icd_type = 'ICD10'
    else: raise Exception("Invalid ICD version specified")    
    url_con = "http://api.ohdsi.org/WebAPI/vocabulary/search"
    headers = {'content-type': 'application/json'}
    params = {"QUERY": txt,
              "VOCABULARY_ID": [icd_type]}
    response = requests.post(url_con, data=json.dumps(params), headers=headers)
    data= json.loads(response.text.decode('utf-8'))
    return [d["CONCEPT_CODE"] for d in data]
print findICD_ohdsi('asthma')

# Get ICD10/ICD9 code for a given string from UMLS. By default we get back ICD10.
def findICD_umls(name, icd_version = 10):
    if icd_version == 9:
        icd_type = 'ICD9CM'
    elif icd_version == 10:
        icd_type = 'ICD10'
    else: raise Exception("Invalid ICD version specified")

    cursor = cnx.cursor()
    query = ("SELECT CUI FROM umls.MRCONSO WHERE STR='"+name+"'")
    cursor.execute(query, ())
    res = "Undef"
    for code in cursor:
        if res=="Undef":
            res = code
    if res != "Undef":
        query = ("SELECT CODE FROM umls.MRCONSO WHERE SAB='"+icd_type+"' AND CUI='"+res[0]+"'")
        cursor.execute(query, ())
        icd10 = "Undef"
        for code in cursor:
            icd10 = code
        return (icd10[0])
    return ("Undef")

print(findICD_umls('Asthma'))
print(findICD_umls('Asthma', 9))

In [None]:
## Given disease name, get back symptoms (defined using MeSH terms) along with TFIDF scores
## Taken from https://www.nature.com/articles/ncomms5212
def disease2symptoms(txt):
    s = filter(lambda x: txt.lower() in x[1].lower(), DISEASE2SYMPTOMS)
    return([(x[0], x[3]) for x in s])
symps = disease2symptom("Asthma")
print 'Found %s symptom MeSH terms for %s' % (len(symps), "Asthma")

In [None]:
## Functions to retreive patients from different sources - currently FHIR & UNC
def findPatients_fhir(code, count=1000):
    try:
        response = urllib2.urlopen("http://ictrweb.johnshopkins.edu/rest/synthetic/Condition?icd_10="+code+"&_count=%d" % (count))
    except Exception, e:
        raise Exception(e)
    return json.loads(response.read())

def findPatients_unc(age='8', sex='male', race='white', location='OUTPATIENT'):
    query = GreenTranslator ().get_query()
    return query.clinical_get_patients(age, sex, race, location)

### Workflow for "_What are symptoms of Asthma subtypes?_"

In [None]:
asthmaCodes = findICD_umls("asthma") # We go with ICD10 codes
## get patients with asthma. First from FHIR, then with UNC
tmp = [findPatients_fhir(icd) for icd in asthmaCodes] # not useful right now
p_unc = findPatients_unc() # TODO needs to be updated to latest code

Next we identify symptoms for asthma. Our starting point in a list of diseases and symptoms from (Zhou et al)[https://www.nature.com/articles/ncomms5212] derived based on co-occurence. The symptoms so obtained are MeSH terms which we then translate to ICD10 codes. For this translation we query both UMLS and OHDSI

In [None]:
asthmaSymptoms = disease2symptoms("asthma")
print 'Found %s symptom MeSH terms for %s' % (len(symps), "asthma")
asthmaSymptomCodes = filter(lambda x: x != 'U', [findICD_umls(x[0], 10) for x in symps])

flatten = lambda l: [item for sublist in l for item in sublist]
tmp2 = flatten([findICD_ohdsi(x[0], 10) for x in symps])
asthmaSymptomCodes.extend(tmp2)

asthmaSymptomCodes = list(set(asthmaSymptomCodes))
print 'Mapped to %d unique ICD10 codes' % (len(asthmaSymptomCodes))

Given the set of symptoms for the disease, we now identify patients matching these symptoms. Note that the lines between symptom, condition, diagnoses are not always well defined.