### Question 1:
Among pediatric patients with an 'asthma-like phenotype', is exposure to particulate 
matter <=2.5 micrometers in diameter (PM2.5) and ozone associated with responsiveness
to treatment? (In other words, are exposures higher in patients who are non-responsive
to treatment than in patients who are responsive to treatment?)

*** Imports ***
Greentranslator gets us a Python [API](https://stars.renci.org/var/greentranslator/index.html?highlight=greenquery#greentranslator.api.GreenQuery) for Green services.

*** Query ***
The [GreenQuery](https://stars.renci.org/var/greentranslator/index.html?highlight=greenquery#greentranslator.api.GreenQuery) provides access to clinical, exposures, and chembio data. It tracks provenance and is able to report it in W3C PROV format.


import pprint
import json
import traceback
from datetime import datetime, timedelta
from dateutil.parser import parse as parse_date

from greent.core import GreenT # Clinical API is IP restricted to translator.ncats.io, must use core.
greent_core = GreenT ()

from greent import client
greent = client.GraphQL ("http://stars.renci.org:5000/graphql")


In [1]:
import pprint
import json
import traceback
from datetime import datetime, timedelta
from dateutil.parser import parse as parse_date
from greent.core import GreenT # Clinical API is IP restricted to translator.ncats.io, must use core. greent_core = GreenT ()
from greent.client import GraphQL
greent = GraphQL ("http://localhost:5000/graphql")
from greent.endotype import Endotype



### Clinical Query
For now, we can filter by age, sex, race, and location. Soon, the interface will change to specify an age range.


Here's an example response:
```
[{'birth_date': '2006-08-02 00:00:00',
  'diag': {'ICD10:B08.4': {'2016-07-08 00:00:00': 'OUTPATIENT'},
           ...
           'ICD9:V19.2': {'2006-12-19 00:00:00': 'OUTPATIENT'}},
  'geoCode': {'GEO:LAT': '35.22056', 'GEO:LONG': '-80.69664'},
  'medList': {'MDCTN:10427': '2016-03-10 00:00:00',
              ...
              'MDCTN:9502': '2017-01-20 00:00:00'},
  'patient_id': '32227752',
  'race': 'white',
  'sex': 'M'},
  ...]
```


In [3]:
patients = greent.get_patients (age=8, sex='male', race='white') #, location='OUTPATIENT')
pprint.pprint (patients)

------------------------> b'{"variables": {"age": 8, "race": "white", "sex": "male"}, "query": "\\nquery queryPatients ($age : Int, $sex : String, $race : String) {\\n  patients (age: $age, sex: $sex, race: $race) {\\n    birthDate\\n    race\\n    sex\\n    patientId\\n    diagnoses {\\n      diagnosis\\n    }\\n    geoCode {\\n      latitude\\n      longitude\\n    }\\n    prescriptions {\\n      medication\\n      date\\n    }\\n  }\\n}  \\n            "}'
None


In [3]:
meds = {}

for x in patients:
    medList = x['medList']
    # Collect the unique meds
    for m in medList.keys():
        try:
            meds[medList[m]] = meds[medList[m]]+1
        except KeyError:
            meds[medList[m]] = 1
pprint.pprint(meds)
            

{None: 115,
 '0.3 ML Enoxaparin sodium 100 MG/ML Prefilled Syringe': 1,
 '0.3 ML Epinephrine 0.5 MG/ML Auto-Injector [Epipen]': 4,
 '10 Ml Sodium Chloride 9 Mg/Ml Prefilled Syringe': 1,
 '120 ACTUAT Fluticasone propionate 0.05 MG/ACTUAT Nasal Inhaler': 1,
 '2 ML Diazepam 0.005 MG/MG Prefilled Applicator': 1,
 '2 ML Somatropin 5 MG/ML Prefilled Syringe [Nutropin]': 4,
 '200 ACTUAT Albuterol 0.09 MG/ACTUAT Metered Dose Inhaler': 2,
 '5 Ml Sodium Chloride 9 Mg/Ml Prefilled Syringe': 2,
 'ACETAMINOPHEN 160 MG/5 ML (5 ML) ORAL SUSPENSION': 6,
 'ACETAMINOPHEN 325 MG TABLET': 1,
 'ACETAMINOPHEN 80 MG CHEWABLE TABLET': 2,
 'ACETAMINOPHEN 80 MG RECTAL SUPPOSITORY': 2,
 'ALBENDAZOLE 200 MG TABLET': 2,
 'ALBENZA 200 MG TABLET': 1,
 'ALBUTEROL 90 MCG INHALER': 1,
 'ALBUTEROL SULF HFA 90 MCG INH': 1,
 'ALBUTEROL SULFATE 2.5 MG/3 ML (0.083 %) SOLUTION FOR NEBULIZATION': 4,
 'ALBUTEROL SULFATE HFA 90 MCG/ACTUATION AEROSOL INHALER': 7,
 'ALTEPLASE 2 MG SOLUTION FOR INJECTION': 1,
 'AMOXICILLIN 400 MG/

In [29]:
import requests
def med2rxnorm(txt):
    annos = []
    url = 'http://data.bioontology.org/annotator?text=%s&apikey=b792dd1b-cdc2-4cc8-aaf2-4fa4fbf47e4e'
    try:
        resp = requests.get(url % urllib.parse.quote(txt)).json()
        if len(resp) > 0:
            for aresp in resp:
                annos.extend([ x['text'] for x in aresp['annotations'] ])
    except TypeError:
        print ("type error: {}".format (txt))
    return (annos)

c = 0
for med in meds:
    if c > 50:
        continue
    c = c + 1
    print (med2rxnorm (med))

['RISPERIDONE 0.25 MG', 'RISPERIDONE']
['CHLORHEXIDINE GLUCONATE 1.2 MG/ML MOUTHWASH', 'CHLORHEXIDINE GLUCONATE 1.2 MG/ML', 'CHLORHEXIDINE GLUCONATE', 'CHLORHEXIDINE', 'GLUCONATE', 'MOUTHWASH']
['CEPHALEXIN 250 MG', 'CEPHALEXIN', 'ORAL SUSPENSION']
['CITRIC ACID 66.8 MG/ML / SODIUM CITRATE 100 MG/ML ORAL SOLUTION', 'CITRIC ACID 66.8 MG/ML', 'CITRIC ACID', 'SODIUM CITRATE 100 MG/ML', 'SODIUM CITRATE', 'SODIUM', 'CITRATE', 'ORAL SOLUTION']
['SODIUM CHLORIDE 0.154 MEQ/ML INJECTABLE SOLUTION', 'SODIUM CHLORIDE 0.154 MEQ/ML', 'SODIUM CHLORIDE', 'SODIUM', 'INJECTABLE SOLUTION']
['VANCOMYCIN']
['AUGMENTIN']
['LABETALOL']
['GLUCOSE 50 MG/ML', 'GLUCOSE', 'SODIUM CHLORIDE 0.0769 MEQ/ML INJECTABLE SOLUTION', 'SODIUM CHLORIDE 0.0769 MEQ/ML', 'SODIUM CHLORIDE', 'SODIUM', 'INJECTABLE SOLUTION']
['ACETAMINOPHEN 80 MG CHEWABLE TABLET', 'ACETAMINOPHEN 80 MG', 'ACETAMINOPHEN', 'CHEWABLE TABLET']
['MEPHYTON']
['SEVELAMER CARBONATE 26.7 MG/ML ORAL SUSPENSION', 'SEVELAMER CARBONATE 26.7 MG/ML', 'SEVELAMER 

### Exposures
Now we want to find exposures data corresponding to our clinical data. We iterate over each returned patient, over each visit for that patient. In the context of a visit, we calculate a date prior to the visit date by a week, look up exposure data in the patient's location during that time interval.

Response structure:
```
[{'end_time': datetime.datetime(2010, 1, 7, 23, 0, tzinfo=tzlocal()),
 'exposure_type': 'pm25',
 'latitude': '35.9131996',
 'longitude': '-79.0558445',
 'start_time': datetime.datetime(2010, 1, 7, 0, 0, tzinfo=tzlocal()),
 'units': '7dayrisk',
 'value': '4.714285714285714'},
 ...]
```

In [11]:
# We'll use the Environmental Exposures endpoint to address the second part of the question:
#  > ... is exposure to particulate matter <=2.5 micrometers in diameter (PM2.5) 
#    and ozone associated with responsiveness to treatment?
exposures = []
for patient in patients:
    for diagnosis, visit in patient['diag'].items ():
        for date, visit_type in visit.items ():
            
            visit_date = parse_date (date)
            start_date = (visit_date - timedelta(days=7)).isoformat ().split('T')[0]
            end_date = visit_date.isoformat ().split('T')[0]

            geo_location = "{0},{1}".format (patient['geoCode']['GEO:LAT'], patient['geoCode']['GEO:LONG'])

            if not start_date.startswith ("2010-1"):
                continue

            try:
                response = greent.get_exposure_scores (
                    exposure_type = 'pm25',
                    start_date = start_date,
                    end_date = end_date,
                    exposure_point = geo_location)
                
                exposure_set = json.loads (response['data']['exposureScore'])
                
                for exposure in exposure_set:
                    print ("start: {0} end: {1} loc: {2} value: {0} units: {1}".format (
                        start_date, end_date, geo_location, exposure['value'], exposure['units']))
                    exposures.append (exposure)
            except: 
                print ('.') 
                traceback.print_exc ()

start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-11-14 end: 2010-11-21 loc: 35.27658,-80.75361 value: 2010-11-14 units: 2010-11-21
start: 2010-12-23 end: 2010-12-30 loc: 35.27658,-80.75361 value: 2010-12-23 units: 2010-12-30
start: 2010-12-23 end: 2010-12-30 loc: 35.27658,-80.75361 value: 2010-12-23 units: 2010-12-30
start: 2010-12-23 end: 2010-12-30 loc: 35.27658,-80.75361 va

start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-11-17 end: 2010-11-24 loc: 44.51101,-73.01033 value: 2010-11-17 units: 2010-11-24
start: 2010-10-25 end: 2010-11-01 loc: 44.51101,-73.01033 value: 2010-10-25 units: 2010-11-01
start: 2010-10-25 end: 2010-11-01 loc: 44.51101,-73.01033 value: 2010-10-25 units: 2010-11-01
start: 2010-10-25 end: 2010-11-01 loc: 44.51101,-73.01033 va

start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-10-18 end: 2010-10-25 loc: 34.88018,-80.52109 value: 2010-10-18 units: 2010-10-25
start: 2010-12-26 end: 2011-01-02 loc: 44.31477,-73.10609 value: 2010-12-26 units: 2011-01-02
start: 2010-12-26 end: 2011-01-02 loc: 44.31477,-73.10609 value: 2010-12-26 units: 2011-01-02
start: 2010-12-26 end: 2011-01-02 loc: 44.31477,-73.10609 va

start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-16 end: 2010-11-23 loc: 35.56878,-80.50115 value: 2010-11-16 units: 2010-11-23
start: 2010-11-01 end: 2010-11-08 loc: 35.56878,-80.50115 value: 2010-11-01 units: 2010-11-08
start: 2010-11-01 end: 2010-11-08 loc: 35.56878,-80.50115 value: 2010-11-01 units: 2010-11-08
start: 2010-11-01 end: 2010-11-08 loc: 35.56878,-80.50115 va

start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27
start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27
start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27
start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27
start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27
start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27
start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27
start: 2010-10-20 end: 2010-10-27 loc: 35.26250,-80.99058 value: 2010-10-20 units: 2010-10-27


In [4]:
print (query.prov_json ())

{
  "prefix": {
    "expo.pm25-o3": "http://purl.translator.org/prov/expo.pm25-o3",
    "blazegraph": "http://purl.translator.org/prov/blazegraph",
    "biochem": "http://purl.translator.org/prov/biochem",
    "expo": "http://purl.translator.org/prov/expo",
    "default": "http://purl.translator.org/prov/",
    "clinical.med.prescribed": "http://purl.translator.org/prov/clinical.med.prescribed",
    "TODO": "http://purl.translator.org/prov/TODO"
  }
}


In [5]:
query.provenance.document.plot ()

TclError: no display name and no $DISPLAY environment variable

In [4]:
exposures = list(map(lambda exp : Endotype.create_exposure (**exp), [{
    "exposure_type": "pm25",
    "units"        : "",
    "value"        : 2
}]))
visits = list(map(lambda v : Endotype.create_visit(**v), [{
    "icd_codes"  : "ICD9:V12,ICD9:E002",
    "lat"        : "20",
    "lon"        : "20",
    "time"       : "2017-10-12 21:12:29",
    "visit_type" : "INPATIENT",
    "exposures"  : exposures
}]))
request = Endotype.create_request (dob= "2017-10-04", model_type="M0", race="1", sex="M", visits = visits)
print (json.dumps (request))
response = greent.get_endotypes (query = json.dumps(request))
print (response)
#endo = json.loads (response['endotype'])

{"race": "1", "model_type": "M0", "visits": [{"exposures": [{"value": 2, "exposure_type": "pm25", "units": ""}], "lat": "20", "time": "2017-10-12 21:12:29", "icd_codes": "ICD9:V12,ICD9:E002", "visit_type": "INPATIENT", "lon": "20"}], "date_of_birth": "2017-10-04", "sex": "M"}
------------------------> b'{"variables": {"query": "{\\"race\\": \\"1\\", \\"model_type\\": \\"M0\\", \\"visits\\": [{\\"exposures\\": [{\\"value\\": 2, \\"exposure_type\\": \\"pm25\\", \\"units\\": \\"\\"}], \\"lat\\": \\"20\\", \\"time\\": \\"2017-10-12 21:12:29\\", \\"icd_codes\\": \\"ICD9:V12,ICD9:E002\\", \\"visit_type\\": \\"INPATIENT\\", \\"lon\\": \\"20\\"}], \\"date_of_birth\\": \\"2017-10-04\\", \\"sex\\": \\"M\\"}"}, "query": "\\n                  query get_endotype ( $query : String) {\\n                      endotype (query:$query)\\n            }"}'
{'endotype': ['{\'periods\': [{\'end_time\': \'2017-10-12 21:12:29\', \'start_time\': \'2017-10-12 21:12:29\'}], \'endotype_id\': \'E0\', \'endotype_evi

In [22]:
def endotype_to_obj (response):
    result = None
    if 'endotype' in response and len(response['endotype']) > 0:
        print(response['endotype'][0])
        text = response['endotype'][0].replace ('"', '\\"').replace ("'", '"')
    return json.dumps (json.loads (text), indent=2)

print (endotype_to_obj (response))


{'periods': [{'end_time': '2017-10-12 21:12:29', 'start_time': '2017-10-12 21:12:29'}], 'endotype_id': 'E0', 'endotype_evidence': 'pre_ed %in% c("[0,0.5)", "[0.5,1.5)")', 'endotype_description': '[0,0.5)'}
{"periods": [{"end_time": "2017-10-12 21:12:29", "start_time": "2017-10-12 21:12:29"}], "endotype_id": "E0", "endotype_evidence": "pre_ed %in% c(\"[0,0.5)\", \"[0.5,1.5)\")", "endotype_description": "[0,0.5)"}
{
  "periods": [
    {
      "end_time": "2017-10-12 21:12:29",
      "start_time": "2017-10-12 21:12:29"
    }
  ],
  "endotype_evidence": "pre_ed %in% c(\"[0,0.5)\", \"[0.5,1.5)\")",
  "endotype_id": "E0",
  "endotype_description": "[0,0.5)"
}


In [None]:
%matplotlib inline

In [6]:
import seaborn as sns
import numpy as np
import pandas as pd

sns.set_style("darkgrid", { "font.family" : "serif" })
sns.jointplot(x="days",
              y="values",
              data = pd.DataFrame({
                  "days" : [ exposure.end_time.timetuple().tm_yday for exposure in exposures ],
                  "values" : [ float(exposure.value) for exposure in exposures ],
              }),
              kind='reg');

NameError: name 'exposures' is not defined

In [7]:
sns.set_style("darkgrid", { "font.family" : "serif" })
tips = sns.load_dataset("tips")
ax = sns.barplot(x="day", y="total_bill", hue="sex", data=tips)

TclError: no display name and no $DISPLAY environment variable

In [14]:
# http://bokeh.pydata.org/en/latest/docs/gallery/texas.html
# http://briank.im/determining-counties-from-longlat/
from bokeh.io import show, output_notebook
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    LogColorMapper
)
from bokeh.palettes import Viridis6 as palette
from bokeh.plotting import figure

from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment

palette.reverse()

counties = {
    code: county for code, county in counties.items() if county["state"] == "nc"
}

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

county_names = [county['name'] for county in counties.values()]
county_rates = [unemployment[county_id] for county_id in counties]
color_mapper = LogColorMapper(palette=palette)

source = ColumnDataSource(data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates,
))

TOOLS = "pan,wheel_zoom,reset,hover,save"

p = figure(
    plot_width=900,
    plot_height=380,
    title="NC Unemployment, 2009", tools=TOOLS,
    x_axis_location=None, y_axis_location=None
)
p.grid.grid_line_color = None

p.patches('x', 'y', source=source,
          fill_color={ 'field': 'rate', 'transform': color_mapper },
          fill_alpha=0.7, line_color="white", line_width=0.5)

hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
    ("Name", "@name"),
    ("Unemployment rate)", "@rate%"),
    ("(Long, Lat)", "($x, $y)"),
]
output_notebook()
show(p, new='window')