In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("YMykng5HzUs", width=800, height=600)

# Identifying Patient Cohorts in [MIMIC-II](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124312/)


[MIMIC-II](https://physionet.org/mimic2/mimic2_clinical_overview.shtml) is a freely available database of ICU patients. To access the full database (now migrated to MIMIC-III)  you must sign a data use agreement. However, there is a [demo data set](https://physionet.org/mimic2/demo/) based on 4000 deceased patients that can be used without signing any DUA.

## How to Use the MIMIC-II Database
* [MIMIC-II Cookbook](../Resources/MIMICIICookBook_v1.pdf)
* [MIMIC Data Dictionaries](http://physionet.incor.usp.br/physiobank/database/dictionaries/)


## The Varieties of...Data
The data set is very rich and so is a good resource for exploring the varieties of clinical data

![William James](https://goo.gl/ABDNmD)

Data incluces free text notes (nursing, radiology, discharg summaries, etc.), input/output events, test results, procedure codes, diagnosis codes, etc.

In [None]:
%matplotlib inline

In [None]:
import pymysql
import pandas as pd
import getpass
import pandas as pd
import seaborn as sns

In [None]:
conn = pymysql.connect(host="mysql",
                       port=3306,user="jovyan",
                       passwd=getpass.getpass("Enter MySQL passwd for jovyan"),db='mimic2')
cursor = conn.cursor()

## Example Query: Identifying ICD9 Codes for Patients

In [None]:
icd9_codes = pd.read_sql('SELECT subject_id, code, description from icd9',conn)
icd9_codes.head()

In [None]:
icd9_codes["description"].value_counts(normalize=False)

In [None]:
icd9_counts = icd9_codes["description"].value_counts().to_frame(name="ICD9 Counts")
icd9_counts.sort_values(by="ICD9 Counts")
icd9_counts.head(20).plot(kind='barh')

## Exercise 

Come up with a visualization of the top ICD9 codes

## Exercise

* Based on the query described on page 20 of the [MIMIC-II Cookbook](../Resources/MIMICIICookBook_v1.pdf) create a dataframe of urine output values from the database. Limit the query to a reasonable number of results
* Create a visualization of the values

## Selecting Cohorts

Our most interesting explorations will be when we use information from multiple tables to limit/select cases. Here is an example selecting radiology reports for patients with COPD.

## Select all the radiology reports for a patient with COPD
### [Codes obtained from CDC](http://www.cdc.gov/niosh/pdfs/98-157-d.pdf)
* chronic bronchitis (ICD-9 codes 490-491)
* emphysema (ICD-9 code 492)
* bronchiectasis (ICD-9 code 494)
* chronic airway obstruction (ICD-9 code 496). 

The **\** character indicates a line continuation.

In [None]:
copd_data = \
pd.read_sql("""SELECT noteevents.subject_id, 
                      noteevents.category, 
                      noteevents.text, 
                      icd9.code 
               FROM noteevents INNER JOIN icd9 ON 
                      noteevents.subject_id = icd9.subject_id 
               WHERE (   icd9.code LIKE '490%' OR
                         icd9.code LIKE '491%' OR
                         icd9.code LIKE '492%' OR
                         icd9.code LIKE '494%' OR
                         icd9.code LIKE '496%'
                      ) 
                      AND noteevents.category = 'RADIOLOGY_REPORT'""",conn)
copd_data.head(200)