# Retrieve the notes from our local MIMIC database

This notebook provides a Jupyter interface to the MIMIC2 demo data set.


In [None]:
import pymysql
import pandas as pd
import os
import pickle
import zipfile
from IPython.display import display

## 1. Set up the database connection

In [None]:
conn = pymysql.connect(host="mysql",port=3306,user="jovyan",passwd='jovyan',db='mimic2')
cursor = conn.cursor()

## 2.  See some statistics of note types

We use the following sql query:
```sql
SELECT category,count(*) FROM noteevents GROUP BY category;
```

In [None]:
sql='SELECT category,count(*) FROM noteevents GROUP BY category'
pos_adm_ids = pd.read_sql(sql, conn)
# the read_sql function returns a pandas dataframe, which can be diplayed nicely in jupyter notebook:
display(pos_adm_ids)

## 3.  Retrieve the notes and zip them

We will read all dischange summary notes into a dataframe, and write into a zip file, so that we can download it from our jupyter notebook. Here is the sql query that we are going to use:
```sql  
SELECT n.hadm_id, n.text FROM noteevents n WHERE category='DISCHARGE_SUMMARY';
```
Let's **read the notes into a dataframe**:

In [None]:
# let's limit the total number here for your exercise, you don't want to wait too long and explode our server space.
sql="SELECT n.hadm_id, n.text FROM noteevents n WHERE category='DISCHARGE_SUMMARY' LIMIT 10"

discharge_summaries=pd.read_sql(sql, conn)

If you want, you can **have a look** at what you have retrieved:

In [None]:
@ipywidgets.interact(i=ipywidgets.IntSlider(min=0, max=discharge_summaries['text'].count()-1))
def _view_markup(i):
	print('hadm_id=',discharge_summaries.ix[i,'hadm_id'])
	print(discharge_summaries.ix[i,'text'])

Now Let's create our **export zip function**:

In [None]:
def exportZip(filename, dataframe):
    zf = zipfile.ZipFile(filename, 
                     mode='w',
                     compression=zipfile.ZIP_DEFLATED, 
                     )
    ids=set();
    try:
        for i in range(0,len(dataframe)):
            if(dataframe.ix[i,'hadm_id'] not in ids):
                zf.writestr(str(dataframe.ix[i,'hadm_id'])+".txt", dataframe.ix[i,'text'])
                ids.add(dataframe.ix[i,'hadm_id'])
    finally:
        zf.close()

Then **zipping the notes** is pretty simple:

In [None]:
exportZip('discharge_summaries.zip',discharge_summaries)

Now go back to your notebook's file list, and download this discharge_summaries.zip'

<br/><hr/>This material presented as part of the Foundermental Health Informatics Course, 2017 Fall, BMI, University of Utah. It's revised from the <a href="https://github.com/UUDeCART/decart_rule_based_nlp">material</a> of the DeCART  Summer Program (Data, exploration, Computation, and Analytics Real-world Training for the Health Sciences) at the University of Utah in 2017. <br/><br/>Original presenters : Dr. Wendy Chapman, Jianlin Shi and Kelly Peterson.<br/>
Revised by: Jianlin Shi and Dr. Wendy Chapman<br/>
<img align="left" src="https://wiki.creativecommons.org/images/1/10/Cc.org_cc_by_license.jpg" alt="Except where otherwise noted, this website is licensed under a Creative Commons Attribution 3.0 Unported License.">