In [1]:
from IPython.core.display import HTML
import urllib2
HTML(urllib2.urlopen('https://gist.githubusercontent.com/mattlewissf/83989910849fdb4a04a72d431e84053f/raw/cefa015a9065665faccd0219774c7087be7d21a8/skeleton.css').read())

### MIMIC Deep Dive -Preprocessing the Data
**[Basic structure](#preprocessing)**   
**[What is OMOP?](#omop)**  
**[The MIMIC Dataset](#mimic_dataset)**  
**[Setting up the database](#setting_up_db)** 

<a id='preprocessing'></a>
#### Or: I can't just use the csv?

Here we leaned a lot on Jason (our mentor) in terms of devising an intelligent setup for getting data out of the database and into forms that would allow us to better manipulate it to extract features for the model. In order to make the data easy to manipulate later, we want to: 

1. Query tables out of the database using an ORM. Here we started with SQLAlchemy, a very popular ORM for Python folk. It has a ton of bells and whistles and does way more than we need it to, but it seems like something that will work for our purposes. 
2. Create a way to represent these tables as python objects in our Mapper file. 
3. Create a new understanding of a Person object in our Standard module, with attributes from each of the tables accessible from the Person class. This class is created to conform with the [Common Data Model](http://omop.org/CDM), as outlined by the Observational Medical Outcomes Partnership (OMOP). 

<br></br>

Here's the basic concept: 

![](http://i.imgur.com/ZFtvKtf.png)

<a id='omop'></a>
#### What is OMOP? 

OMOP, or the Observational Medical Outcomes Partnership Common Data Model (CDM) version 4, is a standard representation of healthcare experiences and common vocabularies for coding clinical concepts which serves to facilitate analysis across disparate databases. We're using their CDM as it has emerged as a standard among different applications of helathcare data taht need common definitions for visits, patients, and observations where business rules can be applied consistently throughout the database. 

Here's a snippet of code to show you what mapping this code looked like for SQLAlchemy (check out the [rest of it here](https://github.com/mattlewissf/mimic/blob/master/mimic_package/data_model/mapper.py#L25)). The goal was to pull in the data from the MIMIC database and create a way of passing it around as relational data. Here's what the Admission class looked like for our Reader.py: 

In [None]:
class Admission(Base): # 
    __table__ = table_override(metadata.tables['mimiciii.admissions'],
               Base, 
               {"row_id": Column(Integer, primary_key =True), 
                "subject_id": Column(Integer, ForeignKey('mimiciii.patients.subject_id'))}) #
    callouts = relationship('Callout') #
    chartevents = relationship('Chartevent') #
    cptevents = relationship('CPTevent')
    datetimeevents = relationship('Datetimeevent')
    diagnoses_icd = relationship('Diagnosis_ICD')
    drgcodes = relationship('Drgcode')    
   # etc...

Once this was pulled in, we could our pythonic representations of the admission and patient tables (in the Admission and Patient classes) to create an OMOPPerson object elsewhere: 

In [None]:
class OMOPPerson(OMOPStandardData): 

    def __init__(self, person_id, DOB, DOD, gender, expire_flag, drug_exposures, visit_occurances, 
                 procedures, observations, conditions, death, index_admission):
        self.person_id = person_id
        self.DOB = DOB
        self.DOD = DOD
        self.gender = gender
        self.expire_flag = expire_flag
        self.drug_exposures = drug_exposures
        self.visit_occurances = visit_occurances
        self.procedures = procedures
        self.observations = observations
        self.conditions = conditions
        self.death = death 
        self.index_admission = index_admission

After a lot of coding and configuration (and consulting with the SQLALchemy docs), we were able to pull data from the database, transform it into the correct class, and then pass that to an OMOP friendly. Super excellent! 

<br></br>

Except... SQLAlchemy is...  Very slow...  On my aging Macbook, I was finding that creating a single Person from the database was taking on the order of a few minutes. And with more than 46k different patients in the MIMIC database, this was going to be a problem. 

<a id='oreader'></a>
### Moving to OReader

more here