>**MIMIC-III Data Description**
>
>MIMIC-III is a relational database consisting of 26 tables. Tables are linked by identifiers which usually have the suffix ‘ID’. For example, SUBJECT_ID refers to a unique patient, HADM_ID refers to a unique admission to the hospital, and ICUSTAY_ID refers to a unique admission to an intensive care unit.
>
>Charted events such as notes, laboratory tests, and fluid balance are stored in a series of ‘events’ tables. For example the OUTPUTEVENTS table contains all measurements related to output for a given patient, while the LABEVENTS table contains laboratory test results for a patient.
>
>Tables prefixed with ‘D_’ are dictionary tables and provide definitions for identifiers. For example, every row of CHARTEVENTS is associated with a single ITEMID which represents the concept measured, but it does not contain the actual name of the measurement. By joining CHARTEVENTS and D_ITEMS on ITEMID, it is possible to identify the concept represented by a given ITEMID.
>
>Developing the MIMIC data model involved balancing simplicity of interpretation against closeness to ground truth. As such, the model is a reflection of underlying data sources, modified over iterations of the MIMIC database in response to user feedback. Care has been taken to avoid making assumptions about the underlying data when carrying out transformations, so MIMIC-III closely represents the raw hospital data.
>
>Broadly speaking, five tables are used to define and track patient stays: ADMISSIONS; PATIENTS; ICUSTAYS; SERVICES; and TRANSFERS. Another five tables are dictionaries for cross-referencing codes against their respective definitions: D_CPT; D_ICD_DIAGNOSES; D_ICD_PROCEDURES; D_ITEMS; and D_LABITEMS. The remaining tables contain data associated with patient care, such as physiological measurements, caregiver observations, and billing information.
>
>In some cases it would be possible to merge tables—for example, the D_ICD_PROCEDURES and CPTEVENTS tables both contain detail relating to procedures and could be combined—but our approach is to keep the tables independent for clarity, since the data sources are significantly different. Rather than combining the tables within MIMIC data model, we suggest researchers develop database views and transforms as appropriate.



We wish to convert these 26 tables into a single table, where each row represents a single patient's features. This will be done by joining the tables on the `SUBJECT_ID` column and adding features such as age, mortality, etc.

The main 5 tables contain the following columns:

| Table | Columns |
| --- | --- |
| `ADMISSIONS` | **ROW_ID**, **SUBJECT_ID**, **HADM_ID**, **ADMITTIME**, **DISCHTIME**, **DEATHTIME**, **ADMISSION_TYPE**, **ADMISSION_LOCATION**, **DISCHARGE_LOCATION**, **INSURANCE**, **LANGUAGE**, **RELIGION**, **MARITAL_STATUS**, **ETHNICITY**, **EDREGTIME**, **EDOUTTIME**, **DIAGNOSIS**, **HOSPITAL_EXPIRE_FLAG**, **HAS_CHARTEVENTS_DATA** |
| `PATIENTS` | ROW_ID, **SUBJECT_ID**, **GENDER**, **DOB**, **DOD**, DOD_HOSP, DOD_SSN, EXPIRE_FLAG |
| `ICUSTAYS` | ROW_ID, **SUBJECT_ID**, **HADM_ID**, **ICUSTAY_ID**, **DBSOURCE**, FIRST_CAREUNIT, LAST_CAREUNIT, FIRST_WARDID, LAST_WARDID, **INTIME**, **OUTTIME**, **LOS** |
| `SERVICES` | ROW_ID, **SUBJECT_ID**, **HADM_ID**, TRANSFERTIME, PREV_SERVICE, CURR_SERVICE |
| `TRANSFERS` | ROW_ID, **SUBJECT_ID**, **HADM_ID**, **ICUSTAY_ID**, **DBSOURCE**, EVENTTYPE, PREV_CAREUNIT, CURR_CAREUNIT, PREV_WARDID, CURR_WARDID, INTIME, OUTTIME, LOS |

[This repo](https://github.com/bmmalone/mimic-preprocessing) pre-processes MIMIC by keeping certain features. All features in bold above were kept. The following features are missing from the main 5 tables: 
- EPISODE = (ADMITTIME - DISCHTIME) ?
- EPISODE_NAME = 
- stay
- AGE = (ADMITTIME - DOB)
- MORTALITY_INUNIT = 
- MORTALITY = 1 if DOD is not null, 0 otherwise
- MORTALITY_INHOSPITAL = 1 if DOD_HOSP is not null, 0 otherwise
- HEIGHT
- WEIGHT