# Mimic II Dataset

https://physionet.org/content/mimiciii/1.4/

DOI (version 1.4):
https://doi.org/10.13026/C2HM2Q

DOI (latest version):
https://doi.org/10.13026/jbmn-w042

Topics:
critical care mimic electronic health records

Project Website:
https://mimic.physionet.org


## Data Description

MIMIC-III is a relational database consisting of 26 tables. Tables are linked by identifiers which usually have the suffix ‘ID’. For example, SUBJECT_ID refers to a unique patient, HADM_ID refers to a unique admission to the hospital, and ICUSTAY_ID refers to a unique admission to an intensive care unit.

Charted events such as notes, laboratory tests, and fluid balance are stored in a series of ‘events’ tables. For example the OUTPUTEVENTS table contains all measurements related to output for a given patient, while the LABEVENTS table contains laboratory test results for a patient.

Tables prefixed with ‘D_’ are dictionary tables and provide definitions for identifiers. For example, every row of CHARTEVENTS is associated with a single ITEMID which represents the concept measured, but it does not contain the actual name of the measurement. By joining CHARTEVENTS and D_ITEMS on ITEMID, it is possible to identify the concept represented by a given ITEMID.

Developing the MIMIC data model involved balancing simplicity of interpretation against closeness to ground truth. As such, the model is a reflection of underlying data sources, modified over iterations of the MIMIC database in response to user feedback. Care has been taken to avoid making assumptions about the underlying data when carrying out transformations, so MIMIC-III closely represents the raw hospital data.

Broadly speaking, five tables are used to define and track patient stays: ADMISSIONS; PATIENTS; ICUSTAYS; SERVICES; and TRANSFERS. Another five tables are dictionaries for cross-referencing codes against their respective definitions: D_CPT; D_ICD_DIAGNOSES; D_ICD_PROCEDURES; D_ITEMS; and D_LABITEMS. The remaining tables contain data associated with patient care, such as physiological measurements, caregiver observations, and billing information.

In some cases it would be possible to merge tables—for example, the D_ICD_PROCEDURES and CPTEVENTS tables both contain detail relating to procedures and could be combined—but our approach is to keep the tables independent for clarity, since the data sources are significantly different. Rather than combining the tables within MIMIC data model, we suggest researchers develop database views and transforms as appropriate.

## Usage Notes

MIMIC-III is provided as a collection of comma separated value (CSV) files, along with scripts to help with importing the data into database systems including PostreSQL, MySQL, and MonetDB. As the database contains detailed information regarding the clinical care of patients, it must be treated with appropriate care and respect. Researchers are required to formally request access via a process documented on the MIMIC website. There are two key steps that must be completed before access is granted:

- the researcher must complete a recognized course in protecting human research participants that includes Health Insurance Portability and Accountability Act (HIPAA) requirements.
- the researcher must sign a data use agreement, which outlines appropriate data usage and security standards, and forbids efforts to identify individual patients.

Approval requires at least a week. Once an application has been approved the researcher will receive emails containing instructions for downloading the database from PhysioNetWorks, a restricted access component of PhysioNet.

In [23]:
%pip install duckdb

Note: you may need to restart the kernel to use updated packages.


In [24]:
import duckdb

if True:
    conn = duckdb.connect('mimiciii.db', read_only=False)
    try:
        df = conn.execute("""
            CREATE TABLE IF NOT EXISTS PATIENTS AS
                FROM read_csv_auto(
                            'datasets/physionet.org/files/mimiciii/1.4/PATIENTS.csv.gz',
                            header=True,
                            union_by_name=true,
                            files_to_sniff=-1,
                            filename=true
                            );

            CREATE TABLE IF NOT EXISTS CHARTEVENTS AS
                FROM read_csv_auto(
                            'datasets/physionet.org/files/mimiciii/1.4/CHARTEVENTS.csv.gz',
                            header=True,
                            union_by_name=true,
                            files_to_sniff=-1,
                            filename=true,
                            sample_size=-1
                            );

            -- ICD9_CODE
            CREATE TABLE IF NOT EXISTS DIAGNOSES_ICD AS
                FROM read_csv_auto(
                            'datasets/physionet.org/files/mimiciii/1.4/DIAGNOSES_ICD.csv.gz',
                            header=True,
                            union_by_name=true,
                            files_to_sniff=-1,
                            filename=true,
                            sample_size=-1
                            );

            -- ICD9_CODE
            CREATE TABLE IF NOT EXISTS D_ICD_DIAGNOSES AS
                FROM read_csv_auto(
                            'datasets/physionet.org/files/mimiciii/1.4/D_ICD_DIAGNOSES.csv.gz',
                            header=True,
                            union_by_name=true,
                            files_to_sniff=-1,
                            filename=true,
                            sample_size=-1
                            );
        """).df()
        print(df)

        # conn.execute("COPY mytable TO 'output.csv' (HEADER, DELIMITER ',')")

    except Exception as ex:
        print(ex)
    finally:
        conn.close()


   Count
0  14567


In [None]:
"""
select
    p.subject_id,
    dd.short_title
from patients p
join diagnoses_icd d on p.subject_id = d.subject_id
join d_icd_diagnoses dd on d.icd9_code = dd.icd9_code
where short_title like 'Depressive%'
or short_title like 'Anx%'
;
"""
# Depressive ~ 3433
# Anxiety ~ 1583