In [1]:
import os
import subprocess
from pathlib import Path
from contextlib import contextmanager

from pyicu.configs.load import load_src_cfg
from pyicu.data import MIMIC

## Download MIMIC III example data

A small subset of MIMIC III is openly available without the need for data governance training and application for data access. We use this data here and in the other examples to show how ICU data can be imported into and then processed with pyICU.

In [2]:
@contextmanager
def directory(path):
    # Code copied from https://stackoverflow.com/questions/299446/how-do-i-change-directory-back-to-my-original-working-directory-with-python
    oldpwd = os.getcwd()
    os.chdir(path)
    try:
        yield
    finally:
        os.chdir(oldpwd)

In [3]:
url = "https://physionet.org/files/mimiciii-demo/1.4/"

download_dir = Path("examples/data")
download_dir.mkdir(parents=True, exist_ok=True)

with directory(download_dir):
    subprocess.run(["wget", "-r", "-N", "-c", "-np", url])

--2022-11-13 22:35:12--  https://physionet.org/files/mimiciii-demo/1.4/
Resolving physionet.org (physionet.org)... 18.18.42.54
Connecting to physionet.org (physionet.org)|18.18.42.54|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'physionet.org/files/mimiciii-demo/1.4/index.html'

     0K ...                                                     427M=0s

Last-modified header missing -- time-stamps turned off.
2022-11-13 22:35:12 (427 MB/s) - 'physionet.org/files/mimiciii-demo/1.4/index.html' saved [3582]

Loading robots.txt; please ignore errors.
--2022-11-13 22:35:12--  https://physionet.org/robots.txt
Reusing existing connection to physionet.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 22 [text/plain]
Saving to: 'physionet.org/robots.txt'

     0K                                                       100%  440 =0s

2022-11-13 22:35:13 (440 B/s) - 'physionet.org/robots.txt' saved [22/22]

--2022-11-13 22:35

## Import data into pyICU
In order to use the downloaded data with pyICU, we must import the raw CSV files into pyICU. During import, the CSV files will be converted to Apache Parquet files, which are columnar and allow for fast access. Importing data for one of the data sources supported by pyICU is easy and can be done in two lines of code. We only need to point it towards the appropriate source configuration and then use this config to import the downloaded data.

In [4]:
data_dir = download_dir/"physionet.org/files/mimiciii-demo/1.4"
mimic_cfg = load_src_cfg("mimic_demo")
mimic_cfg.do_import(data_dir)

Successfully imported 25 tables.


Note that if no `out_dir` is specified, the Parquet files will be generated in the same directory as the raw data.

## Access the raw data through pyICU
Now that we have imported the data into pyICU, we can create a `MIMIC` object and acess the tables as attributes. 

In [5]:
mimic = MIMIC(mimic_cfg, data_dir)

For example, the admissions table can be accessed like this:

In [6]:
mimic.admissions

# <SrcTbl>:  [129 x 19]
# Defaults:  `admission_type` (val)
# Time vars: `admittime`, `dischtime`, `deathtime`, `edregtime`, `edouttime`
   row_id  subject_id  hadm_id           admittime           dischtime  \
0   12258       10006   142345 2164-10-23 21:09:00 2164-11-01 17:15:00   
1   12263       10011   105331 2126-08-14 22:32:00 2126-08-28 18:59:00   
2   12265       10013   165520 2125-10-04 23:36:00 2125-10-07 15:13:00   
3   12269       10017   199207 2149-05-26 17:19:00 2149-06-03 18:42:00   
4   12270       10019   177759 2163-05-14 20:43:00 2163-05-15 12:00:00   

            deathtime admission_type         admission_location  \
0                 NaT      EMERGENCY       EMERGENCY ROOM ADMIT   
1 2126-08-28 18:59:00      EMERGENCY  TRANSFER FROM HOSP/EXTRAM   
2 2125-10-07 15:13:00      EMERGENCY  TRANSFER FROM HOSP/EXTRAM   
3                 NaT      EMERGENCY       EMERGENCY ROOM ADMIT   
4 2163-05-15 12:00:00      EMERGENCY  TRANSFER FROM HOSP/EXTRAM   

  discharge_loc

Besides the data, some additional metadata like the default value variable for this table and any time variables is displayed in the header. The table is a pyarrow dataset and is not actually loaded into memory yet. If we want to simply load the entire table into memory, we can call `.to_pandas()`.

In [7]:
mimic.admissions.to_pandas()

Unnamed: 0,row_id,subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admission_location,discharge_location,insurance,language,religion,marital_status,ethnicity,edregtime,edouttime,diagnosis,hospital_expire_flag,has_chartevents_data
0,12258,10006,142345,2164-10-23 21:09:00,2164-11-01 17:15:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,HOME HEALTH CARE,Medicare,,CATHOLIC,SEPARATED,BLACK/AFRICAN AMERICAN,2164-10-23 16:43:00,2164-10-23 23:00:00,SEPSIS,0,1
1,12263,10011,105331,2126-08-14 22:32:00,2126-08-28 18:59:00,2126-08-28 18:59:00,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,DEAD/EXPIRED,Private,,CATHOLIC,SINGLE,UNKNOWN/NOT SPECIFIED,NaT,NaT,HEPATITIS B,1,1
2,12265,10013,165520,2125-10-04 23:36:00,2125-10-07 15:13:00,2125-10-07 15:13:00,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,DEAD/EXPIRED,Medicare,,CATHOLIC,,UNKNOWN/NOT SPECIFIED,NaT,NaT,SEPSIS,1,1
3,12269,10017,199207,2149-05-26 17:19:00,2149-06-03 18:42:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,SNF,Medicare,,CATHOLIC,DIVORCED,WHITE,2149-05-26 12:08:00,2149-05-26 19:45:00,HUMERAL FRACTURE,0,1
4,12270,10019,177759,2163-05-14 20:43:00,2163-05-15 12:00:00,2163-05-15 12:00:00,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,DEAD/EXPIRED,Medicare,,CATHOLIC,DIVORCED,WHITE,NaT,NaT,ALCOHOLIC HEPATITIS,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124,41055,44083,198330,2112-05-28 15:45:00,2112-06-07 16:50:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,HOME,Private,ENGL,CATHOLIC,SINGLE,WHITE,2112-05-28 13:16:00,2112-05-28 17:30:00,PERICARDIAL EFFUSION,0,1
125,41070,44154,174245,2178-05-14 20:29:00,2178-05-15 09:45:00,2178-05-15 09:45:00,EMERGENCY,EMERGENCY ROOM ADMIT,DEAD/EXPIRED,Medicare,ENGL,PROTESTANT QUAKER,MARRIED,WHITE,2178-05-14 17:37:00,2178-05-14 22:08:00,ALTERED MENTAL STATUS,1,1
126,41087,44212,163189,2123-11-24 14:14:00,2123-12-30 14:31:00,NaT,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,REHAB/DISTINCT PART HOSP,Medicare,ENGL,UNOBTAINABLE,SINGLE,BLACK/AFRICAN AMERICAN,NaT,NaT,ACUTE RESPIRATORY DISTRESS SYNDROME;ACUTE RENA...,0,1
127,41090,44222,192189,2180-07-19 06:55:00,2180-07-20 13:00:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,HOME,Medicare,ENGL,CATHOLIC,SINGLE,WHITE,2180-07-19 04:50:00,2180-07-19 08:23:00,BRADYCARDIA,0,1


This works for admissions, which is a relatively small table. However, it would take much more memory to load in the chartevents table. We can still view it as a pyarrow dataset, though, at near instant speed -- because it again only loads the first couple of rows.

In [8]:
mimic.chartevents

# <SrcTbl>:  [758355 x 15]
# Defaults:  `charttime` (index), `valuenum` (val), `valueuom` (unit)
# Time vars: `charttime`, `storetime`
     row_id  subject_id  hadm_id  icustay_id  itemid           charttime  \
0  85780442       10006   142345      206504      69 2164-10-23 21:10:00   
1  85780443       10006   142345      206504     762 2164-10-23 21:10:00   
2  85780444       10006   142345      206504     916 2164-10-23 21:10:00   
3  85780445       10006   142345      206504     917 2164-10-23 21:10:00   
4  85780446       10006   142345      206504     919 2164-10-23 21:10:00   

0 2164-10-23 21:10:00  14077                  NaN               NaN    NaN   
1 2164-10-23 21:10:00  14077                  NaN               NaN    NaN   
2 2164-10-23 21:10:00  14077  Lidocaine       NaN               NaN    NaN   
3 2164-10-23 21:10:00  14077      FEVER       NaN               NaN    NaN   
4 2164-10-23 21:10:00  14077        MED       NaN               NaN    NaN   

  resultstatus st