In [1]:
import numpy as np
import pandas as pd
import datetime
import copy
import time
import os
import re
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import operator

from tqdm.auto import tqdm, trange
from tqdm.notebook import tqdm
from datetime import timedelta

tqdm.pandas()

In [2]:
# Edit to point to your MIMIC directory.
dataDirStr = '/Users/gmessier/data/mimic-1.4/'

In [3]:
icustays_df = pd.read_csv(dataDirStr + "ICUSTAYS.csv")
icustays_df.columns = icustays_df.columns.str.lower()
icustays_df

Unnamed: 0,row_id,subject_id,hadm_id,icustay_id,dbsource,first_careunit,last_careunit,first_wardid,last_wardid,intime,outtime,los
0,365,268,110404,280836,carevue,MICU,MICU,52,52,2198-02-14 23:27:38,2198-02-18 05:26:11,3.2490
1,366,269,106296,206613,carevue,MICU,MICU,52,52,2170-11-05 11:05:29,2170-11-08 17:46:57,3.2788
2,367,270,188028,220345,carevue,CCU,CCU,57,57,2128-06-24 15:05:20,2128-06-27 12:32:29,2.8939
3,368,271,173727,249196,carevue,MICU,SICU,52,23,2120-08-07 23:12:42,2120-08-10 00:39:04,2.0600
4,369,272,164716,210407,carevue,CCU,CCU,57,57,2186-12-25 21:08:04,2186-12-27 12:01:13,1.6202
...,...,...,...,...,...,...,...,...,...,...,...,...
61527,59806,94944,143774,201233,metavision,CSRU,CSRU,15,15,2104-04-15 10:18:16,2104-04-17 14:51:00,2.1894
61528,59807,94950,123750,283653,metavision,CCU,CCU,7,7,2155-12-08 05:33:16,2155-12-10 17:24:58,2.4942
61529,59808,94953,196881,241585,metavision,SICU,SICU,57,57,2160-03-03 16:09:11,2160-03-04 14:22:33,0.9259
61530,59809,94954,118475,202802,metavision,CSRU,CSRU,15,15,2183-03-25 09:53:10,2183-03-27 17:55:03,2.3346


ICU stays describes in/out time of each patient. LOS values are normalized, where 1.0 = 24 hrs.

We now know that each patient is unique with its own `subject_id`. Each hospital admission of a patient is unique with `hadm_id`. Each ICU stay of a patient is unique with `icustay_id`.

This means that one `subject_id` can be accociated with multiple `hadm_ids` when a patient has had multiple admissions. One `hadm_id` can be linked to multiple `icustay_id` when a patient had multiple ICU stays during an admission(such as transferring between ICUs)

In [4]:
print(f"Number of distinct ICU stays: {icustays_df.icustay_id.nunique()}")

Number of distinct ICU stays: 61532


`first_careunit` and `last_careunit` contain, respectively, the first and last ICU type in which the patient was cared for. As an `icustay_id` groups all ICU admissions within 24 hours of each other, it is possible for a patient to be transferred from one type of ICU to another and have the same `icustay_id`.

Care units are derived from the `transfers` table.

In [5]:
c = icustays_df.first_careunit.value_counts()[:5]
p = icustays_df.first_careunit.value_counts(normalize=True).mul(100).round(2)[:5]
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
MICU,21088,34.27
CSRU,9312,15.13
SICU,8891,14.45
NICU,8100,13.16
CCU,7726,12.56


In [6]:
c = icustays_df.first_careunit.value_counts()[:5]
p = icustays_df.first_careunit.value_counts(normalize=True).mul(100).round(2)[:5]
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
MICU,21088,34.27
CSRU,9312,15.13
SICU,8891,14.45
NICU,8100,13.16
CCU,7726,12.56


`first_wardid` and `last_wardid` contain the first and last ICU unit in which the patient stayed. Note the grouping of physical locations in the hospital database is referred to as ward. Though in practice ICUs are not referred to as wards, the hospital database technically tracks ICUs as “wards with an ICU cost center”. As a result, each ICU is associated with a WARDID.


In [7]:
c = icustays_df.first_wardid.value_counts()[:5]
p = icustays_df.first_wardid.value_counts(normalize=True).mul(100).round(2)[:5]
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
52,8482,13.78
56,7611,12.37
14,7444,12.1
23,6302,10.24
57,5736,9.32


In [8]:
c = icustays_df.last_wardid.value_counts()[:5]
p = icustays_df.last_wardid.value_counts(normalize=True).mul(100).round(2)[:5]
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
52,8077,13.13
56,7641,12.42
14,7345,11.94
23,6635,10.78
15,5820,9.46


`intime` provides the date and time the patient was transferred into the ICU. OUTTIME provides the date and time the patient was transferred out of the ICU.

`los` is the length of stay for the patient for the given ICU stay, which may include one or more ICU units. The length of stay is measured in fractional days.

In [9]:
c = icustays_df.los.value_counts().nlargest(5)
p = icustays_df.los.value_counts(normalize=True).mul(100).round(2).nlargest(5)
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
1.009,10,0.02
0.1111,10,0.02
0.1064,10,0.02
1.0917,10,0.02
0.8438,9,0.01


In [10]:
print(f"Average length of stay: {icustays_df.los.mean()} days")

Average length of stay: 4.91797158089789 days
