# MIMIC-IV Dataset Exploratory Analysis

This notebook inspects the pre-computed MIMIC-IV ARDS datasets contained in the `static_analysis_table_mimic(1).parquet` and `time_series_analysis_table_mimic(1).parquet` files.


In [1]:
import pandas as pd
import pyarrow.parquet as pq
import pathlib

In [2]:
static_path = pathlib.Path('../../data/raw/static_analysis_table_mimic(1).parquet')
static = pq.read_table(static_path).to_pandas()
static.shape

(90481, 14)

In [3]:
static.head()

Unnamed: 0,hospital_id,patient_id,hospitalization_id,admission_datetime,discharge_datetime,sex,age_at_admission,disposition_category,hospital_admit_source,mortality,icu_los_days,hospital_los_days,ventilator_free_days_28,race
0,BIDMC,10001884,26184834,2131-01-07 20:39:00,2131-01-20 05:15:00,F,77,Expired,EMERGENCY ROOM,1,9.171817,12.358333,0.0,Black
1,BIDMC,10001884,26184834,2131-01-07 20:39:00,2131-01-20 05:15:00,F,77,Expired,EMERGENCY ROOM,1,9.171817,12.358333,0.0,Black
2,BIDMC,10001884,26184834,2131-01-07 20:39:00,2131-01-20 05:15:00,F,77,Expired,EMERGENCY ROOM,1,9.171817,12.358333,0.0,Black
3,BIDMC,10001884,26184834,2131-01-07 20:39:00,2131-01-20 05:15:00,F,77,Expired,EMERGENCY ROOM,1,9.171817,12.358333,0.0,Black
4,BIDMC,10001884,26184834,2131-01-07 20:39:00,2131-01-20 05:15:00,F,77,Expired,EMERGENCY ROOM,1,9.171817,12.358333,0.0,Black


## Static table completeness

In [4]:
static.isnull().mean().sort_values(ascending=False).head(20)

hospital_id                0.0
patient_id                 0.0
hospitalization_id         0.0
admission_datetime         0.0
discharge_datetime         0.0
sex                        0.0
age_at_admission           0.0
disposition_category       0.0
hospital_admit_source      0.0
mortality                  0.0
icu_los_days               0.0
hospital_los_days          0.0
ventilator_free_days_28    0.0
race                       0.0
dtype: float64

## Time-series data

In [5]:
time_path = pathlib.Path('../../data/raw/time_series_analysis_table_mimic(1).parquet')
time = pq.read_table(time_path).to_pandas()
time.shape

(1007358, 25)

In [6]:
time.head()

Unnamed: 0,hospital_id,patient_id,hospitalization_id,recorded_dttm,icu_in_time,icu_type,ARDS_onset_dttm,time_from_ARDS_onset,respiratory_device,ecmo_flag,...,height_cm,weight_kg,cisatracurium_dose,vecuronium_dose,rocuronium_dose,atracurium_dose,pancuronium_dose,position,new_tracheostomy,prone_flag
0,BIDMC,10004720,22081550,2186-11-12 18:02:00,2186-11-12 19:55:00,MICU,2186-11-12 20:00:00,-1.966667,,,...,183.0,70.0,,,,0.0,0.0,,,0
1,BIDMC,10004720,22081550,2186-11-12 20:00:00,2186-11-12 19:55:00,MICU,2186-11-12 20:00:00,0.0,Endotracheal tube,,...,,70.0,,,,0.0,0.0,Left Side,,0
2,BIDMC,10004720,22081550,2186-11-12 20:06:00,2186-11-12 19:55:00,MICU,2186-11-12 20:00:00,0.1,,,...,,,,,,0.0,0.0,,,0
3,BIDMC,10004720,22081550,2186-11-12 21:00:00,2186-11-12 19:55:00,MICU,2186-11-12 20:00:00,1.0,,,...,,,,,,0.0,0.0,,,0
4,BIDMC,10004720,22081550,2186-11-12 22:00:00,2186-11-12 19:55:00,MICU,2186-11-12 20:00:00,2.0,,,...,,,,,,0.0,0.0,Right Side,,0


## Proning prevalence

In [7]:
proning = time.groupby('hospitalization_id')['prone_flag'].max().value_counts()
proning

prone_flag
0    4191
1     211
Name: count, dtype: int64

## Mortality

In [8]:
mortality = static['mortality'].value_counts()
mortality

mortality
0    80408
1    10073
Name: count, dtype: int64