# Exploring version 2 of MIMIC-IV

v2 released June 12, 2022 - https://physionet.org/content/mimiciv/2.0/

Main changes from v1: 
- *core* module removed to simplify the schema.
    - *admissions*, *patients*, and *transfers* table now in the *hosp* module
- Neonates have been removed from the dataset. That data to be released in a separate project with NICU data.
- out-of-hospital mortality is now added back in, so dod now populated with additional records from state records (an additional 15,621 patients
- a change in the mechanism for determining patients included in MIMIC, approx 1% of stays reflected.
- new table *omr* added (Online Medical Record)
- other changes explained in the source linked above

Goal of this notebook is to do a cursory comparison of this new version with version 1 which I explored here:  https://github.com/marymlucas/mimic_explore/blob/main/mimic_explore_1.ipynb 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Patients Table

In [2]:
df_patients = pd.read_csv('D:/data/physionet/mimic-iv-2.0/hosp/patients.csv.gz')

In [3]:
print(df_patients.shape)
df_patients['subject_id'].nunique()

(315460, 6)


315460

This is in contrast to v1 which had 382,278 distinct patients.

In [4]:
382278 - 315460

66818

In [5]:
df_patients.head()

Unnamed: 0,subject_id,gender,anchor_age,anchor_year,anchor_year_group,dod
0,10000032,F,52,2180,2014 - 2016,2180-09-09
1,10000048,F,23,2126,2008 - 2010,
2,10000068,F,19,2160,2008 - 2010,
3,10000084,M,72,2160,2017 - 2019,2161-02-13
4,10000102,F,27,2136,2008 - 2010,


In [6]:
df_patients['anchor_age'].describe()

count    315460.000000
mean         48.526035
std          20.887027
min          18.000000
25%          29.000000
50%          48.000000
75%          65.000000
max          91.000000
Name: anchor_age, dtype: float64

The minimum age is now 18, reflecting the removal of neonates. From the v1 analysis there were 60872 patients under age 18.

In [7]:
df_patients.groupby('gender').size()

gender
F    166899
M    148561
dtype: int64

## Admissions Table

In [8]:
df_admissions = pd.read_csv('D:/data/physionet/mimic-iv-2.0/hosp/admissions.csv.gz')

In [9]:
print(df_admissions.shape)
df_admissions['subject_id'].nunique()

(454324, 15)


190279

v2 has 454,324 stays for 190,279 distinct patients in contrast to version 1 which has 523,740 stays for 256,878 distinct patients.

In [10]:
df_admissions.columns

Index(['subject_id', 'hadm_id', 'admittime', 'dischtime', 'deathtime',
       'admission_type', 'admission_location', 'discharge_location',
       'insurance', 'language', 'marital_status', 'race', 'edregtime',
       'edouttime', 'hospital_expire_flag'],
      dtype='object')

In [11]:
df_admissions.head()

Unnamed: 0,subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admission_location,discharge_location,insurance,language,marital_status,race,edregtime,edouttime,hospital_expire_flag
0,10000032,22595853,2180-05-06 22:23:00,2180-05-07 17:15:00,,URGENT,TRANSFER FROM HOSPITAL,HOME,Other,ENGLISH,WIDOWED,WHITE,2180-05-06 19:17:00,2180-05-06 23:30:00,0
1,10000032,22841357,2180-06-26 18:27:00,2180-06-27 18:49:00,,EW EMER.,EMERGENCY ROOM,HOME,Medicaid,ENGLISH,WIDOWED,WHITE,2180-06-26 15:54:00,2180-06-26 21:31:00,0
2,10000032,25742920,2180-08-05 23:44:00,2180-08-07 17:50:00,,EW EMER.,EMERGENCY ROOM,HOSPICE,Medicaid,ENGLISH,WIDOWED,WHITE,2180-08-05 20:58:00,2180-08-06 01:44:00,0
3,10000032,29079034,2180-07-23 12:35:00,2180-07-25 17:55:00,,EW EMER.,EMERGENCY ROOM,HOME,Medicaid,ENGLISH,WIDOWED,WHITE,2180-07-23 05:54:00,2180-07-23 14:00:00,0
4,10000068,25022803,2160-03-03 23:16:00,2160-03-04 06:26:00,,EU OBSERVATION,EMERGENCY ROOM,,Other,ENGLISH,SINGLE,WHITE,2160-03-03 21:55:00,2160-03-04 06:26:00,0


## Misc. Tables

In [12]:
df_omr = pd.read_csv('D:/data/physionet/mimic-iv-2.0/hosp/omr.csv.gz')

In [13]:
print(df_omr.shape)
df_omr.head()

(6770301, 5)


Unnamed: 0,subject_id,chartdate,seq_num,result_name,result_value
0,10000032,2180-04-27,1,Blood Pressure,110/65
1,10000032,2180-04-27,1,Weight (Lbs),94
2,10000032,2180-05-07,1,BMI (kg/m2),18.0
3,10000032,2180-05-07,1,Height (Inches),60
4,10000032,2180-05-07,1,Weight (Lbs),92.15


In [14]:
df_omr.groupby('result_name').size()

result_name
BMI                                     583
BMI (kg/m2)                         1747382
Blood Pressure                      2281254
Blood Pressure Lying                   2924
Blood Pressure Sitting                 3648
Blood Pressure Standing                 552
Blood Pressure Standing (1 min)        2722
Blood Pressure Standing (3 mins)        657
Height                                   39
Height (Inches)                      743399
Weight                                  368
Weight (Lbs)                        1986519
eGFR                                    254
dtype: int64