In [1]:
import numpy as np
import pandas as pd
import datetime
import copy
import time
import os
import re
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import operator

from tqdm.auto import tqdm, trange
from tqdm.notebook import tqdm
from datetime import timedelta

tqdm.pandas()

In [2]:
# Edit to point to your MIMIC directory.
dataDirStr = '/Users/gmessier/data/mimic-1.4/'

In [3]:
inputevents_mv_df = pd.read_csv(dataDirStr + "INPUTEVENTS_MV.csv")
inputevents_mv_df.columns = inputevents_mv_df.columns.str.lower()
inputevents_mv_df

Unnamed: 0,row_id,subject_id,hadm_id,icustay_id,starttime,endtime,itemid,amount,amountuom,rate,...,totalamountuom,isopenbag,continueinnextdept,cancelreason,statusdescription,comments_editedby,comments_canceledby,comments_date,originalamount,originalrate
0,241,27063,139787,223259.0,2133-02-05 06:29:00,2133-02-05 08:45:00,225166,6.774532,mEq,,...,ml,0,0,1,Rewritten,,RN,2133-02-05 12:52:00,10.000000,0.050000
1,242,27063,139787,223259.0,2133-02-05 05:34:00,2133-02-05 06:30:00,225944,28.132997,ml,30.142497,...,ml,0,0,0,FinishedRunning,,,,28.132998,30.255817
2,243,27063,139787,223259.0,2133-02-05 05:34:00,2133-02-05 06:30:00,225166,2.813300,mEq,,...,ml,0,0,0,FinishedRunning,,,,2.813300,0.050426
3,244,27063,139787,223259.0,2133-02-03 12:00:00,2133-02-03 12:01:00,225893,1.000000,dose,,...,ml,0,0,2,Rewritten,RN,,2133-02-03 17:06:00,1.000000,1.000000
4,245,27063,139787,223259.0,2133-02-03 12:00:00,2133-02-03 12:01:00,220949,100.000000,ml,,...,ml,0,0,2,Rewritten,RN,,2133-02-03 17:06:00,100.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3618986,3568968,90959,136680,240741.0,2147-08-28 12:00:00,2147-08-28 12:01:00,221744,99.999997,mcg,,...,,0,0,0,FinishedRunning,,,,100.000000,100.000000
3618987,3568969,90959,136680,240741.0,2147-08-29 12:16:00,2147-08-29 15:04:00,225942,0.842267,mg,300.809532,...,ml,0,0,0,Paused,,,,2.500000,300.000000
3618988,3568970,90959,136680,240741.0,2147-08-29 12:16:00,2147-08-29 15:04:00,225943,16.845331,ml,6.016190,...,ml,0,0,0,Paused,,,,50.000000,6.000000
3618989,3568971,90959,136680,240741.0,2147-08-29 02:30:00,2147-08-29 02:31:00,221744,99.999997,mcg,,...,,0,0,0,FinishedRunning,,,,100.000000,100.000000


`INPUTEVENTS_MV` contains input events from the Metavision ICU database. Inputs are any fluids which have been administered to the patient: such as oral or tube feedings or intravenous solutions containing medications. 

In [4]:
print(f"There are {inputevents_mv_df.subject_id.nunique()} patients in the Metavision ICU database with input events")

There are 17680 patients in the Metavision ICU database with input events


`starttime` and `endtime` record the start and end time of an input/output event.

`itemid` is the identifier for a single measurement type in the database. The items are defined in `d_items.parquet`.

In [5]:
c = inputevents_mv_df.itemid.value_counts()[:5]
p = inputevents_mv_df.itemid.value_counts(normalize=True).mul(100).round(2)[:5]
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
225158,527855,14.59
220949,406345,11.23
225943,246312,6.81
222168,178819,4.94
226452,135438,3.74


`amount` and `amountuom` list the amount of a drug or substance administered to the patient either between the `starttime` and `endtime`.

`rate` and `rateuom` list the rate at which the drug or substance was administered to the patient either between the `starttime` and `endtime`.

`storetime` records the time at which an observation was manually input or manually validated by a member of the clinical staff.

`cgid` is the identifier for the caregiver who validated the given measurement.

In [6]:
c = inputevents_mv_df.cgid.value_counts()[:5]
p = inputevents_mv_df.cgid.value_counts(normalize=True).mul(100).round(2)[:5]
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
14891,30043,0.83
16915,28205,0.78
15659,25297,0.7
17600,24854,0.69
14435,23870,0.66


`orderid` links multiple `itemid`s to the same solution together. For example, when a solution of two different medicines are made and administered at the same time.

`linkorderid` links the same order across multiple instantiations: for example, if the rate of delivery for the solution with noradrenaline and normal saline is changed, two new rows which share the same new ORDERID will be generated, but the LINKORDERID will be the same.

`ordercategoryname`, `secondaryordercategoryname`, `ordercomponenttypedescription`, `ordercategorydescription` provide higher level information about the order the medication/solution is a part of. Categories represent the type of administration, while the `ordercomponenttypedescription` describes the role of the substance in the solution (i.e. main order parameter, additive, or mixed solution)

`patientweight` is the patient weight in kilograms.

In [7]:
inputevents_mv_df.patientweight.describe().apply(lambda x: format(x, 'f'))

count    3618991.000000
mean          85.558807
std           31.328380
min            1.000000
25%           68.400000
50%           81.400000
75%           98.000000
max         8106.000000
Name: patientweight, dtype: object

`totalamount` and `totalamountuom` list the total amount of the fluid in the bag containing the solution given by `itemid`. Intravenous administrations are usually given by hanging a bag of fluid at the bedside for continuous infusion over a certain period of time. 

`statusdescription` states the ultimate status of the item, or more specifically, row. It is used to indicate why the delivery of the compound has ended. There are only six possible statuses:

`Changed` - The current delivery has ended as some aspect of it has changed (most frequently, the rate has been changed)
`Paused` - The current delivery has been paused
`FinishedRunning` - The delivery of the item has finished
`Stopped` - The delivery of the item been terminated by the caregiver
`Rewritten` - Incorrect information was input, and so the information in this row was rewritten (these rows are primarily useful for auditing purposes - the rates/amounts described were not delivered and so should not be used if determining what compounds a patient has received)
`Flushed` - A line was flushed.

In [8]:
c = inputevents_mv_df.statusdescription.value_counts()
p = inputevents_mv_df.statusdescription.value_counts(normalize=True).mul(100).round(2)
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
FinishedRunning,2002471,55.33
Rewritten,695521,19.22
Changed,667547,18.45
Stopped,157735,4.36
Paused,94820,2.62
Flushed,897,0.02


`continueinnextdept` is a binary value if the order ended on a patient transfer, this field indicates if it continued into the next department.

In [9]:
c = inputevents_mv_df.continueinnextdept.value_counts()
p = inputevents_mv_df.continueinnextdept.value_counts(normalize=True).mul(100).round(2)
pd.concat([c,p], axis=1, keys=['counts', '%'])

Unnamed: 0,counts,%
0,3618906,100.0
1,85,0.0
