# Triage MIMIC - Emergency Department

This analysis relies on the emergency data from the MIMIC IV dataset (Refer to https://physionet.org/content/mimic-iv-ed/1.0/ for the original dataset.) 

First, you need to download the data from Physionet website, following the instructions on the website.

```
wget -r -N -c -np --user USERNAME --ask-password https://physionet.org/files/mimic-iv-ed/1.0/  
wget -r -N -c -np --user USERNAME --ask-password https://physionet.org/files/mimiciv/1.0/core/
```

This will result in a `physionet.org` folder in which the `ed` directory will contains all relevant data.

In [1]:
path = 'physionet.org/files/'

##### Extract data of interest

In [2]:
from sklearn.preprocessing import StandardScaler
import pandas as pd
import os

In [3]:
# Open data
demo = pd.read_csv(os.path.join(path, 'mimiciv/1.0/core/patients.csv.gz'), index_col = 0)
triage = pd.read_csv(os.path.join(path, 'mimic-iv-ed/1.0/ed/triage.csv.gz'), index_col = [0, 1])
ed = pd.read_csv(os.path.join(path, 'mimic-iv-ed/1.0/ed/edstays.csv.gz'), index_col = [0, 2], parse_dates = ['intime', 'outtime'])

In [4]:
# Remove unnecessary columns and datapoints with any missing data
triage = triage.drop(columns = 'chiefcomplaint')
triage = triage.dropna(0, 'any')
triage

  triage = triage.dropna(0, 'any')


Unnamed: 0_level_0,Unnamed: 1_level_0,temperature,heartrate,resprate,o2sat,sbp,dbp,pain,acuity
subject_id,stay_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
15585360,37573921,97.0,87.0,18.0,100.0,150.0,71.0,10.0,3.0
15248757,32172727,97.1,112.0,20.0,100.0,147.0,97.0,8.0,4.0
16648037,38946064,98.5,59.0,18.0,99.0,160.0,86.0,2.0,2.0
13492931,39828574,100.6,90.0,16.0,96.0,107.0,55.0,0.0,3.0
11475777,38193311,97.1,85.0,16.0,100.0,138.0,86.0,7.0,3.0
...,...,...,...,...,...,...,...,...,...
15913671,35574167,98.0,82.0,15.0,98.0,127.0,86.0,8.0,3.0
14913519,33280070,97.1,104.0,18.0,97.0,90.0,57.0,0.0,2.0
13537748,39146222,97.1,56.0,20.0,100.0,177.0,92.0,6.0,2.0
15608541,39109339,97.6,92.0,18.0,98.0,197.0,73.0,0.0,4.0


In [5]:
# Nurse assignment
# Expertise and tiredness might play a role here and we assign the day of admission as proxies of these dimensions
triage['nurse'] = ed.intime.dt.day_of_week[triage.index]

In [6]:
# Acuity binarization - D
# Human decision
triage['D'] = triage['acuity'] <= 2

In [7]:
# Outcome - Y1
# Defined as admission to the hospital
triage['Y1'] = ed.hadm_id.isna()[triage.index]

In [8]:
# Outcome - Y2
# Defined as abnormal vital signs using Emergency Severity Index
triage['Y2'] = (triage.o2sat < 92) | (triage.resprate > 20) | (triage.heartrate > 100) | (triage.join(demo).anchor_age > 75)

In [9]:
# Concept - Yc
# Yc is definied as the union of Y1 and Y2
triage['YC'] = triage['Y1'] | triage['Y2']

In [10]:
# Normalize data
triage.iloc[:, :-5] = StandardScaler().fit_transform(triage.iloc[:, :-5])

In [11]:
#triage.to_csv('triage_clean.csv')

### Verification

We study what proportion of the population have these characteristics.

In [12]:
# Nurse assignment
triage['nurse'].value_counts().sort_index() / len(triage)

0    0.143822
1    0.142558
2    0.142089
3    0.143345
4    0.142443
5    0.142326
6    0.143418
Name: nurse, dtype: float64

In [13]:
# Human decision D - Acuity
triage['D'].mean()

0.36397630728730407

In [14]:
# Outcome - Y1
triage['Y1'].mean()

0.5445559610705596

In [15]:
# Outcome - Y2
triage['Y2'].mean()

0.3019748913086833

In [16]:
# Concept - Yc
triage['YC'].mean()

0.7284187906345977

In [17]:
# Intersection Y1 and Y2
(triage['Y1'] & triage['Y2']).sum() / min(triage['Y1'].sum(), triage['Y2'].sum())

0.3911320614531135

In [18]:
# Intersection Y1 concept
(triage['Y1'] & triage['YC']).sum() / triage['YC'].sum()

0.7475863721145126

In [19]:
# Intersection Y2 concept
(triage['Y2'] & triage['YC']).sum() / triage['YC'].sum()

0.4145621930560071

In [20]:
# Intersection D concept
(triage['D'] & triage['YC']).sum() / triage['YC'].sum()

0.31115864405619537

In [21]:
# Intersection D concept
(triage['D'] & triage['Y1']).sum() / min(triage['Y1'].sum(), triage['D'].sum())

0.31060107942248155

In [22]:
# Intersection D concept
(triage['D'] & triage['Y2']).sum() / min(triage['Y2'].sum(), triage['D'].sum())

0.48884284216522333

----------

# Semi - synthetic labels for scenarios

We create semi synthetic labels using tree-based models to allow more control on the consistency scenarios

In [23]:
from sklearn.metrics import roc_auc_score, precision_score
from sklearn.tree import DecisionTreeClassifier
import numpy as np

In [24]:
# To exhibit the desired properties in the synthetic dataset 
## Have more (but random) nurses
triage['nurse'] = np.random.choice(np.arange(20), size = len(triage))
triage = triage.join((demo.anchor_age > 70).astype(float))

In [25]:
covariates = triage.drop(columns = ['acuity', 'nurse', 'D', 'Y1', 'Y2', 'YC'])

**Scenario 1**: One model for all experts and randomness in high consistency for Y1 (experts agree on Y2 and might therefore benefit YC modelling)

In [26]:
# Model for Y1
model_y1 = DecisionTreeClassifier(max_depth = 9, random_state = 42)
model_y1.fit(covariates, triage['Y1'])
synth_y1 = model_y1.predict_proba(covariates)[:, 1]
roc_auc_score(triage['Y1'], synth_y1)

0.7035090998058837

In [27]:
# Model for Y2
model_y2 = DecisionTreeClassifier(max_depth = 2, random_state = 42)
model_y2.fit(covariates, triage['Y2'])
synth_y2 = model_y2.predict_proba(covariates)[:, 1]
roc_auc_score(triage['Y2'], synth_y2)

0.9555263589748657

In [28]:
# Update labels
triage['Y1'] = synth_y1 > 0.5
triage['Y2'] = synth_y2 > 0.5
triage['YC'] = triage['Y1'] | triage['Y2']

In [29]:
# Model for D : Use a model for Yc and chance some of the leaved decision
model_yc = DecisionTreeClassifier(max_depth = 4, random_state = 42)
model_yc.fit(covariates, triage['YC'])
synth_yc = model_yc.predict_proba(covariates)[:, 1]
roc_auc_score(triage['YC'], synth_yc)

0.9431259214041656

In [30]:
# Compute last leaves of each point
final_leave_yc = model_yc.apply(covariates)

# Compute precision in Y2 for each leave
for leaf in np.unique(final_leave_yc):
    selection = final_leave_yc == leaf
    print('\n{} -> {} patients ({:.2f} % population)'.format(leaf, selection.sum(), 100*selection.mean()))
    print('Y1 -> {:.2f} precision - {:.2f} % patients'.format(precision_score(triage['Y1'][selection], synth_yc[selection] > 0.5, zero_division = 0), 100*triage['Y1'][selection].mean()))
    print('Y2 -> {:.2f} precision - {:.2f} % patients'.format(precision_score(triage['Y2'][selection], synth_yc[selection] > 0.5, zero_division = 0), 100*triage['Y2'][selection].mean()))


4 -> 20579 patients (5.13 % population)
Y1 -> 0.00 precision - 0.02 % patients
Y2 -> 0.00 precision - 0.00 % patients

5 -> 6331 patients (1.58 % population)
Y1 -> 0.00 precision - 0.03 % patients
Y2 -> 1.00 precision - 100.00 % patients

7 -> 5613 patients (1.40 % population)
Y1 -> 0.00 precision - 0.21 % patients
Y2 -> 0.00 precision - 29.40 % patients

8 -> 39656 patients (9.89 % population)
Y1 -> 0.77 precision - 76.94 % patients
Y2 -> 0.22 precision - 21.78 % patients

9 -> 28981 patients (7.22 % population)
Y1 -> 0.00 precision - 0.22 % patients
Y2 -> 1.00 precision - 100.00 % patients

13 -> 7673 patients (1.91 % population)
Y1 -> 0.00 precision - 32.83 % patients
Y2 -> 0.00 precision - 12.60 % patients

14 -> 14201 patients (3.54 % population)
Y1 -> 0.80 precision - 80.13 % patients
Y2 -> 0.12 precision - 12.10 % patients

15 -> 9527 patients (2.38 % population)
Y1 -> 0.00 precision - 0.07 % patients
Y2 -> 1.00 precision - 100.00 % patients

18 -> 255301 patients (63.64 % popu

In [31]:
# Randomly change prediction for leaves with high precision for Y1
leaves_to_update = [8, 13, 14, 18, 19, 21]
synth_yc_sc1 = synth_yc.copy()

noise = np.random.uniform(size = len(final_leave_yc)) > 0.1
for leaf in leaves_to_update:
    selection = final_leave_yc == leaf
    synth_yc_sc1[selection & noise & triage.Y1] = 0
    synth_yc_sc1[selection & noise & ~triage.Y1] = 1
    print(leaf, np.mean(synth_yc_sc1[selection] > 0.5))

8 0.30567883800685897
13 0.6051088231460967
14 0.2773044151820294
18 0.24330496159435333
19 0.7325077399380805
21 0.7596532702915682


In [32]:
triage['D'] = synth_yc_sc1 > 0.5

In [33]:
triage.to_csv('triage_scenario_1.csv')

**Scenario 2**: Non random assignment with bias. Older patients are assigned to one expert who is biased by underestimating their risk (D == 0).

In [34]:
triages2 = triage.copy()

In [35]:
triages2.loc[triage.anchor_age == 1, 'nurse'] = 21 # Non random assignment with new nurse
triages2.loc[triage.anchor_age == 1, 'D'] = False 

In [36]:
triages2.to_csv('triage_scenario_2.csv')

**Scenario 3**: All experts biased against older patients.

In [37]:
triages3 = triage.copy()

In [38]:
triages3.loc[triage.anchor_age == 1, 'D'] = False

In [39]:
triages3.to_csv('triage_scenario_3.csv')

**Scenario 4**: Noise dependent on experts. Different experts come with different expertise. We model this with expert-specific noise.

In [40]:
proba_error = np.random.uniform(size = len(np.unique(triage.nurse)))
noises = {nurse: np.random.uniform(size = len(triage)) > proba_error[nurse] for nurse in np.unique(triage.nurse)}

In [41]:
triages4 = triage.copy()

In [42]:
# Randomly change prediction for leaves with high precision for Y1
synth_yc_sc4 = synth_yc.copy()

for leaf in leaves_to_update:
    selection = final_leave_yc == leaf

    for nurse in noises:
        selection_nurse = selection & noises[nurse] & (triage.nurse == nurse)
        synth_yc_sc4[selection_nurse & triage.Y1] = 0
        synth_yc_sc4[selection_nurse & ~triage.Y1] = 1
        print(leaf, nurse, np.mean(synth_yc_sc4[selection_nurse] > 0.5))
    print(leaf, np.mean(synth_yc_sc4[selection] > 0.5), '\n')

8 0 0.225177304964539
8 1 0.2328977709454266
8 2 0.27941176470588236
8 3 0.2300556586270872
8 4 0.24823529411764705
8 5 0.26666666666666666
8 6 0.23302752293577983
8 7 0.2082616179001721
8 8 0.23692636072572038
8 9 0.22321428571428573
8 10 0.24365942028985507
8 11 0.22608695652173913
8 12 0.23404255319148937
8 13 0.23057644110275688
8 14 0.21371428571428572
8 15 0.22844175491679275
8 16 0.24227740763173833
8 17 0.2143474503025065
8 18 0.27440633245382584
8 19 0.2153846153846154
8 0.6635818035101876 

13 0 0.7129186602870813
13 1 0.6626984126984127
13 2 0.2
13 3 0.7037037037037037
13 4 0.7058823529411765
13 5 0.7592592592592593
13 6 0.6188118811881188
13 7 0.6446280991735537
13 8 0.609271523178808
13 9 0.7138157894736842
13 10 0.6771300448430493
13 11 0.6956521739130435
13 12 0.5753424657534246
13 13 0.652027027027027
13 14 0.6688963210702341
13 15 0.6714285714285714
13 16 0.6687306501547987
13 17 0.6521739130434783
13 18 0.639344262295082
13 19 0.75
13 0.28711064772579176 

14 0 0.1702

In [43]:
triages4['D'] = synth_yc_sc4 > 0.5

In [44]:
triages4.to_csv('triage_scenario_4.csv')