# Triage MIMIC - Emergency Department

This analysis relies on the emergency data from the MIMIC IV dataset (Refer to https://physionet.org/content/mimic-iv-ed/1.0/ for the original dataset.) 

First, you need to download the data from Physionet website, following the instructions on the website.

```
wget -r -N -c -np --user USERNAME --ask-password https://physionet.org/files/mimic-iv-ed/1.0/  
wget -r -N -c -np --user USERNAME --ask-password https://physionet.org/files/mimiciv/1.0/core/
```

This will result in a `physionet.org` folder in which the `ed` directory will contains all relevant data.

In [None]:
path = 'physionet.org/files/'

##### Extract data of interest

In [None]:
from sklearn.preprocessing import StandardScaler
import pandas as pd
import os

In [None]:
# Open data
demo = pd.read_csv(os.path.join(path, 'mimiciv/1.0/core/patients.csv.gz'), index_col = 0)
triage = pd.read_csv(os.path.join(path, 'mimic-iv-ed/1.0/ed/triage.csv.gz'), index_col = [0, 1])
ed = pd.read_csv(os.path.join(path, 'mimic-iv-ed/1.0/ed/edstays.csv.gz'), index_col = [0, 2], parse_dates = ['intime', 'outtime'])

In [None]:
# Remove unnecessary columns and datapoints with any missing data
triage = triage.drop(columns = 'chiefcomplaint')
triage = triage.dropna(0, 'any')
triage

In [None]:
# Nurse assignment
# Expertise and tiredness might play a role here and we assign the day of admission as proxies of these dimensions
triage['nurse'] = ed.intime.dt.day_of_week[triage.index]

In [None]:
# Acuity binarization - D
# Human decision
triage['D'] = triage['acuity'] <= 2

In [None]:
# Outcome - Y1
# Defined as admission to the hospital
triage['Y1'] = ed.hadm_id.isna()[triage.index]

In [None]:
# Outcome - Y2
# Defined as abnormal vital signs using Emergency Severity Index
triage['Y2'] = (triage.o2sat < 92) | (triage.resprate > 20) | (triage.heartrate > 100)

In [None]:
# Concept - Yc
# Yc is definied as the union of Y1 and Y2
triage['YC'] = triage['Y1'] | triage['Y2']

In [None]:
# Normalize data
triage.iloc[:, :-5] = StandardScaler().fit_transform(triage.iloc[:, :-5])

In [None]:
triage.to_csv('triage_clean.csv')

### Verification

We study what proportion of the population have these characteristics.

In [None]:
# Nurse assignment
triage['nurse'].value_counts().sort_index() / len(triage)

In [None]:
# Human decision D - Acuity
triage['D'].mean()

In [None]:
# Outcome - Y1
triage['Y1'].mean()

In [None]:
# Outcome - Y2
triage['Y2'].mean()

In [None]:
# Concept - Yc
(triage['Y1'] & triage['Y2']).sum() / triage['Y2'].sum()

----------

# Semi - synthetic labels for scenarios

We create semi synthetic labels using tree-based models to allow more control on the consistency scenarios

In [None]:
from sklearn.metrics import roc_auc_score, precision_score
from sklearn.tree import DecisionTreeClassifier
import numpy as np

**Scenario 1**: One model for each experts and randomness in high consistency for Y1 (experts agree on Y2 and might therefore benefit YC modelling)

In [None]:
# Model for Y1
model_y1 = DecisionTreeClassifier(max_depth = 9, random_state = 42)
model_y1.fit(triage.iloc[:, :7], triage['Y1'])
synth_y1 = model_y1.predict_proba(triage.iloc[:, :7])[:, 1]
roc_auc_score(triage['Y1'], synth_y1)

In [None]:
# Model for Y2
model_y2 = DecisionTreeClassifier(max_depth = 2, random_state = 42)
model_y2.fit(triage.iloc[:, :7], triage['Y2'])
synth_y2 = model_y2.predict_proba(triage.iloc[:, :7])[:, 1]
roc_auc_score(triage['Y2'], synth_y2)

In [None]:
# Update labels
triage['Y1'] = synth_y1 > 0.5
triage['Y2'] = synth_y2 > 0.5
triage['YC'] = triage['Y1'] | triage['Y2']

In [None]:
# Model for D : Use a model for Yc and chance some of the leaved decision with random noise
model_yc = DecisionTreeClassifier(max_depth = 4, random_state = 42)
model_yc.fit(triage.iloc[:, :7], triage['YC'])
synth_yc = model_yc.predict_proba(triage.iloc[:, :7])[:, 1]
roc_auc_score(triage['YC'], synth_yc)

In [None]:
# Compute last leaves of each point
final_leave_yc = model_yc.apply(triage.iloc[:, :7])

# Compute precision in Y2 for each leave
for leaf in np.unique(final_leave_yc):
    selection = final_leave_yc == leaf
    print('Y1 {} -> {:.2f} precision - {} patients'.format(leaf, 
            precision_score(triage['Y1'][selection], synth_yc[selection] > 0.5), selection.sum()))
    print('Y2 {} -> {:.2f} precision - {} patients'.format(leaf, 
            precision_score(triage['Y2'][selection], synth_yc[selection] > 0.5), selection.sum()))

In [None]:
# Change prediction with noise for leaves with high precision for Y1
leaves_to_update = [13, 20, 24]


eps = 2 # Noise to add
for leaf in leaves_to_update:
    selection = final_leave_yc == leaf
    noise = (np.random.random(np.sum(selection)) - 0.5) * 2 * eps
    synth_yc[selection] = np.minimum(np.maximum(synth_yc[selection] + noise, 0), 1)
    print(leaf, np.mean(synth_yc[selection] > 0.5))

In [None]:
triage['D'] = synth_yc > 0.5

In [None]:
triage.to_csv('triage_scenario_1.csv')

**Scenario 2**: Non random assignment with bias. Women are assigned to one expert who is biased by overestimating their risk (D == 1).

In [None]:
triages2 = triage.copy()
triages2['D'] = triages2['YC'] # Initialize close to oracle

In [None]:
gender = triages2.join(demo).gender
index_women = (gender == 'F').sample(frac = 0.5,random_state = 42).index # Select 50% women
triages2.loc[index_women, 'nurse'] = 1 # Non random assignment
triages2.loc[index_women, 'D'] = True # Increase from 75% to 100%

In [None]:
triages2.to_csv('triage_scenario_2.csv')

**Scenario 3**: Shared biases. All experts overestimate risk for female.

In [None]:
triages2.loc[gender == 'F', 'D'] = True # Biased against women

In [None]:
triages2.to_csv('triage_scenario_3.csv')

**Scenario 4**: Noise dependent on experts. Different experts come with different expertise. We model this with one nurse (50 % correct) and one (0% correct).

In [None]:
triages4 = triage.copy()
triages4['D'] = triages4['YC']

In [None]:
nurse0 = triages4[triages4.nurse == 0]
nurse0.D = ~nurse0.Y1 # Always wrong

nurse1 = triages4[triages4.nurse == 1]
nurse1.D = ~nurse1.Y1 # Always wrong

selection = nurse1.sample(frac = 0.5,random_state = 42).index
nurse1.loc[selection].D = nurse1.loc[selection].Y1 # 50% right

In [None]:
triage.to_csv('triage_scenario_4.csv')