Here, the data is loaded into a pandas dataframe from my Google drive. The data is posted with the paper by Hou et al., at https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-020-02620-5#availability-of-data-and-materials

In [25]:
import pandas as pd

sepsis_df = pd.read_csv("/content/drive/MyDrive/CS 598 - Deep Learning for Healthcare/Final Project/Reproducing Septic Paper/sepsis_data.csv")

Here, we can see the columns from this dataset, which include demographic information, vital signs, and lab results.

In [26]:
for col in sepsis_df.columns:
  print(col)

icustay_id
hadm_id
intime
outtime
dbsource
suspected_infection_time_poe
suspected_infection_time_poe_days
specimen_poe
positiveculture_poe
antibiotic_time_poe
blood_culture_time
blood_culture_positive
age
gender
is_male
ethnicity
race_white
race_black
race_hispanic
race_other
metastatic_cancer
diabetes
first_service
hospital_expire_flag
thirtyday_expire_flag
icu_los
hosp_los
sepsis_angus
sepsis_martin
sepsis_explicit
septic_shock_explicit
severe_sepsis_explicit
sepsis_nqf
sepsis_cdc
sepsis_cdc_simple
elixhauser_hospital
vent
sofa
lods
sirs
qsofa
qsofa_sysbp_score
qsofa_gcs_score
qsofa_resprate_score
aniongap_min
aniongap_max
bicarbonate_min
bicarbonate_max
creatinine_min
creatinine_max
chloride_min
chloride_max
glucose_min
glucose_max
hematocrit_min
hematocrit_max
hemoglobin_min
hemoglobin_max
lactate_min
lactate_max
lactate_mean
platelet_min
platelet_max
potassium_min
potassium_max
inr_min
inr_max
sodium_min
sodium_max
bun_min
bun_max
bun_mean
wbc_min
wbc_max
wbc_mean
heartrate_min
he

We confirm that there are no missing values in the target, which is thirtyday_expire_flag, indicating whether or not the patient died within 30 days.

In [27]:
sepsis_df["thirtyday_expire_flag"].isna().sum()

0

We look at the value counts for the target, seeing that there are 3670 patients who survived, and 889 patients who died within 30 days. We see that we have a class imbalance, with 80.5% of patients surviving.

In [28]:
sepsis_df["thirtyday_expire_flag"].value_counts()

0    3670
1     889
Name: thirtyday_expire_flag, dtype: int64

Now, we select the target.

In [29]:
y = sepsis_df["thirtyday_expire_flag"]

Next, we drop columns that are not used for prediction. Note that gender and ethnicity are one-hot encoded, and these one-hot encoded variables are still included in the dataset.

In [30]:
to_drop = ["icustay_id",
           "hadm_id",
           "intime",
           "outtime",
           "dbsource",
           "suspected_infection_time_poe",
           "suspected_infection_time_poe_days",
           "specimen_poe",
           "positiveculture_poe",
           "antibiotic_time_poe",
           "blood_culture_time",
           "blood_culture_positive",
           "gender",
           "ethnicity",
           "hospital_expire_flag",
           "thirtyday_expire_flag",
           "first_service"]

X = sepsis_df.drop(columns = to_drop)

We are now left with only numerical features. We perform mean value imputation on all columns to fill in missing values.

In [31]:
X = X.apply(lambda x:x.fillna(x.mean()), axis = 0).values

Now, we ust the StandardScaler from SciKit-Learn to normalize the data so that each feature has mean 0 and standard deviation 1.

In [32]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = scaler.fit_transform(X)

We split the data into a training set and a testing set, which we will use to train and evaluate models. The training set is 80% of the data, while the testing set is 20% of the data.

In [33]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, stratify = y, random_state = 0)

We will use a memory profile to check the computational requirements.

In [45]:
!pip install memory_profiler
%load_ext memory_profiler



Next, we train our first model, which is logistic regression. We use the model classifier from SciKit-Learn, and fit it to the training data. We evaluate the performance by computing the accuracy, precision, recall, and AUC on the testing data.

In [46]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

clf = LogisticRegression(random_state = 0, max_iter = 10000)
%time %memit clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_pred))

peak memory: 296.12 MiB, increment: 0.02 MiB
CPU times: user 402 ms, sys: 160 ms, total: 562 ms
Wall time: 761 ms
Accuracy: 0.8848684210526315
Precision: 0.7967479674796748
Recall: 0.550561797752809
AUC: 0.758250926124361


Our next model is a support vector machine classifier. Again, we use the model from SciKit-Learn. Below, I include the SVC with the default kernel: rbf. I experimented with other kernels, and found that this gave the best performance.

In [47]:
from sklearn.svm import SVC

clf = SVC(random_state = 0)
%time %memit clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_pred))

peak memory: 296.14 MiB, increment: 0.03 MiB
CPU times: user 1.28 s, sys: 108 ms, total: 1.39 s
Wall time: 2.32 s
Accuracy: 0.8651315789473685
Precision: 0.8767123287671232
Recall: 0.3595505617977528
AUC: 0.6736444907081407


Next, I used a random forest classifier.

In [48]:
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(random_state = 0)
%time %memit clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_pred))

peak memory: 296.29 MiB, increment: 0.00 MiB
CPU times: user 2.85 s, sys: 39.2 ms, total: 2.89 s
Wall time: 5.53 s
Accuracy: 0.8739035087719298
Precision: 0.9315068493150684
Recall: 0.38202247191011235
AUC: 0.6876052414046474


Now, we try a gradient boosting classifier. Note that the result should be essentially the same as XGBoost, except that XGBoost typically runs faster.

In [49]:
from sklearn.ensemble import GradientBoostingClassifier

clf = GradientBoostingClassifier(random_state = 0)
%time %memit clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_pred))

peak memory: 296.34 MiB, increment: 0.00 MiB
CPU times: user 6.61 s, sys: 39.8 ms, total: 6.65 s
Wall time: 7.28 s
Accuracy: 0.8980263157894737
Precision: 0.8761061946902655
Recall: 0.5561797752808989
AUC: 0.768553102899305


Next, we try a $k$-nearest neighbors classifier. From experimenting with the value our $k$, we found that $k = 1$ gave the best performance (though this is most likely overfitting).

In [50]:
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors = 1)
%time %memit clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_pred))

peak memory: 296.34 MiB, increment: 0.00 MiB
CPU times: user 198 ms, sys: 98.6 ms, total: 296 ms
Wall time: 535 ms
Accuracy: 0.8026315789473685
Precision: 0.49166666666666664
Recall: 0.33146067415730335
AUC: 0.6241772035636654
