# Original Code

**The following code will compute an XGBoost model trained to predict exitus in the healthcare dataset from the mock midterm:**

In [4]:
# Load Libraries
%matplotlib inline
import numpy as np
import pandas as pd
import io
from google.colab import files
from timeit import default_timer
from sklearn.preprocessing import OneHotEncoder
from xgboost import XGBClassifier as model_constructor



# Read file
#uploaded = files.upload()
dat = pd.read_csv(io.BytesIO(uploaded['dataset_mock_midterm.csv']), sep = ",")


# One Hot Encoding
categorical_vars = set(['severity', 'origin', 'tip_adm', 'tip_grd', 'date'])
non_categorical_vars = set(dat.columns) - categorical_vars
categorical_vars = list(categorical_vars)
non_categorical_vars = list(non_categorical_vars)
ohe = OneHotEncoder(sparse_output = False)
ohe_fit = ohe.fit(dat[dat['dataset'] == 'train'][categorical_vars])
dat_ohe = pd.DataFrame(ohe.fit_transform(dat[categorical_vars]))
dat_ohe.columns = pd.DataFrame(ohe_fit.get_feature_names_out())
dat = pd.concat((dat_ohe, dat[non_categorical_vars].reset_index()), axis=1)



# 3) Define the model
model = model_constructor(n_estimators = 1000,
             learning_rate = 0.01,
             gamma = 0,
             max_depth = 100,
             min_child_weight = 1,
             subsample = 1,
             colsample_bytree = 1,
             num_parallel_tree = 20,
             reg_lambda = 0,
             random_state = 0,
             early_stopping_rounds = 10,
             eval_metric = 'auc')



# 4) Train the model
start_time = default_timer()
model.fit(dat[dat['dataset'] == 'train'].drop(['exitus', 'dataset'], axis = 1), dat[dat['dataset'] == 'train'].exitus.values,
eval_set = [(dat[dat['dataset'] == 'val'].drop(['exitus', 'dataset'], axis = 1), dat[dat['dataset'] == 'val'].exitus.values)])
time = default_timer() - start_time


# Print time required to train the model
print(time)

[0]	validation_0-auc:0.72559
[1]	validation_0-auc:0.72068
[2]	validation_0-auc:0.72237
[3]	validation_0-auc:0.73398
[4]	validation_0-auc:0.73406
[5]	validation_0-auc:0.74625
[6]	validation_0-auc:0.74473
[7]	validation_0-auc:0.74896
[8]	validation_0-auc:0.75258
[9]	validation_0-auc:0.75299
[10]	validation_0-auc:0.78569
[11]	validation_0-auc:0.78332
[12]	validation_0-auc:0.78184
[13]	validation_0-auc:0.77967
[14]	validation_0-auc:0.77585
[15]	validation_0-auc:0.77954
[16]	validation_0-auc:0.77948
[17]	validation_0-auc:0.77911
[18]	validation_0-auc:0.78781
[19]	validation_0-auc:0.78861
[20]	validation_0-auc:0.79617
[21]	validation_0-auc:0.79649
[22]	validation_0-auc:0.79241
[23]	validation_0-auc:0.82040
[24]	validation_0-auc:0.82196
[25]	validation_0-auc:0.82551
[26]	validation_0-auc:0.82624
[27]	validation_0-auc:0.83838
[28]	validation_0-auc:0.83997
[29]	validation_0-auc:0.84515
[30]	validation_0-auc:0.85164
[31]	validation_0-auc:0.85327
[32]	validation_0-auc:0.85360
[33]	validation_0-au

This model takes around 50-60 seconds to run in my Google Colab session, achieving an AUC over validation of 0.89.

# Exercise

**Modify the hyperparameters to reduce this computational time to less than 5 seconds (try to not reduce significantly the AUC over validation, or even improve it, obtained by the original model).**

In [5]:
# 3) Define the model
model = model_constructor(n_estimators = 1000,
             learning_rate = 1, # Modified!
             gamma = 0,
             max_depth = 20, # Modified!
             min_child_weight = 10, # Modified!
             subsample = 1,
             colsample_bytree = 1,
             num_parallel_tree = 20,
             reg_lambda = 0,
             random_state = 0,
             early_stopping_rounds = 10,
             eval_metric = 'auc')



# 4) Train the model
start_time = default_timer()
model.fit(dat[dat['dataset'] == 'train'].drop(['exitus', 'dataset'], axis = 1), dat[dat['dataset'] == 'train'].exitus.values,
eval_set = [(dat[dat['dataset'] == 'val'].drop(['exitus', 'dataset'], axis = 1), dat[dat['dataset'] == 'val'].exitus.values)])
time = default_timer() - start_time


# Print time required to train the model
print(time)

[0]	validation_0-auc:0.92258
[1]	validation_0-auc:0.92357
[2]	validation_0-auc:0.92025
[3]	validation_0-auc:0.91690
[4]	validation_0-auc:0.91996
[5]	validation_0-auc:0.91521
[6]	validation_0-auc:0.91319
[7]	validation_0-auc:0.91319
[8]	validation_0-auc:0.91150
[9]	validation_0-auc:0.90616
[10]	validation_0-auc:0.90859
[11]	validation_0-auc:0.90495
2.870255441999973


This model took less than 3 seconds and achieved 0.92 AUC over validation.

**Which hyperparameters where more important to speed up the code?**

In my case, I performed the following modifications:

- Increase *learning_rate*.
- Decrease *max_depth*.
- Increase *min_child_weight*.