# Introduction to the xgbsurv package - Breslow

This notebook introduces `xgbsurv` using a specific dataset. It structured by the following steps:

- Load data
- Load model
- Fit model
- Predict and evaluate model

The syntax conveniently follows that of sklearn.

In [1]:
from xgbsurv.datasets import load_metabric
from xgbsurv import XGBSurv
from xgbsurv.models.utils import sort_X_y, transform_back
from xgbsurv.evaluation import cindex_censored
from pycox.evaluation import EvalSurv
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import os
current_path = os.getcwd() 
one_level_up = os.path.abspath(os.path.join(current_path,  ".."))

## Load Data

In [2]:
data = load_metabric(path=one_level_up+"/xgbsurv/datasets/data/", as_frame=False)
# stratify by event indicated by sign
target_sign = np.sign(data.target)
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, stratify=target_sign)
# sort data
X_train, y_train = sort_X_y(X_train, y_train)
X_test, y_test = sort_X_y(X_test, y_test)

Values are being sorted!
Values are being sorted!


## Load Model

In [3]:
model = XGBSurv(n_estimators=100, objective="cind_objective",
                                             eval_metric="cind_loss",
                                             learning_rate=0.3,
                                             random_state=42, 
                                             disable_default_eval_metric=1,
                                             )
model

The options of loss and objective functions can be obtained like below:

In [4]:
print(model.get_loss_functions().keys())
print(model.get_objective_functions().keys())

dict_keys(['breslow_loss', 'efron_loss', 'cind_loss', 'deephit_loss', 'aft_loss', 'ah_loss', 'eh_loss'])
dict_keys(['breslow_objective', 'efron_objective', 'cind_objective', 'deephit_objective', 'aft_objective', 'ah_objective', 'eh_objective'])


## Fit Model

In [5]:
eval_set = [(X_train, y_train)]

In [6]:
model.fit(X_train, y_train, eval_set=eval_set)

[0]	validation_0-cind_loss:404.66783
[1]	validation_0-cind_loss:396.11028
[2]	validation_0-cind_loss:387.63888
[3]	validation_0-cind_loss:379.63556
[4]	validation_0-cind_loss:372.38694
[5]	validation_0-cind_loss:366.03649
[6]	validation_0-cind_loss:359.61474
[7]	validation_0-cind_loss:353.44977
[8]	validation_0-cind_loss:346.26213
[9]	validation_0-cind_loss:341.04881
[10]	validation_0-cind_loss:334.59096
[11]	validation_0-cind_loss:328.60658
[12]	validation_0-cind_loss:323.84192
[13]	validation_0-cind_loss:318.44819
[14]	validation_0-cind_loss:314.17773
[15]	validation_0-cind_loss:309.88120
[16]	validation_0-cind_loss:305.99119
[17]	validation_0-cind_loss:301.45461
[18]	validation_0-cind_loss:297.89785
[19]	validation_0-cind_loss:294.66603
[20]	validation_0-cind_loss:290.59012
[21]	validation_0-cind_loss:286.79064
[22]	validation_0-cind_loss:283.26882
[23]	validation_0-cind_loss:279.81337
[24]	validation_0-cind_loss:276.47015
[25]	validation_0-cind_loss:273.30644
[26]	validation_0-cind

The model can be saved like below. Note that objective and eval_metric are not saved.

In [7]:
#model.save_model("introduction_model_breslow.json")

## Predict

In [8]:
preds_train = model.predict(X_train, output_margin=True)
preds_test = model.predict(X_test, output_margin=True)

## Predict Survival Function

In [11]:
cindex_score_test = cindex_censored(y_test, preds_test)
cindex_score_test

0.6180119894047121

## Evaluate

### Test

In [None]:
durations_test, events_test = transform_back(y_test)
time_grid = np.linspace(durations_test.min(), durations_test.max(), 100)
ev = EvalSurv(df_survival_function, durations_test, events_test, censor_surv='km')
print('Concordance Index',ev.concordance_td('antolini'))

Concordance Index 0.5238516694244558
