# Introduction to the xgbsurv package - Accelerated Failure Time

This notebook introduces `xgbsurv` using a specific dataset. It structured by the following steps:

- Load data
- Load model
- Fit model
- Predict and evaluate model

The syntax conveniently follows that of sklearn.

In [1]:
from xgbsurv.datasets import load_metabric,
from xgbsurv import XGBSurv
from sklearn.model_selection import train_test_split
import numpy as np
from xgbsurv.evaluation import cindex_censored, ibs
%load_ext autoreload
%autoreload 2


## Load Data

In [17]:
data, target = load_metabric(path="/Users/JUSC/Documents/xgbsurv/xgbsurv/datasets/data/", as_frame=False)
target_sign = np.sign(target)
X_train, X_test, y_train, y_test = train_test_split(data, target, stratify=target_sign)

## Load Model

In [18]:
model = XGBSurv(n_estimators=100, objective="aft_objective",
                                             eval_metric="aft_loss",
                                             learning_rate=0.3,
                                             random_state=7, 
                                             disable_default_eval_metric=True,
                                             base_score=0.0)

The options of loss and objective functions can be obtained like below:

In [19]:
print(model.get_loss_functions().keys())
print(model.get_objective_functions().keys())

dict_keys(['breslow_loss', 'efron_loss', 'cind_loss', 'deephit_loss', 'aft_loss', 'ah_loss'])
dict_keys(['breslow_objective', 'efron_objective', 'cind_objective', 'deephit_objective', 'aft_objective', 'ah_objective'])


## Fit Model

In [20]:
eval_set = [(X_train, y_train)]

In [21]:
model.fit(X_train, y_train, eval_set=eval_set)

[0]	validation_0-aft_likelihood:2.25441
[1]	validation_0-aft_likelihood:2.25258
[2]	validation_0-aft_likelihood:2.25088
[3]	validation_0-aft_likelihood:2.24929
[4]	validation_0-aft_likelihood:2.24773
[5]	validation_0-aft_likelihood:2.24627
[6]	validation_0-aft_likelihood:2.24501
[7]	validation_0-aft_likelihood:2.24383
[8]	validation_0-aft_likelihood:2.24278
[9]	validation_0-aft_likelihood:2.24165
[10]	validation_0-aft_likelihood:2.24057
[11]	validation_0-aft_likelihood:2.23977
[12]	validation_0-aft_likelihood:2.23883
[13]	validation_0-aft_likelihood:2.23804
[14]	validation_0-aft_likelihood:2.23734
[15]	validation_0-aft_likelihood:2.23652
[16]	validation_0-aft_likelihood:2.23569
[17]	validation_0-aft_likelihood:2.23511
[18]	validation_0-aft_likelihood:2.23454
[19]	validation_0-aft_likelihood:2.23404
[20]	validation_0-aft_likelihood:2.23332
[21]	validation_0-aft_likelihood:2.23271
[22]	validation_0-aft_likelihood:2.23222
[23]	validation_0-aft_likelihood:2.23184
[24]	validation_0-aft_like

The model can be saved like below. Note that objective and eval_metric are not saved.

## Predict

In [22]:
preds_train = model.predict(X_train, output_margin=True)
preds_test = model.predict(X_test, output_margin=True)

## Evaluate

In [23]:
#from sksurv.metrics import concordance_index_censored


In [24]:
# train
cindex_censored(y_train, preds_train)

0.7675853845693046

In [25]:
# test
cindex_censored(y_test, preds_test)

0.6170342828333754

## With Early Stopping

In [26]:
data, target = load_metabric(path="/Users/JUSC/Documents/xgbsurv/xgbsurv/datasets/data/", as_frame=False)
target_sign = np.sign(target)
X_train, X_test, y_train, y_test = train_test_split(data, target, stratify=target_sign)

In [27]:
model = XGBSurv(n_estimators=100, objective="aft_objective",
                                             eval_metric="aft_loss",
                                             learning_rate=0.3,
                                             random_state=7, 
                                             disable_default_eval_metric=True,
                                             base_score=0.0, early_stopping_rounds=20)

In [28]:
model.fit(X_train, y_train, eval_test_size=0.1)

[0]	validation_0-aft_likelihood:2.27951	validation_1-aft_likelihood:2.38658
[1]	validation_0-aft_likelihood:2.27738	validation_1-aft_likelihood:2.38560
[2]	validation_0-aft_likelihood:2.27522	validation_1-aft_likelihood:2.38508
[3]	validation_0-aft_likelihood:2.27331	validation_1-aft_likelihood:2.38422
[4]	validation_0-aft_likelihood:2.27152	validation_1-aft_likelihood:2.38346
[5]	validation_0-aft_likelihood:2.26987	validation_1-aft_likelihood:2.38274
[6]	validation_0-aft_likelihood:2.26831	validation_1-aft_likelihood:2.38212
[7]	validation_0-aft_likelihood:2.26686	validation_1-aft_likelihood:2.38143
[8]	validation_0-aft_likelihood:2.26542	validation_1-aft_likelihood:2.38114
[9]	validation_0-aft_likelihood:2.26406	validation_1-aft_likelihood:2.38036
[10]	validation_0-aft_likelihood:2.26286	validation_1-aft_likelihood:2.37959
[11]	validation_0-aft_likelihood:2.26169	validation_1-aft_likelihood:2.37917
[12]	validation_0-aft_likelihood:2.26075	validation_1-aft_likelihood:2.37871
[13]	vali

In [29]:
preds_train = model.predict(X_train, output_margin=True)
preds_test = model.predict(X_test, output_margin=True)
# train
cindex_censored(y_train, preds_train)

0.7500766520649215

In [30]:
# test
cindex_censored(y_test, preds_test)

0.6005902074011753