# Introduction to the xgbsurv package

This notebook introduces `xgbsurv` using a specific dataset. It structured by the following steps:

- Load data
- Load model
- Fit model
- Predict and evaluate model

The syntax conveniently follows that of sklearn.

In [1]:
from xgbsurv.datasets import load_metabric
from xgbsurv.models.utils import sort_X_y
from xgbsurv import XGBSurv
from sklearn.model_selection import train_test_split
import numpy as np
%load_ext autoreload
%autoreload 2


In [9]:
X, y = load_metabric(path="/Users/JUSC/Documents/xgbsurv/xgbsurv/datasets/data/", as_frame= False return_X_y=True)
X

Unnamed: 0,horm_treatment,grade,menopause,age,n_positive_nodes,progesterone,estrogene
0,0,1,0,50,2,90,30
1,1,2,1,57,18,11,13
2,0,2,0,44,19,28,31
3,0,0,0,50,1,1,4
4,0,1,0,51,5,360,57
...,...,...,...,...,...,...,...
2227,0,1,1,80,1,875,534
2228,1,1,1,59,4,4,3
2229,0,1,0,43,1,22,0
2230,1,1,1,57,4,16,5


## Load Data

In [3]:
data, target = load_metabric(path="/Users/JUSC/Documents/xgbsurv/xgbsurv/datasets/data/", as_frame=False)
target_sign = np.sign(target)
X_train, X_test, y_train, y_test = train_test_split(data, target, stratify=target_sign)
X_train, y_train = sort_X_y(X_train, y_train) 
X_test,  y_test = sort_X_y(X_test,  y_test)

## Load Model

In [3]:
model = XGBSurv(n_estimators=8000, objective="cind_objective",
                                             eval_metric="cind_loss",
                                             learning_rate=0.01,
                                             random_state=7, 
                                             disable_default_eval_metric=True)

The options of loss and objective functions can be obtained like below:

In [4]:
print(model.get_loss_functions().keys())
print(model.get_objective_functions().keys())

dict_keys(['breslow_loss', 'efron_loss', 'cind_loss', 'deephit_loss', 'aft_loss'])
dict_keys(['breslow_objective', 'efron_objective', 'cind_objective', 'deephit_objective', 'aft_objective'])


## Fit Model

In [5]:
eval_set = [(X_train, y_train)]

In [6]:
model.fit(X_train, y_train, eval_set=eval_set)

[0]	validation_0-cind_loss:-0.50024
[1]	validation_0-cind_loss:-0.50047
[2]	validation_0-cind_loss:-0.50071
[3]	validation_0-cind_loss:-0.50094
[4]	validation_0-cind_loss:-0.50118
[5]	validation_0-cind_loss:-0.50142
[6]	validation_0-cind_loss:-0.50165
[7]	validation_0-cind_loss:-0.50189
[8]	validation_0-cind_loss:-0.50213
[9]	validation_0-cind_loss:-0.50236
[10]	validation_0-cind_loss:-0.50260
[11]	validation_0-cind_loss:-0.50283
[12]	validation_0-cind_loss:-0.50307
[13]	validation_0-cind_loss:-0.50331
[14]	validation_0-cind_loss:-0.50354
[15]	validation_0-cind_loss:-0.50378
[16]	validation_0-cind_loss:-0.50402
[17]	validation_0-cind_loss:-0.50426
[18]	validation_0-cind_loss:-0.50449
[19]	validation_0-cind_loss:-0.50473
[20]	validation_0-cind_loss:-0.50497
[21]	validation_0-cind_loss:-0.50520
[22]	validation_0-cind_loss:-0.50544
[23]	validation_0-cind_loss:-0.50568
[24]	validation_0-cind_loss:-0.50592
[25]	validation_0-cind_loss:-0.50615
[26]	validation_0-cind_loss:-0.50639
[27]	valida

The model can be saved like below. Note that objective and eval_metric are not saved.

In [7]:
model.save_model("efron_model.json")



## Predict

In [8]:
preds_train = model.predict(X_train, output_margin=True)
preds_test = model.predict(X_test, output_margin=True)

### Predict Cumulative Hazard

## Evaluate

In [9]:
#from sksurv.metrics import concordance_index_censored
from xgbsurv.evaluation import cindex_censored, ibs

In [10]:
# train
cindex_censored(y_train, preds_train)

0.23831960251401393

In [11]:
# test
cindex_censored(y_test, preds_test)

0.33928479547208645