In [1]:
import os
main = os.chdir(os.path.dirname(os.path.dirname(os.getcwd())))

In [2]:
import numpy as np
import pandas as pd

from model import model_architecture, output_results, utils
from model.utils import prepare_pseudobs
from sksurv.util import Surv as skSurv

Using TensorFlow backend.


# I. METABRIC data

We first choose the type of pseudo-observation among the followings:
- "pseudo_optim"
- "pseudo_km"
- "pseudo-continuous"
- "pseudo-discrete"

In [3]:
name = "pseudo-discrete" 

We use METABRIC data. We use clinical and pathological informations and gene expression data. Missing values for explanatory variables are completed. Data is normalized (with mean and std from train set for train and test set) and splitted into training and test set. The same training and test set are used for all the models.

In [4]:
df_train = pd.read_csv("data/real_data/metabric_train.csv")
data_train = skSurv.from_arrays(event=df_train['cen_train'], time=df_train['surv_train'])
x_train = df_train.drop(['surv_train','cen_train'], axis = 1)
y_train = pd.read_csv("data/real_data/meta_" + name +".csv")

df_test = pd.read_csv("data/real_data/metabric_test.csv")
x_test = np.array(df_test.drop(['surv_test','cen_test'], axis = 1), dtype = 'float32')
y_test = df_test[['surv_test','cen_test']]

In [5]:
x_train_all, y_train_all, _ = utils.prepare_pseudobs(x_train, y_train, df_train, x_test, df_test, name)

# II. Model's construction and training

The parameters of the architecture are the one listed in the parameters dataframe, selected by a 5-fold cross-validation among 100 sets of parameters. 

In [6]:
param = pd.read_csv("model/param_metabric.csv",sep=';', index_col = 0).T
param_final = param.loc[name]

In [7]:
print(param_final)

neurons          16
drop            0.4
activation      elu
lr_opt        0.005
optimizer     sgdwr
n_layers          3
Name: pseudo-discrete, dtype: object


In [8]:
neurons = int(param_final['neurons'])
drop = float(param_final['drop'])
activation = param_final['activation']
lr_opt = float(param_final['lr_opt'])
optimizer = param_final['optimizer']
n_layers = int(param_final['n_layers'])

The objective function is used to define the architecture of the neural network. 

In [9]:
model, callbacks  = model_architecture.objective_pseudobs(x_train_all, neurons, drop,  activation, lr_opt, optimizer, n_layers)
log = model.fit(x_train_all, y_train_all, batch_size = 32, epochs = 100, callbacks = callbacks, verbose=2)

Epoch 1/100
 - 2s - loss: 0.1825
Epoch 2/100
 - 1s - loss: 0.1359
Epoch 3/100
 - 1s - loss: 0.1340
Epoch 4/100
 - 1s - loss: 0.1333
Epoch 5/100
 - 1s - loss: 0.1310
Epoch 6/100
 - 1s - loss: 0.1311
Epoch 7/100
 - 1s - loss: 0.1302
Epoch 8/100
 - 1s - loss: 0.1299
Epoch 9/100
 - 1s - loss: 0.1301
Epoch 10/100
 - 1s - loss: 0.1292
Epoch 11/100
 - 1s - loss: 0.1296
Epoch 12/100
 - 1s - loss: 0.1296
Epoch 13/100
 - 1s - loss: 0.1293
Epoch 14/100
 - 1s - loss: 0.1291
Epoch 15/100
 - 1s - loss: 0.1291
Epoch 16/100
 - 1s - loss: 0.1288
Epoch 17/100
 - 1s - loss: 0.1288
Epoch 18/100
 - 1s - loss: 0.1283
Epoch 19/100
 - 1s - loss: 0.1288
Epoch 20/100
 - 1s - loss: 0.1284
Epoch 21/100
 - 1s - loss: 0.1285
Epoch 22/100
 - 1s - loss: 0.1282
Epoch 23/100
 - 1s - loss: 0.1279
Epoch 24/100
 - 1s - loss: 0.1277
Epoch 25/100
 - 1s - loss: 0.1281
Epoch 26/100
 - 1s - loss: 0.1276
Epoch 27/100
 - 1s - loss: 0.1280
Epoch 28/100
 - 1s - loss: 0.1278
Epoch 29/100
 - 1s - loss: 0.1277
Epoch 30/100
 - 1s - lo

# III. Results

We sample 100 test sets iteratively among the test dataset to obtain bootstraped results. 

In [10]:
n_iterations = 100
results_all = output_results.output_bootstrap(model, n_iterations, df_train, data_train, y_train, df_test,name)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99


We output the AUC value at 5 and 10 years and Uno's C-index at 5 and 10 years, with 95% Confidence Intervals.

In [11]:
results_all

Unnamed: 0,mean,ci95_lo,ci95_hi,std,count
auc5,0.708788,0.652492,0.762467,0.035913,100.0
auc10,0.736577,0.687853,0.788017,0.029735,100.0
unoc5,0.680362,0.636641,0.726988,0.030728,100.0
unoc10,0.680517,0.647172,0.713789,0.020426,100.0
