Time-to-event subgroup analysis for randomized controlled trials
git clone https://github.com/owkin/hte.git
Then:
conda create -n hte python=3.9
conda activate hte
pip install "poetry==1.5.1"
poetry install
Generate experiment parameters, for one subgroup definition in configs/data_configs/subgroups.py
and a chosen dimension and prognostic configuration, by running:
configs/data_configs/generate_parameters.py
To add a new subgroup function:
- Add the subgroup function definition in
configs/data_configs/subgroups.py
- Add the new subgroup function name to the
subgroup_index_dict
inconfigs/data_configs/subgroups.py
Parameter settings from LINK PAPER are available in configs/data_configs
folder. Currently the existing parameter settings are:
- Dimension p=20, isotropic, 10 prognostic covariates (5 positive, 5 negative) :
dim20_isotropic_pro(+)5_pro(-)5.json
- Dimension p=100, isotropic, 10 prognostic covariates (5 positive, 5 negative), 50 noise covariates :
dim100_isotropic_pro(+)5_pro(-)5_noise50.json
- Dimension p=1000, isotropic, 20 prognostic covariates (10 positive, 10 negative), 500 noise covariates :
dim1000_isotropic_pro(+)10_pro(-)10_noise500.json
Create, if needed, a results_compute_arr
folder in the data
folder.
Generate data respecting heterogeneity conditions by running the following when located in the experiments
folder:
python launch_compute_arr.py [–h] [-f FILE] [-g GROUP] [-l MINVAL] [-u MAXVAL] [-n NBPOINTS] [-mc MONTECARLO] [-m MODEL] [-ha HAZARD] [-a A] [-b B] [-tp TIMEPOINT] [-ss SEMI_SYNTH]
Where
-
-FILE
: the path to the parameter settings file produced in step 1; -
GROUP
: name of the subgroup function defined inconfigs/data_configs/subgroups.py
, of the formdim<P>_pred<X>_prog<Y>_balanced
; -
MINVAL
: the lowest value of$\beta$ to consider. -
MAXVAL
: the highest value of$\beta$ to consider. -
NBPOINTS
: the number of$\beta$ points for which to generate ARR. -
MONTECARLO
: Monte Carlo sampling size. Defaults to 1e6 -
MODEL
: Name of the time-to-event model {Cox, AH, AFT}. Defaults to Cox -
HAZARD
: Name of the baseline hazard {Weibull, LogNormal}. Defaults to Weibull -
A
: First parameter of the baseline hazard. Defaults to 1. -
B
: Second parameter of the baseline hazard. Defaults to 2. -
TIMEPOINT
: The timepoint where the ARR is computed. Defaults to 1. -
SEMI_SYNTH
: whether to use semi-synthetic data {True, False}. Defaults to False.
Data generated used in LINK PAPER are available in data/results_compute_arr
as a json file with a naming of the form Cox_Weibull_1.0_2.0_<MONTH>_<MONTH-NUMBER>_<DAY>_<YEAR>_<HH:MM:SS>----nbpoints=<NBPOINTS>--minval=<MINVALD>--maxval=<MAXVAL>.json
.
Create, if needed, results_expe/raw_results/
and results_expe/processed_results/
folders in the experiments
folder.
Run experiments by running the following when located in the experiments
folder:
python launch_expe.py [-h] [-f FILE] [-n NBPOINTS] [-s SAMPLESIZE] [-tr TRAINSIZE] [-r REPET] [-c CENSORED] [-sc SCALE] [-ss SEMI_SYNTH] [-m METHODS]
Where
FILE
: the path to the generated data file produced in step 2.NBPOINTS
: the number of ARR points to run the experiment for.SAMPLESIZE
: the sample size.TRAINSIZE
: the proportion of samples assigned to the training set out ofSAMPLESIZE
;REPET
: the number of experiment repeats at each ARR point;CENSORED
: whether there is censorship {True, False}. Defaults to False.SCALE
: the censorship scenario {1, 2, 3} (as defined in LINK PAPER). Defaults to 1.SEMI_SYNTH
: whether to use semi-synthetic data {True, False}. Defaults to False.METHODS
: list (as string) of methods to benchmark. Defaults to "Oracle, Univariate interaction, Univariate t_test, Multivariate cox, Multivariate tree, MOB, ITree, ARDP".
Experiment results are stored in experiments/results_expe/raw_results/
and experiments/results_expe/processed_results/
as two csv files with the same naming, of the form Cox_Weibull_1.0_2.0_dim=<DIMENSION>_range=[<MINVAL>,<MAXVAL>]_nb=<NBPOINTS>_group=[<GROUP>]_rangeARR=[<LOWER-BOUND-ARR>, <UPPER-BOUND-ARR>]_nb=<NBPOINTS>_train=<TRAINSIZE>_test=<TESTSIZE>_repet=<REPET>_censored=<CENSORED>_scale=<SCALE>_<MONTH>_<MONTH-NUMBER>_<DAY>_<YEAR>_<HH:MM:SS>.csv
.
To add an additional method to the benchmark, follow the template in hte/models/method_template.py
. The new method should be a class and have at least the following functions:
- A fit(**args) method
- A pval_hte(**args) method returning the p-value corresponding to the model estimation of heterogeneity existence
- A variables_ranking(**args) method returning a ranking of variables based on their estimated contribution to heterogeneity
- A predict(**args) returning the good/bad responders group assignment (0 or 1)
Additionally:
- The file containing the new method class definition should be placed in
hte/models
- Import the new method in
hte/experiments/run_experiments
- Set up the new method attributes in
hte/experiments/attributes.py
Follow the steps in B. In step B-3, use the command line and modify the [-m METHODS]
argument to add the new method accordingly.
Processed results from the main figures of [LINK PAPER] are available in experiments/results_expe/processed_results/
folder.
In the results_expe/
folder, analysis files are available; they can be converted to .ipynb notebooks using the following command:
jupytext --to notebook <FILENAME>.py
which will create a .ipynb file. The notebooks can be run to recreate the figures of our [LINK PAPER], as well as producing exploration tables for the different experiment scenarios and research questions.
The parameter settings stored in dim20_isotropic_pro(+)5_pro(-)5.json
are used.
Data is generated for a subgroup with 4 predictive variables and no prognostic variables, for 500 values of experiments
folder:
python launch_compute_arr.py -f="../configs/data_configs/dim20_isotropic_pro(+)5_pro(-)5.json" -g="dim20_pred4_prog0_balanced" -l=-10. -u=10. -n=500
This step reproduces the following file, available in data/results_compute_arr
folder: Cox_Weibull_1.0_2.0_dim=20_range=[-10.0,10.0]_nb=500_group=[dim20_pred4_prog0_balanced]_July_07_12_2023_15:15:24.json
Experiments are run for 10 points of ARR, with 100 repeats per ARR, with a sample size of 500 and 50-50 train-test split, following censorship scenario (1), when located in the experiments
folder:
python launch_expe.py -f="../data/results_compute_arr/Cox_Weibull_1.0_2.0_dim=20_range=[-10.0,10.0]_nb=500_group=[dim20_pred4_prog0_balanced]_July_07_12_2023_15:15:24.json" -n=10 -s=500 -tr=0.5 -r=100 -c=True -sc=1.0 -m="Oracle, Univariate interaction, Univariate t_test, Multivariate cox"
This step reproduces the following file, available in experiments/results_expe
folder: Cox_Weibull_1.0_2.0_dim=20_range=[-10.0,10.0]_nb=500_group='dim20_pred4_prog0_balanced_rangeARR=[0.0, 0.432410680647274]_nb=10_train=250_test=250_repet=100_censored=True_scale=1.0_October_10_06_2023_16:19:45.csv