Questions about the function "causal_survival_forest" #1099

ChenXinaha · 2022-02-20T03:51:04Z

Hello,
I am trying to use the Causal survival forest prediction function. I have some questions about the "causal_survival_forest" function as follows:

I cannot use the arguments "target" and "horizon ";
In the details about the function as follows,why "'D[Y >= Y.max] <- 1' and 'Y[Y >= Y.max] <- Y.max'." and not
"'D[Y >= Y.max] <- 0' and 'Y[Y >= Y.max] <- Y.max'."
Details
An important assumption for identifying the conditional average treatment effect tau(X) is that there exists a fixed positive constant M such that the probability of observing an event time past the maximum follow-up time Y.max is at least M. This may be an issue with data where most endpoint observations are censored. The suggested resolution is to re-define the estimand as the treatment effect up to some suitable maximum follow-up time Y.max. One can do this in practice by thresholding Y before running causal_survival_forest: 'D[Y >= Y.max] <- 1' and 'Y[Y >= Y.max] <- Y.max'. For details see Cui et al. (2020). The computational complexity of this estimator scales with the cardinality of the event times Y. If the number of samples is large and the Y grid dense, consider rounding the event times (or supply a coarser grid with the 'failure.times' argument).
When I used the "causal_survival_forest" function to analysis my RCT (random control trial) data, it's expected that tau(X) = E[Y(1) - Y(0) | X = x] > 0 (1:treatment arm; 0: control arm) where E[Y] is the expected survival time = integral(survival function), but the results
of this analysis was tau(X) = E[Y(1) - Y(0) | X = x] < 0 contrary to what I thought, which were very unreasonable.

Could you give me an answer.

Thanks very much!
Xin

erikcs · 2022-02-20T06:02:47Z

Hi @ChenXinaha ,

Those are only available in the development version of GRF, a revised paper will be posted soon together with a new grf CRAN release which will contain these arguments
Because we are estimating a mean truncated at Y.max: all patients that survive past Y.max can be treated as observed (D=1) at Y.max since we know that if a patient was censored at Y.max + k (k>0) he must have been alive at Y.max. See figure 3, red line, in https://arxiv.org/pdf/2001.09887v2.pdf for what an appropriate Y.max would be
See 2, for tau(X) to even be defined a proper Y.max has to be chosen, if all samples past time h are censored, the mean past h isn't defined since you never observe any events there

The revised paper + new CRAN release will ideally be easier to read and use.

ChenXinaha · 2022-02-20T06:28:36Z

2. Because we are estimating a mean truncated at Y.max: all patients that survive past Y.max can be treated as observed (D=1) at Y.max since we know that if a patient was censored at Y.max + k (k>0) he must have been alive at Y.max. See figure 3, red line, in https://arxiv.org/pdf/2001.09887v2.pdf for what an appropriate Y.max would be

In the function "causal_survival_forest", "D" represets the event type (0: censored, 1: failure) ,so if a patient was censored at Y.max + k (k>0) then he must have been alive at Y.max and D=0?

erikcs · 2022-02-20T16:58:38Z

No, the effective non-censoring indicator D is 1 in that case

ChenXinaha · 2022-02-21T03:37:59Z

No, the effective non-censoring indicator D is 1 in that case
However, in the function "causal_survival_forest", "D" represets the event type (0: censored, 1: failure), and if a patient was censored at Y.max + k (k>0) ,then D = 1, how about if a patient dead(or failure) at Y.max - k (k>0) and a patient censored at Y.max - k (k>0).

erikcs · 2022-02-21T22:37:20Z

Only samples observed past Y.max are relabeled

ChenXinaha · 2022-02-22T07:14:29Z

Thank you very much！I have been very confused about tau(x) in the function "causal_survival_forest", I think that tau(X) = E[Y(1) - Y(0) | X = x] ,where Y is the survival time up to a fixed maximum follow-up time. Y(1) and Y(0) are potential outcomes corresponding to the two possible treatment states. Could I think that the larger tau(X) means larger benefits from treat (1) than the control (0).
However, in my analysis (random control trial), in the larger tau(X) group, the effect of treatment was not significant when I using Cox model (Surv(time, events) ~ Arm), and in the smaller tau(X) group, the effect of treatment was significant obviously.

So, I was confused about the association between tau(X) /ITE(individual treatment effect) estimated by causal survival forest and the effect of treatment estimated using Cox model. And how I understand "The event time (may be negative)" in the arguments.

I hope you can answer my questions. Thanks very much!

erikcs · 2022-02-23T00:40:48Z

Can you post the R code you are using in the above paragraph? I'm not sure I understand what you are doing, it's fine the data is not available, just show the commands

ChenXinaha · 2022-02-23T01:21:29Z

library(grf)
library(survival)
CSF.function <- function(X = x, Y = y, W = w, D = d){
ITE.CSF <- causal_survival_forest(X, Y, W, D, num.trees = 5000, honesty = TRUE, seed =2021)
ITE.Pred <- predict(ITE.CSF, estimate.variance = TRUE)
sigma.hat <- sqrt(ITE.Pred$variance.estimates)
LCI <- ITE.Pred$predictions - 1.96sigma.hat
UCI <- ITE.Pred$predictions + 1.96sigma.hat
CSF.ITE <- data.frame(ITE = ITE.Pred$predictions, LCI, UCI)
return(CSF.ITE)
}

ITE <- CSF.function(X = data [, c(8:ncol(data))],
Y = data$time, #Survival time
W = data$Arm, #treat=1，control = 0
D = data$Event)

ITE$subject_id <- data$subject_id

data.ITE <- merge(ITE, data, by = "subject_id")

temp1 <- data.ITE[data.ITE$ITE <= quantile(data.ITE$ITE, 1/2), ]
temp2 <- data.ITE[data.ITE$ITE > quantiledata.ITE$ITE, 1/2), ]

coxph(Surv(time, Event)~ as.factor(Arm), data = data)
#the treatment effect in the all population, HR = 0.5, P < 0.05

coxph(Surv(time, Event)~ as.factor(Arm), data = temp1)
#HR = 0.3, P < 0.05
#smaller ITE but larger tretment effect

coxph(Surv(time, Event)~ as.factor(Arm), data = temp2)
#HR=0.8, P > 0.05
#larger ITE but smaller tretment effect

Could you explain the meaning of tau(X) = E[Y(1) - Y(0) | X = x] in this function.
Thanks!

erikcs · 2022-02-23T02:59:03Z

tau(x) estimates the (truncated*) mean difference in is survival time conditional on X.
if you want an ATE estimate over a subset you can use the average_treatment_effect function with the subset argument.

*the truncation currently needs to be done manually, have you already done that with data? what does the following histogram look like?

Y=data$time
D=data$Event
library(ggplot2)
ggplot(data.frame(Y, Censored = factor(D, labels = c("Yes", "No"))), aes(x = Y, fill = Censored)) +
  geom_histogram(alpha = 0.5)

ChenXinaha · 2022-02-23T03:31:02Z

how should I choose the (truncated*)

erikcs · 2022-02-23T03:35:15Z

Y.max should be at most around 19 (the time of the last green bar) from that plot. You need to the the truncation "relabeling" on the data before calling causal_survival_forest (the next release will make this easier).

ChenXinaha · 2022-02-23T03:36:13Z

Y.max should be at most around 19 (the time of the last green bar) from that plot. You need to the the truncation "relabeling" on the data before calling causal_survival_forest (the next release will make this easier).

I will have a try, thanks very much！

ChenXinaha · 2022-02-23T05:29:00Z

library(grf) library(survival) CSF.function <- function(X = x, Y = y, W = w, D = d){ ITE.CSF <- causal_survival_forest(X, Y, W, D, num.trees = 5000, honesty = TRUE, seed =2021) ITE.Pred <- predict(ITE.CSF, estimate.variance = TRUE) sigma.hat <- sqrt(ITE.Pred$variance.estimates) LCI <- ITE.Pred$predictions - 1.96_sigma.hat UCI <- ITE.Pred$predictions + 1.96_sigma.hat CSF.ITE <- data.frame(ITE = ITE.Pred$predictions, LCI, UCI) return(CSF.ITE) }

ITE <- CSF.function(X = data [, c(8:ncol(data))], Y = data$time, #Survival time W = data$Arm, #treat=1，control = 0 D = data$Event)

ITE$subject_id <- data$subject_id

data.ITE <- merge(ITE, data, by = "subject_id")

temp1 <- data.ITE[data.ITE$ITE <= quantile(data.ITE$ITE, 1/2), ] temp2 <- data.ITE[data.ITE$ITE > quantiledata.ITE$ITE, 1/2), ]

coxph(Surv(time, Event)~ as.factor(Arm), data = data) #the treatment effect in the all population, HR = 0.5, P < 0.05

coxph(Surv(time, Event)~ as.factor(Arm), data = temp1) #HR = 0.3, P < 0.05 #smaller ITE but larger tretment effect

coxph(Surv(time, Event)~ as.factor(Arm), data = temp2) #HR=0.8, P > 0.05 #larger ITE but smaller tretment effect

Could you explain the meaning of tau(X) = E[Y(1) - Y(0) | X = x] in this function. Thanks!

How should I understand the associations between ITE and the results of Cox model.
In this analysis (random control trial), in the larger tau(X) group, the effect of treatment was not significant when I using Cox model (Surv(time, events) ~ Arm), and in the smaller tau(X) group, the effect of treatment was significant obviously. Or in the other words, the effect of treatment was more obvious in the subgroup with smaller ITE.

erikcs · 2022-02-23T16:15:49Z

Is this after you ran CSF with truncated Y?

ChenXinaha · 2022-02-24T00:39:40Z

Is this after you ran CSF with truncated Y?

Yes.

erikcs · 2022-02-25T05:33:08Z

How should I understand the associations between ITE and the results of Cox model.

There is no general relation between CSF and Cox. a) CSF is non-parametric, Cox is parametric (and can thus be mis-specified). b) they measure different things, CSF measures a difference in conditional (restricted) mean, Cox measures HR, they need not coincide

erikcs · 2022-03-18T04:04:36Z

The latest CSF is on CRAN now (version 2.1.0) so closing this for now, please feel free to ask more questions if anything is unclear

erikcs closed this as completed Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the function "causal_survival_forest" #1099

Questions about the function "causal_survival_forest" #1099

ChenXinaha commented Feb 20, 2022

erikcs commented Feb 20, 2022 •

edited

ChenXinaha commented Feb 20, 2022

erikcs commented Feb 20, 2022

ChenXinaha commented Feb 21, 2022

erikcs commented Feb 21, 2022

ChenXinaha commented Feb 22, 2022

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022 •

edited

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 24, 2022

erikcs commented Feb 25, 2022

erikcs commented Mar 18, 2022

Questions about the function "causal_survival_forest" #1099

Questions about the function "causal_survival_forest" #1099

Comments

ChenXinaha commented Feb 20, 2022

erikcs commented Feb 20, 2022 • edited

ChenXinaha commented Feb 20, 2022

erikcs commented Feb 20, 2022

ChenXinaha commented Feb 21, 2022

erikcs commented Feb 21, 2022

ChenXinaha commented Feb 22, 2022

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022 • edited

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022

ChenXinaha commented Feb 23, 2022

erikcs commented Feb 23, 2022

ChenXinaha commented Feb 24, 2022

erikcs commented Feb 25, 2022

erikcs commented Mar 18, 2022

erikcs commented Feb 20, 2022 •

edited

ChenXinaha commented Feb 23, 2022 •

edited