Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the function "causal_survival_forest" #1099

Closed
ChenXinaha opened this issue Feb 20, 2022 · 17 comments
Closed

Questions about the function "causal_survival_forest" #1099

ChenXinaha opened this issue Feb 20, 2022 · 17 comments

Comments

@ChenXinaha
Copy link

Hello,
I am trying to use the Causal survival forest prediction function. I have some questions about the "causal_survival_forest" function as follows:

  1. I cannot use the arguments "target" and "horizon ";
  2. In the details about the function as follows,why "'D[Y >= Y.max] <- 1' and 'Y[Y >= Y.max] <- Y.max'." and not
    "'D[Y >= Y.max] <- 0' and 'Y[Y >= Y.max] <- Y.max'."
    Details
    An important assumption for identifying the conditional average treatment effect tau(X) is that there exists a fixed positive constant M such that the probability of observing an event time past the maximum follow-up time Y.max is at least M. This may be an issue with data where most endpoint observations are censored. The suggested resolution is to re-define the estimand as the treatment effect up to some suitable maximum follow-up time Y.max. One can do this in practice by thresholding Y before running causal_survival_forest: 'D[Y >= Y.max] <- 1' and 'Y[Y >= Y.max] <- Y.max'. For details see Cui et al. (2020). The computational complexity of this estimator scales with the cardinality of the event times Y. If the number of samples is large and the Y grid dense, consider rounding the event times (or supply a coarser grid with the 'failure.times' argument).
  3. When I used the "causal_survival_forest" function to analysis my RCT (random control trial) data, it's expected that tau(X) = E[Y(1) - Y(0) | X = x] > 0 (1:treatment arm; 0: control arm) where E[Y] is the expected survival time = integral(survival function), but the results
    of this analysis was tau(X) = E[Y(1) - Y(0) | X = x] < 0 contrary to what I thought, which were very unreasonable.

Could you give me an answer.

Thanks very much!
Xin

@erikcs
Copy link
Member

erikcs commented Feb 20, 2022

Hi @ChenXinaha ,

  1. Those are only available in the development version of GRF, a revised paper will be posted soon together with a new grf CRAN release which will contain these arguments
  2. Because we are estimating a mean truncated at Y.max: all patients that survive past Y.max can be treated as observed (D=1) at Y.max since we know that if a patient was censored at Y.max + k (k>0) he must have been alive at Y.max. See figure 3, red line, in https://arxiv.org/pdf/2001.09887v2.pdf for what an appropriate Y.max would be
  3. See 2, for tau(X) to even be defined a proper Y.max has to be chosen, if all samples past time h are censored, the mean past h isn't defined since you never observe any events there

The revised paper + new CRAN release will ideally be easier to read and use.

@ChenXinaha
Copy link
Author

2. Because we are estimating a mean truncated at Y.max: all patients that survive past Y.max can be treated as observed (D=1) at Y.max since we know that if a patient was censored at Y.max + k (k>0) he must have been alive at Y.max. See figure 3, red line, in https://arxiv.org/pdf/2001.09887v2.pdf for what an appropriate Y.max would be

In the function "causal_survival_forest", "D" represets the event type (0: censored, 1: failure) ,so if a patient was censored at Y.max + k (k>0) then he must have been alive at Y.max and D=0?

@erikcs
Copy link
Member

erikcs commented Feb 20, 2022

No, the effective non-censoring indicator D is 1 in that case

@ChenXinaha
Copy link
Author

No, the effective non-censoring indicator D is 1 in that case
However, in the function "causal_survival_forest", "D" represets the event type (0: censored, 1: failure), and if a patient was censored at Y.max + k (k>0) ,then D = 1, how about if a patient dead(or failure) at Y.max - k (k>0) and a patient censored at Y.max - k (k>0).

@erikcs
Copy link
Member

erikcs commented Feb 21, 2022

Only samples observed past Y.max are relabeled

@ChenXinaha
Copy link
Author

Thank you very much!I have been very confused about tau(x) in the function "causal_survival_forest", I think that tau(X) = E[Y(1) - Y(0) | X = x] ,where Y is the survival time up to a fixed maximum follow-up time. Y(1) and Y(0) are potential outcomes corresponding to the two possible treatment states. Could I think that the larger tau(X) means larger benefits from treat (1) than the control (0).
However, in my analysis (random control trial), in the larger tau(X) group, the effect of treatment was not significant when I using Cox model (Surv(time, events) ~ Arm), and in the smaller tau(X) group, the effect of treatment was significant obviously.

So, I was confused about the association between tau(X) /ITE(individual treatment effect) estimated by causal survival forest and the effect of treatment estimated using Cox model. And how I understand "The event time (may be negative)" in the arguments.

I hope you can answer my questions. Thanks very much!

@erikcs
Copy link
Member

erikcs commented Feb 23, 2022

Can you post the R code you are using in the above paragraph? I'm not sure I understand what you are doing, it's fine the data is not available, just show the commands

@ChenXinaha
Copy link
Author

library(grf)
library(survival)
CSF.function <- function(X = x, Y = y, W = w, D = d){
ITE.CSF <- causal_survival_forest(X, Y, W, D, num.trees = 5000, honesty = TRUE, seed =2021)
ITE.Pred <- predict(ITE.CSF, estimate.variance = TRUE)
sigma.hat <- sqrt(ITE.Pred$variance.estimates)
LCI <- ITE.Pred$predictions - 1.96sigma.hat
UCI <- ITE.Pred$predictions + 1.96
sigma.hat
CSF.ITE <- data.frame(ITE = ITE.Pred$predictions, LCI, UCI)
return(CSF.ITE)
}

ITE <- CSF.function(X = data [, c(8:ncol(data))],
Y = data$time, #Survival time
W = data$Arm, #treat=1,control = 0
D = data$Event)

ITE$subject_id <- data$subject_id

data.ITE <- merge(ITE, data, by = "subject_id")

temp1 <- data.ITE[data.ITE$ITE <= quantile(data.ITE$ITE, 1/2), ]
temp2 <- data.ITE[data.ITE$ITE > quantiledata.ITE$ITE, 1/2), ]

coxph(Surv(time, Event)~ as.factor(Arm), data = data)
#the treatment effect in the all population, HR = 0.5, P < 0.05

coxph(Surv(time, Event)~ as.factor(Arm), data = temp1)
#HR = 0.3, P < 0.05
#smaller ITE but larger tretment effect

coxph(Surv(time, Event)~ as.factor(Arm), data = temp2)
#HR=0.8, P > 0.05
#larger ITE but smaller tretment effect

Could you explain the meaning of tau(X) = E[Y(1) - Y(0) | X = x] in this function.
Thanks!

@erikcs
Copy link
Member

erikcs commented Feb 23, 2022

tau(x) estimates the (truncated*) mean difference in is survival time conditional on X.
if you want an ATE estimate over a subset you can use the average_treatment_effect function with the subset argument.

*the truncation currently needs to be done manually, have you already done that with data? what does the following histogram look like?

Y=data$time
D=data$Event
library(ggplot2)
ggplot(data.frame(Y, Censored = factor(D, labels = c("Yes", "No"))), aes(x = Y, fill = Censored)) +
  geom_histogram(alpha = 0.5)

@ChenXinaha
Copy link
Author

ChenXinaha commented Feb 23, 2022

how should I choose the (truncated*)

@erikcs
Copy link
Member

erikcs commented Feb 23, 2022

Y.max should be at most around 19 (the time of the last green bar) from that plot. You need to the the truncation "relabeling" on the data before calling causal_survival_forest (the next release will make this easier).

@ChenXinaha
Copy link
Author

Y.max should be at most around 19 (the time of the last green bar) from that plot. You need to the the truncation "relabeling" on the data before calling causal_survival_forest (the next release will make this easier).

I will have a try, thanks very much!

@ChenXinaha
Copy link
Author

library(grf) library(survival) CSF.function <- function(X = x, Y = y, W = w, D = d){ ITE.CSF <- causal_survival_forest(X, Y, W, D, num.trees = 5000, honesty = TRUE, seed =2021) ITE.Pred <- predict(ITE.CSF, estimate.variance = TRUE) sigma.hat <- sqrt(ITE.Pred$variance.estimates) LCI <- ITE.Pred$predictions - 1.96_sigma.hat UCI <- ITE.Pred$predictions + 1.96_sigma.hat CSF.ITE <- data.frame(ITE = ITE.Pred$predictions, LCI, UCI) return(CSF.ITE) }

ITE <- CSF.function(X = data [, c(8:ncol(data))], Y = data$time, #Survival time W = data$Arm, #treat=1,control = 0 D = data$Event)

ITE$subject_id <- data$subject_id

data.ITE <- merge(ITE, data, by = "subject_id")

temp1 <- data.ITE[data.ITE$ITE <= quantile(data.ITE$ITE, 1/2), ] temp2 <- data.ITE[data.ITE$ITE > quantiledata.ITE$ITE, 1/2), ]

coxph(Surv(time, Event)~ as.factor(Arm), data = data) #the treatment effect in the all population, HR = 0.5, P < 0.05

coxph(Surv(time, Event)~ as.factor(Arm), data = temp1) #HR = 0.3, P < 0.05 #smaller ITE but larger tretment effect

coxph(Surv(time, Event)~ as.factor(Arm), data = temp2) #HR=0.8, P > 0.05 #larger ITE but smaller tretment effect

Could you explain the meaning of tau(X) = E[Y(1) - Y(0) | X = x] in this function. Thanks!

How should I understand the associations between ITE and the results of Cox model.
In this analysis (random control trial), in the larger tau(X) group, the effect of treatment was not significant when I using Cox model (Surv(time, events) ~ Arm), and in the smaller tau(X) group, the effect of treatment was significant obviously. Or in the other words, the effect of treatment was more obvious in the subgroup with smaller ITE.

@erikcs
Copy link
Member

erikcs commented Feb 23, 2022

Is this after you ran CSF with truncated Y?

@ChenXinaha
Copy link
Author

Is this after you ran CSF with truncated Y?

Yes.

@erikcs
Copy link
Member

erikcs commented Feb 25, 2022

How should I understand the associations between ITE and the results of Cox model.

There is no general relation between CSF and Cox. a) CSF is non-parametric, Cox is parametric (and can thus be mis-specified). b) they measure different things, CSF measures a difference in conditional (restricted) mean, Cox measures HR, they need not coincide

@erikcs
Copy link
Member

erikcs commented Mar 18, 2022

The latest CSF is on CRAN now (version 2.1.0) so closing this for now, please feel free to ask more questions if anything is unclear

@erikcs erikcs closed this as completed Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants