You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have problems using simsurv for more complex hazard functions (which I need for cases where the cumulative hazard is not analytically tractable).
A reproducible example is given below. The figure shows an example where the KM survival estimates from simsurv, which are generated from specifying the hazard function, deviate significantly from the true model.
Here, we take a bell shaped hazard function with sharp spike at 2 year timepoint, and we can see that simsurv, which is using numerical integration of the hazard function, does not reproduce the survival function at a large sample of N = 10,000.
In contrast, as shown in the figure, using simsurv and specifying the cumulative hazard (which exists for this example) does reproduce the survival function accurately.
This is due to having too few nodes limited at 15 for the Gauss-Kronrod quadrature.
The Stata version of this in survsim does not run into this problem, as nodes can be set e.g. to 100 or 200, which overcomes the problem in this case (though Stata also struggles for nodes = 15).
Can greater flexibility be added to the number of nodes in simsurv?
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(survminer)
library(simsurv)
set.seed(1024771823)
N <- 1e4 # sample for simsurv datasets.
x_cov <- tibble(id = 1:N)
# set parameters to define survival model.
mu <- 2
sd <- 0.1
haz_norm <- function(t, x = NULL, betas = NULL){
res <- dnorm(x=t, mean = mu, sd = sd)+0.2
res
}
cumhaz_norm <- function(t, x = NULL, betas = NULL){
res <- pnorm(q=t, mean = mu, sd = sd)+0.2*t
res
}
# simulate data using hazard function
sim_data_h <- simsurv(x = x_cov,
hazard = haz_norm,
nodes = 15)
# extract Kaplan-Meier estimates
km_h <- survfit(Surv(eventtime, status) ~ 1, data=sim_data_h )
km_h_plot <- ggsurvplot(km_h,data=sim_data_h)$data.survplot %>%
select(time, surv) %>%
mutate(simsurv_scale = "hazard",
alpha = 1,
linetype = 1)
# simulate data using cumulative hazard function
sim_data_ch <- simsurv(x = x_cov,
cumhazard = cumhaz_norm)
# extract Kaplan-Meier estimates
km_ch <- survfit(Surv(eventtime, status) ~ 1, data=sim_data_ch)
km_ch_plot <- ggsurvplot(km_ch,data=sim_data_ch)$data.survplot %>%
select(time, surv) %>%
mutate(simsurv_scale = "cumhazard",
alpha = 1,
linetype = 1)
# generate true values of survival function.
true_surv_plot <- tibble(time = seq(from = 0, to = 20, length.out = 1e4)) %>%
mutate(surv = exp(-(pnorm(q=time, mean = mu, sd = sd)+0.2*time))) %>%
mutate(simsurv_scale = "true model",
alpha = 1,
linetype = 2)
# combine values together into tidy format tibble
plot_df <- rbind(true_surv_plot, km_h_plot, km_ch_plot) %>%
filter(time <= 20) %>%
mutate(linetype = as.factor(linetype))
ggplot(aes(x = time, y= surv,
colour = simsurv_scale,
linetype = simsurv_scale), data = plot_df) +
theme_classic()+
geom_step()+
xlim(c(0,20))+
ylim(c(0,1))+
scale_colour_manual(name = "Simsurv scale", values = c("#c51b7d","#3288bd", "#1a1a1a"))+
scale_linetype_manual(name = "Simsurv scale", values = c("solid","solid","dashed"))+
ylab("Survival")
The text was updated successfully, but these errors were encountered:
irtimmins
changed the title
Insufficient nodes for integrating complex hazards
Insufficient nodes for simulating from complex hazards
Feb 6, 2024
Hi, I have problems using simsurv for more complex hazard functions (which I need for cases where the cumulative hazard is not analytically tractable).
A reproducible example is given below. The figure shows an example where the KM survival estimates from simsurv, which are generated from specifying the hazard function, deviate significantly from the true model.
Here, we take a bell shaped hazard function with sharp spike at 2 year timepoint, and we can see that simsurv, which is using numerical integration of the hazard function, does not reproduce the survival function at a large sample of N = 10,000.
In contrast, as shown in the figure, using simsurv and specifying the cumulative hazard (which exists for this example) does reproduce the survival function accurately.
This is due to having too few nodes limited at 15 for the Gauss-Kronrod quadrature.
The Stata version of this in survsim does not run into this problem, as nodes can be set e.g. to 100 or 200, which overcomes the problem in this case (though Stata also struggles for nodes = 15).
Can greater flexibility be added to the number of nodes in simsurv?
The text was updated successfully, but these errors were encountered: