Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient nodes for simulating from complex hazards #20

Open
irtimmins opened this issue Feb 6, 2024 · 0 comments
Open

Insufficient nodes for simulating from complex hazards #20

irtimmins opened this issue Feb 6, 2024 · 0 comments

Comments

@irtimmins
Copy link

irtimmins commented Feb 6, 2024

Hi, I have problems using simsurv for more complex hazard functions (which I need for cases where the cumulative hazard is not analytically tractable).

A reproducible example is given below. The figure shows an example where the KM survival estimates from simsurv, which are generated from specifying the hazard function, deviate significantly from the true model.

Here, we take a bell shaped hazard function with sharp spike at 2 year timepoint, and we can see that simsurv, which is using numerical integration of the hazard function, does not reproduce the survival function at a large sample of N = 10,000.

In contrast, as shown in the figure, using simsurv and specifying the cumulative hazard (which exists for this example) does reproduce the survival function accurately.

This is due to having too few nodes limited at 15 for the Gauss-Kronrod quadrature.
The Stata version of this in survsim does not run into this problem, as nodes can be set e.g. to 100 or 200, which overcomes the problem in this case (though Stata also struggles for nodes = 15).

Can greater flexibility be added to the number of nodes in simsurv?

library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(survminer)
library(simsurv)

set.seed(1024771823)

N <- 1e4 # sample for simsurv datasets.
x_cov <- tibble(id = 1:N)

# set parameters to define survival model.
mu <- 2
sd <- 0.1

haz_norm <- function(t, x = NULL, betas = NULL){
  res <- dnorm(x=t, mean = mu, sd = sd)+0.2
  res
}
cumhaz_norm <- function(t, x = NULL, betas = NULL){
  res <- pnorm(q=t, mean = mu, sd = sd)+0.2*t
  res
}

# simulate data using hazard function
sim_data_h <- simsurv(x = x_cov,
                      hazard = haz_norm,
                      nodes = 15)
# extract Kaplan-Meier estimates
km_h <- survfit(Surv(eventtime, status) ~ 1, data=sim_data_h )
km_h_plot <- ggsurvplot(km_h,data=sim_data_h)$data.survplot %>% 
  select(time, surv) %>%
  mutate(simsurv_scale = "hazard",
         alpha = 1,
         linetype = 1)

# simulate data using cumulative hazard function
sim_data_ch <- simsurv(x = x_cov,
                       cumhazard = cumhaz_norm)
# extract Kaplan-Meier estimates
km_ch <- survfit(Surv(eventtime, status) ~ 1, data=sim_data_ch)
km_ch_plot <- ggsurvplot(km_ch,data=sim_data_ch)$data.survplot %>% 
  select(time, surv) %>%
  mutate(simsurv_scale = "cumhazard",
         alpha = 1,
         linetype = 1)

# generate true values of survival function.
true_surv_plot <- tibble(time = seq(from = 0, to = 20, length.out = 1e4)) %>%
  mutate(surv = exp(-(pnorm(q=time, mean = mu, sd = sd)+0.2*time))) %>%
  mutate(simsurv_scale = "true model",
         alpha = 1,
         linetype = 2)

# combine values together into tidy format tibble
plot_df <- rbind(true_surv_plot, km_h_plot, km_ch_plot) %>%
  filter(time <= 20) %>%
  mutate(linetype = as.factor(linetype))

ggplot(aes(x = time, y= surv, 
           colour = simsurv_scale, 
           linetype = simsurv_scale), data = plot_df) +
  theme_classic()+
  geom_step()+
  xlim(c(0,20))+
  ylim(c(0,1))+
  scale_colour_manual(name = "Simsurv scale", values = c("#c51b7d","#3288bd", "#1a1a1a"))+
  scale_linetype_manual(name = "Simsurv scale", values = c("solid","solid","dashed"))+
  ylab("Survival")

simsurv_nodes

@irtimmins irtimmins changed the title Insufficient nodes for integrating complex hazards Insufficient nodes for simulating from complex hazards Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant