Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is cfr_rolling() suitable for small time-series and cfr_time_varying() for larget ones? #123

Closed
avallecam opened this issue Jan 31, 2024 · 7 comments
Labels
question Further information is requested

Comments

@avallecam
Copy link
Member

avallecam commented Jan 31, 2024

From documentation:

To understand how severity has changed over time (e.g. following vaccination or pathogen evolution), use the function cfr_time_varying(). This function is however not well suited to small outbreaks because it requires sufficiently many cases over time to estimate how CFR changes.

However, I do not find a specific reference to a difference or direct comparison between cfr_rolling() and cfr_time_varying().

From reprex-es below, I find that:

  • cfr_rolling() is more suited to ebola1976 (small time-series), while
  • cfr_time_varying() is more suited to covid_data (larger time-series).

Can I conclude that cfr_rolling() would be useful for real-time estimations, while cfr_time_varying() for retrospective assessments?

small outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# Load the Ebola 1976 data provided with the package
data("ebola1976")

# Calculate the rolling daily CFR while correcting for delays
rolling_cfr_corrected <- cfr_rolling(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# Calculate the time varying daily CFR while correcting for delays
time_varying_cfr_corrected <- cfr_time_varying(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# combine the data for plotting
rolling_cfr_corrected$method <- "rolling"
time_varying_cfr_corrected$method <- "time_varying"

data_cfr <- rbind(
  rolling_cfr_corrected,
  time_varying_cfr_corrected
)

# visualise both corrected and uncorrected rolling estimates
ggplot(data_cfr) +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, colour = method)
  )
#> Warning: Removed 19 rows containing missing values (`geom_line()`).

Created on 2024-01-31 with reprex v2.0.2

large outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# get data pre-loaded with the package
data("covid_data")
df_covid_uk <- covid_data[covid_data$country == "United Kingdom", ]

# estimate time varying severity while correcting for delays
time_varying_cfr <- cfr_time_varying(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440),
  burn_in = 7L
)

covid_rolling <- cfr_rolling(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440)
)

time_varying_cfr %>% 
  mutate(method = "time_varying") %>% 
  bind_rows(
    covid_rolling %>% 
      mutate(method = "rolling")
  ) %>% 
  ggplot() +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, color = method)
  )
#> Warning: Removed 68 rows containing missing values (`geom_line()`).

Created on 2024-01-31 with reprex v2.0.2

@avallecam avallecam added the question Further information is requested label Jan 31, 2024
@pratikunterwegs
Copy link
Collaborator

Thanks @avallecam - the two functions are aimed at providing different functionalities.

  • cfr_rolling() shows what the estimated CFR would be on each day of the outbreak, given that future data on cases and deaths is not available at the time. The final value of cfr_rolling() estimates is expected to be identical to the value of cfr_static() on the same data. This is not sensitive to the length of the outbreak (afaik).

  • cfr_time_varying() calculates the CFR over a moving window, and helps to understand changes in CFR due to changes in the epidemic, e.g. due to a new variant or due to increased immunity from vaccination. It performs poorly for short outbreaks as it discards some data at the start (the burn_in), discards data at the end (due to the size of smoothing_window) (both cases return NA), and also returns NA where deaths < estimated deaths and estimated deaths > 0. I think that the reason it is less suitable for smaller outbreaks is that these conditions are more common there, returning more NAs.

@avallecam
Copy link
Member Author

avallecam commented Jan 31, 2024

Useful clarification.

  • cfr_rolling() shows the daily cumulative sum of cases used by cfr_static()
  • about cfr_time_varying we could possibly say that it is sensitive to the length, given the discard of data from burn_in and smoothing_window, and the size of the deaths in the outbreak, given the estimated deaths constraint that it needs.

About the different trends that we get from the two methods, any suggestions about how to discuss it?

@pratikunterwegs
Copy link
Collaborator

pratikunterwegs commented Feb 1, 2024

About the different trends that we get from the two methods, any suggestions about how to discuss it?

I think the key is to discuss the static and time varying methods and where they apply. The rolling method is perhaps more useful to check whether an outbreak's CFR estimate has stabilised. The rolling and time-varying methods aren't really comparable, so I wouldn't really discuss them together. Is it worth mentioning some reasons not to interpret the rolling estimate in relation to the time-varying one?

@adamkucharski
Copy link
Member

Perhaps a useful rule of thumb is to discuss in context of the sampling uncertainy. E.g. With 100 cases, the fatality risk estimate will, roughly speaking, have a 95% confidence interval ±10% of the mean estimate (binomial CI). So if we have >100 cases with expected outcomes on a given day, we can get reasonable estimates of the time varying CFR. But if we only have >100 cases over the course of the whole epidemic, we probably need to rely on the static version that uses the cumulative data.

@pratikunterwegs
Copy link
Collaborator

Thanks @adamkucharski! @avallecam, did you have any specific suggestions for where these clarifications should be added?

avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Feb 16, 2024
@avallecam
Copy link
Member Author

did you have any specific suggestions for where these clarifications should be added?

In the tutorial episode drafted, we include the clarifications shared here. Please, see that section in the working branch (edit suggestions welcome in working PR). At the end, we refer to the vignette on cfr_time_varying().

Although this outbreak size detail is mentioned briefly in the get started vignette, probably it would be informative to complement the current vignette showing that point in particular, on how the time-varying method performs under different sizes of cases per day (then, our reference from tutorials to vignette could be more specific). If this content gets too long, we can consider another vignette. If appropriate, this could be based on the reprex above.

@pratikunterwegs
Copy link
Collaborator

Closed as answered, and this will be addressed by #128

avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Mar 25, 2024
avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Mar 25, 2024
avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants