is `cfr_rolling()` suitable for small time-series and `cfr_time_varying()` for larget ones? #123

avallecam · 2024-01-31T15:00:56Z

To understand how severity has changed over time (e.g. following vaccination or pathogen evolution), use the function cfr_time_varying(). This function is however not well suited to small outbreaks because it requires sufficiently many cases over time to estimate how CFR changes.

However, I do not find a specific reference to a difference or direct comparison between cfr_rolling() and cfr_time_varying().

From reprex-es below, I find that:

cfr_rolling() is more suited to ebola1976 (small time-series), while
cfr_time_varying() is more suited to covid_data (larger time-series).

Can I conclude that cfr_rolling() would be useful for real-time estimations, while cfr_time_varying() for retrospective assessments?

small outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# Load the Ebola 1976 data provided with the package
data("ebola1976")

# Calculate the rolling daily CFR while correcting for delays
rolling_cfr_corrected <- cfr_rolling(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# Calculate the time varying daily CFR while correcting for delays
time_varying_cfr_corrected <- cfr_time_varying(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# combine the data for plotting
rolling_cfr_corrected$method <- "rolling"
time_varying_cfr_corrected$method <- "time_varying"

data_cfr <- rbind(
  rolling_cfr_corrected,
  time_varying_cfr_corrected
)

# visualise both corrected and uncorrected rolling estimates
ggplot(data_cfr) +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, colour = method)
  )
#> Warning: Removed 19 rows containing missing values (`geom_line()`).

^{Created on 2024-01-31 with reprex v2.0.2}

large outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# get data pre-loaded with the package
data("covid_data")
df_covid_uk <- covid_data[covid_data$country == "United Kingdom", ]

# estimate time varying severity while correcting for delays
time_varying_cfr <- cfr_time_varying(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440),
  burn_in = 7L
)

covid_rolling <- cfr_rolling(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440)
)

time_varying_cfr %>% 
  mutate(method = "time_varying") %>% 
  bind_rows(
    covid_rolling %>% 
      mutate(method = "rolling")
  ) %>% 
  ggplot() +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, color = method)
  )
#> Warning: Removed 68 rows containing missing values (`geom_line()`).

^{Created on 2024-01-31 with reprex v2.0.2}

The text was updated successfully, but these errors were encountered:

pratikunterwegs · 2024-01-31T16:24:49Z

Thanks @avallecam - the two functions are aimed at providing different functionalities.

cfr_rolling() shows what the estimated CFR would be on each day of the outbreak, given that future data on cases and deaths is not available at the time. The final value of cfr_rolling() estimates is expected to be identical to the value of cfr_static() on the same data. This is not sensitive to the length of the outbreak (afaik).
cfr_time_varying() calculates the CFR over a moving window, and helps to understand changes in CFR due to changes in the epidemic, e.g. due to a new variant or due to increased immunity from vaccination. It performs poorly for short outbreaks as it discards some data at the start (the burn_in), discards data at the end (due to the size of smoothing_window) (both cases return NA), and also returns NA where deaths < estimated deaths and estimated deaths > 0. I think that the reason it is less suitable for smaller outbreaks is that these conditions are more common there, returning more NAs.

avallecam · 2024-01-31T21:30:44Z

Useful clarification.

cfr_rolling() shows the daily cumulative sum of cases used by cfr_static()
about cfr_time_varying we could possibly say that it is sensitive to the length, given the discard of data from burn_in and smoothing_window, and the size of the deaths in the outbreak, given the estimated deaths constraint that it needs.

About the different trends that we get from the two methods, any suggestions about how to discuss it?

pratikunterwegs · 2024-02-01T09:45:15Z

About the different trends that we get from the two methods, any suggestions about how to discuss it?

I think the key is to discuss the static and time varying methods and where they apply. The rolling method is perhaps more useful to check whether an outbreak's CFR estimate has stabilised. The rolling and time-varying methods aren't really comparable, so I wouldn't really discuss them together. Is it worth mentioning some reasons not to interpret the rolling estimate in relation to the time-varying one?

adamkucharski · 2024-02-01T13:47:04Z

Perhaps a useful rule of thumb is to discuss in context of the sampling uncertainy. E.g. With 100 cases, the fatality risk estimate will, roughly speaking, have a 95% confidence interval ±10% of the mean estimate (binomial CI). So if we have >100 cases with expected outcomes on a given day, we can get reasonable estimates of the time varying CFR. But if we only have >100 cases over the course of the whole epidemic, we probably need to rely on the static version that uses the cumulative data.

pratikunterwegs · 2024-02-01T14:15:49Z

Thanks @adamkucharski! @avallecam, did you have any specific suggestions for where these clarifications should be added?

from epiverse-trace/cfr#123

avallecam · 2024-02-17T11:27:45Z

did you have any specific suggestions for where these clarifications should be added?

In the tutorial episode drafted, we include the clarifications shared here. Please, see that section in the working branch (edit suggestions welcome in working PR). At the end, we refer to the vignette on cfr_time_varying().

Although this outbreak size detail is mentioned briefly in the get started vignette, probably it would be informative to complement the current vignette showing that point in particular, on how the time-varying method performs under different sizes of cases per day (then, our reference from tutorials to vignette could be more specific). If this content gets too long, we can consider another vignette. If appropriate, this could be based on the reprex above.

pratikunterwegs · 2024-03-13T14:53:52Z

Closed as answered, and this will be addressed by #128

from epiverse-trace/cfr#123 (comment)

from epiverse-trace/cfr#123

from epiverse-trace/cfr#123 (comment)

avallecam added the question Further information is requested label Jan 31, 2024

avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Feb 16, 2024

add details about rolling and time varying method

4590f4b

from epiverse-trace/cfr#123

pratikunterwegs mentioned this issue Mar 13, 2024

Add message to cfr_rolling() #128

Closed

pratikunterwegs closed this as completed Mar 13, 2024

avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Mar 25, 2024

replace sentence with sampling uncertainty example

0b250ef

from epiverse-trace/cfr#123 (comment)

avallecam mentioned this issue Mar 25, 2024

add episodes on estimate severity epiverse-trace/tutorials-middle#1

Merged

avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Mar 25, 2024

add details about rolling and time varying method

480106a

from epiverse-trace/cfr#123

avallecam added a commit to epiverse-trace/tutorials-middle that referenced this issue Mar 25, 2024

replace sentence with sampling uncertainty example

6380022

from epiverse-trace/cfr#123 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is `cfr_rolling()` suitable for small time-series and `cfr_time_varying()` for larget ones? #123

is `cfr_rolling()` suitable for small time-series and `cfr_time_varying()` for larget ones? #123

avallecam commented Jan 31, 2024 •

edited

Loading

pratikunterwegs commented Jan 31, 2024

avallecam commented Jan 31, 2024 •

edited

Loading

pratikunterwegs commented Feb 1, 2024 •

edited

Loading

adamkucharski commented Feb 1, 2024

pratikunterwegs commented Feb 1, 2024

avallecam commented Feb 17, 2024

pratikunterwegs commented Mar 13, 2024

is cfr_rolling() suitable for small time-series and cfr_time_varying() for larget ones? #123

is cfr_rolling() suitable for small time-series and cfr_time_varying() for larget ones? #123

Comments

avallecam commented Jan 31, 2024 • edited Loading

small outbreak time-series

large outbreak time-series

pratikunterwegs commented Jan 31, 2024

avallecam commented Jan 31, 2024 • edited Loading

pratikunterwegs commented Feb 1, 2024 • edited Loading

adamkucharski commented Feb 1, 2024

pratikunterwegs commented Feb 1, 2024

avallecam commented Feb 17, 2024

pratikunterwegs commented Mar 13, 2024

is `cfr_rolling()` suitable for small time-series and `cfr_time_varying()` for larget ones? #123

is `cfr_rolling()` suitable for small time-series and `cfr_time_varying()` for larget ones? #123

avallecam commented Jan 31, 2024 •

edited

Loading

avallecam commented Jan 31, 2024 •

edited

Loading

pratikunterwegs commented Feb 1, 2024 •

edited

Loading