diff --git a/DESCRIPTION b/DESCRIPTION index e0acd5211..d286fb8b8 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: epinowcast -Title: Hierarchical Nowcasting of Right Censored Epidemiological Counts +Title: Flexible Hierarchical Nowcasting Version: 0.1.0.1000 Authors@R: c(person(given = "Sam Abbott", @@ -21,14 +21,16 @@ Authors@R: email = "me.dewitt.jr@gmail.com", comment = c(ORCID = "0000-0001-8940-1967"))) Description: Tools to enable flexible and efficient hierarchical nowcasting of - right censored epidemiological counts using a semi-mechanistic Bayesian - method with support for both day of reference and day of report effects. - Nowcasting in this context is the estimation of the total notifications - (for example hospitalisations or deaths) that will be reported for a given - date based on those currently reported and the pattern of reporting for - previous days. This can be useful when tracking the spread of infectious - disease in real-time as otherwise changes in trends can be obfuscated by - partial reporting or their detection may be delayed due to the use of simpler methods like truncation. + right-truncated epidemiological time-series using a semi-mechanistic + Bayesian model with support for a range of reporting and generative + processes. Nowcasting, in this context, is gaining situational awareness + using currently available observations and the reporting patterns of + historical observations. This can be useful when tracking the spread of + infectious disease in real-time: without nowcasting, changes in trends can + be obfuscated by partial reporting or their detection may be delayed due to + the use of simpler methods like truncation. While the package has been + designed with epidemiological applications in mind, it could be applied + to any set of right-truncated time-series count data. License: MIT + file LICENSE URL: https://epiforecasts.io/epinowcast/, https://github.com/epiforecasts/epinowcast/ diff --git a/NEWS.md b/NEWS.md index 08bccc428..079c59ed2 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,6 +8,7 @@ This is a major release and contains multiple breaking changes. If needing the o ## Package +* Renamed the package and updated the description to give more clarity about the problem space it focusses on. See [#110](https://github.com/epiforecasts/epinowcast/pull/110) by [@seabbs](https://github.com/seabbs). * A new helper function `enw_delay_metadata()` has been added. This produces metadata about the delay distribution vector that may be helpful in future modelling. This prepares the way for [#4](https://github.com/epiforecasts/epinowcast/issues/4) where this data frame will be combined with the reference metadata in order to build non-parametric hazard reference and delay based models. In addition to adding this function, it has also been added to the output of `enw_preprocess_data()` in order to make the metadata readily available to end-users. See [#80](https://github.com/epiforecasts/epinowcast/pull/80) by [@seabbs](https://github.com/seabbs). * Two new helper functions `enw_filter_reference_dates()` and `enw_filter_report_dates()` have been added. These replace `enw_retrospective_data()` but allow users to similarly construct retrospective data. Splitting these functions out into components also allows for additional use cases that were not previously possible. Note that by definition it is assumed that a report date for a given reference date must be the equal or greater (i.e a report cannot happen before the event being reported occurs). See [#82](https://github.com/epiforecasts/epinowcast/pull/82) by [@sbfnk](https://github.com/sbfnk) and [@seabbs](https://github.com/seabbs). * The internal grouping variables have been refactored to reduce the chance of clashes with columns in the data frames supplied by the user. There will also be an error thrown in case of a variable clash, making preprocessing safer. See [#102](https://github.com/epiforecasts/epinowcast/pull/102) by [@adrian-lison](https://github.com/adrian-lison) and [@seabbs](https://github.com/seabbs), which solves [#99](https://github.com/epiforecasts/epinowcast/issues/99). diff --git a/R/epinowcast.R b/R/epinowcast.R index df6fe3956..82be5400a 100644 --- a/R/epinowcast.R +++ b/R/epinowcast.R @@ -1,4 +1,4 @@ -#' @title Nowcast right censored data +#' @title Nowcast using partially observed data #' #' @description Provides a user friendly interface around package functionality #' to produce a nowcast from observed preprocessed data, a reference model, and diff --git a/README.Rmd b/README.Rmd index 269af39a6..df8fb15f4 100644 --- a/README.Rmd +++ b/README.Rmd @@ -15,7 +15,7 @@ knitr::opts_chunk$set( ) ``` -# Hierarchical nowcasting of right censored epidemiological counts +# Flexible hierarchical nowcasting [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental) @@ -28,7 +28,7 @@ license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/ [![DOI](https://zenodo.org/badge/422611952.svg)](https://zenodo.org/badge/latestdoi/422611952) -This package contains tools to enable flexible and efficient hierarchical nowcasting of right censored epidemiological counts using a semi-mechanistic Bayesian method with support for both day of reference and day of report effects. Nowcasting in this context is the estimation of the total notifications (for example hospitalisations or deaths) that will be reported for a given date based on those currently reported and the pattern of reporting for previous days. This can be useful when tracking the spread of infectious disease in real-time as otherwise changes in trends can be obfuscated by partial reporting or their detection may be delayed due to the use of simpler methods like truncation. +Tools to enable flexible and efficient hierarchical nowcasting of right-truncated epidemiological time-series using a semi-mechanistic Bayesian model with support for a range of reporting and generative processes. Nowcasting, in this context, is gaining situational awareness using currently available observations and the reporting patterns of historical observations. This can be useful when tracking the spread of infectious disease in real-time: without nowcasting, changes in trends can be obfuscated by partial reporting or their detection may be delayed due to the use of simpler methods like truncation. While the package has been designed with epidemiological applications in mind, it could be applied to any set of right-truncated time-series count data. ## Installation diff --git a/README.md b/README.md index e2fa80283..3ba8a8507 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ -# Hierarchical nowcasting of right censored epidemiological counts +# Flexible hierarchical nowcasting [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental) @@ -17,17 +17,17 @@ contributors](https://img.shields.io/github/contributors/epiforecasts/epinowcast [![DOI](https://zenodo.org/badge/422611952.svg)](https://zenodo.org/badge/latestdoi/422611952) -This package contains tools to enable flexible and efficient -hierarchical nowcasting of right censored epidemiological counts using a -semi-mechanistic Bayesian method with support for both day of reference -and day of report effects. Nowcasting in this context is the estimation -of the total notifications (for example hospitalisations or deaths) that -will be reported for a given date based on those currently reported and -the pattern of reporting for previous days. This can be useful when -tracking the spread of infectious disease in real-time as otherwise -changes in trends can be obfuscated by partial reporting or their -detection may be delayed due to the use of simpler methods like -truncation. +Tools to enable flexible and efficient hierarchical nowcasting of +right-truncated epidemiological time-series using a semi-mechanistic Bayesian +model with support for a range of reporting and generative processes. +Nowcasting, in this context, is gaining situational awareness using +currently available observations and the reporting patterns of +historical observations. This can be useful when tracking the spread of +infectious disease in real-time: without nowcasting, changes in trends can be +obfuscated by partial reporting or their detection may be delayed due to +the use of simpler methods like truncation. While the package has been +designed with epidemiological applications in mind, it could be +applied to any set of right-truncated time-series count data. ## Installation @@ -215,12 +215,12 @@ nowcast <- epinowcast(pobs, ) #> Running MCMC with 2 parallel chains, with 2 thread(s) per chain... #> -#> Chain 1 finished in 71.8 seconds. -#> Chain 2 finished in 92.8 seconds. +#> Chain 2 finished in 50.2 seconds. +#> Chain 1 finished in 51.8 seconds. #> #> Both chains finished successfully. -#> Mean chain execution time: 82.3 seconds. -#> Total execution time: 92.9 seconds. +#> Mean chain execution time: 51.0 seconds. +#> Total execution time: 52.0 seconds. ``` ### Results @@ -237,11 +237,11 @@ nowcast #> metadelay time snapshots by groups max_delay max_date #> 1: 41 41 1 40 2021-08-22 #> fit data fit_args samples max_rhat -#> 1: 1000 1.03 +#> 1: 1000 1.02 #> divergent_transitions per_divergent_transitions max_treedepth #> 1: 0 0 8 #> no_at_max_treedepth per_at_max_treedepth run_time -#> 1: 271 0.271 92.9 +#> 1: 10 0.01 52 ``` Summarise the nowcast for the latest snapshot of data. @@ -263,26 +263,26 @@ nowcast |> #> 10: 2021-07-23 2021-08-22 1 86 DE 00+ 86 #> cum_prop_reported delay prop_reported mean median sd mad q5 #> 1: 1 39 0 72.000 72 0.0000000 0.0000 72 -#> 2: 1 38 0 69.044 69 0.2147326 0.0000 69 -#> 3: 1 37 0 47.081 47 0.3074631 0.0000 47 -#> 4: 1 36 0 65.194 65 0.4388127 0.0000 65 -#> 5: 1 35 0 50.272 50 0.5350722 0.0000 50 -#> 6: 1 34 0 36.220 36 0.4731002 0.0000 36 -#> 7: 1 33 0 94.457 94 0.6990276 0.0000 94 -#> 8: 1 32 0 91.705 91 0.9158678 0.0000 91 -#> 9: 1 31 0 100.101 100 1.1207386 1.4826 99 -#> 10: 1 30 0 87.198 87 1.1866778 1.4826 86 -#> q95 rhat ess_bulk ess_tail -#> 1: 72.00 NA NA NA -#> 2: 69.00 1.0001418 975.6787 970.3968 -#> 3: 48.00 0.9990071 900.4571 882.7382 -#> 4: 66.00 0.9994697 900.4178 758.2049 -#> 5: 51.00 1.0017612 707.0578 679.5653 -#> 6: 37.00 0.9997234 969.2497 963.5052 -#> 7: 96.00 0.9982308 950.4162 859.6649 -#> 8: 93.05 1.0002517 1004.3908 1008.6991 -#> 9: 102.00 1.0020085 948.5313 936.2672 -#> 10: 89.00 1.0000114 1041.1571 858.5731 +#> 2: 1 38 0 69.046 69 0.2189336 0.0000 69 +#> 3: 1 37 0 47.096 47 0.3144565 0.0000 47 +#> 4: 1 36 0 65.176 65 0.4552266 0.0000 65 +#> 5: 1 35 0 50.271 50 0.5251998 0.0000 50 +#> 6: 1 34 0 36.242 36 0.5096035 0.0000 36 +#> 7: 1 33 0 94.457 94 0.6637707 0.0000 94 +#> 8: 1 32 0 91.738 92 0.8945145 1.4826 91 +#> 9: 1 31 0 100.032 100 1.0545559 1.4826 99 +#> 10: 1 30 0 87.159 87 1.1344629 1.4826 86 +#> q95 rhat ess_bulk ess_tail +#> 1: 72 NA NA NA +#> 2: 69 1.0007765 1006.9282 1002.6983 +#> 3: 48 1.0000955 786.6752 783.7836 +#> 4: 66 0.9993755 940.8320 855.6696 +#> 5: 51 1.0023047 1000.5282 955.8951 +#> 6: 37 1.0006436 989.9089 867.0135 +#> 7: 96 0.9982502 1042.2918 882.8528 +#> 8: 94 1.0013474 919.6210 759.2589 +#> 9: 102 0.9993403 983.8782 930.9138 +#> 10: 89 1.0013534 861.8767 886.7324 ``` Plot the summarised nowcast against currently observed data (or @@ -331,17 +331,17 @@ samples[, (cols) := lapply(.SD, frollsum, n = 7), #> 33999: 2021-08-22 2021-08-22 1 45 DE 00+ 1093 #> 34000: 2021-08-22 2021-08-22 1 45 DE 00+ 1093 #> cum_prop_reported delay prop_reported .chain .iteration .draw sample -#> 1: 1 33 0 1 1 1 433 -#> 2: 1 33 0 1 2 2 433 -#> 3: 1 33 0 1 3 3 433 -#> 4: 1 33 0 1 4 4 435 -#> 5: 1 33 0 1 5 5 437 +#> 1: 1 33 0 1 1 1 435 +#> 2: 1 33 0 1 2 2 435 +#> 3: 1 33 0 1 3 3 438 +#> 4: 1 33 0 1 4 4 436 +#> 5: 1 33 0 1 5 5 433 #> --- -#> 33996: 1 0 1 2 496 996 2233 -#> 33997: 1 0 1 2 497 997 2024 -#> 33998: 1 0 1 2 498 998 2283 -#> 33999: 1 0 1 2 499 999 2109 -#> 34000: 1 0 1 2 500 1000 1806 +#> 33996: 1 0 1 2 496 996 2107 +#> 33997: 1 0 1 2 497 997 2374 +#> 33998: 1 0 1 2 498 998 2089 +#> 33999: 1 0 1 2 499 999 1975 +#> 34000: 1 0 1 2 500 1000 2044 latest_germany_hosp_7day <- copy(latest_germany_hosp)[ , confirm := frollsum(confirm, n = 7) @@ -372,14 +372,14 @@ following, #> #> To cite epinowcast in publications use: #> - #> Sam Abbott (2021). epinowcast: Hierarchical nowcasting of right - #> censored epidemiological counts, DOI: 10.5281/zenodo.5637165 + #> Sam Abbott, Adrian Lison, and Sebastian Funk (2021). epinowcast: Flexible + #> hierarchical nowcasting, DOI: 10.5281/zenodo.5637165 #> #> A BibTeX entry for LaTeX users is #> #> @Article{, - #> title = {epinowcast: Hierarchical nowcasting of right censored epidemiological counts}, - #> author = {Sam Abbott}, + #> title = {epinowcast: Flexible hierarchical nowcasting}, + #> author = {Sam Abbott and Adrian Lison and Sebastian Funk}, #> journal = {Zenodo}, #> year = {2021}, #> doi = {10.5281/zenodo.5637165}, diff --git a/_pkgdown.yml b/_pkgdown.yml index 42651577b..d582d650c 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -10,7 +10,7 @@ template: opengraph: image: src: man/figures/README-nowcast-1.png - alt: An R package for hierarchical nowcasting of right censored epidemiological counts + alt: An R package for flexible hierarchical nowcasting twitter: creator: "@seabbs" card: summary_large_image diff --git a/inst/CITATION b/inst/CITATION index c7e82edfd..0e8ef84f6 100644 --- a/inst/CITATION +++ b/inst/CITATION @@ -1,8 +1,9 @@ citHeader("To cite epinowcast in publications use:") citEntry(entry = "Article", - title = "epinowcast: Hierarchical nowcasting of right censored epidemiological counts", # nolint - author = personList(as.person("Sam Abbott")), + title = "epinowcast: Flexible hierarchical nowcasting", # nolint + author = personList(as.person("Sam Abbott"), as.person("Adrian Lison"), + as.person("Sebastian Funk")), journal = "Zenodo", year = "2021", volume = "", @@ -11,7 +12,7 @@ citEntry(entry = "Article", doi = "10.5281/zenodo.5637165", textVersion = - paste("Sam Abbott (2021).", - "epinowcast: Hierarchical nowcasting of right censored epidemiological counts, ", # nolint + paste("Sam Abbott, Adrian Lison, and Sebastian Funk (2021).", + "epinowcast: Flexible hierarchical nowcasting, ", # nolint "DOI: 10.5281/zenodo.5637165") ) diff --git a/man/epinowcast-package.Rd b/man/epinowcast-package.Rd index 6c6ab0d2c..33a1ae838 100644 --- a/man/epinowcast-package.Rd +++ b/man/epinowcast-package.Rd @@ -3,11 +3,11 @@ \docType{package} \name{epinowcast-package} \alias{epinowcast-package} -\title{epinowcast: Hierarchical Nowcasting of Right Censored Epidemiological Counts} +\title{epinowcast: Flexible Hierarchical Nowcasting} \description{ \if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}} -Tools to enable flexible and efficient hierarchical nowcasting of right censored epidemiological counts using a semi-mechanistic Bayesian method with support for both day of reference and day of report effects. Nowcasting in this context is the estimation of the total notifications (for example hospitalisations or deaths) that will be reported for a given date based on those currently reported and the pattern of reporting for previous days. This can be useful when tracking the spread of infectious disease in real-time as otherwise changes in trends can be obfuscated by partial reporting or their detection may be delayed due to the use of simpler methods like truncation. +Tools to enable flexible and efficient hierarchical nowcasting of right truncated epidemiological time-series using a semi-mechanistic Bayesian method with support for a range of reporting and generative processes. Nowcasting, in this context, is gaining situational awareness using currently available observations and the reporting patterns of historical observations. This can be useful when tracking the spread of infectious disease in real-time as otherwise changes in trends can be obfuscated by partial reporting or their detection may be delayed due to the use of simpler methods like truncation. While the package has been designed with the epidemiological application in mind, it could be applied to any set of right-truncated time-series count data. } \seealso{ Useful links: diff --git a/man/epinowcast.Rd b/man/epinowcast.Rd index 88d8a083f..e4bffc8ef 100644 --- a/man/epinowcast.Rd +++ b/man/epinowcast.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/epinowcast.R \name{epinowcast} \alias{epinowcast} -\title{Nowcast right censored data} +\title{Nowcast using partially observed data} \usage{ epinowcast( pobs, diff --git a/man/figures/README-nowcast-1.png b/man/figures/README-nowcast-1.png index e506ae486..acc5c89d7 100644 Binary files a/man/figures/README-nowcast-1.png and b/man/figures/README-nowcast-1.png differ diff --git a/man/figures/README-pp-1.png b/man/figures/README-pp-1.png index 592351f86..011bbdb81 100644 Binary files a/man/figures/README-pp-1.png and b/man/figures/README-pp-1.png differ diff --git a/man/figures/README-week_nowcast-1.png b/man/figures/README-week_nowcast-1.png index df1a189db..0f81a07e7 100644 Binary files a/man/figures/README-week_nowcast-1.png and b/man/figures/README-week_nowcast-1.png differ diff --git a/vignettes/germany-age-stratified-nowcasting.Rmd b/vignettes/germany-age-stratified-nowcasting.Rmd index 0e9e03c3e..b04753d08 100644 --- a/vignettes/germany-age-stratified-nowcasting.Rmd +++ b/vignettes/germany-age-stratified-nowcasting.Rmd @@ -41,7 +41,7 @@ options(mc.cores = 2) # Data -Nowcasting is effectively the estimation of reporting patterns for recently reported data. This requires data on these patterns for previous observations and typically this means the time series of data as reported on multiple consecutive days (in theory non-consecutive days could be used but this is not yet supported in `epinowcast`). +Nowcasting is effectively the estimation of reporting patterns for recently reported data. This requires data on these patterns for previous observations, which typically means the time series of data as reported on multiple consecutive days (in theory non-consecutive days could be used but this is not yet supported in `epinowcast`). Here we use COVID-19 hospitalisations by date of positive test in Germany stratified by age group available from up to the 1st of September 2020 (with 40 days of data included prior to this) as an example of data available in real-time and hospitalisations by date of positive test available up to 20th of October to represent hospitalisations as finally reported. These data are sourced from the [Robert Koch Institute via the Germany Nowcasting hub](https://github.com/KITmetricslab/hospitalization-nowcast-hub/wiki/Truth-data#role-an-definition-of-the-seven-day-hospitalization-incidence) where they are deconvolved from weekly data and days with negative reported hospitalisations are adjusted. @@ -69,7 +69,7 @@ retro_nat_germany #> 6027: 2021-07-23 DE 80+ 5 2021-09-01 ``` -Similarly we then find the data that were available on the 20th of October for these dates which will serve as the target "true" data. +Similarly we then find the data that were available on the 20th of October for these dates, which will serve as the target "true" data. ```r @@ -81,7 +81,7 @@ latest_nat_germany <- nat_germany_hosp |> # Data preprocessing -`epinowcast` works by assuming data has been preprocessed into the reporting format it requires coupled with meta data for both reference and report dates. `enw_preprocess_data()` can be used for this though users can also use the internal functions to produce their own custom preprocessing steps. It is at this stage that arbitrary groupings of observations can be defined which will then be propagated throughout all subsequent modelling steps. Here we have data stratified by age and so grouped by age group but in principle this could be any grouping or combination of groups independent of the reference and report date models. Here we also assume a maximum delay required to make the model identifiable. We set this to 40 days due to evidence of long reporting delays in this example data but note that in most cases the majority of right censoring occurs in the first few days and that increasing the maximum delay has a non-linear effect on run-time (i.e a 20 day delay will be much faster to fit a model for than a 40 day delay). Note also that under the current formulation delays longer than the maximum are ignored so that the adjusted estimate is really for data reported after the maximum delay rather than for finally reported data. +`epinowcast` works by assuming data has been preprocessed into the reporting format it requires, coupled with meta data for both reference and report dates. `enw_preprocess_data()` can be used for this, although users can also use the internal functions to produce their own custom preprocessing steps. It is at this stage that arbitrary groupings of observations can be defined, which will then be propagated throughout all subsequent modelling steps. Here we have data stratified by age, and hence grouped by age group, but in principle this could be any grouping or combination of groups independent of the reference and report date models. We furthermore assume a maximum delay required to make the model identifiable. We set this to 40 days due to evidence of long reporting delays in this example data. However, note that in most cases the majority of right truncation occurs in the first few days and that increasing the maximum delay has a non-linear effect on run-time (i.e. a model with a maximum delay of 20 days will be much faster to fit than with 40 days). Note also that under the current formulation delays longer than the maximum are ignored so that the adjusted estimate is really for data reported after the maximum delay rather than for finally reported data. Another key modelling choice we make at this stage is to model overall hospitalisations jointly with age groups rather than as an aggregation of age group estimates. This implicitly assumes that aggregated and non-aggregated data are not comparable (which may or may not be the case) but that the reporting process shares some of the same mechanisms. Another way to approach this would be to only model age stratified hospitalisations and then to aggregate the nowcast estimates into total counts after fitting the model. @@ -484,7 +484,7 @@ enw_plot_nowcast_quantiles( In all the models defined above we have assumed that the delay distribution, aside from report day effects, is parametric and has a lognormal distribution. Both of these assumptions may be less than optimal. Alternatives include assuming a different distributional form (such as the gamma distribution which is also supported by `epinowcast`) or assuming that the report delay is fully non-parametric which is not yet supported but will be in future package versions. -There are any number of additional models we could explore within the framework supported by `epinowcast` as well as a large number of alternative parameterisations that are not yet supported. For example, we could explore models with more complex reporting day effects, including holidays (supported in `epinowcast` either as a separate effect or by assuming they have the same reporting hazard as Sundays) and variation over time which would represent reporting delays changing independently of reference date (this would be similar to the time varying model we defined above but with this effect occurring in the report date model rather than in the reference date model). These choices are data dependent and domain knowledge needs to be used to assess the likely censoring mechanisms. +There are any number of additional models we could explore within the framework supported by `epinowcast` as well as a large number of alternative parameterisations that are not yet supported. For example, we could explore models with more complex reporting day effects, including holidays (supported in `epinowcast` either as a separate effect or by assuming they have the same reporting hazard as Sundays) and variation over time which would represent reporting delays changing independently of reference date (this would be similar to the time varying model we defined above but with this effect occurring in the report date model rather than in the reference date model). These choices are data dependent and domain knowledge needs to be used to assess the likely truncation mechanisms. If interested in expanding the functionality of the underlying model to address some of these issues note that `epinowcast` allows users to pass in their own models meaning that alternative parameterisations, for example altering the forecast model used for inferring expected observations, may be easily tested within the package infrastructure. Once this testing has been done alterations that increase the flexibility of the package model and improves its defaults are very welcome as pull requests. diff --git a/vignettes/germany-age-stratified-nowcasting.Rmd.orig b/vignettes/germany-age-stratified-nowcasting.Rmd.orig index e6aae2dd0..34155102d 100644 --- a/vignettes/germany-age-stratified-nowcasting.Rmd.orig +++ b/vignettes/germany-age-stratified-nowcasting.Rmd.orig @@ -47,7 +47,7 @@ options(mc.cores = 2) # Data -Nowcasting is effectively the estimation of reporting patterns for recently reported data. This requires data on these patterns for previous observations and typically this means the time series of data as reported on multiple consecutive days (in theory non-consecutive days could be used but this is not yet supported in `epinowcast`). +Nowcasting is effectively the estimation of reporting patterns for recently reported data. This requires data on these patterns for previous observations, which typically means the time series of data as reported on multiple consecutive days (in theory non-consecutive days could be used but this is not yet supported in `epinowcast`). Here we use COVID-19 hospitalisations by date of positive test in Germany stratified by age group available from up to the 1st of September 2020 (with 40 days of data included prior to this) as an example of data available in real-time and hospitalisations by date of positive test available up to 20th of October to represent hospitalisations as finally reported. These data are sourced from the [Robert Koch Institute via the Germany Nowcasting hub](https://github.com/KITmetricslab/hospitalization-nowcast-hub/wiki/Truth-data#role-an-definition-of-the-seven-day-hospitalization-incidence) where they are deconvolved from weekly data and days with negative reported hospitalisations are adjusted. @@ -62,7 +62,7 @@ retro_nat_germany <- nat_germany_hosp |> retro_nat_germany ``` -Similarly we then find the data that were available on the 20th of October for these dates which will serve as the target "true" data. +Similarly we then find the data that were available on the 20th of October for these dates, which will serve as the target "true" data. ```{r} latest_nat_germany <- nat_germany_hosp |> @@ -73,7 +73,7 @@ latest_nat_germany <- nat_germany_hosp |> # Data preprocessing -`epinowcast` works by assuming data has been preprocessed into the reporting format it requires coupled with meta data for both reference and report dates. `enw_preprocess_data()` can be used for this though users can also use the internal functions to produce their own custom preprocessing steps. It is at this stage that arbitrary groupings of observations can be defined which will then be propagated throughout all subsequent modelling steps. Here we have data stratified by age and so grouped by age group but in principle this could be any grouping or combination of groups independent of the reference and report date models. Here we also assume a maximum delay required to make the model identifiable. We set this to 40 days due to evidence of long reporting delays in this example data but note that in most cases the majority of right censoring occurs in the first few days and that increasing the maximum delay has a non-linear effect on run-time (i.e a 20 day delay will be much faster to fit a model for than a 40 day delay). Note also that under the current formulation delays longer than the maximum are ignored so that the adjusted estimate is really for data reported after the maximum delay rather than for finally reported data. +`epinowcast` works by assuming data has been preprocessed into the reporting format it requires, coupled with meta data for both reference and report dates. `enw_preprocess_data()` can be used for this, although users can also use the internal functions to produce their own custom preprocessing steps. It is at this stage that arbitrary groupings of observations can be defined, which will then be propagated throughout all subsequent modelling steps. Here we have data stratified by age, and hence grouped by age group, but in principle this could be any grouping or combination of groups independent of the reference and report date models. We furthermore assume a maximum delay required to make the model identifiable. We set this to 40 days due to evidence of long reporting delays in this example data. However, note that in most cases the majority of right truncation occurs in the first few days and that increasing the maximum delay has a non-linear effect on run-time (i.e. a model with a maximum delay of 20 days will be much faster to fit than with 40 days). Note also that under the current formulation delays longer than the maximum are ignored so that the adjusted estimate is really for data reported after the maximum delay rather than for finally reported data. Another key modelling choice we make at this stage is to model overall hospitalisations jointly with age groups rather than as an aggregation of age group estimates. This implicitly assumes that aggregated and non-aggregated data are not comparable (which may or may not be the case) but that the reporting process shares some of the same mechanisms. Another way to approach this would be to only model age stratified hospitalisations and then to aggregate the nowcast estimates into total counts after fitting the model.