Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to accumulate observations #534

Merged
merged 18 commits into from
Feb 15, 2024
Merged

Add option to accumulate observations #534

merged 18 commits into from
Feb 15, 2024

Conversation

sbfnk
Copy link
Contributor

@sbfnk sbfnk commented Jan 22, 2024

This PR adds an argument na to obs_opts with an "accumulate" option that adds any NA data points to the following data point in the inference. The idea is that this will allow fitting data at any time interval (e.g. weekly) including where spacing is irregular (e.g., monthly, or weekly where on some occasions data is reported a day later because of a holiday etc.).

Two points for potential discussion are:

  1. When setting this to accumulate the first data point is ignored as we don't know when we should start accumulating for that one. This means that for a someone who would like that data point to be considered it would be advantageous to add a dummy data point to the beginning of the data set. We could add an option to do that, or document it, or leave it unmentioned, or find another solution.
  2. Because we either skip or accumulate NAs we can't combine them, e.g. to have weekly data with missing values. Enabling this would, I think, require some sort of that marks dates explicitly as missing vs. NA.

I've updated estimate_secondary to still work with the examples/tests but it doesn't work with NA values yet. This is for another issue/PR. It was necessary to update estimate_secondary in order to pass tests/checks. This now also works with NA values.

Closes #531

Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 99c3559 is merged into main:

  •   :ballot_box_with_check:default: 33.1s -> 32s [-21.89%, +15.59%]
  •   :ballot_box_with_check:no_delays: 32.9s -> 35.1s [-10.48%, +24.14%]
  •   :ballot_box_with_check:random_walk: 9.24s -> 14.5s [-76.54%, +191.13%]
  •   :ballot_box_with_check:stationary: 20.1s -> 18s [-24.6%, +3.67%]
  •   :ballot_box_with_check:uncertain: 51.2s -> 52s [-17.56%, +20.65%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@sbfnk sbfnk requested a review from seabbs January 22, 2024 15:16
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 7ee4d94 is merged into main:

  •   :ballot_box_with_check:default: 32.2s -> 33.1s [-14.22%, +20.2%]
  •   :ballot_box_with_check:no_delays: 34.5s -> 35.6s [-4.78%, +10.62%]
  •   :ballot_box_with_check:random_walk: 9.4s -> 9.55s [-10.87%, +13.95%]
  •   :ballot_box_with_check:stationary: 18.9s -> 18.2s [-14.84%, +7.46%]
  •   :ballot_box_with_check:uncertain: 49.4s -> 48s [-21.87%, +16.14%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice.

Enabling this would, I think, require some sort of that marks dates explicitly as missing vs. NA.

I think this would be my preferred option as it would be more general but I also think it can be addressed in its own review as it would be a superset of this PR.

My thought on how that would work is to have a new variable (accumulate) that indicates which days should be summed.

R/opts.R Show resolved Hide resolved
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good and is a nice feature to have. Some reservations about the precise implementation but as flagged can be addressed by follow up work.

This means that for a someone who would like that data point to be considered it would be advantageous to add a dummy data point to the beginning of the data set. We could add an option to do that, or document it, or leave it unmentioned, or find another solution.

I think my preferred option here would be to add a message when this method is used? (i.e dropping the first data point and use a dummy if you wish).

R/opts.R Outdated Show resolved Hide resolved
inst/stan/functions/observation_model.stan Show resolved Hide resolved
inst/stan/functions/observation_model.stan Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
seabbs
seabbs previously approved these changes Feb 14, 2024
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me. I think the outstanding points are all for a new issues so if you agree we can resolve and merge?

R/opts.R Outdated Show resolved Hide resolved
inst/stan/functions/observation_model.stan Show resolved Hide resolved
inst/stan/functions/observation_model.stan Show resolved Hide resolved
@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 14, 2024

LGTM to me. I think the outstanding points are all for a new issues so if you agree we can resolve and merge?

Yes sounds good to me.

@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 14, 2024

I think my preferred option here would be to add a message when this method is used? (i.e dropping the first data point and use a dummy if you wish).

See

if (na == "accumulate") {

Might be annoying but will remind us to implement a better solution if so.

@seabbs
Copy link
Contributor

seabbs commented Feb 14, 2024

Might be annoying but will remind us to implement a better solution if so.

If things move over to {cli} this could be made a once per session thing to tone it down a bit.

Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 2a32f95 is merged into main:

  •   :ballot_box_with_check:default: 30.8s -> 32.1s [-12.54%, +20.89%]
  •   :ballot_box_with_check:no_delays: 33.4s -> 39.7s [-23.84%, +61.15%]
  •   :ballot_box_with_check:random_walk: 8.96s -> 10.6s [-14%, +50.41%]
  •   :ballot_box_with_check:stationary: 17.5s -> 19.3s [-7.64%, +27.77%]
  •   :ballot_box_with_check:uncertain: 51.7s -> 51.7s [-14.15%, +14.2%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if df1fdc8 is merged into main:

  •   :ballot_box_with_check:default: 32.5s -> 47.8s [-52.33%, +147.07%]
  •   :ballot_box_with_check:no_delays: 31.9s -> 38.9s [-26.42%, +70.59%]
  •   :ballot_box_with_check:random_walk: 9.09s -> 9.24s [-4.27%, +7.66%]
  •   :ballot_box_with_check:stationary: 18.7s -> 32.7s [-92.66%, +243.41%]
  •   :ballot_box_with_check:uncertain: 51.4s -> 49.6s [-16.86%, +9.73%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@seabbs seabbs merged commit 50fc3cf into main Feb 15, 2024
14 checks passed
@seabbs seabbs deleted the accumulate-na branch February 15, 2024 12:21
sbfnk added a commit that referenced this pull request May 3, 2024
* add option to accumulate observations

* accumulate in estimate_secondary model

* add test for weekly accumulation

* check there's data to fit initial growth model

* ignore first observation when accumulating

* document "na" argument

* add news item

* update obs_opts tests

* make logical operator scalar

* make NA option work with estimate_secondary

* add tests

* Apply suggestions from code review

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
sbfnk added a commit that referenced this pull request May 3, 2024
* add option to accumulate observations

* accumulate in estimate_secondary model

* add test for weekly accumulation

* check there's data to fit initial growth model

* ignore first observation when accumulating

* document "na" argument

* add news item

* update obs_opts tests

* make logical operator scalar

* make NA option work with estimate_secondary

* add tests

* Apply suggestions from code review

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accumulate observations where data are NAs
2 participants