Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature missing reference support code #138

Merged
merged 42 commits into from Aug 6, 2022
Merged

Conversation

seabbs
Copy link
Collaborator

@seabbs seabbs commented Jul 25, 2022

This PR adds support code for adding modelling of missing reference dates (#43). This includes updating epinowcast() to have enw_missing() as an argument (it cannot be optional as stan complains about missing data even when it has zero dimensions), updating enw_missing() to have a not on mode, adding support for missing effects and priors, adding a new grouped version of the observation likelihood in which we can add likelihood changes from #107, and adding a prototype function for simulating missing data (we should probably look at adding this and enw_incidence_to_cumulative() to the package but I am aware we are getting lots of exported functions and it may be a bit overwhelming/hard to support in the long term.

Alongside #107 this leaves post-processing as a final support step in terms of supporting missing dates to the same level as non-missing dates.

Note the grouped observation likelihood is likely not quite there - tricky keeping track of the indexing. It would be easier if both grouped and snapshot versions could be used without missing data (as should return the same thing) but this might be a bit annoying to support internally.

@seabbs seabbs added enhancement New feature or request high-priority labels Jul 25, 2022
@seabbs seabbs self-assigned this Jul 25, 2022
@seabbs seabbs requested a review from adrian-lison July 25, 2022 22:05
Copy link
Collaborator

@adrian-lison adrian-lison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, great work, really close now to getting the missingness model running.

If I understand correctly, for the missingness likelihood, we

  • still need to integrate the ref_miss_prop
  • add a matrix storing the without-reference date obs while going through the snapshots
  • do the broadcasting as proposed in Add stan code changes for missing data model #107 for the without-reference date matrix
  • transform the matrix into a vector by log-sum-exping over the columns and
  • add another call to obs_lmpfs for this second vector

Anything else?

inst/examples/germany_missing.R Outdated Show resolved Hide resolved
inst/stan/epinowcast.stan Show resolved Hide resolved
inst/stan/epinowcast.stan Show resolved Hide resolved
inst/stan/epinowcast.stan Outdated Show resolved Hide resolved
inst/stan/epinowcast.stan Show resolved Hide resolved
inst/stan/epinowcast.stan Outdated Show resolved Hide resolved
@seabbs
Copy link
Collaborator Author

seabbs commented Jul 27, 2022

Yes, I think you have got all the changes we need. We also need some postprocessing work to make plotting/summary etc part of the tooling. The only other things I want to add to this PR are:

  • some tests that the two delay_ functions return the same thing.
  • Expose the ability to switch between them to the user.
  • Add a warning to enw_missing() indicating not currently supported in the available model.

@seabbs seabbs mentioned this pull request Jul 27, 2022
4 tasks
@seabbs
Copy link
Collaborator Author

seabbs commented Jul 29, 2022

Note that rather than adding tests I instead added more lower level functions. I also used these to update the generated quantities to use the flat data structure.

@seabbs seabbs mentioned this pull request Jul 30, 2022
6 tasks
@seabbs
Copy link
Collaborator Author

seabbs commented Jul 30, 2022

still need to integrate the ref_miss_prop

Added in 706388c

@epinowcast epinowcast deleted a comment from codecov bot Jul 30, 2022
@codecov
Copy link

codecov bot commented Jul 30, 2022

Codecov Report

Merging #138 (1b054a2) into develop (57d2673) will decrease coverage by 0.06%.
The diff coverage is 91.76%.

@@             Coverage Diff             @@
##           develop     #138      +/-   ##
===========================================
- Coverage    86.20%   86.14%   -0.07%     
===========================================
  Files           12       12              
  Lines         1196     1234      +38     
===========================================
+ Hits          1031     1063      +32     
- Misses         165      171       +6     
Impacted Files Coverage Δ
R/check.R 73.46% <22.22%> (-11.54%) ⬇️
R/epinowcast.R 92.10% <100.00%> (+2.10%) ⬆️
R/model-modules.R 99.31% <100.00%> (+0.05%) ⬆️
R/model-tools.R 94.21% <100.00%> (+0.46%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@seabbs
Copy link
Collaborator Author

seabbs commented Jul 30, 2022

I think everything planned for this PR is now in place so pinged for another review. As quite a few png updates in this PR so I think we should go for a squash merge.

A quick summary of what is here (sorry got a little out of hand):

  • Passes in missing model input to stan code
  • Added model machinery for the missingness model effects to stan
  • Updates handling of enw_missing() to make it work with stan
  • Reorganises the stan code slightly to be easier to read, including dropping repeated comments
  • Splitting delay_lmpf into two functions one that is aggregated over snapshots and one over groups. Expose this to the user as a choice and forcing the use of the grouped option when the missing model is present.
  • Both of these functions now depend on a vectorised version of expected_obs_from_index called expected_obs_from_snaps.
  • This has also been used to refactor the generated quantities. I have also switched all parts of this to use the flattened data storage structure.
  • Added regression tests for posterior predictions, nowcasts and key parameters. These are rough and only a first pass but better than nothing
  • Added a new example for missing data along with some prototype functions for making missing data (these likely can be refined and added to the package in another PR).

I think as long as correct most of these changes are fairly straightforward. There is an argument we liked the gq being verbose and different from other code as a check on our model but to be honest I prefer unit tests etc and hopefully sharing more code will make future additions easier.

Note: Going with a flat structure here poses some potential problems for allocating observations to report dates. One solution would be to pass in a look up for report dates and groups and iterate over it. I think the proposed matrix approach will still work but I may have made it harder.

@seabbs seabbs requested a review from adrian-lison July 30, 2022 18:59
@seabbs
Copy link
Collaborator Author

seabbs commented Aug 4, 2022

Woops!

Copy link
Collaborator

@adrian-lison adrian-lison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me. I added one proposal commit

inst/stan/epinowcast.stan Show resolved Hide resolved
@seabbs seabbs merged commit ab48d8a into develop Aug 6, 2022
@seabbs seabbs deleted the feature-missing-support-code branch August 6, 2022 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high-priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants