-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum delay treated as observed and not nowcast #116
Comments
Hmm, interesting and important point! But I do no fully get it yet. Leaving your second points aside for a moment, I would have only expected that the "nowcast" for reference dates And yes, the second issue is also something to discuss further. Theoretically, the max delay should be chosen such that no or only very few cases are beyond it. But I agree that if it is chosen too shortly, it could bias the estimates for the expected cases over time. If we want to avoid this, I would agree that we have to switch to the "drop/ignore" strategy, meaning that we then risk biasing our case estimates downwards but with no differences in the bias over time. I don't think I am a big fan of the 2nd max delay approach, as I still think that the maximum delay should be chosen large enough in the first place, and if we have efficiency issues we should try to solve them in a different way. |
Yes exactly, this only impacts the final day (i.e at horizon -max_delay) as the observation on that day is treated as an overservation on that day (for
We don't want setting a short delay to introduce bias it currently does though we want it to make uncertainty larger? What upside do you think there is from including this in the likelihood at the moment if it is biased based on when we observe our data? |
So I am thinking we role back to the old definition of excluding counts beyond the maximum delay and normalising the probability distribution (as we want all expected cases to eventually be reported). If we don't do this I think we are open to bias (if things remain the same) or will have identifiability issues if the probability of report doesn't sum to 1 for observed data. Very happy to discuss this more though as really key to get right. We can discuss this in detail at the next community call? |
In #121 I have hot fixed this by rolling back to excluding beyond the maximum delay and updating documentation to this effect. The more I think about this I have become convinced this is actually the correct strategy in general but definitely think we need to discuss it as I know I am in the minority on this. My reasons are:
I think the way to frame this model and the use of a maximum delay in general is that are shifting the question from nowcasting what will ever be reported to what will be reported up to D. We should probably work on phrasing this better in the model vignette. |
Closed in favour of #122 |
In
develop
, introduced in #113, we now model observations at the maximum delay by adding all of the probability that reports will occur after the maximum delay to the maximum delay. This works well for posterior predictions. Unfortunately, when we nowcast we treat all observed data as known and only nowcast unobserved dates. This means that we do not include reports beyond the maximum delay from either the model or from the data (as they are not yet observed in real-world data).See here:
https://github.com/epiforecasts/epinowcast/blob/21b9311ca13d0939d9a9905f36e6d5284411ad48/inst/stan/epinowcast.stan#L218
The likely solution to this is to add 1 to the maximum delay, add the observations beyond this point to this date, and update the nowcast model code to nowcast this additional delay.
Another option is to alter the nowcast part of the generated quanities so that it uses a posterior prediction vs the observation for maximum delay.
This highlights another issue that we have potentially introduced. The number at the maximum delay now changes over time beyond the maximum delay. This means it is larger further away from the reported observation (see the posterior predictions below). This likely introduces bias to the likelihood. One potential solution to this is to exclde the date which has the bucketed additional reports in it (either the current max delay or one plus). Another option is to only include it after some additional time has passed (in effect having a second maximum delay).
Related to #114 (@adrian-lison).
The following code (using the
feature-redesign-interface
branch) reproduces this issue.The text was updated successfully, but these errors were encountered: