-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random effects and random walks (issues #223 and #166) #228
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #228 +/- ##
===========================================
- Coverage 97.31% 97.27% -0.05%
===========================================
Files 14 14
Lines 1527 1540 +13
===========================================
+ Hits 1486 1498 +12
- Misses 41 42 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There is still a remaining issue of trying to use the same random effect as used in a random walk (see below). library(epinowcast)
# U/se meta data for references dates from the Germany COVID-19
# hospitalisation data.
obs <- enw_filter_report_dates(
germany_covid19_hosp[location == "DE"][
age_group %in% c("0-04", "05-14", "15-34")
],
remove_days = 10
)
obs <- enw_filter_reference_dates(obs, include_days = 10)
pobs <- enw_preprocess_data(
obs, by = c("age_group", "location"), max_delay = 10
)
data <- pobs$metareference[[1]]
# Model with a random effect for age group and a random walk
enw_formula(~ 1 + (1 | week) + rw(week), data)
# Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, : ‘min’ not meaningful for factors I have resolved this by changing the ordering of processing in I have also added new tag |
The orginal reprex now looks as follows which I think is correct: library(epinowcast)
# U/se meta data for references dates from the Germany COVID-19
# hospitalisation data.
obs <- enw_filter_report_dates(
germany_covid19_hosp[location == "DE"],
remove_days = 40
)
obs <- enw_filter_reference_dates(obs, include_days = 40)
pobs <- enw_preprocess_data(obs, by = c("age_group", "location"))
data <- pobs$metareference[[1]]
# Model with a random effect for age group and a random walk
enw_formula(~ 1 + (1 | age_group) + rw(week), data)
#> $formula
#> [1] "~1 + (1 | age_group) + rw(week)"
#>
#> $parsed_formula
#> $parsed_formula$fixed
#> [1] "1"
#>
#> $parsed_formula$random
#> $parsed_formula$random[[1]]
#> 1 | age_group
#>
#>
#> $parsed_formula$rw
#> [1] "rw(week)"
#>
#>
#> $expanded_formula
#> [1] "~1 + age_group + cweek1 + cweek2 + cweek3 + cweek4 + cweek5"
#>
#> $fixed
#> $fixed$formula
#> [1] "~1 + age_group + cweek1 + cweek2 + cweek3 + cweek4 + cweek5"
#>
#> $fixed$design
#> (Intercept) age_group00-04 age_group00+ age_group05-14 age_group15-34
#> 1 1 0 1 0 0
#> 8 1 0 1 0 0
#> 15 1 0 1 0 0
#> 22 1 0 1 0 0
#> 29 1 0 1 0 0
#> 36 1 0 1 0 0
#> 42 1 1 0 0 0
#> 49 1 1 0 0 0
#> 56 1 1 0 0 0
#> 56 1 1 0 0 0
#> 63 1 1 0 0 0
#> 70 1 1 0 0 0
#> 77 1 1 0 0 0
#> 83 1 0 0 1 0
#> 90 1 0 0 1 0
#> 97 1 0 0 1 0
#> 104 1 0 0 1 0
#> 111 1 0 0 1 0
#> 118 1 0 0 1 0
#> 124 1 0 0 0 1
#> 131 1 0 0 0 1
#> 138 1 0 0 0 1
#> 145 1 0 0 0 1
#> 152 1 0 0 0 1
#> 159 1 0 0 0 1
#> 165 1 0 0 0 0
#> 172 1 0 0 0 0
#> 179 1 0 0 0 0
#> 186 1 0 0 0 0
#> 193 1 0 0 0 0
#> 200 1 0 0 0 0
#> 206 1 0 0 0 0
#> 213 1 0 0 0 0
#> 220 1 0 0 0 0
#> 227 1 0 0 0 0
#> 234 1 0 0 0 0
#> 241 1 0 0 0 0
#> 247 1 0 0 0 0
#> 254 1 0 0 0 0
#> 261 1 0 0 0 0
#> 268 1 0 0 0 0
#> 275 1 0 0 0 0
#> 282 1 0 0 0 0
#> age_group35-59 age_group60-79 age_group80+ cweek1 cweek2 cweek3 cweek4
#> 1 0 0 0 0 0 0 0
#> 8 0 0 0 1 0 0 0
#> 15 0 0 0 1 1 0 0
#> 22 0 0 0 1 1 1 0
#> 29 0 0 0 1 1 1 1
#> 36 0 0 0 1 1 1 1
#> 42 0 0 0 0 0 0 0
#> 49 0 0 0 1 0 0 0
#> 56 0 0 0 1 1 0 0
#> 63 0 0 0 1 1 1 0
#> 70 0 0 0 1 1 1 1
#> 77 0 0 0 1 1 1 1
#> 83 0 0 0 0 0 0 0
#> 90 0 0 0 1 0 0 0
#> 97 0 0 0 1 1 0 0
#> 104 0 0 0 1 1 1 0
#> 111 0 0 0 1 1 1 1
#> 118 0 0 0 1 1 1 1
#> 124 0 0 0 0 0 0 0
#> 131 0 0 0 1 0 0 0
#> 138 0 0 0 1 1 0 0
#> 145 0 0 0 1 1 1 0
#> 152 0 0 0 1 1 1 1
#> 159 0 0 0 1 1 1 1
#> 165 1 0 0 0 0 0 0
#> 172 1 0 0 1 0 0 0
#> 179 1 0 0 1 1 0 0
#> 186 1 0 0 1 1 1 0
#> 193 1 0 0 1 1 1 1
#> 200 1 0 0 1 1 1 1
#> 206 0 1 0 0 0 0 0
#> 213 0 1 0 1 0 0 0
#> 220 0 1 0 1 1 0 0
#> 227 0 1 0 1 1 1 0
#> 234 0 1 0 1 1 1 1
#> 241 0 1 0 1 1 1 1
#> 247 0 0 1 0 0 0 0
#> 254 0 0 1 1 0 0 0
#> 261 0 0 1 1 1 0 0
#> 268 0 0 1 1 1 1 0
#> 275 0 0 1 1 1 1 1
#> 282 0 0 1 1 1 1 1
#> cweek5
#> 1 0
#> 8 0
#> 15 0
#> 22 0
#> 29 0
#> 36 1
#> 42 0
#> 49 0
#> 56 0
#> 63 0
#> 70 0
#> 77 1
#> 83 0
#> 90 0
#> 97 0
#> 104 0
#> 111 0
#> 118 1
#> 124 0
#> 131 0
#> 138 0
#> 145 0
#> 152 0
#> 159 1
#> 165 0
#> 172 0
#> 179 0
#> 186 0
#> 193 0
#> 200 1
#> 206 0
#> 213 0
#> 220 0
#> 227 0
#> 234 0
#> 241 1
#> 247 0
#> 254 0
#> 261 0
#> 268 0
#> 275 0
#> 282
#
#> $fixed$index
#> [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4
#> [26] 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8
#> [51] 8 8 8 8 8 9 9 9 9 9 9 9 10 10 10 10 10 10 10 11 11 11 11 11 11
#> [76] 11 12 12 12 12 12 12 13 13 13 13 13 13 13 14 14 14 14 14 14 14 15 15 15 15
#> [101] 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 17 17 18 18 18 18 18 18 19 19
#> [126] 19 19 19 19 19 20 20 20 20 20 20 20 21 21 21 21 21 21 21 22 22 22 22 22 22
#> [151] 22 23 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 25 26 26 26 26
#> [176] 26 26 26 27 27 27 27 27 27 27 28 28 28 28 28 28 28 29 29 29 29 29 29 29 30
#> [201] 30 30 30 30 30 31 31 31 31 31 31 31 32 32 32 32 32 32 32 33 33 33 33 33 33
#> [226] 33 34 34 34 34 34 34 34 35 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37
#> [251] 37 37 37 38 38 38 38 38 38 38 39 39 39 39 39 39 39 40 40 40 40 40 40 40 41
#> [276] 41 41 41 41 41 41 42 42 42 42 42 42
#>
#>
#> $random
#> $random$formula
#> [1] "~0 + fixed + age_group + rw__week"
#>
#> $random$design
#> fixed age_group rw__week
#> 1 0 1 0
#> 2 0 1 0
#> 3 0 1 0
#> 4 0 1 0
#> 5 0 1 0
#> 6 0 1 0
#> 7 0 1 0
#> 8 0 0 1
#> 9 0 0 1
#> 10 0 0 1
#> 11 0 0 1
#> 12 0 0 1
#> attr(,"assign")
#> [1] 1 2 3
#>
#> $random$index
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12
#>
#>
#> attr(,"class")
#> [1] "enw_formula" "list" Created on 2023-04-12 with reprex v2.0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is quite a complex PR I have also self-reviewed it. Everything looks as expected to me. The impact on the Germany vignette of the random walk/effect misspecification is fairly minimal which suggests to me the impact on users is also likely to be minimal in most cases where this issue comes into play.
…andom walk + erorr when same used as re and fixed
I've added an issue (#235) for refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a second round of self-review and rerun all the examples + compared these to outputs from running them using the code on develop
.
Everything looks sensible to me and I think the issue diagnosed is a real-one. Going to merge this here as we have the final review step when we move to release 0.2.1
.
As flagged by @Bisaloo in #223
no_contrasts
was defined but not used inenw_formula
. I have explored this potential bug in detail and found that it has no functional impact and hence dropped the definition ofno_contrasts
with no other changes.However, in revisiting this area of the code base I also revisited #166 and this appears to be more problematic. The essential issue appears to be that when the fixed design matrix is specified it is ordered fixed terms, random walk terms, and random effect terms but when the random effects design matrix is constructed it is ordered fixed, random effects, and random walk terms (I think this is the root of the issue @Gulfa was having which would make sense as it would really cause problems when trying to forecast).
This means that variances are improperly assigned (as the random effects variables are applied in order to the fixed effects design matrix) for formulas with both a random walk and a random effect. For most of the simple models we have been fitting this is not a significant issue but it definitely could be causing issues for users (for example potentially some of the work @jbracher and collabs have been doing) which would mostly show up as increased issues fitting their models though could also potentially bias posterior estimates. It is also an issue for the Germany showcase vignette (which I have updated here to highlight potential issues) but not the original application where design matrices were constructed manually.
An obvious follow-up is that the code pattern described above is quite brittle and prone to this kind of issue. It is out of scope for this PR but we should add another feature issue to refactor this area of the code base to be more robust (which we will need as more formula features are added).
Reprex (from #166):
Created on 2023-04-11 with reprex v2.0.2
This gives us the following metadata
data.frame
which as can be seen has a different order than the fixed effects design matrix. This means that variance parameters may be being improperly assigned as @Gulfa flagged.To do:
enw_replace_priors()
enw_replace_priors()
fixenw_replace_priors()
to update from a fitted model object.