Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random effects and random walks (issues #223 and #166) #228

Merged
merged 17 commits into from
Apr 17, 2023
Merged

Conversation

seabbs
Copy link
Collaborator

@seabbs seabbs commented Apr 11, 2023

As flagged by @Bisaloo in #223 no_contrasts was defined but not used in enw_formula. I have explored this potential bug in detail and found that it has no functional impact and hence dropped the definition of no_contrasts with no other changes.

However, in revisiting this area of the code base I also revisited #166 and this appears to be more problematic. The essential issue appears to be that when the fixed design matrix is specified it is ordered fixed terms, random walk terms, and random effect terms but when the random effects design matrix is constructed it is ordered fixed, random effects, and random walk terms (I think this is the root of the issue @Gulfa was having which would make sense as it would really cause problems when trying to forecast).

This means that variances are improperly assigned (as the random effects variables are applied in order to the fixed effects design matrix) for formulas with both a random walk and a random effect. For most of the simple models we have been fitting this is not a significant issue but it definitely could be causing issues for users (for example potentially some of the work @jbracher and collabs have been doing) which would mostly show up as increased issues fitting their models though could also potentially bias posterior estimates. It is also an issue for the Germany showcase vignette (which I have updated here to highlight potential issues) but not the original application where design matrices were constructed manually.

An obvious follow-up is that the code pattern described above is quite brittle and prone to this kind of issue. It is out of scope for this PR but we should add another feature issue to refactor this area of the code base to be more robust (which we will need as more formula features are added).

Reprex (from #166):

library(epinowcast)
# U/se meta data for references dates from the Germany COVID-19
# hospitalisation data.
obs <- enw_filter_report_dates(
  germany_covid19_hosp[location == "DE"],
  remove_days = 40
)
obs <- enw_filter_reference_dates(obs, include_days = 40)
pobs <- enw_preprocess_data(obs, by = c("age_group", "location"))
data <- pobs$metareference[[1]]

# Model with a random effect for age group and a random walk
enw_formula(~ 1 + (1 | age_group) + rw(week), data)
#> $formula
#> [1] "~1 + (1 | age_group) + rw(week)"
#>
#> $parsed_formula
#> $parsed_formula$fixed
#> [1] "1"
#>
#> $parsed_formula$random
#> $parsed_formula$random[[1]]
#> 1 | age_group
#>
#>
#> $parsed_formula$rw
#> [1] "rw(week)"
#>
#>
#> $expanded_formula
#> [1] "~1 + cweek1 + cweek2 + cweek3 + cweek4 + cweek5 + age_group"
#>
#> $fixed
#> $fixed$formula
#> [1] "~1 + cweek1 + cweek2 + cweek3 + cweek4 + cweek5 + age_group"
#>
#> $fixed$design
#>     (Intercept) cweek1 cweek2 cweek3 cweek4 cweek5 age_group00-04 age_group00+
#> 1             1      0      0      0      0      0              0            1
#> 8             1      1      0      0      0      0              0            1
#> 15            1      1      1      0      0      0              0            1
#> 22            1      1      1      1      0      0              0            1
#> 29            1      1      1      1      1      0              0            1
#> 36            1      1      1      1      1      1              0            1
#> 42            1      0      0      0      0      0              1            0
#> 49            1      1      0      0      0      0              1            0
#> 56            1      1      1      0      0      0              1            0
#> 63            1      1      1      1      0      0              1            0
#> 70            1      1      1      1      1      0              1            0
#> 77            1      1      1      1      1      1              1            0
#> 83            1      0      0      0      0      0              0            0
#> 90            1      1      0      0      0      0              0            0
#> 97            1      1      1      0      0      0              0            0
#> 104           1      1      1      1      0      0              0            0
#> 111           1      1      1      1      1      0              0            0
#> 118           1      1      1      1      1      1              0            0
#> 124           1      0      0      0      0      0              0            0
#> 131           1      1      0      0      0      0              0            0
#> 138           1      1      1      0      0      0              0            0
#> 145           1      1      1      1      0      0              0            0
#> 152           1      1      1      1      1      0              0            0
#> 159           1      1      1      1      1      1              0            0
#> 165           1      0      0      0      0      0              0            0
#> 172           1      1      0      0      0      0              0            0
#> 179           1      1      1      0      0      0              0            0
#> 186           1      1      1      1      0      0              0            0
#> 193           1      1      1      1      1      0              0            0
#> 200           1      1      1      1      1      1              0            0
#> 206           1      0      0      0      0      0              0            0
#> 213           1      1      0      0      0      0              0            0
#> 220           1      1      1      0      0      0              0            0
#> 227           1      1      1      1      0      0              0            0
#> 234              0              0              0              1            0
#> 241              0              0              0              1            0
#> 247              0              0              0              0            1
#> 254              0              0              0              0            1
#> 261              0              0              0              0            1
#> 268              0              0              0              0            1
#> 275              0              0              0              0            1
#> 282              0              0              0              0            1
#>
#> $fixed$index
#>   [1]  1  1  1  1  1  1  1  2  2  2  2  2  2  2  3  3  3  3  3  3  3  4  4  4  4
#>  [26]  4  4  4  5  5  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7  7  7  7  8  8
#>  [51]  8  8  8  8  8  9  9  9  9  9  9  9 10 10 10 10 10 10 10 11 11 11 11 11 11
#>  [76] 11 12 12 12 12 12 12 13 13 13 13 13 13 13 14 14 14 14 14 14 14 15 15 15 15
#> [101] 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 17 17 18 18 18 18 18 18 19 19
#> [126] 19 19 19 19 19 20 20 20 20 20 20 20 21 21 21 21 21 21 21 22 22 22 22 22 22
#> [151] 22 23 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 25 26 26 26 26
#> [176] 26 26 26 27 27 27 27 27 27 27 28 28 28 28 28 28 28 29 29 29 29 29 29 29 30
#> [201] 30 30 30 30 30 31 31 31 31 31 31 31 32 32 32 32 32 32 32 33 33 33 33 33 33
#> [226] 33 34 34 34 34 34 34 34 35 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37
#> [251] 37 37 37 38 38 38 38 38 38 38 39 39 39 39 39 39 39 40 40 40 40 40 40 40 41
#> [276] 41 41 41 41 41 41 42 42 42 42 42 42
#>
#>
#> $random
#> $random$formula
#> [1] "~0 + fixed + age_group + week"
#>
#> $random$design
#>    fixed age_group week
#> 1      0         1    0
#> 2      0         1    0
#> 3      0         1    0
#> 4      0         1    0
#> 5      0         1    0
#> 6      0         1    0
#> 7      0         1    0
#> 8      0         0    1
#> 9      0         0    1
#> 10     0         0    1
#> 11     0         0    1
#> 12     0         0    1
#> attr(,"assign")
#> [1] 1 2 3
#>
#> $random$index
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12
#>
#>
#> attr(,"class")
#> [1] "enw_formula" "list"

Created on 2023-04-11 with reprex v2.0.2

This gives us the following metadata data.frame which as can be seen has a different order than the fixed effects design matrix. This means that variance parameters may be being improperly assigned as @Gulfa flagged.

Browse[2]> metadata
           effects fixed age_group week
 1: age_group00-04     0         1    0
 2:   age_group00+     0         1    0
 3: age_group05-14     0         1    0
 4: age_group15-34     0         1    0
 5: age_group35-59     0         1    0
 6: age_group60-79     0         1    0
 7:   age_group80+     0         1    0
 8:         cweek1     0         0    1
 9:         cweek2     0         0    1
10:         cweek3     0         0    1
11:         cweek4     0         0    1
12:         cweek5     0         0    1

To do:

  • Complete fix for enw_replace_priors()
  • Add unit tests for enw_replace_priors() fix
  • Add an example showing how to use enw_replace_priors() to update from a fitted model object.
  • Update Germany vignette

@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Merging #228 (1cd429b) into develop (ec5ae4e) will decrease coverage by 0.05%.
The diff coverage is 96.87%.

❗ Current head 1cd429b differs from pull request most recent head c97e54c. Consider uploading reports for the commit c97e54c to get more accurate results

@@             Coverage Diff             @@
##           develop     #228      +/-   ##
===========================================
- Coverage    97.31%   97.27%   -0.05%     
===========================================
  Files           14       14              
  Lines         1527     1540      +13     
===========================================
+ Hits          1486     1498      +12     
- Misses          41       42       +1     
Impacted Files Coverage Δ
R/formula-tools.R 95.00% <96.15%> (-0.24%) ⬇️
R/model-design-tools.R 100.00% <100.00%> (ø)
R/model-tools.R 99.20% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@seabbs seabbs added bug Something isn't working help wanted Extra attention is needed high-priority labels Apr 11, 2023
@seabbs seabbs requested a review from pearsonca April 11, 2023 22:20
@seabbs
Copy link
Collaborator Author

seabbs commented Apr 12, 2023

There is still a remaining issue of trying to use the same random effect as used in a random walk (see below).

 library(epinowcast)
    # U/se meta data for references dates from the Germany COVID-19
    # hospitalisation data.
    obs <- enw_filter_report_dates(
      germany_covid19_hosp[location == "DE"][
        age_group %in% c("0-04", "05-14", "15-34")
      ],
      remove_days = 10
    )
    obs <- enw_filter_reference_dates(obs, include_days = 10)
    pobs <- enw_preprocess_data(
      obs, by = c("age_group", "location"), max_delay = 10
    )
    data <- pobs$metareference[[1]]
 
    # Model with a random effect for age group and a random walk
    enw_formula(~ 1 + (1 | week) + rw(week), data)
# Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,  :  ‘min’ not meaningful for factors

I have resolved this by changing the ordering of processing in enw_formula() so that random walks are handled first and then random effects. This means that variables are processed into factors (for use as random effects) after the new variables that are needed for random walks have been created. I have also added error checking etc. to give informative errors if this kind of issue reoccurs.

I have also added new tag rw__ to all random walk variance terms to make it clear what they are as well as adding an error message if users try and specified a fixed and random effect of the same kind (there could be an edge case for this but I am not aware of what it is).

@seabbs seabbs self-assigned this Apr 12, 2023
@seabbs
Copy link
Collaborator Author

seabbs commented Apr 12, 2023

The orginal reprex now looks as follows which I think is correct:

library(epinowcast)
# U/se meta data for references dates from the Germany COVID-19
# hospitalisation data.
obs <- enw_filter_report_dates(
 germany_covid19_hosp[location == "DE"],
 remove_days = 40
)
obs <- enw_filter_reference_dates(obs, include_days = 40)
pobs <- enw_preprocess_data(obs, by = c("age_group", "location"))
data <- pobs$metareference[[1]]

# Model with a random effect for age group and a random walk
enw_formula(~ 1 + (1 | age_group) + rw(week), data)
#> $formula
#> [1] "~1 + (1 | age_group) + rw(week)"
#>
#> $parsed_formula
#> $parsed_formula$fixed
#> [1] "1"
#>
#> $parsed_formula$random
#> $parsed_formula$random[[1]]
#> 1 | age_group
#>
#>
#> $parsed_formula$rw
#> [1] "rw(week)"
#>
#>
#> $expanded_formula
#> [1] "~1 + age_group + cweek1 + cweek2 + cweek3 + cweek4 + cweek5"
#>
#> $fixed
#> $fixed$formula
#> [1] "~1 + age_group + cweek1 + cweek2 + cweek3 + cweek4 + cweek5"
#>
#> $fixed$design
#>     (Intercept) age_group00-04 age_group00+ age_group05-14 age_group15-34
#> 1             1              0            1              0              0
#> 8             1              0            1              0              0
#> 15            1              0            1              0              0
#> 22            1              0            1              0              0
#> 29            1              0            1              0              0
#> 36            1              0            1              0              0
#> 42            1              1            0              0              0
#> 49            1              1            0              0              0
#> 56            1              1            0              0              0
#> 56            1              1            0              0              0
#> 63            1              1            0              0              0
#> 70            1              1            0              0              0
#> 77            1              1            0              0              0
#> 83            1              0            0              1              0
#> 90            1              0            0              1              0
#> 97            1              0            0              1              0
#> 104           1              0            0              1              0
#> 111           1              0            0              1              0
#> 118           1              0            0              1              0
#> 124           1              0            0              0              1
#> 131           1              0            0              0              1
#> 138           1              0            0              0              1
#> 145           1              0            0              0              1
#> 152           1              0            0              0              1
#> 159           1              0            0              0              1
#> 165           1              0            0              0              0
#> 172           1              0            0              0              0
#> 179           1              0            0              0              0
#> 186           1              0            0              0              0
#> 193           1              0            0              0              0
#> 200           1              0            0              0              0
#> 206           1              0            0              0              0
#> 213           1              0            0              0              0
#> 220           1              0            0              0              0
#> 227           1              0            0              0              0
#> 234           1              0            0              0              0
#> 241           1              0            0              0              0
#> 247           1              0            0              0              0
#> 254           1              0            0              0              0
#> 261           1              0            0              0              0
#> 268           1              0            0              0              0
#> 275           1              0            0              0              0
#> 282           1              0            0              0              0
#>     age_group35-59 age_group60-79 age_group80+ cweek1 cweek2 cweek3 cweek4
#> 1                0              0            0      0      0      0      0
#> 8                0              0            0      1      0      0      0
#> 15               0              0            0      1      1      0      0
#> 22               0              0            0      1      1      1      0
#> 29               0              0            0      1      1      1      1
#> 36               0              0            0      1      1      1      1
#> 42               0              0            0      0      0      0      0
#> 49               0              0            0      1      0      0      0
#> 56               0              0            0      1      1      0      0
#> 63               0              0            0      1      1      1      0
#> 70               0              0            0      1      1      1      1
#> 77               0              0            0      1      1      1      1
#> 83               0              0            0      0      0      0      0
#> 90               0              0            0      1      0      0      0
#> 97               0              0            0      1      1      0      0
#> 104              0              0            0      1      1      1      0
#> 111              0              0            0      1      1      1      1
#> 118              0              0            0      1      1      1      1
#> 124              0              0            0      0      0      0      0
#> 131              0              0            0      1      0      0      0
#> 138              0              0            0      1      1      0      0
#> 145              0              0            0      1      1      1      0
#> 152              0              0            0      1      1      1      1
#> 159              0              0            0      1      1      1      1
#> 165              1              0            0      0      0      0      0
#> 172              1              0            0      1      0      0      0
#> 179              1              0            0      1      1      0      0
#> 186              1              0            0      1      1      1      0
#> 193              1              0            0      1      1      1      1
#> 200              1              0            0      1      1      1      1
#> 206              0              1            0      0      0      0      0
#> 213              0              1            0      1      0      0      0
#> 220              0              1            0      1      1      0      0
#> 227              0              1            0      1      1      1      0
#> 234              0              1            0      1      1      1      1
#> 241              0              1            0      1      1      1      1
#> 247              0              0            1      0      0      0      0
#> 254              0              0            1      1      0      0      0
#> 261              0              0            1      1      1      0      0
#> 268              0              0            1      1      1      1      0
#> 275              0              0            1      1      1      1      1
#> 282              0              0            1      1      1      1      1
#>     cweek5
#> 1        0
#> 8        0
#> 15       0
#> 22       0
#> 29       0
#> 36       1
#> 42       0
#> 49       0
#> 56       0
#> 63       0
#> 70       0
#> 77       1
#> 83       0
#> 90       0
#> 97       0
#> 104      0
#> 111      0
#> 118      1
#> 124      0
#> 131      0
#> 138      0
#> 145      0
#> 152      0
#> 159      1
#> 165      0
#> 172      0
#> 179      0
#> 186      0
#> 193      0
#> 200      1
#> 206      0
#> 213      0
#> 220      0
#> 227      0
#> 234      0
#> 241      1
#> 247      0
#> 254      0
#> 261      0
#> 268      0
#> 275      0
#> 282  
#
#> $fixed$index
#>   [1]  1  1  1  1  1  1  1  2  2  2  2  2  2  2  3  3  3  3  3  3  3  4  4  4  4
#>  [26]  4  4  4  5  5  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7  7  7  7  8  8
#>  [51]  8  8  8  8  8  9  9  9  9  9  9  9 10 10 10 10 10 10 10 11 11 11 11 11 11
#>  [76] 11 12 12 12 12 12 12 13 13 13 13 13 13 13 14 14 14 14 14 14 14 15 15 15 15
#> [101] 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 17 17 18 18 18 18 18 18 19 19
#> [126] 19 19 19 19 19 20 20 20 20 20 20 20 21 21 21 21 21 21 21 22 22 22 22 22 22
#> [151] 22 23 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 25 26 26 26 26
#> [176] 26 26 26 27 27 27 27 27 27 27 28 28 28 28 28 28 28 29 29 29 29 29 29 29 30
#> [201] 30 30 30 30 30 31 31 31 31 31 31 31 32 32 32 32 32 32 32 33 33 33 33 33 33
#> [226] 33 34 34 34 34 34 34 34 35 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37
#> [251] 37 37 37 38 38 38 38 38 38 38 39 39 39 39 39 39 39 40 40 40 40 40 40 40 41
#> [276] 41 41 41 41 41 41 42 42 42 42 42 42
#>
#>
#> $random
#> $random$formula
#> [1] "~0 + fixed + age_group + rw__week"
#>
#> $random$design
#>    fixed age_group rw__week
#> 1      0         1        0
#> 2      0         1        0
#> 3      0         1        0
#> 4      0         1        0
#> 5      0         1        0
#> 6      0         1        0
#> 7      0         1        0
#> 8      0         0        1
#> 9      0         0        1
#> 10     0         0        1
#> 11     0         0        1
#> 12     0         0        1
#> attr(,"assign")
#> [1] 1 2 3
#>
#> $random$index
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12
#>
#>
#> attr(,"class")
#> [1] "enw_formula" "list"

Created on 2023-04-12 with reprex v2.0.2

Copy link
Collaborator Author

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is quite a complex PR I have also self-reviewed it. Everything looks as expected to me. The impact on the Germany vignette of the random walk/effect misspecification is fairly minimal which suggests to me the impact on users is also likely to be minimal in most cases where this issue comes into play.

@seabbs
Copy link
Collaborator Author

seabbs commented Apr 17, 2023

I've added an issue (#235) for refactoring enw_formula() with a focus on robustness.

Copy link
Collaborator Author

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a second round of self-review and rerun all the examples + compared these to outputs from running them using the code on develop.

Everything looks sensible to me and I think the issue diagnosed is a real-one. Going to merge this here as we have the final review step when we move to release 0.2.1.

@seabbs seabbs merged commit 6c7b5ab into develop Apr 17, 2023
8 checks passed
@seabbs seabbs deleted the seabbs/issue223 branch April 17, 2023 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed high-priority self-reviewed
Projects
No open projects
Archived in project
1 participant